Publications



RDF Digest: Efficient Summarization of RDF/S KBs

The exponential growth of the web and the extended use of semantic web technologies has brought to the fore the need for quick understanding, flexible exploration and selection of complex web documents and schemas. To this direction, ontology summarization aspires to produce an abridged version of the original ontology that highlights its most representative concepts. In this paper, we present RDF Digest, a novel platform that automatically produces summaries of RDF/S Knowledge Bases (KBs). A summary is a valid RDFS docu-ment/graph that includes the most representative concepts of the schema adapted to the corresponding instances. To construct this graph, our algorithm exploits the semantics and the structure of the schema and the distribution of the corresponding data/instances. The performed preliminary evaluation demonstrates the benefits of our approach and the considerable advantages gained.


RDF Digest: Ontology Exploration Using Summaries

Ontology summarization aspires to produce an abridged version of the original ontology that highlights its most representative concepts. In this paper, we present RDF Digest, a novel platform that automatically produces and visu-alizes summaries of RDF/S Knowledge Bases (KBs). A summary is a valid RDFS document/graph that includes the most representative concepts of the schema, adapted to the corresponding instances. To construct this graph our al-gorithm exploits the semantics and the structure of the schema and the distribu-tion of the corresponding data/instances. A novel feature of our platform is that it allows summary exploration through extensible summaries. The aim of this demonstration is to dive in the exploration of the sources using summaries and to enhance the understanding of the various algorithms used.


Semantically-enabled Personal Medical Information Recommender

Word wide web has become the first choice of patients to inform themselves about their disease, side effects and possible treatments. While patient’s knowledge from internet is widely regarded as having a positive influence on the treatment, a lot of criticism exists for the quality and the diversity of the available information. In this paper we demonstrate the Personal Medical Information Recommender (PMIR), a semantically-enabled, intelligent platform that empowers patients to search in a high quality set of web documents for relevant medical knowledge. In addition, the platform automatically provides intelligent and personalized recommendations, according to the individual preferences and medical conditions. To demonstrate the platform example patients will be used to show the functionality of the system. Then we will allow conference participants to directly interact with the system to test its capabilities.


Understanding Ontology Evolution Beyond Deltas

The dynamic nature of the data on the Web gives rise to a multitude of problems related to the description and analysis of the evolution of such data. Traditional approaches for identifying and analyzing changes are descriptive, focusing on the provision of a "delta" that describes the changes and often overwhelming the user with loads of information. Here, we take an alternative approach which aims at giving a high-level overview of the change process and at identifying the most important changes in the ontology. For doing so, we consider different metrics of "change intensity", taking into account the changes that affected each class and its neighborhood, as well as ontological information related to the importance and connectivity of each class in the different versions. We argue that this approach will allow a better understanding of the intent (rather than the actions) of the editor, and a better focusing of the curator analyzing the changes; traditional delta-based approaches can subsequently be used for a more fine-grained analysis.


Ontology Understanding without Tears: The summarization approach

Given the explosive growth in both data size and schema complexity, data sources are becoming increasingly difficult to use and comprehend. Summarization aspires to produce an abridged version of the original data source highlighting its most representative concepts. In this paper, we present an advanced version of the RDF Digest, a novel platform that automatically produces and visualizes high quality summaries of RDF/S Knowledge Bases (KBs). A summary is a valid RDFS graph that includes the most representative concepts of the schema, adapted to the corresponding instances. To construct this graph we designed and implemented two algorithms that exploit both the structure of the corresponding graph and the semantics of the KB. Initially we identify the most important nodes using the notion of relevance. Then we explore how to select the edges connecting these nodes by maximizing either locally or globally the importance of the selected edges. The extensive evaluation performed compares our system with two other systems and shows the benefits of our approach and the considerable advantages gained.


Exploring Importance Measures for Summarizing RDF/S KBs

Given the explosive growth in the size and the complexity of the Data Web, there is now more than ever, an increasing need to develop methods and tools in order to facilitate the understanding and exploration of RDF/S Knowledge Bases (KBs). To this direction, summarization approaches try to produce an abridged version of the original data source, highlighting the most representative concepts. Central questions to summarization are: how to identify the most important nodes and then how to link them in order to produce a valid sub-schema graph. In this paper, we try to answer the first question by revisiting six wellknown measures from graph theory and adapting them for RDF/S KBs. Then, we proceed further to model the problem of linking those nodes as a graph Steiner- Tree problem (GSTP) employing approximations and heuristics to speed up the execution of the respective algorithms. The performed experiments show the added value of our approach since a) our adaptations outperform current state of the art measures for selecting the most important nodes and b) the constructed summary has a better quality in terms of the additional nodes introduced to the generated summary.


RDF Query Answering Using Apache Spark: Review and Assessment

The explosion of the web and the abundance of linked data demand for effective and efficient methods for storage, management and querying. More specifically, the everincreasing size and number of RDF data collections raises the need for efficient query answering, and dictate the usage of distributed data management systems for effectively partitioning and querying them. To this direction, Apache Spark is one of the most active big-data approaches, with more and more systems adopting it, for efficient, distributed data management. The purpose of this paper is to provide an overview of the existing works dealing with efficient query answering, in the area of RDF data, using Apache Spark. We discuss on the characteristics and the key dimension of such systems, we describe novel ideas in the area, and the corresponding drawbacks, and provide directions for future work.


Exploring RDF/S KBs Using Summaries

Ontology summarization aspires to produce an abridged version of the original data source highlighting its most important concepts. However, in an ideal scenario, the user should not be limited only to static summaries. Starting from the summary, s/he should be able to further explore the data source requesting more detailed information for a particular part of it. In this paper, we present a new approach enabling the dynamic exploration of summaries through two novel operations zoom and extend. Extend focuses on a specific subgraph of the initial summary, whereas zoom on the whole graph, both providing granular information access to the end-user. We show that calculating these operators is NP-complete and provide approximations for their calculation. Then, we show that using extend, we can answer more queries focusing on specific nodes, whereas using global zoom, we can answer overall more queries. Finally, we show that the algorithms employed can efficiently approximate both operators.



This work was partially supported by the EU projects eHealthMonitor(FP7-287509), DIACHRON (FP7-601043), iManageCancer (H2020-643529), MyHealthAvatar (FP7-600929) and EURECA (FP7-288048).