Entity-Centric Summarization

Recent years have seen an explosive growth in the area of automatic information extraction from large text corpora ranging from the entire Wikipedia to online news articles. As a consequence, we have access to big, automatically populated knowledge graphs such as DBPedia, FreeBase, Yago2 etc. On such graphs, a natural question to ask is ``Given a set of entities, how to best characterize the relationships between these entities?''. Many recent results addressing this question have the following structure: consider the graph structure of the knowledge graph (possibly with the type/relationship information), and compute a smaller/simpler structure that best describes the common relationship(s) among the input entities. The resulting structure could be a simple path, a minimum weighted Steiner tree, or a general subgraph that connects the input entities by optimizing certain graph-structural property such as sum of edge-weights, density of edges, conductance, etc. Unfortunately, none of these solutions specifically pay attention to the interpretability of results. In many cases the resulting graph structure is either very hard to interpret, or requires extensive background and contextual information to do so.

On the other hand, the text sources that were used for populating these knowledge graphs contain much more information than just what has been extracted. With this insight, many researchers have considered the problem of generating so called support documents that helps in the better understanding of a relationship. The user is burdened with the task of combining these support documents for relationships in the result graph.

In this research, we aim to overcome this limitation and make the relationship graphs interpretable with relative ease by end users, and at the same time, retain the power of graph analytics based knowledge discovery which goes beyond the document-level search. The idea we pursue is that of generating entity-centric summaries for one or more entities connected in the form of a graph structure.