abstractive summarization

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions


We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

Download Links

Opinosis External Discussions/Usage

The Big Idea

The Opinosis Summarization framework focuses on generating very short abstractive summaries from large amounts of text. These summaries can resemble micropinions or “micro-reviews” that you see on sites like twitter and four squares. The idea of the algorithm is to use a word graph data structure referred to as the Opinosis-Graph to represent the text to be summarized. Then, the resulting graph is repeatedly explored to find meaningful paths which in turn becomes candidate summary phrases. The Opinosis summarizer is considered a “shallow” abstractive summarizer as it uses the original text itself to generate summaries (this makes it shallow) but it can generate phrases that were previously not seen in the original text because of the way paths are explored (and this makes it abstractive rather than purely extractive). The summarization framework was evaluated on an opinion (user review) dataset. The approach itself is actually very general in that, it can be applied to any corpus containing high amounts of redundancies, for example, Twitter comments or user comments on blog/news articles.

Here is another example of an Opinosis summary for a Car (Acura 2007) generated using the OpinRank Edmunds data set. :

Additional Thoughts

While most research projects in data mining and NLP focus on technical complexity, the focus of Opinosis was its practicality, in that it uses very shallow representation of text, relying mostly on redundancy to help generate summaries. This is not too much to ask given that we live in an era of big data, and we have ample user reviews on the Web to work with. Even though the Opinosis paper uses part-of-speech tags in its graph representation, you don’t have to use this at all and the algorithm will still work fine as long as you have sufficient volume of reviews and you make a few tweaks in finding sentence breaks.

Related summarization works

Other works using a similar graph data structure

  • Discovering Related Clinical Concepts – This paper focuses on using a concept graph similar to the Opinosis-Graph to mine clinical concepts that are highly related. For example, the drug advair is highly related to concepts like inhaler, puff, diskus, singulair, tiotropium, albuterol, combivent, spiriva. Such concepts are easily discovered using the Concept-Graph in this paper.
  • Multi-sentence compression: Finding shortest paths in word graphs
    Katja’s work was used to summarize news (google news) for both English and Spanish while Opinosis was evaluated on user reviews from various sources (English only). She studies the informativeness and grammaticality of sentences and in a similar way we evaluate these aspects by studying how close the Opinosis summaries are compared to the human composed summaries in terms of information overlap and readability (using a human assessor
  • Peilin Yang and Hui Fang – Contextual Suggestion – Another related work uses the Opinosis Algorithm to extract terms from reviews for the purpose of Contextual Suggestion. This was done as  part of the Contextual Suggestion TREC Task. It turns out that Yang and Fang had the highest rank and MRR scores in this track. Their paper can be found here: An Opinion-aware Approach to Contextual Suggestion. The details of the TREC run can be found here: Overview of the TREC 2013 Contextual Suggestion Track.

    Opinosis Presentation Slides


Abstractive Summarization Papers

While much work has been done in the area of extractive summarization, there has been limited study in abstractive summarization as this is much harder to achieve (going by the definition of true abstraction). Existing work in abstractive summarization may not be truly abstractive, and even if it is, it may not be fully automated. This page contains a very small collection of  summarization methods that are non-extractive. Please take note that this is by no means an exhaustive list, it would just be a starting point.

Related Papers

  1. UNL Document Summarization
    Virach S, Potipiti T and Charoenporn T. UNL document summarization. Proceedings of the first in-ternational workshop on multimedia annotation (MMA2001), Tokyo, Japan, January 2001
  2. FRUMP
    DeJong G. An Overview of the FRUMP system In Strategies for natural language processing, W. G. Lehnert and M. H. Ringle (eds.), 149–176. Hillsdale, New Jersey: Erlbaum 1982
    Radev D R and McKeown K R. Generating natural language summaries from multiple on-line sources. Computational Linguistics 1998; 24(3):469–500
    Summary: Involves shallow syntactic and semantic analysis, concept identification, and text regeneration. Method was developed through the study of a corpus of abstracts written by professional abstractors.
    Harabagiu, S., Lacatusu, F. Generating Single and Multi-Document Summaries with GISTEXTER. In Workshop on Text Summarization (In Conjunction with the ACL 2002 and including the DARPA/NIST sponsored DUC 2002 Meeting on Text Summarization) Philadelphia, USA, 2002.
  5. SUMUM  Generating Indicative-Informative Summaries with SumUM (Horacio Saggion , Guy Lapalm)
    Ganesan, K. A., C. X. Zhai, and J. Han, “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions”, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), Beijing, China, pp. 340–348, 2010.
    Summary: Uses a word graph data structure for each input document to find promising paths that act as candidate summaries. Leverages 3 key properties of the graph that helps generate summaries that are more abstractive (shallow abstraction). This method is syntax lean – uses only POS tags.