text summarization

Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions

Ganesan, Kavita, ChengXiang Zhai, and Evelyne Viegas. “Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions.” Proceedings of the 21st international conference on World Wide Web. ACM, 2012.

Abstract

This paper presents a new unsupervised approach to generating ultra-concise summaries of opinions. We formulate the problem of generating such a micropinion summary as an optimization problem, where we seek a set of concise and non-redundant phrases that are readable and represent key opinions in text. We measure representativeness based on a modified mutual information function and model readability with an n-gram language model. We propose some heuristic algorithms to efficiently solve this optimization problem. Evaluation results show that our unsupervised approach outperforms other state of the art summarization methods and the generated summaries are informative and readable.

Links

Related Articles

Micropinion Generation Presentation Slides

View more PowerPoint from Kavita Ganesan

Citation

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions

Abstract

We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

Download Links

Opinosis External Discussions/Usage

The Big Idea

The Opinosis Summarization framework focuses on generating very short abstractive summaries from large amounts of text. These summaries can resemble micropinions or “micro-reviews” that you see on sites like twitter and four squares. The idea of the algorithm is to use a word graph data structure referred to as the Opinosis-Graph to represent the text to be summarized. Then, the resulting graph is repeatedly explored to find meaningful paths which in turn becomes candidate summary phrases. The Opinosis summarizer is considered a “shallow” abstractive summarizer as it uses the original text itself to generate summaries (this makes it shallow) but it can generate phrases that were previously not seen in the original text because of the way paths are explored (and this makes it abstractive rather than purely extractive). The summarization framework was evaluated on an opinion (user review) dataset. The approach itself is actually very general in that, it can be applied to any corpus containing high amounts of redundancies, for example, Twitter comments or user comments on blog/news articles.

Here is another example of an Opinosis summary for a Car (Acura 2007) generated using the OpinRank Edmunds data set. :

Additional Thoughts

While most research projects in data mining and NLP focus on technical complexity, the focus of Opinosis was its practicality, in that it uses very shallow representation of text, relying mostly on redundancy to help generate summaries. This is not too much to ask given that we live in an era of big data, and we have ample user reviews on the Web to work with. Even though the Opinosis paper uses part-of-speech tags in its graph representation, you don’t have to use this at all and the algorithm will still work fine as long as you have sufficient volume of reviews and you make a few tweaks in finding sentence breaks.

Related summarization works

Other works using a similar graph data structure

  • Discovering Related Clinical Concepts – This paper focuses on using a concept graph similar to the Opinosis-Graph to mine clinical concepts that are highly related. For example, the drug advair is highly related to concepts like inhaler, puff, diskus, singulair, tiotropium, albuterol, combivent, spiriva. Such concepts are easily discovered using the Concept-Graph in this paper.
  • Multi-sentence compression: Finding shortest paths in word graphs
    Katja’s work was used to summarize news (google news) for both English and Spanish while Opinosis was evaluated on user reviews from various sources (English only). She studies the informativeness and grammaticality of sentences and in a similar way we evaluate these aspects by studying how close the Opinosis summaries are compared to the human composed summaries in terms of information overlap and readability (using a human assessor
  • Peilin Yang and Hui Fang – Contextual Suggestion – Another related work uses the Opinosis Algorithm to extract terms from reviews for the purpose of Contextual Suggestion. This was done as  part of the Contextual Suggestion TREC Task. It turns out that Yang and Fang had the highest rank and MRR scores in this track. Their paper can be found here: An Opinion-aware Approach to Contextual Suggestion. The details of the TREC run can be found here: Overview of the TREC 2013 Contextual Suggestion Track.

    Opinosis Presentation Slides


    Citation

Abstractive Summarization Papers

While much work has been done in the area of extractive summarization, there has been limited study in abstractive summarization as this is much harder to achieve (going by the definition of true abstraction). Existing work in abstractive summarization may not be truly abstractive, and even if it is, it may not be fully automated. This page contains a very small collection of  summarization methods that are non-extractive. Please take note that this is by no means an exhaustive list, it would just be a starting point.

Related Papers

  1. UNL Document Summarization
    Virach S, Potipiti T and Charoenporn T. UNL document summarization. Proceedings of the first in-ternational workshop on multimedia annotation (MMA2001), Tokyo, Japan, January 2001
  2. FRUMP
    DeJong G. An Overview of the FRUMP system In Strategies for natural language processing, W. G. Lehnert and M. H. Ringle (eds.), 149–176. Hillsdale, New Jersey: Erlbaum 1982
  3. SUMMONS
    Radev D R and McKeown K R. Generating natural language summaries from multiple on-line sources. Computational Linguistics 1998; 24(3):469–500
    Summary: Involves shallow syntactic and semantic analysis, concept identification, and text regeneration. Method was developed through the study of a corpus of abstracts written by professional abstractors.
  4. GISTEXTER
    Harabagiu, S., Lacatusu, F. Generating Single and Multi-Document Summaries with GISTEXTER. In Workshop on Text Summarization (In Conjunction with the ACL 2002 and including the DARPA/NIST sponsored DUC 2002 Meeting on Text Summarization) Philadelphia, USA, 2002.
  5. SUMUM  Generating Indicative-Informative Summaries with SumUM (Horacio Saggion , Guy Lapalm)
  6. OPINOSIS
    Ganesan, K. A., C. X. Zhai, and J. Han, “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions”, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), Beijing, China, pp. 340–348, 2010.
    Summary: Uses a word graph data structure for each input document to find promising paths that act as candidate summaries. Leverages 3 key properties of the graph that helps generate summaries that are more abstractive (shallow abstraction). This method is syntax lean – uses only POS tags.

 

A Step by Step Guide to Working with ROUGE for Summary Evaluation

I have been trying to use the ROUGE Perl toolkit to evaluate one of my research projects but have been finding it really hard to get proper documentation on its usage. So I decided to piece together some information that may be helpful to others. Actually, I learnt the basics of using ROUGE from the MEAD documentation! If you have successfully installed ROUGE and  need to set up the evaluation mechanism, read on. If you need information on how to install ROUGE, go through the README file in the the ROUGE package. Basically, the trick is in the successful installation of the perl modules. If you need to understand at a high level how ROUGE works as a metric, you can read this article.

First off, to evaluate a summarization system you have two types of summaries. One is the system generated summaries that is referred to as ‘peer summaries’  and then you have the reference summaries or gold standard summaries known as ‘model summaries’. Reference summaries are usually written by humans and it has been shown that using multiple reference summaries yields in more reliable ROUGE scores than using just one reference summary. Note that ROUGE can handle any number of peer summaries (if generated by multiple systems) and any number of model summaries. All you have to really do, is specify all of this in an xml file.

New! Edit 2018 – Java Based ROUGE: For those who cannot get the perl version to work on Windows or your Linux Machine, you can use the ROUGE 2.0 package with unicode support and full documentation. The settings are super simplified and there is no special formatting needed for the reference and system summaries.

Getting Started

To get started, create a directory structure as folllows anywhere on your system:

  • <your-project-name>/
    • models/  —- contains all reference summaries that will be used for evaluation. Each file can be identified by the the the set of documents for which the summary was generated. Say, a summary was generated for document set 3, by human 2. Then the file name can be something like human2_doc3.html
    • systems/  —  contains all system generated summaries. Each file can be identified by the id of the system and the set of documents for which the summary was generated. Say, a summary was generated for document set 3, by system 1. Then the file name can be something like system1_doc3.html.
    • settings.xml — This file is the core file that specifies which peer summaries should use which model summaries for evaluation.  Detailed explanation as below.

How to format settings.xml ?

Here I will only explain the basic syntax for formatting the core settings file. I am assuming that this file will be generated using some script so the formatting is really important. To learn how to format the system and model files look at the examples in  <ROUGE_HOME>/sample-test/SL2003 or check out the samples below.

  1. The file should typically start with: <ROUGE_EVAL version="1.55">
  2. Then, for each summarization task you need to enclose it between these tags:<EVAL ID="TASK_#">TASK_DETAILS</EVAL>
  3. Within this enclosure you need to specify where to find the model and peer summaries. So make sure to include these tags:
    <MODEL-ROOT> parent_dir_to_model_files  </MODEL-ROOT>
    <PEER-ROOT>  parent_dir_to_peer_files  </PEER-ROOT>
  4. Followed by:  <INPUT-FORMAT TYPE="SEE">  </INPUT-FORMAT>
  5. For each summarization task, we need to specify the system summaries and the reference summaries to evaluate against. Here is an example:<PEERS> – list of system generated summaries for the same task
    <P ID=”1″>1.html</P> — system 1’s summary found in 1.html
    <P ID=”2″>2.html</P> — system 2’s summary found in 2.html
    </PEERS><MODELS>
    <M ID=”0″>0.html</M> — reference summary 1 for this task is in 0.html
    <M ID=”1″>1.html</M> — reference summary 2 for this task is in 1.html
    </MODELS>

For the next summarization task, repeat from point 2. Finally, finish by closing the XML tag with </ROUGE_EVAL>

How to format my model / peer summaries ?

The format that I use, is usually html. I am not sure if other formats are supported.The same format is used for both your reference/model/gold standard summaries and your peer summaries/system summaries. You may have to generate this using  a script. Each summary will have its own file and each sentence from each summary will have to be on its own line. You may thus have to segment your summaries (if not already segmented). Here is an example of a model summary in a format that ROUGE understands. It has 3 sentences, as indicated by the id.

<html>
<head><title>filename_here</title> </head>
<body bgcolor=”white”>
<a name=”1″>[1]</a> <a href=”#1″ id=1>This unit is generally quite accurate.  </a>
<a name=”2″>[2]</a> <a href=”#2″ id=2>Set-up and usage are considered to be very easy. </a>
<a name=”3″>[3]</a> <a href=”#3″ id=3>The maps can be updated, and tend to be reliable.</a> </body>
</html>

Where to obtain gold-standard/model  summaries ?

Well this really depends on your application. If you have a handful of documents that you need to summarize, then, just get your peers to write summaries for you. About 3-5 would be a good number in my opinion. Just give the summary writers very general instructions and make sure you influence them in no way. If you have a large number of documents to summarize you could consider using an online workforce like Amazon’s Mechanical Turk.

How to run my evaluation tasks ?

Once you have prepared the system summaries, model summaries and settings file as described above, its actually pretty straightforward. Here is an example:

./ROUGE-1.5.5.pl -e data -f A -a -x -s -m -2 -4 -u < your-project-name>/settings.xml

This example is to evaluate using ROUGE-SU4

  • -e specifies the location of the data directory that comes with ROUGE. This is mandatory, because it contains the stop-words files within it.
  • -a specifies which systems you want to evaluate
  • -m specifies the usage of stemming
  • -2 -4 -u says use ROUGE SU with a skip-bigrams of 4 and also compute unigram scores
  • -x is to say that you do not want ROUGE-L to be computed (this is computed by default)

To get a list of adjustable parameters, just run ./ROUGE-1.5.5.pl without any parameters.

How do I analyze my ROUGE scores?

ROUGE produces output in a format that cannot be easily analyzed.You have to essentially write a script to parse the results into a format suitable to you. I have written a perl script to parse the results into a CSV format. It allows you to visualize and analyze your results in Open Office or Excel. All you need to do is pipe all your ROUGE results to a text file and provide that as input to the perl script. Download the tool here.

Jackknifing with ROUGE

Jackknifing is typically used when human summaries need to be comparable with system generated ones. This is assuming you have multiple human (reference/model) summaries.  ROUGE used to internally implement jackknifing, but this was removed as of version 1.5.5. I do not know the rationale for this but if you need to implement it its pretty simple.

Say you have K reference summaries, you compute ROUGE scores over K sets of K-1 reference summaries. Which means, you leave out one reference summary each time. If you are attempting to compute human performance, then the reference summary that you leave out, will temporarily be your ‘system’ or ‘peer’ summary. Once you have the K ROUGE scores, you just need to average it to get the final ROUGE score. The Rouge2CSV perl tool, will help you combine and average these scores if you pipe all your ROUGE results to one file.

I am Having WordNet Exceptions

The Wordnet stuff seems to be a problem that a lot of people run into. You essentially need to build a link to the WordNet exception. This was the solution given by Chin Yew Lin :

cd data/WordNet-2.0-Exceptions/
./buildExeptionDB.pl . exc WordNet-2.0.exc.db

cd ../
ln -s WordNet-2.0-Exceptions/WordNet-2.0.exc.db WordNet-2.0.exc.db

Where do I find latest version of ROUGE?