sentiment analysis

Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions

Ganesan, Kavita, ChengXiang Zhai, and Evelyne Viegas. “Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions.” Proceedings of the 21st international conference on World Wide Web. ACM, 2012.


This paper presents a new unsupervised approach to generating ultra-concise summaries of opinions. We formulate the problem of generating such a micropinion summary as an optimization problem, where we seek a set of concise and non-redundant phrases that are readable and represent key opinions in text. We measure representativeness based on a modified mutual information function and model readability with an n-gram language model. We propose some heuristic algorithms to efficiently solve this optimization problem. Evaluation results show that our unsupervised approach outperforms other state of the art summarization methods and the generated summaries are informative and readable.


Related Articles

Micropinion Generation Presentation Slides

View more PowerPoint from Kavita Ganesan


Linguistic Understanding of Complaints and Praises in User Reviews

Ganesan, Kavita, and Guangyu Zhou. “Linguistic Understanding of Complaints and Praises in User Reviews.” Proceedings of NAACL-HLT. 2016.

Gist of paper

This is a short study paper that categorizes positive and negative review sentences into 4 categories: positive only, praise, negative only and complaint. The intuition is that praise sentences and complaints tend to be more informative than plain positive only or negative only sentences. This paper thus tries to understand the properties of such text that we consider as complaints and praises. Our analysis shows several interesting findings including:

  •  complaints tend to have more past tense than the other 3 categories
  •  complaints and praises are generally longer and contain more nouns than positive only or negative only sentences
  •  praise sentences tend to use more adjectives than other types of sentences


  • Paper
  • Dataset – coming soon!


Mining tag clouds and emoticons behind community feedback

Ganesan, K. A., N. Sundaresan, and H. Deo, “Mining tag clouds and emoticons behind community feedback“, WWW ’08: Proceeding of the 17th international conference on World Wide Web, Beijing, China, ACM, pp. 1181–1182, 2008.


In this paper we describe our mining system which automatically mines tags from feedback text in an eCommerce scenario. It renders these tags in a visually appealing manner. Further, emoticons are attached to mined tags to add sentiment to the visual aspect.

Download Paper

Related Articles

Leveraging large amounts of opinions for decision making

Opinion Driven Decision Support System (ODSS) refers to the use of large amounts of online opinions to facilitate business and consumer decision making. The idea is to combine the strengths of search technologies with opinion mining and analysis tools to provide a synergistic decision making platform. The research and engineering problems related to developing such a system include :
  1. opinion acquisition
  2. opinion based search
  3. opinion summarization
  4. presentation of results
Opinions in this case can be aggregation of user reviews, blog comments, facebook status updates, Tweets and so on. Essentially any opinion containing texts on specific topics or entities qualify as candidates for building an ODSS platform. Here’s a description of some of the research and engineering problems towards developing an ODSS platform:

1. Search Capabilities Based on Opinions

The goal of opinion-based search is to help users find entities of interest based on their key requirements. Since a user is often interested in choosing an entity based on opinions on that entity, a system that ranks entities based on a user’s personal preferences would provide a more direct support for a user’s decision-making task. For example, in the case of finding hotels at a destination, a user may only want to consider hotels where other people thought was clean. By finding and ranking hotels based on how well it satisfies such a requirement would significantly reduce the number of entities in consideration, facilitating decision making. Unlike traditional search, the query in this case is a set of preferences and the results is a set of entities that match these preferences. The challenge is to accurately match the user’s preferences with existing opinions in order to recommend the best entities. This special ranking problem is referred to as Opinion-Based Entity Ranking. Many of the existing opinion mining techniques can be potentially used for this new ranking task. I have explored information retrieval based techniques to specifically solve this ranking problem  and there has been a few follow-up works (from other groups) trying other approaches.

2. Opinion Summarization (i.e. Sentiment Analysis + Text Summarization)

Opinion summaries play a critical role in helping users better analyze entities in consideration (e.g. product, physician, cars, politican). Users are often looking out for major concerns or advantages in selecting a specific entity. Thus, a summary that can quickly highlight the key opinions about the entity would significantly help exploration of entities and aid decision making. The field of opinion summarization has been long explored with most techniques being focused on generating structured summaries on a fixed set of topics. These are referred to as stuctured summaries. In the last few years, textual summaries of opinions have been gaining more and more popularity. Bing Liu’s Opinion Mining Tutorial covers some of these recent works or you can refer to this article point (5).

3. Opinion Acquisition (i.e. Opinion or Sentiment Crawling)

To support accurate search and analysis based on opinions, opinionated content is imperative. Relying on opinions from just one specific source not only makes the information unreliable, but also incomplete due to variations in opinions as well as potential bias present in a specific source. Although many applications rely on large amounts of opinions, there has been very limited work on collecting and integrating a complete set of opinions. I recently explored a very simple method to collecting large amounts of opinions on arbitrary entities.

The idea of an Opinion Driven Decision Support (ODSS) was developed as part of my thesis. For more information on this please see Kavita’s thesis.

Opinosis Summarization Demo Software (Command Line Jar)

The Opinosis Summarizer Software is a demo version of a summarizer that generates concise abstractive summaries of highly redundant text. It  was primarily used to summarize opinions, and thus it can  be regarded as a opinion summarization software. However, since the underlying approach is general and  assumes no domain knowledge, with a few minor tweaks it can be used on any highly redundant text (e.g. twitter comments, comments on blog or news articles).  Note that this requires code changes. The demo version mainly works with user reviews.

The Opinosis Summarizer is a simple jar file. All it requires is that you have a work directory defined. This directory will hold all the input files, output files and any other resources. The following instructions will guide you through generating summaries using the Opinosis Summarizer. Please also note that the jar file has to be run from the command line and cannot be integrated into your existing code base. The Web API version will allow you to do that.

Platform: platform independent
Required Software: JRE 1.6 and above
License: Demo


Opinosis Summarizer Usage

Download Library & Set-Up Directory Structure

Once you have unpacked the zip file, you will see the following items in the directory:

opinosis_lib/ - Contains helper jar files
opinosis.jar – The library that performs the summarization task
documentation.pdf – Set-Up instructions
opinosis_sample – Sample directory structure of the work directory.

Now you need to define a new work directory similar to opinosis_sample. You must have the following directory structure.

input/  - All the text to be summarized. One file per document.
output/ - Summarization Results (opinosis summaries)
etc/    - Other resources like will be stored here.

Now copy the file from opinosis_sample/etc/ into <your_work_folder>/etc/. This is the file that would contain all the application specific settings. See below on how to change these settings.

Set-up Input Files

Currently, Opinosis only accepts POS annotated sentences as the input. We assume that each input file contains a set of related sentences (one line per sentence) with POS annotations in the following format:

"that/DT has/VBZ never/RB happened/VBN before/RB ./."
"It/NN never/RB happened/VBN before/RB ./."
"xx/NN yy/VB ......."

To generate POS annotations in the above format, you could use the following POS Tagger. Each input file would represent one summarization task, so it should contain a set of clustered sentences. For example one file for all sentences related to the “battery life of an ipod nano”, and another for all sentences related to the “ease of use of the ipod nano”. Please make sure that you have sufficient redundancies in each input file (i.e. > 60 related sentences).

Running the Opinosis Summarizer

Assuming you have gone over the first two steps above, to start generating summaries type the following:

java -jar opinosis.jar -b <path_to_work_folder>
-b: base directory where input and output directories are found (work directory).

All the Opinosis generated summaries will be found in <path_to_work_folder>/output/. If you want to run the examples from opinosis_sample/, execute the following:

java -jar opinosis.jar -b opinosis_sample/

Opinosis Summarizer Parameter Settings

To change the various properties for summary generation, just look into the file found in the <your_work_folder>/etc/ directory. This file contains a list of configurable parameters. Here is an explanation of these parameters:

Opinosis Parameter Settings
redundancy : Controls the minimum redundancy requirement.  This enforces that a selected path contains at least the minimum specified redundancy. This has to be an absolute value. Setting this value to more than 2 is not recommended unless you have very high redundancies. This corresponds to sigma_r in the paper.
gap : Controls the minimum gap allowed between 2 adjacent nodes. If you set this to a very large value, then your summaries may have grammatical issues. The setting recommended is between 2 and 5. The minimum acceptable setting is 2, and the default is 3. This corresponds to sigma_gap in the paper and has to be an absolute value.
max_summary : The number of candidates to select as the summary. This corresponds to the summary size, sigma_ss in the paper. This has to be an absolute value.
scoring_function :
Which scoring functions to use?
1-    only redundancy
2-    2- redundancy & path length
3-    3- redundancy & log(path length) — default (and recommended)
collapse :
Should we collapse structures? Recall may be low when structures are not collapsed. Possible values are true or false
run_id :
This is just to give the current run a logical name. Any string describing the run would be ideal.