Sentiment Analysis

OpinoFetch: A Practical and Efficient Approach to Collecting Opinions on Arbitrary Entities


The abundance of opinions on the Web is now becoming a critical source of information in a variety of application areas such as business intelligence, market research and online shopping. Unfortunately, due to the rapid growth of online content, there is no one source to obtain a comprehensive set of opinions about a specific entity or a topic, making access to such content severely limited. While previous works have been focused on mining and summarizing online opinions, there is limited work on exploring the automatic collection of opinion content on the Web. In this paper, we propose a lightweight and practical approach to collecting opinion containing pages, namely review pages on the Web for arbitrary entities. We leverage existing Web search engines and use a novel information network called the FetchGraph to efficiently obtain review pages for entities of interest. Our experiments in three different domains show that our method is more effective than plain search engine results and we are able to collect entity specific review pages efficiently with reasonable precision and accuracy.


The Idea

The goal of this paper is to discover review content from arbitrary sources. The intuition here is that, reviews are often scattered and looking into just a few sources would often result in data sparsity problems. The OpinoFetch approach makes no assumption on the type of entity that it can gather reviews on or the sources that should show up for each entity, thus making it a very general approach. In one run, you could be looking for all reviews related to cars, in the next run it could be all review content related to restaurants. In the OpinoFetch paper, we looked into gathering review pages from three distinct sources namely electronics, hotels and attractions. This is an example of sites discovered for various entities:
site distribution


Opinion-Based Entity Ranking

Ganesan, Kavita, and Chengxiang Zhai. “Opinion-based entity ranking.” Information retrieval 15.2 (2012): 116-150.


The deployment of Web 2.0 technologies has led to rapid growth of various opinions and reviews on the web, such as reviews on products and opinions about people. Such content can be very useful to help people find interesting entities like products, businesses and people based on their individual preferences or tradeoffs. Most existing work on leveraging opinionated content has focused on integrating and summarizing opinions on entities to help users better digest all the opinions. In this paper, we propose a different way of leveraging opinionated content, by directly ranking entities based on a user’s preferences. Our idea is to represent each entity with the text of all the reviews of that entity. Given a user’s keyword query that expresses the desired features of an entity, we can then rank all the candidate entities based on how well opinions on these entities match the user’s preferences. We study several methods for solving this problem, including both standard text retrieval models and some extensions of these models. Experiment results on ranking entities based on opinions in two different domains (hotels and cars) show that the proposed extensions are effective and lead to improvement of ranking accuracy over the standard text retrieval models for this task.




Comprehensive Review of Opinion Summarization

Comprehensive Review Of Opinion Summarization (Opinion Mining Survey)Kim, Hyun Duk, Ganesan Kavita A., Sondhi Parikshit, and Zhai ChengXiang , (2011)

This survey zooms into recent research in the area of opinion mining summarization, which is related to generating effective summaries of opinions so that users can get a quick understanding of the underlying sentiments. Since there are various formats of summaries, the survey breaks down the approaches into the commonly studied aspect-based summarization and non-aspect based ones (which includes visualization, contrastive summarization and text summarization). This survey also has a listing of opinion related dataset and available demos.

Links and Downloads



Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions

Ganesan, Kavita, ChengXiang Zhai, and Evelyne Viegas. “Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions.” Proceedings of the 21st international conference on World Wide Web. ACM, 2012.


This paper presents a new unsupervised approach to generating ultra-concise summaries of opinions. We formulate the problem of generating such a micropinion summary as an optimization problem, where we seek a set of concise and non-redundant phrases that are readable and represent key opinions in text. We measure representativeness based on a modified mutual information function and model readability with an n-gram language model. We propose some heuristic algorithms to efficiently solve this optimization problem. Evaluation results show that our unsupervised approach outperforms other state of the art summarization methods and the generated summaries are informative and readable.


Related Articles

Micropinion Generation Presentation Slides

View more PowerPoint from Kavita Ganesan


Linguistic Understanding of Complaints and Praises in User Reviews

Ganesan, Kavita, and Guangyu Zhou. “Linguistic Understanding of Complaints and Praises in User Reviews.” Proceedings of NAACL-HLT. 2016.

Gist of paper

This is a short study paper that categorizes positive and negative review sentences into 4 categories: positive only, praise, negative only and complaint. The intuition is that praise sentences and complaints tend to be more informative than plain positive only or negative only sentences. This paper thus tries to understand the properties of such text that we consider as complaints and praises. Our analysis shows several interesting findings including:

  •  complaints tend to have more past tense than the other 3 categories
  •  complaints and praises are generally longer and contain more nouns than positive only or negative only sentences
  •  praise sentences tend to use more adjectives than other types of sentences


  • Paper
  • Dataset – coming soon!


Mining tag clouds and emoticons behind community feedback

Ganesan, K. A., N. Sundaresan, and H. Deo, “Mining tag clouds and emoticons behind community feedback“, WWW ’08: Proceeding of the 17th international conference on World Wide Web, Beijing, China, ACM, pp. 1181–1182, 2008.


In this paper we describe our mining system which automatically mines tags from feedback text in an eCommerce scenario. It renders these tags in a visually appealing manner. Further, emoticons are attached to mined tags to add sentiment to the visual aspect.

Download Paper

Related Articles

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions


We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

Download Links

Opinosis External Discussions/Usage

The Big Idea

The Opinosis Summarization framework focuses on generating very short abstractive summaries from large amounts of text. These summaries can resemble micropinions or “micro-reviews” that you see on sites like twitter and four squares. The idea of the algorithm is to use a word graph data structure referred to as the Opinosis-Graph to represent the text to be summarized. Then, the resulting graph is repeatedly explored to find meaningful paths which in turn becomes candidate summary phrases. The Opinosis summarizer is considered a “shallow” abstractive summarizer as it uses the original text itself to generate summaries (this makes it shallow) but it can generate phrases that were previously not seen in the original text because of the way paths are explored (and this makes it abstractive rather than purely extractive). The summarization framework was evaluated on an opinion (user review) dataset. The approach itself is actually very general in that, it can be applied to any corpus containing high amounts of redundancies, for example, Twitter comments or user comments on blog/news articles.

Here is another example of an Opinosis summary for a Car (Acura 2007) generated using the OpinRank Edmunds data set. :

Additional Thoughts

While most research projects in data mining and NLP focus on technical complexity, the focus of Opinosis was its practicality, in that it uses very shallow representation of text, relying mostly on redundancy to help generate summaries. This is not too much to ask given that we live in an era of big data, and we have ample user reviews on the Web to work with. Even though the Opinosis paper uses part-of-speech tags in its graph representation, you don’t have to use this at all and the algorithm will still work fine as long as you have sufficient volume of reviews and you make a few tweaks in finding sentence breaks.

Related summarization works

Other works using a similar graph data structure

  • Discovering Related Clinical Concepts – This paper focuses on using a concept graph similar to the Opinosis-Graph to mine clinical concepts that are highly related. For example, the drug advair is highly related to concepts like inhaler, puff, diskus, singulair, tiotropium, albuterol, combivent, spiriva. Such concepts are easily discovered using the Concept-Graph in this paper.
  • Multi-sentence compression: Finding shortest paths in word graphs
    Katja’s work was used to summarize news (google news) for both English and Spanish while Opinosis was evaluated on user reviews from various sources (English only). She studies the informativeness and grammaticality of sentences and in a similar way we evaluate these aspects by studying how close the Opinosis summaries are compared to the human composed summaries in terms of information overlap and readability (using a human assessor
  • Peilin Yang and Hui Fang – Contextual Suggestion – Another related work uses the Opinosis Algorithm to extract terms from reviews for the purpose of Contextual Suggestion. This was done as  part of the Contextual Suggestion TREC Task. It turns out that Yang and Fang had the highest rank and MRR scores in this track. Their paper can be found here: An Opinion-aware Approach to Contextual Suggestion. The details of the TREC run can be found here: Overview of the TREC 2013 Contextual Suggestion Track.

    Opinosis Presentation Slides


Leveraging large amounts of opinions for decision making

Opinion Driven Decision Support System (ODSS) refers to the use of large amounts of online opinions to facilitate business and consumer decision making. The idea is to combine the strengths of search technologies with opinion mining and analysis tools to provide a synergistic decision making platform. The research and engineering problems related to developing such a system include :
  1. opinion acquisition
  2. opinion based search
  3. opinion summarization
  4. presentation of results
Opinions in this case can be aggregation of user reviews, blog comments, facebook status updates, Tweets and so on. Essentially any opinion containing texts on specific topics or entities qualify as candidates for building an ODSS platform. Here’s a description of some of the research and engineering problems towards developing an ODSS platform:

1. Search Capabilities Based on Opinions

The goal of opinion-based search is to help users find entities of interest based on their key requirements. Since a user is often interested in choosing an entity based on opinions on that entity, a system that ranks entities based on a user’s personal preferences would provide a more direct support for a user’s decision-making task. For example, in the case of finding hotels at a destination, a user may only want to consider hotels where other people thought was clean. By finding and ranking hotels based on how well it satisfies such a requirement would significantly reduce the number of entities in consideration, facilitating decision making. Unlike traditional search, the query in this case is a set of preferences and the results is a set of entities that match these preferences. The challenge is to accurately match the user’s preferences with existing opinions in order to recommend the best entities. This special ranking problem is referred to as Opinion-Based Entity Ranking. Many of the existing opinion mining techniques can be potentially used for this new ranking task. I have explored information retrieval based techniques to specifically solve this ranking problem  and there has been a few follow-up works (from other groups) trying other approaches.

2. Opinion Summarization (i.e. Sentiment Analysis + Text Summarization)

Opinion summaries play a critical role in helping users better analyze entities in consideration (e.g. product, physician, cars, politican). Users are often looking out for major concerns or advantages in selecting a specific entity. Thus, a summary that can quickly highlight the key opinions about the entity would significantly help exploration of entities and aid decision making. The field of opinion summarization has been long explored with most techniques being focused on generating structured summaries on a fixed set of topics. These are referred to as stuctured summaries. In the last few years, textual summaries of opinions have been gaining more and more popularity. Bing Liu’s Opinion Mining Tutorial covers some of these recent works or you can refer to this article point (5).

3. Opinion Acquisition (i.e. Opinion or Sentiment Crawling)

To support accurate search and analysis based on opinions, opinionated content is imperative. Relying on opinions from just one specific source not only makes the information unreliable, but also incomplete due to variations in opinions as well as potential bias present in a specific source. Although many applications rely on large amounts of opinions, there has been very limited work on collecting and integrating a complete set of opinions. I recently explored a very simple method to collecting large amounts of opinions on arbitrary entities.

The idea of an Opinion Driven Decision Support (ODSS) was developed as part of my thesis. For more information on this please see Kavita’s thesis.