OpinoFetch: A Practical and Efficient Approach to Collecting Opinions on Arbitrary Entities

OpinoFetch: A Practical and Efficient Approach to Collecting Opinions on Arbitrary Entities, Ganesan, Kavita A., and Zhai ChengXiang , Information Retrieval Journal, (2015)


The abundance of opinions on the Web is now becoming a critical source of information in a variety of application areas such as business intelligence, market research and online shopping. Unfortunately, due to the rapid growth of online content, there is no one source to obtain a comprehensive set of opinions about a specific entity or a topic, making access to such content severely limited. While previous works have been focused on mining and summarizing online opinions, there is limited work on exploring the automatic collection of opinion content on the Web. In this paper, we propose a lightweight and practical approach to collecting opinion containing pages, namely review pages on the Web for arbitrary entities. We leverage existing Web search engines and use a novel information network called the FetchGraph to efficiently obtain review pages for entities of interest. Our experiments in three different domains show that our method is more effective than plain search engine results and we are able to collect entity specific review pages efficiently with reasonable precision and accuracy.

The Big Idea

OpinoFetch is a practical framework for collecting review content for arbitrary entities. Assuming a business intelligence use case, such as wanting user reviews for all Apple products, OpinoFetch can obtain review pages from across the Web for those products of interest. The same framework can then also be applied to obtaining review pages for businesses  (e.g. reviews for a set of restaurants) or reviews on people (e.g. reviews about physicians). The framework is flexible in that you can specify a mixed set of entities from very different domains. The OpinoFetch framework is capable of fetching blog pages containing reviews, review pages from user or expert review sites, reviews from e-commerce sites and others. 


  • Source code
  • Dataset