OpinoFetch: A Practical and Efficient Approach to Collecting Opinions on Arbitrary Entities


The abundance of opinions on the Web is now becoming a critical source of information in a variety of application areas such as business intelligence, market research and online shopping. Unfortunately, due to the rapid growth of online content, there is no one source to obtain a comprehensive set of opinions about a specific entity or a topic, making access to such content severely limited. While previous works have been focused on mining and summarizing online opinions, there is limited work on exploring the automatic collection of opinion content on the Web. In this paper, we propose a lightweight and practical approach to collecting opinion containing pages, namely review pages on the Web for arbitrary entities. We leverage existing Web search engines and use a novel information network called the FetchGraph to efficiently obtain review pages for entities of interest. Our experiments in three different domains show that our method is more effective than plain search engine results and we are able to collect entity specific review pages efficiently with reasonable precision and accuracy.


The Idea

The goal of this paper is to discover review content from arbitrary sources. The intuition here is that reviews are often scattered and looking into just a few sources would often result in data sparsity problems. The OpinoFetch approach makes no assumption on the type of entity that it can gather reviews on or the sources that should show up for each entity, thus making it a very general approach. In one run, you could be looking for all reviews related to cars, in the next run it could be all review content related to restaurants. In the OpinoFetch paper, we looked into gathering review pages from three distinct sources namely electronics, hotels and attractions. This is an example of sites discovered for various entities:

site distribution


author = {Ganesan, Kavita and Zhai, ChengXiang},
year = {2015},
month = {10},
pages = {},
title = {OpinoFetch: a practical and efficient approach to collecting opinions on arbitrary entities},
volume = {18},
booktitle = {Information Retrieval Journal}