What is ROUGE and how it works for evaluation of summaries?

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It is essentially of a set of metrics for evaluating automatic summarization of texts as well as machine translation. It works by comparing an automatically produced summary or translation against a set of reference summaries (typically human-produced). This article provides an intuitive explanation of how ROUGE works. 

What is text similarity?

When talking about text similarity, different people have a slightly different notion on what text similarity means. In essence, the goal is to compute how ‘close’ two pieces of text are in (1) meaning or (2) surface closeness. The first is referred to as semantic similarity and the latter is referred to as lexical similarity. Although the methods for lexical similarity …

What is text similarity? Read More »

Leveraging large amounts of opinions for decision making

Opinion Driven Decision Support System (ODSS) refers to the use of large amounts of online opinions to facilitate business and consumer decision making. The idea is to combine the strengths of search technologies with opinion mining and analysis tools to provide a synergistic decision making platform. The research and engineering problems related to developing such …

Leveraging large amounts of opinions for decision making Read More »

What are Stop Words?

When working with text mining applications, we often hear of the term “stop words” or “stop word list” or even “stop list”. Stop words are basically a set of commonly used words in any language, not just English. The reason why stop words are critical to many applications is that, if we remove the words that …

What are Stop Words? Read More »

Abstractive Summarization Papers

While much work has been done in the area of extractive summarization, there has been limited study in abstractive summarization as this is much harder to achieve (going by the definition of true abstraction). This page contains a very small collection of  summarization methods that are non-extractive…

User Review Datasets

If you are looking for user review data sets for opinion analysis / sentiment analysis tasks, there are quite a few out there. These dataset below contain reviews from Rotten Tomatoes, Amazon, TripAdvisor, Yelp, Edmunds.com and so on.Here are some of the many dataset available out there: Dataset Domain Description Courtesy Of Movie Reviews Data …

User Review Datasets Read More »

Useful tips on using MEAD Summarization Toolkit

These are some handy notes for MEAD. What is MEAD? MEAD is a publicly available framework for summarization. It is not really an ‘algorithm’. By default (I guess when it was first implemented) it was developed based on a centroid based approach…