Dataset for Text Mining and NLP tasks

Stack Overflow Dataset

This dataset contains 20,000 stack overflow questions in json, with 19 attributes including post body, title, tags, and etc.

User Review Datasets

If you are looking for user review data sets for opinion analysis / sentiment analysis tasks, there are quite a few out there. These dataset below contain reviews from Rotten Tomatoes, Amazon, TripAdvisor, Yelp, and so on.Here are some of the many dataset available out there: Dataset Domain Description Courtesy Of Movie Reviews Data …

User Review Datasets Read More »

Opinosis Dataset – Topic related review sentences

This dataset contains sentences extracted from user reviews on a given topic. Example topics are “performance of Toyota Camry” and “sound quality of ipod nano”, etc. In total there are 51 such topics  with each topic having approximately 100 sentences (on average). The reviews were obtained from various sources – Tripadvisor (hotels), (cars) and (various electronics).  This dataset was used for the following automatic text summarization project .