If you are looking for user review data sets for opinion analysis / sentiment analysis tasks, there are quite a few out there. These dataset below contain reviews from Rotten Tomatoes, Amazon, TripAdvisor, Yelp, Edmunds.com and so on.Here are some of the many dataset available out there:
Dataset | Domain | Description | Courtesy Of |
---|---|---|---|
Movie Reviews Data Set | Movies | This is a collection of movie reviews used for various opinion analysis tasks; You would find reviews split into positive and negative classes as well as reviews split into subjective and objective sentences.
This dataset was initially used to predict polarity ratings (+ve/-ve). |
Pang & Lee |
Multi-Domain Sentiment Dataset | Products (books, dvds..) | Product reviews from Amazon.com covering various product types (such as books, dvds, musical instruments). The data has been split into positive and negative reviews. There are more than 100,000 reviews in this dataset. The reviews come with corresponding rating stars.
This dataset was initially used to predict polarity ratings (+ve/-ve). |
Blitzer et. al |
LARA Review Dataset | Hotels & Products | Reviews from Amazon.com and TripAdvisor. It contains attributes such as author name, content, date and the ratings.
This dataset was initially used to decompose user reviews to preference rating on aspects. |
Wang et. al |
Opinosis Review Dataset | Hotels, Cars, Electronics | Topic related sentences extracted from user reviews. You will find 51 topics with approximately 100 sentences each (on average). The reviews were obtained from multiple sources – Tripadvisor (hotels), Edmunds.com (cars) and Amazon.com (various electronics).
This dataset was used for text summarization of opinions. |
Ganesan et. al |
OpinRank Tripadvisor and Edmunds.com Dataset | Hotels & Cars | Reviews of cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews). For cars, the extracted fields include dates, author names, favorites and the full textual review. For hotels, the fields include date, review title and the full review and also includes gold standard judgments for ranking.
This dataset was initially used for opinion-based entity ranking. |
Ganesan & Zhai |
Restaurant Review Dataset | Restaurants | Contains a total 52077 reviews. The fields contain rating information, review counts, percent and cuisine type | Elhadad |
SNAP Review Dataset | Products | Contains a 34,686,770 Amazon user reviews from 6,643,669 users.
This dataset was initially used for recommendation systems. |
McAuley |
MovieLens Dataset | Movies |
Please note that the review text is not available
|
GroupLens Research Project at the University of Minnesota. |
Micropinion Generation Dataset (CNET) | Electronics | 330 review texts. The reviews are on products from various categories like tv, cell phones, gps etc. This dataset was used for text summarization of opinions. |
Ganesan & Zhai |