software

Opinosis Text Summarization Web API

The Opinosis REST API is available to all academic researchers. You can use a command line tool like cURL to access the API or you can also easily access the API from any programming language using HTTP request and response libraries.

The nice thing with using the REST API version versus the Java jar file is that you can integrate the API into your code base, making evaluation and the ability to build on the API output much easier. Please follow these steps to start using the API:

Steps for Consuming API

  • Create a Mashape account (Mashape manages API keys and access) and subscribe to the basic plan of this API. The usage is free up to a certain limit.
  • You can use the examples below to start using the Opinosis Web API
  • You can use this page to learn how to set the opinosis parameters.

Example JSON Input & Output:

The Opinosis Web API accepts a JSON input and returns a JSON output.

Here is the sample request: JSON request.
This is the sample response for this request: JSON response 

Example JSON Request using cURL

Example JSON Request using Python

Example JSON Request using PHP

 

Opinosis Summarization Demo Software (Command Line Jar)

The Opinosis Summarizer Software is a demo version of a summarizer that generates concise abstractive summaries of highly redundant text. It  was primarily used to summarize opinions, and thus it can  be regarded as a opinion summarization software. However, since the underlying approach is general and  assumes no domain knowledge, with a few minor tweaks it can be used on any highly redundant text (e.g. twitter comments, comments on blog or news articles).  Note that this requires code changes. The demo version mainly works with user reviews.

The Opinosis Summarizer is a simple jar file. All it requires is that you have a work directory defined. This directory will hold all the input files, output files and any other resources. The following instructions will guide you through generating summaries using the Opinosis Summarizer. Please also note that the jar file has to be run from the command line and cannot be integrated into your existing code base. The Web API version will allow you to do that.

Platform: platform independent
Required Software: JRE 1.6 and above
License: Demo

Links

Opinosis Summarizer Usage


Download Library & Set-Up Directory Structure

Once you have unpacked the zip file, you will see the following items in the directory:

opinosis_lib/ - Contains helper jar files
opinosis.jar – The library that performs the summarization task
documentation.pdf – Set-Up instructions
opinosis_sample – Sample directory structure of the work directory.

Now you need to define a new work directory similar to opinosis_sample. You must have the following directory structure.

<your_work_folder>/
input/  - All the text to be summarized. One file per document.
output/ - Summarization Results (opinosis summaries)
etc/    - Other resources like opinosis.properties will be stored here.

Now copy the opinosis.properties file from opinosis_sample/etc/ into <your_work_folder>/etc/. This is the file that would contain all the application specific settings. See below on how to change these settings.


Set-up Input Files

Currently, Opinosis only accepts POS annotated sentences as the input. We assume that each input file contains a set of related sentences (one line per sentence) with POS annotations in the following format:

"that/DT has/VBZ never/RB happened/VBN before/RB ./."
"It/NN never/RB happened/VBN before/RB ./."
"xx/NN yy/VB ......."

To generate POS annotations in the above format, you could use the following POS Tagger. Each input file would represent one summarization task, so it should contain a set of clustered sentences. For example one file for all sentences related to the “battery life of an ipod nano”, and another for all sentences related to the “ease of use of the ipod nano”. Please make sure that you have sufficient redundancies in each input file (i.e. > 60 related sentences).


Running the Opinosis Summarizer

Assuming you have gone over the first two steps above, to start generating summaries type the following:

java -jar opinosis.jar -b <path_to_work_folder>
-b: base directory where input and output directories are found (work directory).

All the Opinosis generated summaries will be found in <path_to_work_folder>/output/. If you want to run the examples from opinosis_sample/, execute the following:

java -jar opinosis.jar -b opinosis_sample/


Opinosis Summarizer Parameter Settings

To change the various properties for summary generation, just look into the opinosis.properties file found in the <your_work_folder>/etc/ directory. This file contains a list of configurable parameters. Here is an explanation of these parameters:

Opinosis Parameter Settings
redundancy : Controls the minimum redundancy requirement.  This enforces that a selected path contains at least the minimum specified redundancy. This has to be an absolute value. Setting this value to more than 2 is not recommended unless you have very high redundancies. This corresponds to sigma_r in the paper.
gap : Controls the minimum gap allowed between 2 adjacent nodes. If you set this to a very large value, then your summaries may have grammatical issues. The setting recommended is between 2 and 5. The minimum acceptable setting is 2, and the default is 3. This corresponds to sigma_gap in the paper and has to be an absolute value.
max_summary : The number of candidates to select as the summary. This corresponds to the summary size, sigma_ss in the paper. This has to be an absolute value.
scoring_function :
Which scoring functions to use?
1-    only redundancy
2-    2- redundancy & path length
3-    3- redundancy & log(path length) — default (and recommended)
collapse :
Should we collapse structures? Recall may be low when structures are not collapsed. Possible values are true or false
run_id :
This is just to give the current run a logical name. Any string describing the run would be ideal.

 

rouge2csv – Script to Interpret ROUGE Scores

This is a perl script that helps in interpreting ROUGE scores generated by the perl (original) implementation of ROUGE. If you need Instructions on how to set-up ROUGE for evaluation of your summarization tasks go here.

Assuming you have piped all your ROUGE results to a file, this tool will collect all rouge scores into separate CSV files depending on the n-grams used. For example, all ROUGE-1 scores will be collected into a ROUGE-1.csv file, similarly all ROUGE-2 scores will be in a ROUGE-2.csv. The precision, recall and f-scores will be comma separated. This will allow you to easily visualize your results in Excel or OpenOffice. If you have ROUGE scores of identical runs (usually happens when you use Jackknifing), the scores will be averaged.

Here is a sample input and corresponding output file: [ Input | Output ]. You will notice multiple results with the same run id+ROUGE-N combination in the input file. This is due to the Jackknifing procedure that I used. In the output however, you will see only one instance as the scores have been averaged. If you do not use Jackknifing, you will most likely have one ROUGE score for one particular run, so you need not worry about this.