ROUGE 2.0 Documentation – Java Package for Evaluation of Summarization Tasks

ROUGE 2.0 Documentation – Java Package for Evaluation of Summarization Tasks

Please use the updated documentation.

ROUGE 2.0 is a Java Package for Evaluation of Summarization Tasks building on the Perl Implementation of ROUGE with existing ROUGE measures as well as some updated and improved measures. This article is outlines how the ROUGE 2.0 package can be used for evaluation of your summarization tasks. 

Before you begin…

  • Please ensure that you have downloaded the most recent version on ROUGE 2.0 prior to proceeding.
  • Reference summary refers to the “gold standard” or human summaries.
  • System summary refers to the machine generated summaries.
  • You can have any number of reference summaries for each summarization task.
  • You can have any number of system summaries for each summarization task.

Step 1: Unpack ROUGE 2.0

The first step is to unpack the ROUGE 2.0 distribution to any location on disk. After unpacking ROUGE 2.0, this is the resulting directory structure that you would see:

  • resources/ – This directory contains replaceable stop words, POS taggers, language related resources.
  • test-summarization/ – This is a test project containing “references” and “system” folders. It currently has samples for english and persian.
  • results.csv – This is a sample results file produced by ROUGE 2.0
  • – This is where you configure all your settings. This would be the file you would be working with extensively. (see step 4 for details)
  • rouge2.0.jar – This is the runnable ROUGE 2.0 jar file that you can use to compute ROUGE scores.

Step 2: Getting Started – Create a project directory

To get started, create a directory structure as folllows anywhere on your system:

Since we avoid any complex HTML formatting for evaluation. We evaluate summaries based on file naming convention. Please follow instructions in the next step for naming convention.

Step 3: Generate your system and reference summaries

In generating your system and reference summaries, please adhere to the following naming convention for evaluation. The naming convention for ROUGE 2.0 is actually very simple. You basically use the task name, reference summary name and system summary names.

Reference Summary Naming Convention

Your reference summary should be named as follows:

So if you have a summarization document called news1 and you have 3 human composed reference summaries, the files would be similar to:

The underscore is mandatory, and the first part of the file name always refers to the summarization task (i.e. document or text to be summarized). Please see samples under test-summarization/reference/.

System Summary Naming Convention

Your system summary should be named as follows:

If you have a document called news1 (as earlier) and you have 4 systems (e.g. variation of the same summarization algorithm), each system would generate a separate summary for the same summarization task. The system files would thus be similar to:

The underscore is mandatory, and the first part of the file name always refers to the summarization task (i.e. document or text to be summarized). Note that news1 in the reference folder matches up with news1 in the system folder. Each of the system summaries for news1 would be individually evaluated against all reference summaries for news1 in the reference folder (scores averaged). Please see samples under test-summarization/system/.

Reference and System Summary Formatting

You would need to use one file for each reference or system summary. Each sentence in your system or reference summary should be on a separate line. There is no need for numbering the sentences or placing the text in a html file. Any file extension is permitted as long as each sentence in the file is on a separate line. For example, if your system summary file has 3 sentences, the file would contain sentences like this:

More examples can be found in test-summarization/system or test-summarization/reference

Step 4: Configure

Now, before you start evaluating your summaries you need to configure the file located in the root of ROUGE 2.0. This is where you specify the ROUGE-N type that you want to evaluate, stop words to use, output file, synonyms and etc. You would be working with this file quite extensively. The table below describes the rouge 2.0 parameters and how you should set it. The most important parameters are:

  • project.dir
  • rouge.type (and parameters related to this selection)
  • output
Property Name Description Required Default
project.dir Location of your summaries. Reference and System Summaries. By default it points to test-summarization within the root folder which contains some test reference and system summaries. Yes test-summarization/
rouge.type topic, topicUniq or normal. Select normal for the typical ROUGE-N evaluation. Yes normal
ngram what n-gram size? This is only applicable if rouge.type=normal. Depends 1
stopwords.use Do you want to use stop words? true/false Optional false
stopwords.file Location of stop words file. This can be changed based on language. By default it uses what the perl version of ROUGE uses. Optional resources/stopwords-rouge-default.txt
topic.type Only set this if topic of topicUniq are used. This should be the POS form in lowercase (based on Stanford’s POS Tagger). For example, nn|jj or jj or jj|vb or vbp. Depends nn|jj
synonyms.use Use synonyms? true/false Optional false
output How to generate results, to screen or file? file or console Optional file
outputFile What file to output results to? This is to be set only if output=file Optional results.csv

Step 5: Running ROUGE 2.0 Jar File

Once you have generated your system summaries and formatted your reference summaries as specified in Step 3 and configured the rouge property file as in Step 4, the next step would be to run the evaluation package. Assuming ROUGE 2.0 was unpacked into C:\projects\rouge2.0, you can execute ROUGE 2.0 from any Linux or Windows machine as follows:

This step uses all the system summaries and corresponding reference summaries from your project directory defined in Step 2 and computes the appropriate ROUGE scores (as specified in By default, the file in the root of the rouge 2.0 installation will be used. You could either modify this version of or you can also use a file located elsewhere on disk as follows:

The output of evaluation would be printed to console and also to file if output=file in Here is an example of output printed to screen:

Here is an example of the results file produced:

By default if outputFile in is not changed, the output would be in results.csv in the root of rouge 2.0.

Running ROUGE 2.0 for Unicode Texts (Persian, Tamil, etc)

One of the problems with the original perl version of ROUGE is that it does not support evaluation of unicode based texts because of the way the text is tokenized. This package has been tested with persian texts and would thus work in cases where the original perl package fails. To make sure that this package works for unicode texts please ensure that:

  • In the file, use_synonyms is set to false. When this is true, ROUGE 2.0 tries to POS tag the text and if there isn’t a suitable POS tagger from the Stanford POS tagging libraries, this will cause issues.

Other than this setting, you are actually ready to go! Just follow Steps 1 – 5 as specified above.