Word Cloud in Python for Jupyter Notebooks and Web Apps

About a year ago, I looked high and low for a python word cloud library that I could use from within my Jupyter notebook that was flexible enough to use counts or tfidf when needed or just accept a set of words and corresponding weights. I was a bit surprised that something like that did not already exist within libraries like plotly. All I wanted to do, was to get a quick understanding of my text data and word vectors. That’s probably not too much to ask?

Here I am a year later, using my own word_cloud visualization library. Its not the prettiest or the most sophisticated, but it works for my use cases. I decided to share it, so that others could use it as well. After installation, here are a few ways you could use it:

Generate word clouds with a single text document

This example show cases how you can generate word clouds with just one document. While the colors can be randomized, in this example, the colors are based on the default color settings. By default, the words are weighted by word counts unless you explicitly ask for tfidf weighting. Tfidf weighting makes sense only if you have a lot of documents to start with.

from word_cloud.word_cloud_generator import WordCloud
from IPython.core.display import HTML

#only one news article here
texts=['MEXICO CITY — Newly formed Hurricane Willa rapidly intensified off Mexico\'s Pacific coast','MEXICO CITY — Newly formed Hurricane Willa rapidly intensified off Mexico\'s Pacific coast Sunday and early Monday and became a major Category 5 storm, the U.S. National Hurricane Center said. As of 11 a.m. ET., Willa had maximum sustained winds of 160 mph -- just 3 mph over the threshold for a Category 5.    Willa was "potentially catastrophic," forecasters warned. The hurricane center said it could make landfall along Mexico\'s southwestern coast Tuesday afternoon or evening and bring with it a life-threatening storm surge -- especially near and to the south of where the center of Willa makes landfall.    Near the coast, the surge will be accompanied by large and destructive waves. Willa is also forecast to bring high winds and heavy rainfall.    "Slight weakening is forecast to begin on Tuesday, but Willa is expected to be an extremely dangerous major hurricane when it reaches the coast of Mexico," the center said.    A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018.   A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. NATIONAL HURRICANE CENTER  The center said Willa was about 175 miles south-southwest of Las Islas Marias, Mexico, and some 135 miles southwest of Cabo Corrientes, Mexico, and was moving north at about 7 mph.    Hurricane-force winds extended outward up to 30 miles from the center and tropical-storm-force winds extended outward up to 105 miles.    A hurricane warning was posted for a stretch of shore between San Blas and Mazatlan. A tropical storm warning was in effect for Playa Perula to San Blas and north of Mazatlan to Bahia Tempehuaya.    Forecasters said Willa is expected to produce storm total rainfall accumulations of 6 to 12 inches, with local amounts up to 18 inches, across portions of western Jalisco, western Nayarit, and southern Sinaloa in Mexico. The rainfall could cause life-threatening flash flooding and landslides.    Farther inland, Willa is expected to produce rainfall amounts of 2 to 4 inches across portions of Zacateca, Durango, southeast Chihuahua, and Coahuila in Mexico, with local amounts up to 6 inches possible. That could cause life-threatening flash flooding.    After Willa makes its way across Mexico, it could drop between 1 and 3 inches of rain on central and southern Texas during the middle of the week, CBS News contributing meteorologist Jeff Berardelli reports. The additional rainfall could cause additional flooding in already saturated areas.','early Monday and became a major Category 5 storm, the U.S. National Hurricane Center said. As of 11 a.m. ET., Willa had maximum sustained winds of 160 mph -- just 3 mph over the threshold for a Category 5.    Willa was "potentially catastrophic," forecasters warned. The hurricane center said it could make landfall along Mexico\'s southwestern coast Tuesday afternoon or evening and bring with it a life-threatening storm surge -- especially near and to the south of where the center of Willa makes landfall.    Near the coast, the surge will be accompanied by large and destructive waves. Willa is also forecast to bring high winds and heavy rainfall.    "Slight weakening is forecast to begin on Tuesday, but Willa is expected to be an extremely dangerous major hurricane when it reaches the coast of Mexico," the center said.    A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018.   A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. NATIONAL HURRICANE CENTER  The center said Willa was about 175 miles south-southwest of Las Islas Marias, Mexico, and some 135 miles southwest of Cabo Corrientes, Mexico, and was moving north at about 7 mph.    Hurricane-force winds extended outward up to 30 miles from the center and tropical-storm-force winds extended outward up to 105 miles.    A hurricane warning was posted for a stretch of shore between San Blas and Mazatlan. A tropical storm warning was in effect for Playa Perula to San Blas and north of Mazatlan to Bahia Tempehuaya.    Forecasters said Willa is expected to produce storm total rainfall accumulations of 6 to 12 inches, with local amounts up to 18 inches, across portions of western Jalisco, western Nayarit, and southern Sinaloa in Mexico. The rainfall could cause life-threatening flash flooding and landslides.    Farther inland, Willa is expected to produce rainfall amounts of 2 to 4 inches across portions of Zacateca, Durango, southeast Chihuahua, and Coahuila in Mexico, with local amounts up to 6 inches possible. That could cause life-threatening flash flooding.    After Willa makes its way across Mexico, it could drop between 1 and 3 inches of rain on central and southern Texas during the middle of the week, CBS News contributing meteorologist Jeff Berardelli reports. The additional rainfall could cause additional flooding in already saturated areas.']

wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS)

#don't randomize color, show only top 50
embed_code=wc.get_embed_code(text=texts,random_color=False,topn=50)
HTML(embed_code)

Generate word clouds from multiple documents

Let’s say you have a 100 documents from one news category, and you just want to see what the common mentions are.

from word_cloud.word_cloud_generator import WordCloud 
from IPython.core.display import HTML
from nltk.corpus import reuters
import nltk

wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS)

nltk.download('reuters')

#get all articles related to coffee
category_docs = reuters.fileids("coffee");

list_of_documents=[]

#use raw content from a 100 documents
for i in range (100):
    document_id = category_docs[i]
    list_of_documents.append(reuters.raw(document_id)) 
    

embed_code=wc.get_embed_code(text=list_of_documents,random_color=True,topn=50)
HTML(embed_code)

Generate word clouds from existing weights

Let’s say you have a set of words with corresponding weights, and you just want to visualize it. All you need to do is make sure that the weights are normalized between [0-1]

from word_cloud.word_cloud_generator import WordCloud 
from IPython.core.display import HTML 
import pandas as pd

wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS)

#words with corresponding weights
list_of_scores=[['nice-work',0.2],['great-job',0.7],['cool-place',0.1],['cool-cloud',0.6],['phrase-cloud',0.34],['word-cloud',0.625],['nice-colors',0.525],['small-font',0.4],['fun-place',0.6],['awesome',0.4],['intelligent',0.4],['medium-font',0.4],['crazy',0.2],['smart',0.3],['ambitious',0.4]]

embed_code=wc.get_embed_code(text_scores=pd.DataFrame(list_of_scores),random_color=True,topn=50)
HTML(embed_code)

Please feel free to propose changes to prettify the output — just open a pull request with your changes.

Links

Have a thought?

4 Comments
Inline Feedbacks
View all comments
Yukio
1 year ago

How can I install wordcloud in jupyter notebook?

Ratha
1 year ago

How do I create wordcloud for tamil?

4
0
Would love your thoughts, please comment.x
()
x