About a year ago, I looked high and low for a python word cloud library that I could use from within my Jupyter notebook that was flexible enough to use counts or tfidf when needed or just accept a set of words and corresponding weights. I was a bit surprised that something like that did not already exist within libraries like plotly
. All I wanted to do, was to get a quick understanding of my text data and word vectors. That’s probably not too much to ask?
Here I am a year later, using my own python-based word_cloud visualization library. It’s simple works for many use cases. I decided to share it so that others could use it as well. After installation, here are a few ways you could use it:
Generate Python word cloud with a single text document
This example showcases how you can generate word clouds with just one document. While the colors can be randomized, in this example, the colors are based on the default color settings. By default, the words are weighted by word counts unless you explicitly ask for tfidf weighting. Tfidf weighting makes sense only if you have a lot of documents to start with.
from word_cloud.word_cloud_generator import WordCloud from IPython.core.display import HTML #only one news article here texts=['MEXICO CITY — Newly formed Hurricane Willa rapidly intensified off Mexico\'s Pacific coast','MEXICO CITY — Newly formed Hurricane Willa rapidly intensified off Mexico\'s Pacific coast Sunday and early Monday and became a major Category 5 storm, the U.S. National Hurricane Center said. As of 11 a.m. ET., Willa had maximum sustained winds of 160 mph -- just 3 mph over the threshold for a Category 5. Willa was "potentially catastrophic," forecasters warned. The hurricane center said it could make landfall along Mexico\'s southwestern coast Tuesday afternoon or evening and bring with it a life-threatening storm surge -- especially near and to the south of where the center of Willa makes landfall. Near the coast, the surge will be accompanied by large and destructive waves. Willa is also forecast to bring high winds and heavy rainfall. "Slight weakening is forecast to begin on Tuesday, but Willa is expected to be an extremely dangerous major hurricane when it reaches the coast of Mexico," the center said. A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. NATIONAL HURRICANE CENTER The center said Willa was about 175 miles south-southwest of Las Islas Marias, Mexico, and some 135 miles southwest of Cabo Corrientes, Mexico, and was moving north at about 7 mph. Hurricane-force winds extended outward up to 30 miles from the center and tropical-storm-force winds extended outward up to 105 miles. A hurricane warning was posted for a stretch of shore between San Blas and Mazatlan. A tropical storm warning was in effect for Playa Perula to San Blas and north of Mazatlan to Bahia Tempehuaya. Forecasters said Willa is expected to produce storm total rainfall accumulations of 6 to 12 inches, with local amounts up to 18 inches, across portions of western Jalisco, western Nayarit, and southern Sinaloa in Mexico. The rainfall could cause life-threatening flash flooding and landslides. Farther inland, Willa is expected to produce rainfall amounts of 2 to 4 inches across portions of Zacateca, Durango, southeast Chihuahua, and Coahuila in Mexico, with local amounts up to 6 inches possible. That could cause life-threatening flash flooding. After Willa makes its way across Mexico, it could drop between 1 and 3 inches of rain on central and southern Texas during the middle of the week, CBS News contributing meteorologist Jeff Berardelli reports. The additional rainfall could cause additional flooding in already saturated areas.','early Monday and became a major Category 5 storm, the U.S. National Hurricane Center said. As of 11 a.m. ET., Willa had maximum sustained winds of 160 mph -- just 3 mph over the threshold for a Category 5. Willa was "potentially catastrophic," forecasters warned. The hurricane center said it could make landfall along Mexico\'s southwestern coast Tuesday afternoon or evening and bring with it a life-threatening storm surge -- especially near and to the south of where the center of Willa makes landfall. Near the coast, the surge will be accompanied by large and destructive waves. Willa is also forecast to bring high winds and heavy rainfall. "Slight weakening is forecast to begin on Tuesday, but Willa is expected to be an extremely dangerous major hurricane when it reaches the coast of Mexico," the center said. A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. A map made by the U.S. National Hurricane Center shows the projected path for Hurricane Willa as of 11 a.m. ET on Oct. 22, 2018. NATIONAL HURRICANE CENTER The center said Willa was about 175 miles south-southwest of Las Islas Marias, Mexico, and some 135 miles southwest of Cabo Corrientes, Mexico, and was moving north at about 7 mph. Hurricane-force winds extended outward up to 30 miles from the center and tropical-storm-force winds extended outward up to 105 miles. A hurricane warning was posted for a stretch of shore between San Blas and Mazatlan. A tropical storm warning was in effect for Playa Perula to San Blas and north of Mazatlan to Bahia Tempehuaya. Forecasters said Willa is expected to produce storm total rainfall accumulations of 6 to 12 inches, with local amounts up to 18 inches, across portions of western Jalisco, western Nayarit, and southern Sinaloa in Mexico. The rainfall could cause life-threatening flash flooding and landslides. Farther inland, Willa is expected to produce rainfall amounts of 2 to 4 inches across portions of Zacateca, Durango, southeast Chihuahua, and Coahuila in Mexico, with local amounts up to 6 inches possible. That could cause life-threatening flash flooding. After Willa makes its way across Mexico, it could drop between 1 and 3 inches of rain on central and southern Texas during the middle of the week, CBS News contributing meteorologist Jeff Berardelli reports. The additional rainfall could cause additional flooding in already saturated areas.'] wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS) #don't randomize color, show only top 50 embed_code=wc.get_embed_code(text=texts,random_color=False,topn=50) HTML(embed_code)
Generate word clouds from multiple documents
Let’s say you have 100 documents from one news category, and you just want to see what the common mentions are.
from word_cloud.word_cloud_generator import WordCloud from IPython.core.display import HTML from nltk.corpus import reuters import nltk wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS) nltk.download('reuters') #get all articles related to coffee category_docs = reuters.fileids("coffee"); list_of_documents=[] #use raw content from a 100 documents for i in range (100): document_id = category_docs[i] list_of_documents.append(reuters.raw(document_id)) embed_code=wc.get_embed_code(text=list_of_documents,random_color=True,topn=50) HTML(embed_code)
Generate word clouds from existing weights
Let’s say you have a set of words with corresponding weights, and you just want to visualize it. All you need to do is make sure that the weights are normalized between [0-1]
from word_cloud.word_cloud_generator import WordCloud from IPython.core.display import HTML import pandas as pd wc=WordCloud(use_tfidf=False,stopwords=ENGLISH_STOP_WORDS) #words with corresponding weights list_of_scores=[['nice-work',0.2],['great-job',0.7],['cool-place',0.1],['cool-cloud',0.6],['phrase-cloud',0.34],['word-cloud',0.625],['nice-colors',0.525],['small-font',0.4],['fun-place',0.6],['awesome',0.4],['intelligent',0.4],['medium-font',0.4],['crazy',0.2],['smart',0.3],['ambitious',0.4]] embed_code=wc.get_embed_code(text_scores=pd.DataFrame(list_of_scores),random_color=True,topn=50) HTML(embed_code)
Please feel free to propose changes to prettify the output — just open a pull request with your changes.
How can I install wordcloud in jupyter notebook?
You can do a pip install:
pip install git+ssh://git@github.com/kavgan/word_cloud.git
Usage examples are here: https://github.com/kavgan/word_cloud#quick-start
How do I create wordcloud for tamil?
Good question, have you tried with the current code base? You may have to change the stop list.