These are a collection of tutorials and other articles that I have written over a few years. To receive notifications of new articles, you can follow my blog. I also have some tutorial related code hosted on my Github Repo.
Text Mining / NLP Basics
- What is text preprocessing and how to get started?
- What are n-grams?
- What is term-frequency?
- What is inverse document frequency?
- What is text similarity?
- All about stop words
- How to compile a custom stop word list
- Giving an IDF effect to probability of word counts
- Understanding Nouns in WordNet
Machine Learning / Vectorization
- How to build a text-classifier from scratch with Logistic Regression
- How to generate embeddings of phrases –Phrase2Vec
- Getting started with Word2Vec and making it work
- How to start using pretrained word embeddings
- How to correctly use Tfidftransformer and Tfidfvectorizer
- How to use CountVectorizer for feature extraction
- Computing precision and recall for a multi-class classification problem
- Extracting important keywords from text using sklearn and tf-idf
- Interesting tasks within Opinion Mining and Sentiment Analysis
- Opinion Mining Tutorial
- What is the difference between Micropinions and Micro-Reviews?
Clinical Text Mining
Useful Scripts / Tools / Code Snippets
- WordCloud for Data Scientists
- Extracting phrases from text
- Shell script for generating term frequencies in descending order
- Simple shell command to get word count in files
- Kavita’s Python Cheat Sheet
- ROUGE 2.0 – Evaluation of Textual Summaries in Java
- ROUGE2CSV – Script to Interpret Output from ROUGE 1.5.5
- Prepare4Rouge – Script to Prepare Summaries for ROUGE 1.5.5