Kavita’s Articles

These are the most recent articles that I’ve written. To receive notifications of new articles, you can subscribe to my blog
For all the code samples, you can star or fork this repository.

Technical Deep Dive

FastText vs. Word2vec: A Quick Comparison

One of the questions that often comes up is what’s the difference between fastText and Word2Vec? Aren’t they both the same? Yes and no. They are conceptually the same, but there is a minor difference—fastText operates at a character…

Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI

Word2Vec is a widely used word representation technique that uses neural networks under the hood. The resulting word representation or embeddings can be used to infer semantic similarity between words and phrases, expand queries, surface related concepts…

HashingVectorizer vs. CountVectorizer

Previously, we learned how to use CountVectorizer for text processing. In place of CountVectorizer, you also have the option of using HashingVectorizer. In this tutorial, we will learn how HashingVectorizer differs from CountVectorizer and when to use…

10+ Examples for Using CountVectorizer

Scikit-learn’s CountVectorizer is used to transform a corpora of text to a vector of term / token counts. It also provides the capability to preprocess your text data prior to generating the vector representation making it a…

Easily Access Pre-trained Word Embeddings with Gensim

What are pre-trained embeddings and why? Pre-trained word embeddings are vector representation of words trained on a large dataset. With pre-trained embeddings, you will essentially be using the weights and vocabulary from the end result of the…

The Business Side of AI

Before AI, Invest in A Big Data Strategy

Big data describes the volumes of data that your company generates, every single day. Both structured and unstructured. Analysts at Gartner estimate that more than 80 percent of enterprise data is unstructured….