NLP

Classify news articles with logistic regression and python

Build Your First Text Classifier in Python with Logistic Regression

Text classification is the automatic process of predicting one or more categories given a piece of text. For example, predicting if an email is legit or spammy. Thanks to Gmail’s spam classifier, I don’t see or hear from spammy emails! Other than spam detection, text classifiers can be used to determine sentiment in social media …

Build Your First Text Classifier in Python with Logistic Regression Read More »

Easily Access Pre-trained Word Embeddings with Gensim

What are pre-trained embeddings and why? Pre-trained word embeddings are vector representation of words trained on a large dataset. With pre-trained embeddings, you will essentially be using the weights and vocabulary from the end result of the training process done by….someone else! (It could also be you) One benefit of using pre-trained embeddings is that …

Easily Access Pre-trained Word Embeddings with Gensim Read More »

What is ROUGE and how it works for evaluation of summaries?

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It is essentially of a set of metrics for evaluating automatic summarization of texts as well as machine translation. It works by comparing an automatically produced summary or translation against a set of reference summaries (typically human-produced). This article provides an intuitive explanation of how ROUGE works. 

What is text similarity?

When talking about text similarity, different people have a slightly different notion on what text similarity means. In essence, the goal is to compute how ‘close’ two pieces of text are in (1) meaning or (2) surface closeness. The first is referred to as semantic similarity and the latter is referred to as lexical similarity. Although the methods for lexical similarity …

What is text similarity? Read More »