Applied NLP Basics

General articles that most NLP, Text Mining and Data Science Practitioners should be aware of.

Classify news articles with logistic regression and python

Build Your First Text Classifier in Python with Logistic Regression

Text classification is the automatic process of predicting one or more categories given a piece of text. For example, predicting if an email is legit or spammy. Thanks to Gmail’s spam classifier, I don’t see or hear from spammy emails! Other than spam detection, text classifiers can be used to determine sentiment in social media …

Build Your First Text Classifier in Python with Logistic Regression Read More »

What is text similarity?

When talking about text similarity, different people have a slightly different notion on what text similarity means. In essence, the goal is to compute how ‘close’ two pieces of text are in (1) meaning or (2) surface closeness. The first is referred to as semantic similarity and the latter is referred to as lexical similarity. Although the methods for lexical similarity …

What is text similarity? Read More »

What are Stop Words?

When working with text mining applications, we often hear of the term “stop words” or “stop word list” or even “stop list”. Stop words are basically a set of commonly used words in any language, not just English. The reason why stop words are critical to many applications is that, if we remove the words that …

What are Stop Words? Read More »