5 Examples of Text Classification in Practice

AI is transforming nearly every industry, and text analysis is a key area of interest. That’s because there’s been an explosion in unstructured text data—nearly 80% of data at most organizations—which is quickly becoming impractical to analyze by humans alone.

We’ve already talked about some best practices for building a text classifier, but how can a tool like this help your business? Let’s take a closer look at document classification and some real-world examples.

What Is Document Classification?

Organizations need to classify documents so that their text data is easier to manage and utilize. For example, companies may need to classify incoming customer support tickets so they get sent to the right customer support agents.

document classification, text classification examples, text categorization
Example classification of support tickets

With a manual approach, staff would need to sort through each text and assign a label or category to it individually. The problem is that manual classification can be time-consuming, error-prone, and cost-prohibitive.

That’s why many organizations are turning to machine learning (ML) and natural language processing (NLP) to automatically organize texts into one of several predefined categories. It doesn’t matter if the texts are very short (e.g. Tweets) or entire documents (e.g. news articles), the ability to quickly categorize this data brings efficiency to the organization and frees up staff to work on higher-level tasks.

 

5 Practical Text Classification Examples

With the value of text classification clear, here are five practical use cases business leaders should know about.

1. Gmail Spam Classifier

Spam has always been annoying for email users, and these unwanted messages can cost office workers a considerable amount of time to deal with manually. Most email services filter spam emails based on a number of rules or factors, such as the sender’s email address, malicious hyperlinks, suspicious phrases, and more. But there’s no single definition of spam, and some unwanted emails can still reach users.

document classification

 

Spam classification at a high level. Source: developers.google.com

That’s why Google recently decided to upgrade its Gmail filters using the company’s own machine learning platform called TensorFlow. Google was able to train new ML algorithms to block an additional 100 million spam messages every day. Moreover, these new email classification algorithms are able to identify patterns over time based on what individual Gmail users consider spam themselves.

2. Great Wolf Lodge’s Sentiment Classifier

Great Wolf Lodge (GWL), a chain of resorts and indoor water parks, has expanded its broad digital strategy by using AI to classify customer comments based on sentiment. They developed what they call the Great Wolf Lodge’s Artificial Intelligence Lexicographer (GAIL).

GWL capitalizes on the concept of net promoter score (NPS) to gauge the experience of individual customers. Instead of using an NPS score to determine customer satisfaction, GAIL determines if customers are a net promoter, detractor, or neutral party based on the free-text responses posted in monthly customer surveys. This analogous to predicting if the customer sentiment is positive, negative, or neutral. GAIL essentially “reads” the comments and generates an opinion.

document classification

 

A net promoter score can be used to identify detractors, promoters, or passives (neutral party). Source: helpshift.com

Through this effort, the company hopes to better understand its guests and improve the customer experience. For example, by analyzing comments by detractors, Great Wolf Lodge, would know areas in their service that need improvement.

GAIL was trained using over 67,000 reviews and has an accuracy of 95 percent. Analyzing this unstructured data manually would take far too long for humans, but GAIL can parse this data in seconds and determine whether the author is a net promoter, detractor, or neutral party.

3. Facebook’s Hate Speech Detection

Facebook—with nearly 1.7 billion daily active users—naturally has content posted on the platform that violates its rules. Among this negative content is hate speech. Defining and detecting hate speech is one of the biggest political and technical challenges for Facebook and similar platforms.

Facebook addresses this problem by having human experts review posts detected automatically using an AI text classifier. The AI flagged posts are reviewed in the same way as posts reported by users. In fact, the platform removed 9.6 million pieces of content flagged as hate speech in the first quarter of 2020 alone.

text categorization example

 

Volume of AI based hate speech removal on Facebook. Source: Wired

Detecting which content contains hate speech, however, is much harder than violent or explicit content. AI algorithms must understand the subtle meaning of the text using NLP, analyze the cultural context and nuance being expressed, and then determine whether it’s offensive without incorrectly penalizing innocent content.

text classification example

 

Example hate speech. Hate speech is harder to detect than violent or explicit content. Source: arxiv.org

To increase how much AI can help humans in the loop, Facebook has created a collection of more than 10,000 hate speech memes that combine images and text to spur new research.

4. Bipartisan Press’s Political Bias Detector

The Bipartisan Press is a news outlet that aims to promote transparent journalism by attempting to label the bias of every article it publishes. More recently, however, the publication has turned to AI and NLP to systematically predict political bias.

political bias in text classification

 

What’s your political bias? Source: thebipartisanpress

The publication experimented with multiple ML algorithms, dataset and configurations and found that the best political bias predictor is a model that leveraged Google’s BERT transformer architecture. They also found that the dataset that resulted in the best bias prediction was based on Ad Fontes Media’s list of articles which was manually labeled on a per-article basis. Bipartisan Press now uses its AI tool to classify and score its own articles as left or right leaning and minimal to extreme bias level.

5. LinkedIn’s Inappropriate Profile Flagging

LinkedIn has more than 590 million professionals in over 200 countries. To keep the platform safe and professional, LinkedIn puts a lot of effort into detecting and remediating behavior that violates its Terms of Service, such as spam, scams, harassment, or misinformation. One such attempt—is to detect and remove profiles with inappropriate content. Inappropriate content can range from profanity to advertisements for illegal services.

At first, the platform manually flagged profiles that contained inappropriate words or phrases. This process wasn’t scalable and limited the total number of inappropriate profiles that LinkedIn could surface. Over time, it also became much harder to manage the growing list of offending words and phrases.

Now the social media platform flags profiles that contain inappropriate content using a machine learning model. This classification model was trained using a dataset of public profile content labeled as “appropriate” or “inappropriate”, which was carefully curated to limit false positives. LinkedIn continues to refine its ML algorithm and training set while looking into Microsoft translation services to leverage ML in all of the platform’s supported languages.

Consider Document Classification For Your Business

As you can see, text classification has a wide range of use cases for business. Unstructured data continues to grow at an enormous pace, and the most innovative companies are using ML and AI to harness this information to achieve greater business results.

Keep Learning & Succeed With AI

  • Join my AI Integrated newsletterwhich clears the AI confusion and teaches you how to successfully integrate AI to achieve profitability and growth in your business.
  • Read  The Business Case for AI to learn applications, strategies, and best practices to be successful with AI (select companies using the book: government agencies, automakers like Mercedes Benz, beverage makers, and e-commerce companies such as Flipkart).
  • Work directly with me to improve AI understanding in your organization, accelerate AI strategy development and get meaningful outcomes from every AI initiative.

Recommended Reading

About The Author

4 thoughts on “5 Examples of Text Classification in Practice”

  1. Hello Kavita,
    Hope you are doing well. I read lot of your article, and aware of your credentials. Recently, I have been working on a project, that consist of a sub-problem regarding NLP. Below, I have given a small description of the problem.
    Problem:
    I want to build a classifier that will do the following:
    1. Take a text (as input)
    2.Classifier will tell whether the text is consider strong or diminishing

    For example :
    Strong Statement: This is what I see
    Weak Statement I’m no expert but…
    Strong Statement: Is this clear?
    Weak Statement: Do you get what I’m saying ?

    The goal is to recognize a text, and being able to tell, whether the user is portraying themselves as with low self-esteem. Or they are portraying themselves as confident person.

    This is what I did : I trained the data with Multinomial Logistic Regression. But unfortunately, the prediction is not good.
    I would really appreciate if you can suggest me, how should I approach this problem. Looking forward for your feedback. Thanks

    Regards,
    -Asif Mahmud

    Dataset: https://github.com/ASIF-Mahmud1/Exploration/blob/diminishingTerms/DiminishingTerms/dataSet.csv

Have a thought?