What we’re going to do

We’re going to build a simple Named Entity Recogniser (NER) using FlairNLP. 

What we’re going to use

  • Google Colab
  • FlairNLP

What is Named Entity Recognition

Entity Extraction is an important tool that helps you extract important data such as Names, Location, Organisations and other predefined classes, from a body of text. This could be useful to analyse News articles, scripts, contracts and more. 

A film location manager could use this to filter out locations from a script, or a movie recommendation system could ingest a script for a given movie and build a series of keywords associated with each movie automatically – tagging actors, characters, locations – making it easier to recommend and discover new movies. We’ll run through a more detailed tutorial on how you could do this next time. 

Take the following text:

Costco is an American retailer founded by Jim Sinegal

An NER system will help output data as below: 

Costco - ORG (organisation)
American - LOC (location)
Jim Sinegal - PER (person)

Costco – ORG (organisation)

American – LOC (location)

Jim Sinegal – PER (person)

Here, you can identify a person (PER), location (LOC), organisation (ORG) out of the above text. 

What is FlairNLP

FlairNLP is a Natural Language Processing library developed by Humboldt University of Berlin.

From the Github page, Flair is: 

  • A powerful NLP library. Flair allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification, with support for a rapidly growing number of languages.
  • A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including Flair embeddings, BERT embeddings and ELMo embeddings.
  • A PyTorch NLP framework. Flair builds directly on PyTorch, making it easy to train your own models and experiment with new approaches using Flair embeddings and classes.

1 – Getting Ready

We’re going to build a simple spam or ham text classifier to help filter out emails and messages. 

Firstly, we need to install Flair. Open up a Colab project and run: 

pip install flair

pip install pandas

This will download and install the required packages to use flair on Colab. 

2 – Use a pre-trained NER model

Flair comes with a pre-trained object / entity recognition model built into the package. All you need to do is run it with the following code: 

from flair.data import Sentence
from flair.models import SequenceTagger
 
# load tagger
tagger = SequenceTagger.load("flair/ner-english")
 
# make example sentence
sentence = Sentence("Costco is an American Retailer founded by Jim Sinegal")
 
# predict NER tags
tagger.predict(sentence)
 
# print sentence
print(sentence)
 
# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
   print(entity)

Running this code will give you the following output:

Sentence: "Costco is an American Retailer founded by Jim Sinegal"   [− Tokens: 9  − Token-Labels: "Costco <S-ORG> is an American <B-ORG> Retailer <E-ORG> founded by Jim <B-PER> Sinegal <E-PER>"]
The following NER tags are found:
Span [1]: "Costco"   [− Labels: ORG (0.9986)]
Span [4,5]: "American Retailer"   [− Labels: ORG (0.9212)]
Span [8,9]: "Jim Sinegal"   [− Labels: PER (0.9987)]

The pre-trained model does a pretty good job of extracting named entities from our sentence without much difficulty. 

It’s understood that Costco is an Organisation. American Retailer – the algorithm has determined this to be an Organisation which is inaccurate but you can see where this is coming from – fixed with “retailer”, the data FlairNLP is trained on may associate with a company vs understanding it to refer to Location which is what I expected. We can train our own processor at a later date, it’s not an issue.

And lastly, it’s correctly understood that Jim Sinegal refers to a Person with 99% certainty. 

You can now package the above into a microservice with API access point and create a basic SaaS system without doing much else! 

Let’s breaks own what the code is doing: 

Step 1 – Load the Models

from flair.data import Sentence
from flair.models import SequenceTagger
# load tagger
tagger = SequenceTagger.load("flair/ner-english")

Step 2 – Create your Sentence or Load your text.

Change the sentence to anything you’d like. It could be a search query, it could be a sentence from a story paragraph or newspaper clipping. 

# make example sentence
sentence = Sentence("Costco is an American Retailer founded by Jim Sinegal")

Step 3 – Predict

The last part of the code essentially runs the magic. The function tagger.predict(sentence) tells the script to run the NER model over the sentence. 

We then print out the output for each entity found using a for loop. 

# predict NER tags
tagger.predict(sentence)
# print sentence
print(sentence)
# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
   print(entity)

Example Use Case for NER systems

So what can you actually do with named entity recognition?

With the huge amount of unstructured data being generated by consumers every single minute, we need to create systems which can help classify and discover content that is relevant to us. 

Search 

  • By extracting key entities from text data, we can build more relevant search results for internal search engines, academic search engines, travel sites and movie results. 

Content Classification

  • Classify text into topics to search and organise documents easily

Ecommerce Product Data

  • Do you run an online store? Imagine feeding natural text with the product brand, description, title, price and quantity available to build out your structured data automatically. 

Customer Care

  • Easily tag product names, categories, contact information, location and more to build out more efficient routing and handling of queries. 

Summary

You can use FlairNLP or other libraries to help you structure data from text. This is a simple tutorial which does nothing more than use the FlairNLP example found here to show you how easy it really is to extract value from text. 

If you’d like help or more information about this, feel free to get in touch.

Take a look at the source code in Google Colab here.

Citations

@inproceedings{akbik2018coling,
  title={Contextual String Embeddings for Sequence Labeling},
  author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
  booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
  pages     = {1638--1649},
  year      = {2018}
}