Normal AI - Detecting people, places, things and more

Named entity recognition (NER) is also known as token classification or span classification. Instead of classifying an entire document, it pulls out bits and pieces you might be interested in: names, organization names, places, money, etc.

Use cases

NER could be used to pull out the names of everyone mentioned in an email thread or quoted in a newspaper article.

Try it out

Example from here. Click “Submit” or test with your own text!

Models

Popular Models

Hugging Face has plenty of NER models listed under “token classification”. The most popular is StanfordAIMI/stanford-deidentifier-base, but it’s just for medical work! You’ll probably have better luck with dslim/bert-base-NER or Davlan/bert-base-multilingual-cased-ner-hrl.

State of the art

Named entity recognition on Papers with Code shows that pretty much everything sits in the 90’s, so I wouldn’t worry too much about what model you pick.

If you’re looking at a model with different size options, larger ones often perform much better at NER. For example, many spaCy tutorials suggest you use en_core_web_sm which is small. Instead, you probably want en_core_web_lg which has a lot more data in it. The model name should make it obvious as to whether it’s a smaller version of a “full” model.