Named entity recognition (NER) is also known as token classification or span classification. Instead of classifying an entire document, it pulls out bits and pieces you might be interested in: names, organization names, places, money, etc.
Use cases
NER could be used to pull out the names of everyone mentioned in an email thread or quoted in a newspaper article.
Try it out
Example from here. Click “Submit” or test with your own text!
Models
Popular Models
Hugging Face has plenty of NER models listed under “token classification”. The most popular is StanfordAIMI/stanford-deidentifier-base, but it’s just for medical work! You’ll probably have better luck with dslim/bert-base-NER or Davlan/bert-base-multilingual-cased-ner-hrl.
State of the art
Named entity recognition on Papers with Code shows that pretty much everything sits in the 90’s, so I wouldn’t worry too much about what model you pick.
If you’re looking at a model with different size options, larger ones often perform much better at NER. For example, many spaCy tutorials suggest you use en_core_web_sm
which is small. Instead, you probably want en_core_web_lg
which has a lot more data in it. The model name should make it obvious as to whether it’s a smaller version of a “full” model.