Overview
NER is a fundamental step in information extraction. It allows systems to understand 'who,' 'where,' and 'what' is being discussed in a piece of text.
Common Entity Types
- PERSON: Real or fictional people.
- ORG: Companies, agencies, institutions.
- GPE: Countries, cities, states.
- DATE: Absolute or relative dates or periods.
- MONEY: Monetary values, including unit.
How it Works
Modern NER systems typically use sequence labeling models. A model (like a Bi-LSTM-CRF or a Transformer) processes the text and assigns a tag to each token indicating if it is part of an entity and what type it is.
Applications
- Content categorization for news and research.
- Improving search engine results.
- Automating data entry from documents.
- Powering question-answering systems.