An NLP task that identifies and classifies named entities in text — such as people, organizations, locations, dates, and monetary values — enabling structured extraction from unstructured language.
In Depth
Named Entity Recognition transforms unstructured text into structured data by identifying specific, named things — people (Barack Obama), organizations (OpenAI), locations (San Francisco), dates (March 2024), products (iPhone 15), and monetary values ($4.2 billion). NER is a foundational component of many NLP pipelines: it is rarely the end goal, but its output enables downstream tasks like relation extraction, knowledge graph construction, and question answering.
Modern NER systems are typically formulated as sequence labeling problems. Each token in the text receives a label: B-PER (beginning of a person entity), I-PER (inside a person entity), B-ORG, I-ORG, O (outside any entity), and so on. Bidirectional LSTMs with CRF (Conditional Random Field) output layers were the dominant approach before Transformers. Now, fine-tuned BERT models set the state of the art on standard NER benchmarks like CoNLL-2003, as the bidirectional attention captures the full sentence context needed to resolve entity ambiguity.
The challenges of NER are significant. The same string can be different entity types depending on context: 'Apple' is a company in 'Apple reported record earnings' but a fruit in 'she ate an apple'. Nested entities — 'National Institutes of Health' contains 'Health' as its own entity — require specialized architectures. New entity types and emerging proper nouns pose out-of-vocabulary challenges. Domain-specific NER (clinical, financial, legal) requires fine-tuning on in-domain annotated data because general-purpose models miss specialized entity types.
NER is the bridge between unstructured language and structured data — identifying the 'who, what, where, and when' in text and making it machine-readable for search, analytics, and knowledge extraction.
Real-World Applications
Frequently Asked Questions
What entities does NER recognize?
Standard entity types include: PERSON (people's names), ORG (organizations, companies), GPE/LOC (geopolitical entities, locations), DATE/TIME (temporal expressions), MONEY (monetary values), and PERCENT (percentages). Domain-specific NER extends to medical entities (diseases, drugs, genes), legal entities (laws, court cases), and financial entities (tickers, instruments).
What is the best NER model?
For quick, general-purpose use: spaCy's pre-trained models offer fast, accurate NER out of the box. For maximum accuracy: fine-tuned BERT or RoBERTa models on domain-specific data. For zero-shot or custom entities: LLMs like GPT-4 or Claude can extract entities defined in a prompt without any training. The best choice depends on speed requirements, domain specificity, and whether you need custom entity types.
How is NER used in business?
Common applications include: document processing (extracting names, dates, and amounts from contracts and invoices), compliance monitoring (identifying regulated entities in financial communications), customer support (extracting product names and order IDs from tickets), knowledge graph construction (building entity-relationship databases from unstructured text), and news monitoring (tracking mentions of companies and people).