The NER (Named Entity Recognition) model enables detection of soft PII like person names, organizations, and locations that can’t be captured by regex patterns.Documentation Index
Fetch the complete documentation index at: https://docs.rehydra.ai/llms.txt
Use this file to discover all available pages before exploring further.
Model Modes
| Mode | Description | Size | Use Case |
|---|---|---|---|
'disabled' | No NER, regex only | 0 | Fast processing, structured PII only |
'quantized' | Smaller quantized model | ~280 MB | Recommended for most use cases |
'standard' | Full-size model | ~1.1 GB | Maximum accuracy |
'custom' | Your own ONNX model | Varies | Domain-specific models |
Basic Setup
Download Progress
Track model download progress:Confidence Thresholds
NER entities have confidence scores (0.0-1.0). Configure minimum thresholds:Case Fallback
The NER model is case-sensitive — it works best on properly capitalized text. This means lowercase names like"tom" or "sarah" can be missed. Enable caseFallback to run a second NER pass on title-cased text and merge any new detections:
caseFallback, neither "tom" nor "sarah" would be detected.
How it works
- The primary NER pass runs on the original text
- A second pass runs on title-cased text (e.g.
"tom"→"Tom") - New detections from the fallback pass that don’t overlap with primary detections are merged in
- Fallback detections keep the original lowercase text and character offsets
- A confidence penalty is applied to fallback detections to reduce false positives
Confidence penalty
Fallback detections receive a confidence penalty (multiplied bycaseFallbackPenalty, default 0.85) since title-casing can introduce false positives. You can tune this:
Auto-Download Control
By default, models are downloaded automatically. To disable:Manual Model Management
Pre-download models or manage cache:Inference Server Backend
For batch processing or GPU acceleration, offload NER inference to a remote server:Custom Models
Use your own ONNX model:Custom models must follow the same input/output format as the default models. See the model training guide for details.
Cache Locations
Models are cached locally for offline use:Node.js
| Platform | Location |
|---|---|
| macOS | ~/Library/Caches/rehydra/models/ |
| Linux | ~/.cache/rehydra/models/ |
| Windows | %LOCALAPPDATA%/rehydra/models/ |
Browser
In browsers, models are stored using:- Origin Private File System (OPFS) for large model files
- IndexedDB for metadata
NER-Detected Types
| Type | Examples |
|---|---|
PERSON | John Smith, Maria, Dr. Johnson |
ORG | Acme Corp, Google, United Nations |
LOCATION | Berlin, Germany, Central Park |
ADDRESS | 123 Main Street |
DATE_OF_BIRTH | born on March 15, 1990 |
Disabling Specific NER Types
Detect only certain entity types:Performance Tips
Reuse the anonymizer instance
Reuse the anonymizer instance
Model loading is expensive. Create once and reuse:
Use quantized model for most cases
Use quantized model for most cases
The quantized model is ~95% as accurate but 4x smaller:
| Model | Size | Inference Time |
|---|---|---|
| Standard | ~1.1 GB | ~120ms |
| Quantized | ~280 MB | ~100ms |
Skip NER for structured-only PII
Skip NER for structured-only PII
If you only need emails, phones, IBANs, etc.:
Next Steps
Semantic Enrichment
Add gender and location attributes
Custom Recognizers
Add domain-specific detection patterns