Configure the NER model for detecting names, organizations, and locations
The NER (Named Entity Recognition) model enables detection of soft PII like person names, organizations, and locations that can’t be captured by regex patterns.
import { createAnonymizer } from 'rehydra';const anonymizer = createAnonymizer({ ner: { mode: 'quantized', onStatus: (status) => console.log(status), }});await anonymizer.initialize(); // Downloads model on first useconst result = await anonymizer.anonymize('Hello John Smith from Acme Corp!');// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"
The NER model is case-sensitive — it works best on properly capitalized text. This means lowercase names like "tom" or "sarah" can be missed. Enable caseFallback to run a second NER pass on title-cased text and merge any new detections:
const anonymizer = createAnonymizer({ ner: { mode: 'quantized', caseFallback: true, }});await anonymizer.initialize();await anonymizer.anonymize('hey tom, can you ask sarah to call me?');// "hey <PII type="PERSON" id="1"/>, can you ask <PII type="PERSON" id="2"/> to call me?"
Without caseFallback, neither "tom" nor "sarah" would be detected.
Fallback detections receive a confidence penalty (multiplied by caseFallbackPenalty, default 0.85) since title-casing can introduce false positives. You can tune this:
Enabling caseFallback doubles NER inference time since it runs two passes. Use it when your input text contains informal or uncapitalized names (chat messages, transcripts, etc.).
This sends tokenized text to the server for inference instead of running ONNX locally. The server must accept the same input format and return logits in the expected shape.