Skip to main content
Semantic enrichment adds contextual attributes to PII placeholders, helping translation systems maintain grammatical correctness across languages.

Why Semantic Attributes?

Many languages have grammatical gender agreement. Without knowing the gender of a person, translation quality suffers:
// Without semantic enrichment
Input:  "Thank <PII type="PERSON" id="1"/> for the help."
German: "Danke <PII type="PERSON" id="1"/> für die Hilfe."

// With semantic enrichment
Input:  "Thank <PII type="PERSON" gender="female" id="1"/> for the help."
German: "Danke ihr für die Hilfe."  // Correct feminine pronoun

Enable Semantic Enrichment

import { createAnonymizer } from 'rehydra';

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  semantic: { 
    enabled: true,
    onStatus: (status) => console.log(status),
  }
});

await anonymizer.initialize();

const result = await anonymizer.anonymize('Hello Maria Schmidt from Berlin!');
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"

Semantic Attributes

Person Gender

Attribute ValueMeaningExample Names
gender="male"Masculine nameJohn, Michael, Hans
gender="female"Feminine nameMaria, Sarah, Anna
gender="neutral"Ambiguous/unknownAlex, Jordan, Sam

Location Scope

Attribute ValueMeaningExamples
scope="city"City/townBerlin, Paris, Tokyo
scope="country"CountryGermany, France, Japan
scope="region"Region/stateBavaria, California, Hokkaido

Semantic Data

Semantic enrichment uses lookup databases (~12 MB total):
  • Name database: First names with gender associations
  • Location database: Cities, countries, regions with classifications

First-Use Download

const anonymizer = createAnonymizer({
  semantic: { 
    enabled: true,
    autoDownload: true,  // Default: auto-download if not cached
    onDownloadProgress: (progress) => {
      console.log(`${progress.file}: ${progress.percent}%`);
    }
  }
});

Manual Data Management

import { 
  isSemanticDataDownloaded,
  downloadSemanticData,
  clearSemanticDataCache 
} from 'rehydra';

// Check if data is cached
const hasData = await isSemanticDataDownloaded();

// Pre-download
await downloadSemanticData((progress) => {
  console.log(`${progress.file}: ${progress.percent}%`);
});

// Clear cache
await clearSemanticDataCache();

Title Extraction

When semantic enrichment is enabled, honorific titles are extracted and kept visible:
const result = await anonymizer.anonymize('Contact Dr. Maria Schmidt');
// "Contact Dr. <PII type="PERSON" gender="female" id="1"/>"
Supported titles:
  • Academic: Dr., Prof., PhD
  • Honorific: Mr., Mrs., Ms., Miss
  • Professional: Rev., Hon.
  • German: Herr, Frau, Dr.
  • French: M., Mme., Mlle.
  • And many more…
Titles remain visible because they’re often important for translation and don’t reveal the person’s identity on their own.

Locale Hints

Improve detection accuracy with locale hints:
const result = await anonymizer.anonymize(
  'Bonjour Jean-Pierre de Lyon!',
  'fr-FR'  // French locale
);
The locale helps with:
  • Name gender inference (culture-specific names)
  • Title recognition (Mr. vs Herr vs M.)

Configuration Options

const anonymizer = createAnonymizer({
  semantic: {
    enabled: true,
    autoDownload: true,
    onStatus: (status) => console.log(status),
    onDownloadProgress: (progress) => {
      console.log(`${progress.file}: ${progress.percent}%`);
    }
  },
  defaultPolicy: {
    enableSemanticMasking: true,  // Enable in policy (auto-set when semantic.enabled)
  }
});

Cache Locations

Semantic data is cached locally:

Node.js

PlatformLocation
macOS~/Library/Caches/rehydra/semantic-data/
Linux~/.cache/rehydra/semantic-data/
Windows%LOCALAPPDATA%/rehydra/semantic-data/

Browser

Uses IndexedDB for cross-session persistence.

Use Cases

German, French, Spanish, and many other languages have grammatical gender. Semantic attributes help MT systems:
EN: "Please contact <PII gender="female"/> for assistance."
DE: "Bitte kontaktieren Sie sie für Unterstützung."
Different location types use different prepositions:
"I'm in Berlin" (city) → "Ich bin in Berlin"
"I'm in Germany" (country) → "Ich bin in Deutschland"
The scope attribute helps translation systems choose correctly.
Beyond translation, semantic attributes enable:
  • Gender-aware text generation
  • Location-based content filtering
  • Name normalization

Next Steps