Skip to main content
Extend Rehydra with custom recognizers for domain-specific identifiers like order numbers, employee IDs, or proprietary formats.

Quick Start

import { createAnonymizer, createCustomIdRecognizer, PIIType } from 'rehydra';

// Create a custom recognizer
const orderRecognizer = createCustomIdRecognizer([
  {
    name: 'Order Number',
    pattern: /\bORD-[A-Z0-9]{8}\b/g,
    type: PIIType.CASE_ID,
  },
]);

// Register it
const anonymizer = createAnonymizer();
anonymizer.getRegistry().register(orderRecognizer);
await anonymizer.initialize();

// Use it
const result = await anonymizer.anonymize('Your order ORD-ABC12345 is shipped');
// "Your order <PII type="CASE_ID" id="1"/> is shipped"

Custom ID Recognizer

For simple pattern-based detection, use createCustomIdRecognizer:
import { createCustomIdRecognizer, PIIType } from 'rehydra';

const recognizer = createCustomIdRecognizer([
  {
    name: 'Order Number',           // Descriptive name
    pattern: /\bORD-[A-Z0-9]{8}\b/g,  // Regex pattern (must have 'g' flag)
    type: PIIType.CASE_ID,          // PII type to assign
  },
  {
    name: 'Employee ID',
    pattern: /\bEMP-[0-9]{6}\b/g,
    type: PIIType.CUSTOMER_ID,
  },
  {
    name: 'Support Ticket',
    pattern: /\bTICKET-[0-9]+\b/gi,   // Case-insensitive
    type: PIIType.CASE_ID,
  },
]);
Always include the g (global) flag in your regex patterns. Without it, only the first match will be found.

Available PII Types for Custom Patterns

TypeUse Case
PIIType.CASE_IDOrder numbers, ticket IDs, reference numbers
PIIType.CUSTOMER_IDCustomer IDs, employee IDs, account numbers
PIIType.ACCOUNT_NUMBERFinancial account numbers
PIIType.TAX_IDTax identification numbers
PIIType.NATIONAL_IDNational ID numbers, SSN, etc.

Extending RegexRecognizer

For more control, extend the RegexRecognizer class:
import { RegexRecognizer, PIIType, SpanMatch } from 'rehydra';

class MyCustomRecognizer extends RegexRecognizer {
  constructor() {
    super(
      'MyCustomRecognizer',     // Name
      PIIType.CASE_ID,          // PII type
      /\bMY-[A-Z0-9]+\b/g,      // Pattern
      1.0                       // Confidence (0.0-1.0)
    );
  }

  // Override to add validation
  protected validate(match: string): boolean {
    // Add custom validation logic
    return match.length >= 6 && match.length <= 20;
  }

  // Override to modify confidence based on context
  protected getConfidence(match: string): number {
    // Higher confidence for longer matches
    return match.length > 10 ? 1.0 : 0.9;
  }
}

const anonymizer = createAnonymizer();
anonymizer.getRegistry().register(new MyCustomRecognizer());

Built-in Recognizers

Rehydra includes these recognizers by default:
RecognizerPII TypePattern Examples
emailRecognizerEMAIL[email protected]
phoneRecognizerPHONE+49 30 123456
ibanRecognizerIBANDE89370400440532013000
bicSwiftRecognizerBIC_SWIFTCOBADEFFXXX
creditCardRecognizerCREDIT_CARD4111111111111111
ipAddressRecognizerIP_ADDRESS192.168.1.1, ::1
urlRecognizerURLhttps://example.com

Recognizer Registry

Manage recognizers through the registry:
const registry = anonymizer.getRegistry();

// Register a recognizer
registry.register(myRecognizer);

// List all recognizers
const recognizers = registry.getAll();

// Get specific recognizer by name
const email = registry.get('EmailRecognizer');

// Remove a recognizer
registry.unregister('MyCustomRecognizer');

Create a New Registry

For complete control, create a custom registry:
import { createRegistry, emailRecognizer, phoneRecognizer } from 'rehydra';

// Start with empty registry
const registry = createRegistry();

// Add only the recognizers you need
registry.register(emailRecognizer);
registry.register(phoneRecognizer);
registry.register(myCustomRecognizer);

// Use custom registry
const anonymizer = createAnonymizer({ registry });

Disabling Built-in Recognizers

To disable specific built-in recognizers:
import { createAnonymizer, PIIType } from 'rehydra';

const anonymizer = createAnonymizer({
  defaultPolicy: {
    // Only enable specific regex types
    regexEnabledTypes: new Set([
      PIIType.EMAIL,
      PIIType.PHONE,
      // Exclude IBAN, CREDIT_CARD, etc.
    ]),
  }
});

Pattern Best Practices

Prevent matching substrings:
// ❌ Bad: matches "ABCORD-12345XYZ"
pattern: /ORD-[0-9]+/g

// ✅ Good: only matches complete pattern
pattern: /\bORD-[0-9]+\b/g
Avoid overly broad patterns:
// ❌ Too broad: matches any numbers
pattern: /[0-9]+/g

// ✅ Specific: matches expected format
pattern: /\bORD-[0-9]{6,10}\b/g
Validate matches for accuracy:
class ValidatedRecognizer extends RegexRecognizer {
  protected validate(match: string): boolean {
    // Check format
    if (!match.startsWith('ORD-')) return false;
    
    // Check length
    if (match.length < 10) return false;
    
    // Check checksum if applicable
    return this.validateChecksum(match);
  }
}
Complex patterns can slow detection:
// ❌ Slow: complex lookahead/lookbehind
pattern: /(?<=ID:)\s*[A-Z]+(?=\s)/g

// ✅ Fast: simple pattern + post-processing
pattern: /ID:\s*[A-Z]+/g
// Then strip "ID:" in validate()

Testing Recognizers

import { describe, it, expect } from 'vitest';
import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';

describe('Order Recognizer', () => {
  const recognizer = createCustomIdRecognizer([
    { name: 'Order', pattern: /\bORD-[A-Z0-9]{8}\b/g, type: PIIType.CASE_ID },
  ]);

  it('detects valid order numbers', async () => {
    const anonymizer = createAnonymizer();
    anonymizer.getRegistry().register(recognizer);
    await anonymizer.initialize();

    const result = await anonymizer.anonymize('Order: ORD-ABC12345');
    expect(result.entities).toHaveLength(1);
    expect(result.entities[0].type).toBe('CASE_ID');
  });

  it('ignores invalid patterns', async () => {
    const anonymizer = createAnonymizer();
    anonymizer.getRegistry().register(recognizer);
    await anonymizer.initialize();

    const result = await anonymizer.anonymize('Order: ORD-123');  // Too short
    expect(result.entities).toHaveLength(0);
  });
});

Next Steps