Extend Rehydra with custom recognizers for domain-specific identifiers like order numbers, employee IDs, or proprietary formats.
Quick Start
import { createAnonymizer , createCustomIdRecognizer , PIIType } from 'rehydra' ;
// Create a custom recognizer
const orderRecognizer = createCustomIdRecognizer ([
{
name: 'Order Number' ,
pattern: / \b ORD- [ A-Z0-9 ] {8} \b / g ,
type: PIIType . CASE_ID ,
},
]);
// Register it
const anonymizer = createAnonymizer ();
anonymizer . getRegistry (). register ( orderRecognizer );
await anonymizer . initialize ();
// Use it
const result = await anonymizer . anonymize ( 'Your order ORD-ABC12345 is shipped' );
// "Your order <PII type="CASE_ID" id="1"/> is shipped"
Custom ID Recognizer
For simple pattern-based detection, use createCustomIdRecognizer:
import { createCustomIdRecognizer , PIIType } from 'rehydra' ;
const recognizer = createCustomIdRecognizer ([
{
name: 'Order Number' , // Descriptive name
pattern: / \b ORD- [ A-Z0-9 ] {8} \b / g , // Regex pattern (must have 'g' flag)
type: PIIType . CASE_ID , // PII type to assign
},
{
name: 'Employee ID' ,
pattern: / \b EMP- [ 0-9 ] {6} \b / g ,
type: PIIType . CUSTOMER_ID ,
},
{
name: 'Support Ticket' ,
pattern: / \b TICKET- [ 0-9 ] + \b / gi , // Case-insensitive
type: PIIType . CASE_ID ,
},
]);
Always include the g (global) flag in your regex patterns. Without it, only the first match will be found.
Available PII Types for Custom Patterns
Type Use Case PIIType.CASE_IDOrder numbers, ticket IDs, reference numbers PIIType.CUSTOMER_IDCustomer IDs, employee IDs, account numbers PIIType.ACCOUNT_NUMBERFinancial account numbers PIIType.TAX_IDTax identification numbers PIIType.NATIONAL_IDNational ID numbers, SSN, etc.
Extending RegexRecognizer
For more control, extend the RegexRecognizer class:
import { RegexRecognizer , PIIType , SpanMatch } from 'rehydra' ;
class MyCustomRecognizer extends RegexRecognizer {
constructor () {
super (
'MyCustomRecognizer' , // Name
PIIType . CASE_ID , // PII type
/ \b MY- [ A-Z0-9 ] + \b / g , // Pattern
1.0 // Confidence (0.0-1.0)
);
}
// Override to add validation
protected validate ( match : string ) : boolean {
// Add custom validation logic
return match . length >= 6 && match . length <= 20 ;
}
// Override to modify confidence based on context
protected getConfidence ( match : string ) : number {
// Higher confidence for longer matches
return match . length > 10 ? 1.0 : 0.9 ;
}
}
const anonymizer = createAnonymizer ();
anonymizer . getRegistry (). register ( new MyCustomRecognizer ());
Built-in Recognizers
Rehydra includes these recognizers by default:
Recognizer PII Type Pattern Examples emailRecognizerEMAIL [email protected] phoneRecognizerPHONE +49 30 123456 ibanRecognizerIBAN DE89370400440532013000 bicSwiftRecognizerBIC_SWIFT COBADEFFXXX creditCardRecognizerCREDIT_CARD 4111111111111111 ipAddressRecognizerIP_ADDRESS 192.168.1.1, ::1 urlRecognizerURL https://example.com
Recognizer Registry
Manage recognizers through the registry:
const registry = anonymizer . getRegistry ();
// Register a recognizer
registry . register ( myRecognizer );
// List all recognizers
const recognizers = registry . getAll ();
// Get specific recognizer by name
const email = registry . get ( 'EmailRecognizer' );
// Remove a recognizer
registry . unregister ( 'MyCustomRecognizer' );
Create a New Registry
For complete control, create a custom registry:
import { createRegistry , emailRecognizer , phoneRecognizer } from 'rehydra' ;
// Start with empty registry
const registry = createRegistry ();
// Add only the recognizers you need
registry . register ( emailRecognizer );
registry . register ( phoneRecognizer );
registry . register ( myCustomRecognizer );
// Use custom registry
const anonymizer = createAnonymizer ({ registry });
Disabling Built-in Recognizers
To disable specific built-in recognizers:
import { createAnonymizer , PIIType } from 'rehydra' ;
const anonymizer = createAnonymizer ({
defaultPolicy: {
// Only enable specific regex types
regexEnabledTypes: new Set ([
PIIType . EMAIL ,
PIIType . PHONE ,
// Exclude IBAN, CREDIT_CARD, etc.
]),
}
});
Pattern Best Practices
Prevent matching substrings: // ❌ Bad: matches "ABCORD-12345XYZ"
pattern : /ORD- [ 0-9 ] + / g
// ✅ Good: only matches complete pattern
pattern : / \b ORD- [ 0-9 ] + \b / g
Avoid overly broad patterns: // ❌ Too broad: matches any numbers
pattern : / [ 0-9 ] + / g
// ✅ Specific: matches expected format
pattern : / \b ORD- [ 0-9 ] {6,10} \b / g
Validate matches for accuracy: class ValidatedRecognizer extends RegexRecognizer {
protected validate ( match : string ) : boolean {
// Check format
if ( ! match . startsWith ( 'ORD-' )) return false ;
// Check length
if ( match . length < 10 ) return false ;
// Check checksum if applicable
return this . validateChecksum ( match );
}
}
Testing Recognizers
import { describe , it , expect } from 'vitest' ;
import { createCustomIdRecognizer , PIIType , createAnonymizer } from 'rehydra' ;
describe ( 'Order Recognizer' , () => {
const recognizer = createCustomIdRecognizer ([
{ name: 'Order' , pattern: / \b ORD- [ A-Z0-9 ] {8} \b / g , type: PIIType . CASE_ID },
]);
it ( 'detects valid order numbers' , async () => {
const anonymizer = createAnonymizer ();
anonymizer . getRegistry (). register ( recognizer );
await anonymizer . initialize ();
const result = await anonymizer . anonymize ( 'Order: ORD-ABC12345' );
expect ( result . entities ). toHaveLength ( 1 );
expect ( result . entities [ 0 ]. type ). toBe ( 'CASE_ID' );
});
it ( 'ignores invalid patterns' , async () => {
const anonymizer = createAnonymizer ();
anonymizer . getRegistry (). register ( recognizer );
await anonymizer . initialize ();
const result = await anonymizer . anonymize ( 'Order: ORD-123' ); // Too short
expect ( result . entities ). toHaveLength ( 0 );
});
});
Next Steps