LLM Proxy

The LLM proxy middleware intercepts requests to LLM APIs, anonymizes PII in outgoing messages, and rehydrates PII in responses — all transparently.

The proxy module is only available in Node.js and Bun. It is not included in the browser build.

Supported Providers

Provider	Detection	Streaming
OpenAI (and compatible APIs)	URL, `Bearer sk-` header	SSE
Anthropic	URL, `x-api-key`/`anthropic-version` headers	SSE

Integration Methods

From simplest to most flexible:

Method	Use Case
`wrapLLMClient()`	Wrap an OpenAI or Anthropic SDK client
`createRehydraFetch()`	Drop-in `fetch` replacement
`createRehydraProxy()`	HTTP middleware for any framework
`createRehydraProxyServer()`	Standalone proxy server

Wrap an SDK Client

The simplest approach — wrap your existing OpenAI or Anthropic client:

import OpenAI from 'openai';
import { wrapLLMClient } from 'rehydra/proxy';
import { InMemoryKeyProvider, SQLitePIIStorageProvider } from 'rehydra';

const storage = new SQLitePIIStorageProvider('./pii.db');
await storage.initialize();

const client = new OpenAI();

const wrappedClient = wrapLLMClient(client, {
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

// Use exactly as before — PII is anonymized/rehydrated automatically
const response = await wrappedClient.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'My name is John Smith and my email is john@acme.com' }],
});

console.log(response.choices[0].message.content);
// Response contains original PII values (rehydrated)

Works the same way with Anthropic:

import Anthropic from '@anthropic-ai/sdk';
import { wrapLLMClient } from 'rehydra/proxy';

const client = new Anthropic();
const wrappedClient = wrapLLMClient(client, {
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

Custom Fetch

Replace the fetch function used by any SDK:

import { createRehydraFetch } from 'rehydra/proxy';
import { InMemoryKeyProvider, SQLitePIIStorageProvider } from 'rehydra';

const storage = new SQLitePIIStorageProvider('./pii.db');
await storage.initialize();

const rehydraFetch = createRehydraFetch({
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

// Use with OpenAI SDK
const client = new OpenAI({ fetch: rehydraFetch });

// Or use directly
const response = await rehydraFetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Contact john@example.com' }],
  }),
});

Proxy Middleware

For framework integration (Hono, Bun.serve, etc.):

import { createRehydraProxy } from 'rehydra/proxy';
import { InMemoryKeyProvider, SQLitePIIStorageProvider } from 'rehydra';

const storage = new SQLitePIIStorageProvider('./pii.db');
await storage.initialize();

const proxy = createRehydraProxy({
  upstream: 'https://api.openai.com',
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

// Use with Bun.serve
Bun.serve({
  port: 8080,
  fetch: proxy,
});

// Or with Hono
import { Hono } from 'hono';
const app = new Hono();
app.all('/v1/*', (c) => proxy(c.req.raw));

Configure your LLM client to point at the proxy:

const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
});

Standalone Proxy Server

Start a proxy server with one function call:

import { createRehydraProxyServer } from 'rehydra/proxy';
import { InMemoryKeyProvider, SQLitePIIStorageProvider } from 'rehydra';

const storage = new SQLitePIIStorageProvider('./pii.db');
await storage.initialize();

const server = await createRehydraProxyServer({
  upstream: 'https://api.openai.com',
  port: 8080,
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

console.log(`Proxy running at http://${server.host}:${server.port}`);

// Point any client at the proxy
// const client = new OpenAI({ baseURL: 'http://localhost:8080/v1' });

// Later: shut down
await server.close();

Streaming Support

All proxy methods support streaming (SSE) responses. PII is rehydrated in each streamed chunk:

const wrappedClient = wrapLLMClient(client, {
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => 'session-123',
});

// Streaming works transparently
const stream = await wrappedClient.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Tell me about John Smith' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
  // PII values are rehydrated in each chunk
}

Streaming Internals

PII placeholders like <PII type="EMAIL" id="1"/> can span SSE chunk boundaries. The proxy buffers incomplete tags across chunks and only rehydrates once a complete tag is available, so streaming rehydration is reliable regardless of how the upstream chunks its response. You can disable streaming rehydration if needed:

const wrappedClient = wrapLLMClient(client, {
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  handleStreaming: false,  // SSE responses pass through without rehydration
});

Passthrough Behavior

The proxy only intercepts POST requests with application/json content type. Everything else is forwarded unchanged:

Non-POST requests (GET, OPTIONS, etc.) — passed through to upstream
Non-JSON content types — passed through to upstream
Non-JSON responses — passed through with original status code

This means health checks, CORS preflight, and other non-chat requests work without interference.

Tool Call Handling

The proxy automatically anonymizes and rehydrates tool/function call arguments. If the LLM returns a tool call whose arguments contain PII placeholders, Rehydra rehydrates them before your code sees the result. This works for both non-streaming and streaming responses:

const wrappedClient = wrapLLMClient(client, {
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: () => 'session-123',
});

const response = await wrappedClient.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Send an email to John Smith at john@acme.com' }],
  tools: [{
    type: 'function',
    function: {
      name: 'send_email',
      parameters: {
        type: 'object',
        properties: {
          to: { type: 'string' },
          name: { type: 'string' },
        },
      },
    },
  }],
});

// Tool call arguments are rehydrated — original PII values restored
const args = JSON.parse(response.choices[0].message.tool_calls[0].function.arguments);
// { to: "john@acme.com", name: "John Smith" }

In streaming mode, tool call argument chunks are buffered per tool call index and rehydrated when the tool call completes.

Automated Tool Execution Loop

For server-side agentic workflows, the proxy can manage the full multi-round tool execution loop automatically. Provide an onToolCall callback and the proxy will:

Rehydrate tool call arguments (so your function receives real PII values)
Call your callback with the tool name and parsed arguments
Anonymize the tool result before sending it back to the LLM
Repeat until the LLM responds with no tool calls, or maxToolRounds is reached

import OpenAI from 'openai';
import { createRehydraFetch } from 'rehydra/proxy';
import { InMemoryKeyProvider, SQLitePIIStorageProvider } from 'rehydra';

const storage = new SQLitePIIStorageProvider('./pii.db');
await storage.initialize();

const rehydraFetch = createRehydraFetch({
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: () => 'session-1',

  // Tool execution callback — arguments are already rehydrated
  onToolCall: async (name, args, toolCallId) => {
    if (name === 'send_email') {
      return await sendEmail(args.to as string, args.body as string);
    }
    if (name === 'lookup_user') {
      return await db.users.find(args.email as string);
    }
    return { error: `Unknown tool: ${name}` };
  },

  maxToolRounds: 5,  // default: 10
});

const client = new OpenAI({ fetch: rehydraFetch });

// The proxy handles the full tool loop — this single call may
// result in multiple LLM round-trips behind the scenes
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Send a welcome email to john@acme.com' }],
  tools: [/* your tool definitions */],
});

// Final response (after all tool rounds complete)
console.log(response.choices[0].message.content);

The automated tool loop only works with non-streaming requests. For streaming tool calls, handle the tool loop manually using the rehydrated arguments from each streamed response.

PII System Instruction

When PII is detected in an outgoing request, the proxy automatically injects a system instruction telling the LLM to preserve PII placeholder tags in its response. This prevents the model from inventing replacement values. The instruction is injected as an OpenAI system message or an Anthropic system field, depending on the provider. You can customize this behavior with the systemInstruction config option:

const rehydraFetch = createRehydraFetch({
  // ...

  // Custom instruction
  systemInstruction: 'Do not modify any <PII> XML tags in the conversation.',

  // Or disable entirely
  systemInstruction: false,

  // Or omit for the built-in default (recommended)
});

Value	Behavior
`undefined` (default)	Built-in instruction injected when PII is detected
`string`	Your custom instruction, injected when PII is detected
`false`	No instruction injected

Session Management

Use getSessionId to associate requests with sessions. This enables consistent entity IDs and cross-request rehydration:

const wrappedClient = wrapLLMClient(client, {
  anonymizer: { ner: { mode: 'quantized' } },
  keyProvider: new InMemoryKeyProvider(),
  piiStorageProvider: storage,
  getSessionId: (req) => {
    // Extract session from custom header, URL, or other context
    const url = new URL(req.url);
    return url.searchParams.get('session') || 'default';
  },
});

Error Handling

When the proxy encounters an error, it returns a JSON response with the following shape:

{
  "error": {
    "message": "description of what went wrong",
    "type": "rehydra_proxy_error"
  }
}

Status Code	Meaning
400	Invalid JSON in the request body
500	Internal proxy error during anonymization or rehydration
502	Upstream LLM API is unreachable
503	Proxy not ready (initialization failed)

Supported Providers

Integration Methods

Wrap an SDK Client

Custom Fetch

Proxy Middleware

Standalone Proxy Server

Streaming Support

Streaming Internals

Passthrough Behavior

Tool Call Handling

Automated Tool Execution Loop

PII System Instruction

Session Management

Error Handling

Next Steps

Streaming

Proxy API Reference

​Supported Providers

​Integration Methods

​Wrap an SDK Client

​Custom Fetch

​Proxy Middleware

​Standalone Proxy Server

​Streaming Support

​Streaming Internals

​Passthrough Behavior

​Tool Call Handling

​Automated Tool Execution Loop

​PII System Instruction

​Session Management

​Error Handling

​Next Steps

Streaming

Proxy API Reference

Supported Providers

Integration Methods

Wrap an SDK Client

Custom Fetch

Proxy Middleware

Standalone Proxy Server

Streaming Support

Streaming Internals

Passthrough Behavior

Tool Call Handling

Automated Tool Execution Loop

PII System Instruction

Session Management

Error Handling

Next Steps