LLM Proxy (Middleman)¶

Middleman is Hawk's built-in LLM proxy. It runs on ECS Fargate and routes model API calls to providers (OpenAI, Anthropic, Google Vertex, DeepSeek, Fireworks, and more) with automatic token refresh and access control.

How It Works¶

When evaluations run on the cluster, Inspect AI sends model API calls through Middleman instead of directly to providers. Middleman:

Authenticates the request using the runner's scoped credentials
Routes the request to the correct provider API
Handles token refresh and retries
Enforces model group permissions

Setting Up API Keys¶

Store provider API keys in AWS Secrets Manager:

scripts/dev/set-api-keys.sh <env> OPENAI_API_KEY=sk-...

This stores the key and restarts Middleman. Set multiple keys at once:

scripts/dev/set-api-keys.sh <env> OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-...

Supported Providers¶

OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, DEEPINFRA_TOKEN, DEEPSEEK_API_KEY, FIREWORKS_API_KEY, META_API_KEY, MISTRAL_API_KEY, OPENROUTER_API_KEY, TOGETHER_API_KEY, XAI_API_KEY.

Known issue: Middleman crashes on startup if no GCP project is set

Middleman's startup unconditionally initializes Vertex URLs, which requires GOOGLE_CLOUD_PROJECT_FOR_PUBLIC_MODELS (or a project_id in GOOGLE_APPLICATION_CREDENTIALS_JSON) — even when no Vertex/Gemini models are configured. Symptom: Middleman fails ALB health checks (Target.Timeout on port 3500) and hawk eval-set returns Middleman timeout.

Workaround until fixed: set the config to a real GCP project (if you use Gemini) or a sentinel like none otherwise, then redeploy:

pulumi config set hawk:middlemanGcpProjectForPublicModels none
pulumi up

High Availability¶

By default Middleman runs a single replica. Setting hawk:highAvailability: "true" runs multiple replicas across AZs, which requires the shared Valkey cache (hawk:valkeyEnabled: "true" or an external hawk:valkeyUrl) — replicas share their provider-key/model caches and serialize secret writes through Valkey; without it they serve divergent caches and race secret writes. Deploys fail fast if HA is enabled without Valkey. See the configuration reference.

Bypassing the Proxy¶

To use your own API keys instead of Middleman, pass them as secrets and disable the proxy's token refresh:

runner:
  environment:
    INSPECT_ACTION_RUNNER_REFRESH_URL: ""

Then pass your API key as a secret:

hawk eval-set config.yaml --secret OPENAI_API_KEY

Managing Models¶

Model configurations are stored in the database and organized into model groups for access control — a user must belong to a model's group to use it. A fresh deploy starts with an empty model registry, so hawk eval-set fails with Middleman error: Models not found until you add at least one model.

Models are managed through Middleman's admin API, wrapped by the hawk proxy models CLI commands.

Granting admin¶

The admin API is gated by is_admin=true. There are two ways to grant it:

Auth0 / Okta: emit the https://middleman.metr.org/claims/admin: true JWT claim. Manage admins in your identity provider.
Cognito (default open-source deploy): set hawk:middlemanAdminGroups in your Pulumi config to a non-empty list (e.g. ["middleman-admin"]) and add users to that Cognito group. The default is empty, so the group-based admin path is opt-in.

For Cognito:

# 1. One-time per stack: opt in to the group-based admin path
pulumi config set --path 'hawk:middlemanAdminGroups[0]' middleman-admin
pulumi up   # short rolling restart of Middleman so the env var lands

# 2. Create the Cognito group and add yourself to it
scripts/dev/manage-cognito-groups.sh <stack> create middleman-admin
scripts/dev/manage-cognito-groups.sh <stack> add-user middleman-admin you@example.com

# 3. Re-authenticate so the new group appears in your token
hawk login

Adding and managing models¶

hawk proxy models add claude-haiku-4-5 \
    --group model-access-public \
    --config '{"lab":"anthropic","danger_name":"claude-haiku-4-5","are_details_secret":false,"dead":false,"vision":true,"max_tokens_keyword":"max_tokens","request_timeout_minutes":30,"stream":false}'

hawk proxy models list                       # show configured models
hawk proxy models get claude-haiku-4-5       # show one model
hawk proxy models update claude-haiku-4-5 --config '{...}'
hawk proxy models deactivate <name>          # disable without deleting
hawk proxy models reload                     # force cache reload
hawk proxy secrets list                      # list configured provider keys

Valid lab values include anthropic, openai, gemini, vertex, deepseek, mistral, xai, and more — see middleman/src/middleman/models.py.

Model group naming

Model groups use the prefix model-access-<name> (e.g. model-access-public), and the user's JWT must carry a matching group. For Cognito users without explicit group claims, hawk:defaultPermissions provides the fallback (default "model-access-public"). The admin group itself must not use the model-access- prefix and must not overlap with defaultPermissions — Middleman refuses to start if either constraint is violated.

Admin gate covers provider keys too

The admin gate covers both model management (/admin/models/) and provider-key rotation (/admin/secrets/provider-keys). Anyone who can run cognito-idp:AdminAddUserToGroup on your user pool can grant admin transitively — treat that IAM permission as equivalent to Middleman admin in policy reviews.

Revoking admin: removing a user from the admin Cognito group does not immediately revoke their existing access token; it stays admin until its TTL expires (1 hour by default for Cognito access tokens). Refresh tokens (30 days default) mint fresh access tokens without the removed claim, so the practical revocation window is the access-token TTL. For immediate revocation, run aws cognito-idp admin-user-global-sign-out --user-pool-id <pool> --username <email> — this invalidates all of that user's tokens.

Deploying Changes¶

Middleman runs on ECS Fargate. Deployments are triggered by pushing to the main branch, which builds a new Docker image and updates the ECS service via CI/CD.

Running Locally¶

cd middleman
# Add API keys to .env (see example.env)
docker compose up --build

Testing the Passthrough API¶

uv run scripts/exercise_passthrough.py --help

This script tests the passthrough API against multiple providers (Anthropic, OpenAI, OpenRouter).