Skip to content

Security

This page covers Hawk's security architecture, access control, audit logging, and optional AWS security services.

Authentication

Hawk uses OIDC (OpenID Connect) for all authentication. JWTs are validated at every service boundary — the API server, Middleman (LLM proxy), the web viewer, and Lambda functions.

Default: Cognito

When no OIDC provider is configured, Hawk creates a Cognito user pool automatically. Users are managed via the AWS Console or the helper scripts.

Managing Users

scripts/dev/create-cognito-user.sh <stack> user@example.com

Managing Model Access Groups

Cognito automatically includes group memberships in the cognito:groups claim of access tokens. Create groups matching the model groups configured in Middleman:

# Create a model access group
scripts/dev/manage-cognito-groups.sh <stack> create model-access-openai

# Add a user to the group
scripts/dev/manage-cognito-groups.sh <stack> add-user model-access-openai user@example.com

# List all groups
scripts/dev/manage-cognito-groups.sh <stack> list

Users who aren't in any group fall back to hawk:defaultPermissions (default: model-access-public), which grants access to models in the public group.

External OIDC Provider (Okta, Auth0, etc.)

For production deployments, we recommend using your organization's identity provider. Use the autodiscovery script to generate Pulumi config from your issuer URL:

python scripts/dev/discover-oidc.py <your-issuer-url> <your-client-id> <your-audience>

This prints the full set of hawk:oidc* config values to add to your Pulumi.<stack>.yaml. See Pulumi.example.yaml for the complete list of OIDC settings.

OIDC App Requirements

Your OIDC application must:

  • Use PKCE (Proof Key for Code Exchange) — no client secret
  • Support the authorization_code grant type
  • Include these redirect URIs:
    • http://localhost:18922/callback — Hawk CLI login
    • https://viewer.<your-domain>/oauth/complete — web viewer login

Required JWT Claims

Hawk extracts permissions from the permissions claim (or scp as a fallback). The claim value can be either a JSON array of strings or a space-separated string:

{
  "sub": "user123",
  "iss": "https://login.example.com/oauth2/default",
  "aud": "your-audience",
  "permissions": ["model-access-openai", "model-access-anthropic"]
}

Or equivalently:

{
  "permissions": "model-access-openai model-access-anthropic"
}

The group names must match the groups assigned to models in Middleman (see Model Groups below).

No permissions claim?

If the JWT has no permissions or scp claim, Hawk falls back to hawk:defaultPermissions (default: model-access-public). This is how Cognito users get access without custom claims.

Setting Up Your Identity Provider

The exact steps vary by provider, but the general approach is:

  1. Create an OIDC application in your IdP (Okta, Auth0, Entra ID, etc.) with PKCE and the redirect URIs above.

  2. Create groups in your IdP for each model access level you need (e.g., model-access-openai, model-access-anthropic). These must match the group names you assign to models in Middleman.

  3. Add a custom claim named permissions to your access tokens that includes the user's group memberships. Most IdPs support this via:

    • Okta: Custom claim on an authorization server using a Groups expression
    • Auth0: Post-login Action that adds groups to the access token
    • Entra ID (Azure AD): App roles or group claims in the token configuration
    • Keycloak: Protocol mapper for group membership
  4. Configure Hawk with your IdP's client ID, audience, and issuer URL in your Pulumi stack config.

Access Control

Model Groups

Model access is controlled through model groups. Each model configured in Middleman belongs to a group (e.g., model-access-openai, model-access-anthropic). Users must have a matching group in their JWT permissions claim to:

  • Use the model for evaluations (via Middleman)
  • View evaluation results that used that model (via the web viewer and API)

This means evaluation results are automatically restricted to users who have access to the models used in the evaluation.

Models are assigned to groups by Middleman admins when configuring the model. For example, a model with group: "model-access-openai" requires the user to have model-access-openai in their JWT permissions claim. The model-access-public group is the default and grants access to models intended for all users.

How Group Membership Flows

flowchart LR
    IdP["Identity Provider<br/>(Okta, Auth0, Cognito)"]
    JWT["JWT with<br/>permissions claim"]
    API["Hawk API"]
    MM["Middleman"]
    TB["Token Broker<br/>(Lambda)"]
    S3["S3 (eval logs)"]
    DB["PostgreSQL<br/>(RLS)"]

    IdP -->|"issues"| JWT
    JWT -->|"validates"| API
    JWT -->|"validates"| MM
    JWT -->|"exchanges for<br/>scoped credentials"| TB
    TB -->|"scoped access"| S3
    API -->|"RLS enforces<br/>group membership"| DB
  1. Identity Provider issues JWTs with model group memberships in the permissions claim
  2. Middleman validates the JWT and checks the user's groups before routing model API calls
  3. Token Broker validates the user's model group permissions, then exchanges the JWT for scoped AWS credentials tied to a specific job via AWS session tags
  4. PostgreSQL RLS (Row-Level Security) restricts database queries to evaluation results the user is authorized to see

Administrative Roles

Hawk has one administrative role: Middleman Admin. Admins can:

  • Create, update, and delete model configurations
  • View all models regardless of group membership
  • Manage provider API keys

To grant admin access, add a boolean claim to your OIDC access tokens:

Claim Purpose
https://middleman.metr.org/claims/admin Full admin access (production + non-production)
https://middleman.metr.org/claims/dev-admin Admin access for non-production environments only

The dev-admin claim is only accepted when the Middleman environment variable MIDDLEMAN_ACCEPT_DEV_ADMIN=true is set (which Hawk configures automatically for non-production stacks).

Example JWT with admin access:

{
  "sub": "admin-user",
  "permissions": ["model-access-openai"],
  "https://middleman.metr.org/claims/admin": true
}

In your IdP, create a group (e.g., middleman-admins) and configure a custom claim that emits true when the user is a member of that group.

Sensitive Model Protection

Hawk protects sensitive model information through codenames:

  • Models are given a public_name (codename) that is used everywhere in the system
  • The real model identifier (danger_name) is only known to Middleman
  • Evaluation results, logs, and the web viewer only show the public_name
  • Observability data (Datadog, Sentry) is scrubbed to prevent danger_name leakage — API keys, auth headers, and model identifiers are filtered at multiple layers

Users without the appropriate model group cannot access evaluation results that used that model.

Sandbox Isolation

Evaluations run in isolated Kubernetes pods with:

  • Separate namespaces — each evaluation gets its own namespace for the runner and sandbox pods
  • Network policies — Cilium network policies block egress to VPC infrastructure (primary subnet CIDR, EC2 IMDS, EKS Pod Identity) from sandbox pods, and can restrict no-internet pods to DNS-only egress
  • Resource limits — CPU and memory constraints per pod
  • StatefulSets — sandbox pods auto-restart on failure

Alternative Sandbox Providers

While Kubernetes is the default sandbox environment, Hawk's architecture does not strictly require it. EC2-based sandboxing and other providers (e.g., Modal) can be used as alternatives. The sandbox provider is configured per evaluation.

Audit Logging

Application-Level Logging

  • Hawk API — all API requests are logged to CloudWatch with user identity, action, and resource context
  • Middleman — model API calls are logged with user identity, model (public name only), and token usage. Request/response bodies are not logged.
  • Token Broker — credential exchanges are logged with user identity and requested scope

AWS CloudTrail

CloudTrail is enabled by default in AWS accounts and logs all AWS API calls. CloudTrail Insights (anomaly detection for API call rates and error rates) can be enabled separately via the infra-shared repository.

VPC Flow Logs

VPC flow logs are enabled for all traffic and sent to CloudWatch Logs at /aws/vpc/flowlogs/<env>. Retention follows hawk:cloudwatchLogsRetentionDays (default: 14 days).

Endpoint Protection (CrowdStrike Falcon)

Hawk optionally deploys the CrowdStrike Falcon sensor to protect infrastructure hosts. Enable it with hawk:enableCrowdstrike: "true" and a Secrets Manager secret containing your CrowdStrike API credentials (see Configuration).

What's Protected

Target OS / Arch Installation Method
GPU nodes (Karpenter) AL2023 / x86_64 Sensor RPM installed via EC2NodeClass userData at boot
Tailscale subnet router AL2023 / ARM64 Sensor RPM installed via cloud-init at boot
Default nodes (Karpenter) Bottlerocket / x86_64 DaemonSet via falcon-sensor Helm chart (requires Falcon Images Download scope)

The sensor is downloaded from the CrowdStrike API at instance boot using the Sensor Download: Read API scope. The DaemonSet approach (for Bottlerocket nodes) pulls a container image from registry.crowdstrike.com, which requires the Falcon Images Download: Read scope — part of the Falcon Cloud Security with Containers add-on.

Tailscale ZTA Integration

With the Falcon sensor on the subnet router, you can enable CrowdStrike ZTA with Tailscale to gate tailnet access based on device posture. This requires a separate API client with Hosts: Read and Zero Trust: Read scopes, configured in the Tailscale admin console under Device Posture Integrations. Note that ZTA scores apply to the router instance itself, not to traffic routed through it.

AWS Security Services

AWS security services (GuardDuty, Security Hub, AWS Config, CloudTrail Insights) are managed by the infra-shared repository, not by Hawk. See infra-shared for configuration details.

For production deployments, consider:

  • Setting hawk:eksPublicEndpoint: "false" and using Tailscale for private cluster access
  • Setting hawk:albInternal: "true" to make the ALB private (requires VPN)
  • Setting hawk:protectResources: "true" to prevent accidental deletion of S3 buckets and secrets

Monitoring & Observability

CloudWatch

All services log to CloudWatch by default. Log retention is configurable via hawk:cloudwatchLogsRetentionDays (default: 14 days).

Key log groups:

  • Hawk API logs
  • Middleman logs
  • Lambda function logs
  • GuardDuty findings (when enabled): /aws/events/guardduty/<env>
  • Security Hub findings (when enabled): /aws/events/securityhub/<env>

Datadog (Optional)

For richer monitoring, Hawk supports Datadog integration:

hawk:enableDatadog: "true"

This enables:

  • APM — distributed tracing across API, Middleman, and Lambda functions
  • Log forwarding — CloudWatch logs forwarded to Datadog
  • Custom metrics — token usage, import counts, evaluation durations
  • Sensitive data filteringdanger_name, API keys, and auth headers are scrubbed from all telemetry

Budget Alerts

Monitor AWS spending with budget alerts:

hawk:budgetLimit: "10000"
hawk:budgetNotificationEmails:
  - "team@example.com"
hawk:budgetNotificationThresholds:
  - 80
  - 100

External Dependencies

Hawk interacts with the following external services:

Service Purpose Required?
Docker Hub Pull base container images Yes (login recommended to avoid rate limits)
LLM Providers Model API calls via Middleman At least one provider API key required
OIDC Provider User authentication Optional (Cognito used by default)
Datadog Monitoring and observability Optional
Slack Budget alerts Optional
CrowdStrike Falcon Endpoint protection for EKS nodes and subnet router Optional
Cloudflare DNS delegation Optional
GitHub CI/CD via Pulumi Deploy Optional

Network Security

  • TLS everywhere — all external traffic uses TLS via ACM certificates
  • Private subnets — EKS nodes, RDS, and ECS tasks run in private subnets with no direct internet access
  • NAT Gateways — outbound internet access from private subnets goes through NAT gateways
  • Security groups — restrict traffic between components (ALB → ECS, ECS → RDS, etc.)
  • VPC endpoints — S3 traffic stays within the VPC via a Gateway endpoint
  • Cilium network policies — pod-level network isolation within Kubernetes