Quick Start¶
Just want to run evals?
If you already have access to a Hawk deployment, you just need the CLI. See Installation for setup and usage.
This gets you from zero to a working Hawk deployment on AWS. You'll need an AWS account and a domain name. You can use your existing OIDC identity provider for authentication, or a Cognito user pool by default.
Do not deploy to us-east-1
Hawk has two known us-east-1-specific failure modes:
- EKS does not support
us-east-1eas a control-plane AZ, but Hawk's VPC uses every available AZ. Workaround ininfra/core/__init__.pyfilters it but causes destructive churn for existing 1e subnets. - us-east-1 uses the legacy
ec2.internalDNS suffix instead of<region>.compute.internal. Bottlerocket'splutodoesn't handle this and EKS nodes never join the cluster → no evals can run.
Use us-west-2 (project default, most-tested) or an EU region like eu-west-1 / eu-central-1. This warning should be removed once both issues are fixed upstream.
1. Install prerequisites¶
You also need Docker running — the deploy builds ~12 container images.
Or on Linux, install Pulumi, uv, the AWS CLI, Python 3.13+, Node.js 22 (see .nvmrc), pnpm, Docker, and jq.
2. Clone the repo¶
3. Set up Pulumi state backend¶
Create an S3 bucket and KMS key for Pulumi state:
aws s3 mb s3://my-org-hawk-pulumi-state # must be globally unique
aws kms create-alias --alias-name alias/pulumi-secrets \
--target-key-id $(aws kms create-key --query KeyMetadata.KeyId --output text)
Log in to the S3 backend:
Credential troubleshooting
If pulumi login fails with NoCredentialProviders, your AWS credentials aren't visible to Pulumi. Make sure you ran aws configure (not just aws login, which doesn't persist credentials for other tools). If using SSO profiles, ensure AWS_PROFILE is set, or export credentials explicitly:
## 4. Create and configure your stack
```bash
cd infra
pulumi stack init my-org --secrets-provider="awskms://alias/pulumi-secrets"
cp ../Pulumi.example.yaml ../Pulumi.my-org.yaml
Edit Pulumi.my-org.yaml with your values. At minimum, you need:
Replace domain placeholders
Names like example.com are IANA documentation domains, not domains you can register. Use DNS names you control; the values in the example config are illustrative only.
config:
aws:region: us-west-2
hawk:domain: hawk.example.com # domain you control — used for API and service routing
hawk:publicDomain: example.com # parent domain for DNS zones and TLS certs
hawk:primarySubnetCidr: "10.0.0.0/16"
That's enough to get started. The environment name defaults to your stack name. Hawk will create a Cognito user pool for authentication automatically.
If you already have an OIDC provider (Okta, Auth0, etc.), you can use it instead:
# Optional: use your own OIDC provider instead of Cognito
hawk:oidcClientId: "your-client-id"
hawk:oidcAudience: "your-audience"
hawk:oidcIssuer: "https://login.example.com/oauth2/default"
5. Deploy¶
Before your first deploy, make sure Docker Hub authentication is set up — the build pulls base images from Docker Hub, which rate-limits anonymous pulls:
docker login # Docker Hub — required; anonymous pulls are rate-limited (https://hub.docker.com/)
docker login dhi.io # Docker Hardened Images — Hawk's Python base lives here (free Community tier; same Docker Hub credentials work)
Secrets encryption (AWS KMS)
With pulumi stack init ... --secrets-provider="awskms://alias/pulumi-secrets" (step 4), secret stack configuration is encrypted using KMS, not a passphrase. Do not set PULUMI_CONFIG_PASSPHRASE or rely on passphrase-based encryption for Hawk stacks. If Pulumi prompts for a passphrase, the stack is probably not using the KMS secrets provider — align the stack with step 4 (see Pulumi: changing secrets providers) instead of configuring a passphrase.
This creates roughly 200+ AWS resources including a VPC, EKS cluster, ALB, ECS services, Aurora PostgreSQL, S3 buckets, Lambda functions, and more. First deploy takes about 15-20 minutes.
Custom domain / DNS setup
If you want TLS certificates and public DNS for your deployment, set hawk:createPublicZone: "true" in your stack config. This creates a Route 53 hosted zone for your publicDomain. You'll then need to delegate DNS to this zone — see Configuration Reference: DNS / Cloudflare for options including automated Cloudflare delegation.
6. Set up LLM API keys¶
Hawk routes model API calls through its built-in LLM proxy (Middleman). You need to provide at least one provider's API key:
This stores the key in Secrets Manager and restarts Middleman. You can set multiple keys at once:
Replace <env> with your hawk:env value (e.g., production). Supported providers: OpenAI, Anthropic, Gemini, DeepInfra, DeepSeek, Fireworks, Mistral, OpenRouter, Together, xAI.
7. Create a user (Cognito only)¶
If you're using the default Cognito authentication, create a user:
The script reads the Cognito user pool from your Pulumi stack outputs, creates the user, and prints the login credentials. Skip this step if you're using your own OIDC provider.
To control which models a user can access, add them to Cognito groups matching the model groups configured in Middleman:
scripts/dev/manage-cognito-groups.sh <stack> create model-access-openai
scripts/dev/manage-cognito-groups.sh <stack> add-user model-access-openai you@example.com
See Security: Access Control for details.
8. Install the Hawk CLI and run your first eval¶
uv pip install "hawk[cli] @ git+https://github.com/METR/hawk#subdirectory=hawk"
# Configure the CLI to point to your deployment
uv run python scripts/dev/generate-env.py <stack> > hawk/.env
hawk login
hawk eval-set hawk/examples/simple.eval-set.yaml
hawk logs -f # watch it run
hawk web # open results in browser