Skip to content

CLI Reference

Authentication

Command Description
hawk login Authenticate via OAuth2 Device Authorization flow
hawk auth access-token Print a valid access token to stdout
hawk auth refresh-token Print the current refresh token

Evaluations

Command Description
hawk eval-set CONFIG Submit an evaluation set
hawk local eval-set CONFIG Run eval locally

hawk eval-set options:

Option Description
--image URI Full container image URI for the runner
--image-tag TAG Specify runner image tag (within the default repo)
--secrets-file FILE Load secrets from file (repeatable)
--secret NAME Pass env var as secret (repeatable)
--skip-confirm Skip unknown field warnings
--log-dir-allow-dirty Allow dirty log directory

Scans

Command Description
hawk scan run CONFIG Start a Scout scan
hawk scan resume [ID] Resume an interrupted scan
hawk local scan CONFIG Run scan locally

Monitoring

Command Description
hawk logs [JOB_ID] View logs (-f to follow, -n for line count)
hawk status [JOB_ID] JSON status report (--hours for log window)

hawk logs options:

Option Description
-n, --lines INT Number of lines to show (default: 100)
-f, --follow Follow mode — continuously poll for new logs
--hours INT Hours of data to search (default: 5 years)
--poll-interval FLOAT Seconds between polls in follow mode (default: 3.0)

Viewing Results

Command Description
hawk web [EVAL_SET_ID] Open eval set in browser
hawk view-sample UUID Open a specific sample in browser
hawk list jobs List your launched jobs (eval-sets and scans); --all for all visible jobs
hawk list eval-sets List all eval sets
hawk list evals [ID] List evals in an eval set
hawk list samples [ID] List samples in an eval set
hawk transcript UUID Download a sample transcript
hawk transcripts [ID] Download all transcripts for an eval set
hawk download-artifacts [ID] Download sample artifact files for an eval set

hawk list samples options:

Option Description
--eval TEXT Filter to a specific eval file
--limit INT Max samples to show (default: 50)

hawk transcript / hawk transcripts options:

Option Description
--output-dir DIR Write to files instead of stdout
--raw Raw JSON instead of markdown
--limit INT Limit number of transcripts

hawk download-artifacts options:

Option Description
--sample UUID Download artifacts for one sample only
--output-dir DIR / -o DIR Output directory (default: artifacts/<eval-set-id>)

Artifacts are written as <output-dir>/<sample-uuid>/<artifact-path>. When --output-dir is omitted, the output directory is artifacts/<eval-set-id>. Existing files are overwritten.

When EVAL_SET_ID is omitted, Hawk uses the last eval set from the current session.

Management

Command Description
hawk stop [EVAL_SET_ID] Stop eval gracefully, scoring partial work (--error, --sample UUID)
hawk delete [EVAL_SET_ID] Delete eval set's Kubernetes resources (logs are kept)
hawk edit-samples FILE Submit sample edits (JSON or JSONL)
hawk import PATH Import locally-produced .eval files into the warehouse (--name NAME)

Other

Command Description
hawk config Print the current CLI configuration
hawk models List models accessible via the LLM proxy
hawk scan-export SCANNER_RESULT_UUID Export scan results as CSV

Human Registry

Manage external participants and their SSH public keys. This feature allows humans to perform an evaluation, for example to create human baselines.

Command Description
hawk human register --name NAME --ssh-key KEY Register a new human
hawk human list List all registered humans
hawk human update NAME --ssh-key KEY Update a human's SSH public key
hawk human delete NAME Remove a human from the registry

Human Evaluations

Run evaluations where a registered human does the work inside the sandbox instead of an LLM agent.

Command Description
hawk human eval start CONFIG --human NAME Start a human evaluation (same secrets/options as hawk eval-set)
hawk human eval ssh-command [EVAL_SET_ID] Print a copy-paste-ready SSH command for the sandbox (defaults to the most recently started eval-set)

ssh-command polls the eval logs for the agent's SSH connection details and prints a ssh -J command that hops through the shared jumphost to reach the sandbox pod. Pass --timeout SECONDS (default 600) to bound how long it waits for the sandbox to come up.