CLI Reference
Authentication
| Command |
Description |
hawk login |
Authenticate via OAuth2 Device Authorization flow |
hawk auth access-token |
Print a valid access token to stdout |
hawk auth refresh-token |
Print the current refresh token |
Evaluations
| Command |
Description |
hawk eval-set CONFIG |
Submit an evaluation set |
hawk local eval-set CONFIG |
Run eval locally |
hawk eval-set options:
| Option |
Description |
--image URI |
Full container image URI for the runner |
--image-tag TAG |
Specify runner image tag (within the default repo) |
--secrets-file FILE |
Load secrets from file (repeatable) |
--secret NAME |
Pass env var as secret (repeatable) |
--skip-confirm |
Skip unknown field warnings |
--log-dir-allow-dirty |
Allow dirty log directory |
Scans
| Command |
Description |
hawk scan run CONFIG |
Start a Scout scan |
hawk scan resume [ID] |
Resume an interrupted scan |
hawk local scan CONFIG |
Run scan locally |
Monitoring
| Command |
Description |
hawk logs [JOB_ID] |
View logs (-f to follow, -n for line count) |
hawk status [JOB_ID] |
JSON status report (--hours for log window) |
hawk logs options:
| Option |
Description |
-n, --lines INT |
Number of lines to show (default: 100) |
-f, --follow |
Follow mode — continuously poll for new logs |
--hours INT |
Hours of data to search (default: 5 years) |
--poll-interval FLOAT |
Seconds between polls in follow mode (default: 3.0) |
Viewing Results
| Command |
Description |
hawk web [EVAL_SET_ID] |
Open eval set in browser |
hawk view-sample UUID |
Open a specific sample in browser |
hawk list jobs |
List your launched jobs (eval-sets and scans); --all for all visible jobs |
hawk list eval-sets |
List all eval sets |
hawk list evals [ID] |
List evals in an eval set |
hawk list samples [ID] |
List samples in an eval set |
hawk transcript UUID |
Download a sample transcript |
hawk transcripts [ID] |
Download all transcripts for an eval set |
hawk download-artifacts [ID] |
Download sample artifact files for an eval set |
hawk list samples options:
| Option |
Description |
--eval TEXT |
Filter to a specific eval file |
--limit INT |
Max samples to show (default: 50) |
hawk transcript / hawk transcripts options:
| Option |
Description |
--output-dir DIR |
Write to files instead of stdout |
--raw |
Raw JSON instead of markdown |
--limit INT |
Limit number of transcripts |
hawk download-artifacts options:
| Option |
Description |
--sample UUID |
Download artifacts for one sample only |
--output-dir DIR / -o DIR |
Output directory (default: artifacts/<eval-set-id>) |
Artifacts are written as <output-dir>/<sample-uuid>/<artifact-path>. When --output-dir is omitted, the output directory is artifacts/<eval-set-id>. Existing files are overwritten.
When EVAL_SET_ID is omitted, Hawk uses the last eval set from the current session.
Management
| Command |
Description |
hawk stop [EVAL_SET_ID] |
Stop eval gracefully, scoring partial work (--error, --sample UUID) |
hawk delete [EVAL_SET_ID] |
Delete eval set's Kubernetes resources (logs are kept) |
hawk edit-samples FILE |
Submit sample edits (JSON or JSONL) |
hawk import PATH |
Import locally-produced .eval files into the warehouse (--name NAME) |
Other
| Command |
Description |
hawk config |
Print the current CLI configuration |
hawk models |
List models accessible via the LLM proxy |
hawk scan-export SCANNER_RESULT_UUID |
Export scan results as CSV |
Human Registry
Manage external participants and their SSH public keys. This feature allows humans to perform an evaluation, for example to create human baselines.
| Command |
Description |
hawk human register --name NAME --ssh-key KEY |
Register a new human |
hawk human list |
List all registered humans |
hawk human update NAME --ssh-key KEY |
Update a human's SSH public key |
hawk human delete NAME |
Remove a human from the registry |
Human Evaluations
Run evaluations where a registered human does the work inside the sandbox instead of an LLM agent.
| Command |
Description |
hawk human eval start CONFIG --human NAME |
Start a human evaluation (same secrets/options as hawk eval-set) |
hawk human eval ssh-command [EVAL_SET_ID] |
Print a copy-paste-ready SSH command for the sandbox (defaults to the most recently started eval-set) |
ssh-command polls the eval logs for the agent's SSH connection details and prints a ssh -J command that hops through the shared jumphost to reach the sandbox pod. Pass --timeout SECONDS (default 600) to bound how long it waits for the sandbox to come up.