Oracle Agent

The Oracle Agent runs your solution/solve.sh in the task environment and verifies it passes all tests. It's the first line of validation for your task.

The Oracle Agent will be run 3 times.

Getting Started

Practice Notebook

Download the Jupyter notebook for hands-on practice:

What You'll Learn

What the Oracle Agent is and how it works
How to run the Oracle Agent on your task
How to interpret Oracle output
How to debug failing runs
How to iterate and fix issues

What is the Oracle Agent?

The Oracle Agent is an automated agent that: 1. Starts your Docker environment 2. Executes your solution/solve.sh commands 3. Runs your tests to verify completion 4. Reports pass/fail results

If the Oracle Agent can't complete your task, neither can AI agents.

Key Concept: The Oracle Agent verifies that your task is solvable. If the Oracle can't solve your task, it's broken.

Running the Oracle Agent

# Basic run
harbor run -a oracle -p <task-folder>

# With verbose output
harbor run -a oracle -p <task-folder> -v

Expected Output

Successful Run

Starting task: my-task
Building Docker environment...
Running oracle solution...
  ✓ Step 1 completed
  ✓ Step 2 completed
  ✓ Step 3 completed
Running tests...
  ✓ test_output_exists PASSED
  ✓ test_format_correct PASSED
  ✓ test_values_valid PASSED

RESULT: PASS

Failed Run

Starting task: my-task
Building Docker environment...
Running oracle solution...
  ✓ Step 1 completed
  ✗ Step 2 failed: command not found

RESULT: FAIL
Error: Solution did not complete successfully

Debugging Failures

Debugging Workflow

When the Oracle Agent fails, follow this workflow:

Step 1: Identify the Failure - Read the error message carefully - Which step failed? - What was the error? - What file/command was involved?

Step 2: Reproduce Interactively

harbor tasks start-env -p <task-folder> -i

Inside the container, run commands one by one to find the issue.

Step 3: Fix and Re-test 1. Update solution/solve.sh or environment/Dockerfile 2. Run Oracle again 3. Repeat until passing

Solution Fails

If your solution doesn't run:

Enter interactive mode: bash harbor tasks start-env -p <task-folder> -i
Run commands manually to find the failing step
Check for: - Typos in commands - Missing dependencies - Wrong file paths - Permission issues

Tests Fail

If the solution runs but tests fail:

Check test output for specific failures
Verify your solution actually produces expected output
Check for: - Incorrect output format - Missing files - Off-by-one errors - Edge cases not handled

Environment Issues

If the container won't build:

Check Dockerfile syntax in environment/Dockerfile
Verify base image exists and is accessible
Check dependency installation commands
Try building manually: bash cd <task-folder>/environment docker build -t test .

Common Issues

Issue	Cause	Solution
"command not found"	Missing dependency	Add to environment/Dockerfile
"file not found"	Wrong path	Use absolute paths
"permission denied"	File permissions	Check chmod in environment/Dockerfile
Tests timeout	Solution too slow	Optimize or increase timeout

Common Debugging Scenarios

Missing File:

Error: No such file: /app/data/input.csv

Fix: Check environment/Dockerfile COPY commands and file paths

Command Not Found:

Error: grep: command not found

Fix: Add package to environment/Dockerfile apt-get install

Test Assertion Failed:

AssertionError: Expected 42, got 41

Fix: Debug your solution logic

Timeout:

Error: Task exceeded timeout (1800s)

Fix: Optimize solution or increase timeout in task.toml ([agent].timeout_sec)

Oracle vs Real Agents

Oracle Agent	Real Agents (GPT-5, etc.)
Runs your solution/solve.sh	Generate their own solution
Always deterministic	May vary between runs
Tests task validity	Tests task difficulty
Must always pass	May fail (that's the goal!)

Best Practices

Run oracle early and often - Don't wait until submission
Fix oracle failures first - If oracle fails, task is broken
Check all test output - Understand why tests pass/fail
Keep environment minimal - Easier to debug

Practice Exercise

Using the practice notebook:

Load the sample task
Run the Oracle (it will fail)
Identify the issue
Fix the solution
Run Oracle again until passing

Next Steps

Run against real agents
Review CI checks