Oracle Agent
The Oracle Agent runs your solution/solve.sh in the task environment and verifies it passes all tests. It's the first line of validation for your task.
The Oracle Agent will be run 3 times.
Getting Started
Practice Notebook
Download the Jupyter notebook for hands-on practice:
What You'll Learn
- What the Oracle Agent is and how it works
- How to run the Oracle Agent on your task
- How to interpret Oracle output
- How to debug failing runs
- How to iterate and fix issues
What is the Oracle Agent?
The Oracle Agent is an automated agent that: 1. Starts your Docker environment 2. Executes your solution/solve.sh commands 3. Runs your tests to verify completion 4. Reports pass/fail results
If the Oracle Agent can't complete your task, neither can AI agents.
Key Concept: The Oracle Agent verifies that your task is solvable. If the Oracle can't solve your task, it's broken.
Running the Oracle Agent
# Basic run
harbor run -a oracle -p <task-folder>
# With verbose output
harbor run -a oracle -p <task-folder> -v
Expected Output
Successful Run
Starting task: my-task
Building Docker environment...
Running oracle solution...
✓ Step 1 completed
✓ Step 2 completed
✓ Step 3 completed
Running tests...
✓ test_output_exists PASSED
✓ test_format_correct PASSED
✓ test_values_valid PASSED
RESULT: PASS
Failed Run
Starting task: my-task
Building Docker environment...
Running oracle solution...
✓ Step 1 completed
✗ Step 2 failed: command not found
RESULT: FAIL
Error: Solution did not complete successfully
Debugging Failures
Debugging Workflow
When the Oracle Agent fails, follow this workflow:
Step 1: Identify the Failure - Read the error message carefully - Which step failed? - What was the error? - What file/command was involved?
Step 2: Reproduce Interactively
harbor tasks start-env -p <task-folder> -i
Inside the container, run commands one by one to find the issue.
Step 3: Fix and Re-test
1. Update solution/solve.sh or environment/Dockerfile
2. Run Oracle again
3. Repeat until passing
Solution Fails
If your solution doesn't run:
-
Enter interactive mode:
bash harbor tasks start-env -p <task-folder> -i -
Run commands manually to find the failing step
-
Check for: - Typos in commands - Missing dependencies - Wrong file paths - Permission issues
Tests Fail
If the solution runs but tests fail:
-
Check test output for specific failures
-
Verify your solution actually produces expected output
-
Check for: - Incorrect output format - Missing files - Off-by-one errors - Edge cases not handled
Environment Issues
If the container won't build:
-
Check Dockerfile syntax in
environment/Dockerfile -
Verify base image exists and is accessible
-
Check dependency installation commands
-
Try building manually:
bash cd <task-folder>/environment docker build -t test .
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| "command not found" | Missing dependency | Add to environment/Dockerfile |
| "file not found" | Wrong path | Use absolute paths |
| "permission denied" | File permissions | Check chmod in environment/Dockerfile |
| Tests timeout | Solution too slow | Optimize or increase timeout |
Common Debugging Scenarios
Missing File:
Error: No such file: /app/data/input.csv
Fix: Check environment/Dockerfile COPY commands and file paths
Command Not Found:
Error: grep: command not found
Fix: Add package to environment/Dockerfile apt-get install
Test Assertion Failed:
AssertionError: Expected 42, got 41
Fix: Debug your solution logic
Timeout:
Error: Task exceeded timeout (1800s)
Fix: Optimize solution or increase timeout in task.toml ([agent].timeout_sec)
Oracle vs Real Agents
| Oracle Agent | Real Agents (GPT-5, etc.) |
|---|---|
| Runs your solution/solve.sh | Generate their own solution |
| Always deterministic | May vary between runs |
| Tests task validity | Tests task difficulty |
| Must always pass | May fail (that's the goal!) |
Best Practices
-
Run oracle early and often - Don't wait until submission
-
Fix oracle failures first - If oracle fails, task is broken
-
Check all test output - Understand why tests pass/fail
-
Keep environment minimal - Easier to debug
Practice Exercise
Using the practice notebook:
- Load the sample task
- Run the Oracle (it will fail)
- Identify the issue
- Fix the solution
- Run Oracle again until passing
Next Steps
- Run against real agents
- Review CI checks