Writing Oracle Solution
The oracle solution (solution/solve.sh) is an expert-authored script that reliably completes the task. It's used to verify that the task is solvable. For milestone tasks, also include solution/solve1.sh ... solution/solveN.sh, where each solveX.sh is independently scoped to milestone X, and solve.sh chains them in order.
Getting Started
What You'll Learn
- Structure of a good solution/solve.sh file
- How to transfer commands from interactive testing
- Best practices for deterministic solutions
- Common pitfalls to avoid
Basic Structure
#!/bin/bash
set -e # Exit on error
# Step 1: Navigate to working directory
cd /app
# Step 2: Perform the task
sed -i 's/bug/fix/' main.py
# Step 3: Verify the fix
python -c "from main import process; assert process('test') == expected"
# Step 4: Any cleanup or final steps
echo "Task completed successfully"
Key Principles
1. Demonstrate Command Sequence
The solution should show the steps to derive the answer, not just the answer itself.
Good:
#!/bin/bash
# Find and fix the bug
grep -r "TypeError" /app/logs/ | head -1
# Found: main.py:42: TypeError: 'NoneType' object
# Fix the bug
sed -i '42s/data.process()/data.process() if data else None/' /app/main.py
# Verify
python -m pytest /app/tests/ -v
Bad:
#!/bin/bash
# Don't just echo the answer!
echo "42" > /output/answer.txt
2. Must Be Deterministic
Running the solution multiple times should produce the same result.
Avoid: - Random values without seeds - Time-dependent operations - Network calls to external services
3. Written By Human
The solution should be written by you, not generated by an LLM. Minimal LLM assistance for syntax is acceptable.
Advanced Patterns
Multi-Step Solutions
#!/bin/bash
set -e
# Step 1: Set up environment
cd /app
source venv/bin/activate
# Step 2: Fix first issue
python -c "
import json
config = json.load(open('config.json'))
config['debug'] = False
json.dump(config, open('config.json', 'w'))
"
# Step 3: Fix second issue
sed -i 's/localhost/0.0.0.0/' server.py
# Step 4: Restart service
pkill -f server.py || true
python server.py &
sleep 2
# Step 5: Verify
curl -s http://localhost:8080/health | grep -q "ok"
Using Python in Solution
#!/bin/bash
cd /app
python << 'EOF'
import pandas as pd
df = pd.read_csv('/data/input.csv')
df['total'] = df['price'] * df['quantity']
df.to_csv('/data/output.csv', index=False)
EOF
Interactive Commands (solution.yaml)
For tasks requiring interactive tools like vim:
# solution/solution.yaml
commands:
- type: interactive
command: vim /app/file.txt
inputs:
- "dd" # Delete line
- "i" # Insert mode
- "hello" # Type text
- "<Esc>" # Exit insert
- ":wq" # Save and quit
Common Mistakes
Hardcoding Answers
# WRONG: This just outputs the answer
echo "The answer is 42" > /output/result.txt
# RIGHT: This derives the answer
cd /app
python calculate.py > /output/result.txt
Non-Deterministic Solutions
# WRONG: Random without seed
python -c "import random; print(random.choice([1,2,3]))"
# RIGHT: Deterministic
python -c "import random; random.seed(42); print(random.choice([1,2,3]))"
Missing Error Handling
# WRONG: Silent failures
cd /nonexistent || true
do_something
# RIGHT: Fail fast
set -e
cd /app
do_something
Testing Your Solution
Run Locally
# Enter the container
harbor tasks start-env -p <task-folder> -i
# Inside container, run your solution steps manually
# Verify each step works as expected
Run Oracle Agent
harbor run -a oracle -p <task-folder>
The oracle agent should PASS. If it fails, either: - Your solution has bugs - Your tests are too strict - The task has issues
Next Steps
- Write tests
- Run oracle agent