Submission Checklist

Use this checklist before every submission to ensure your task is complete and will pass review.

For detailed requirements: See Task Requirements for complete specifications on each component.

Pre-Submission Verification

instruction.md - Clear, human-written instructions (requirements)
task.toml - Complete configuration with all required sections (requirements)
environment/Dockerfile - Builds successfully; dependencies and base image pinned (requirements)
solution/solve.sh - Deterministic, human-written solution (requirements)
tests/test.sh - Uses uv, produces reward file (requirements)
tests/test_outputs.py - Tests with docstrings, verify behavior
optional milestones.md - Only include if milestones in task. Short 1-2 sentence descriptions of each milestone
optional solution/solve1.sh ... solution/solveN.sh - Only include if milestones in task. Each solution is scoped only to milestone X.
optional tests/test_m1.py ... tests/test_mN.py - Only include if milestones in task. Each file scores only milestone X completion.

Every submission should include a rubric that is aligned to the task.
You generate a synthetic rubric via the submission UI in the Cognyzer Platform, then edit it for accuracy and completeness.
The rubric must include at least three criteria that assign negative rewards (for example, -1).
See the Rubrics page for workflow details and quality criteria.

All requirements have corresponding tests
All tests verify described requirements with complete coverage of the prompt (explicit, implicit, and edgecases.)
Anti-cheating measures in place (no hints or exposed answers)
Tests check behavior, not implementation
Complies with Quality Guidelines

harbor run -a oracle -p <task-folder>

harbor tasks check <task-folder> -m openai/@openai/gpt-5.2

harbor run -a terminus-2 -m openai/@openai/gpt-5.2 -p <task-folder>

harbor run -a terminus-2 -m anthropic/@anthropic/claude-opus-4-6 -p <task-folder>

Difficulty	Pass Rate	Description
Hard	<= 20%	Requires deep expertise, multi-step reasoning
Medium	21-80%	Moderate complexity, some domain knowledge
Easy	> 80%	Straightforward but still challenging

Use the Reviewer Checklist to validate your task against the same criteria reviewers use before you submit.

Would I understand this task as a first-time reader? - If no → Clarify instructions
Are there any ambiguous requirements? - If yes → Make them explicit
Could an agent cheat on this task? - If yes → Add anti-cheating measures
Do tests verify actual behavior? - If no → Rewrite to test behavior
Is the solution deterministic? - If no → Add seeds, remove randomness

See Platform Submission Guide for detailed submission steps.

If you've completed all items above, upload your ZIP file to the Cognyzer Expert Platform.

Good luck! 🎉