Submission Checklist
Use this checklist before every submission to ensure your task is complete and will pass review.
For detailed requirements: See Task Requirements for complete specifications on each component.
Pre-Submission Verification
Task Design
- Problem statement is clear and unambiguous
- All requirements are explicitly stated
- Uses absolute paths (e.g.,
/app/file.txt) - Output files are named in instructions
- Data schemas are fully specified
- Difficulty target: < 80% pass rate
Required Files
instruction.md- Clear, human-written instructions (requirements)task.toml- Complete configuration with all required sections (requirements)environment/Dockerfile- Builds successfully; dependencies and base image pinned (requirements)solution/solve.sh- Deterministic, human-written solution (requirements)tests/test.sh- Uses uv, produces reward file (requirements)tests/test_outputs.py- Tests with docstrings, verify behavior- optional
milestones.md- Only include if milestones in task. Short 1-2 sentence descriptions of each milestone - optional
solution/solve1.sh...solution/solveN.sh- Only include if milestones in task. Each solution is scoped only to milestoneX. - optional
tests/test_m1.py...tests/test_mN.py- Only include if milestones in task. Each file scores only milestoneXcompletion.
Rubric
- Every submission should include a rubric that is aligned to the task.
- You generate a synthetic rubric via the submission UI in the Cognyzer Platform, then edit it for accuracy and completeness.
- The rubric must include at least three criteria that assign negative rewards (for example,
-1). - See the Rubrics page for workflow details and quality criteria.
Quality Standards
- All requirements have corresponding tests
- All tests verify described requirements with complete coverage of the prompt (explicit, implicit, and edgecases.)
- Anti-cheating measures in place (no hints or exposed answers)
- Tests check behavior, not implementation
- Complies with Quality Guidelines
Automated Checks
Oracle Agent
harbor run -a oracle -p <task-folder>
- Oracle agent PASSES
CI Checks
harbor tasks check <task-folder> -m openai/@openai/gpt-5.2
- pinned_dependencies ✓
- typos ✓
- tests_or_solution_in_image ✓
- test_deps_in_image ✓
- check_canary ✓
- check_dockerfile_references ✓
- check_test_sh ✓
- check_task_absolute_path ✓
- check_privileged_containers ✓
- ruff ✓
- check_task_sizes ✓
- validate_task_fields ✓
LLMaJ Checks
- behavior_in_task_description ✓
- behavior_in_tests ✓
- informative_test_docstrings ✓
- anti_cheating_measures ✓
- structured_data_schema ✓
- hardcoded_solution ✓
- file_reference_mentioned ✓
Real Agent Testing
Run Against GPT-5.2
harbor run -a terminus-2 -m openai/@openai/gpt-5.2 -p <task-folder>
- Run 1: PASS / FAIL
- Run 2: PASS / FAIL
- Run 3: PASS / FAIL
Run Against Claude Opus 4.6
harbor run -a terminus-2 -m anthropic/@anthropic/claude-opus-4-6 -p <task-folder>
- Run 1: PASS / FAIL
- Run 2: PASS / FAIL
Difficulty Calculation
| Difficulty | Pass Rate | Description |
|---|---|---|
| Hard | <= 20% | Requires deep expertise, multi-step reasoning |
| Medium | 21-80% | Moderate complexity, some domain knowledge |
| Easy | > 80% | Straightforward but still challenging |
- Best pass rate: %
- Difficulty: Easy / Medium / Hard
Final Review
Use the Reviewer Checklist to validate your task against the same criteria reviewers use before you submit.
Self-Check Questions
-
Would I understand this task as a first-time reader? - If no → Clarify instructions
-
Are there any ambiguous requirements? - If yes → Make them explicit
-
Could an agent cheat on this task? - If yes → Add anti-cheating measures
-
Do tests verify actual behavior? - If no → Rewrite to test behavior
-
Is the solution deterministic? - If no → Add seeds, remove randomness
Submission Method
- Created ZIP of files (not folder)
- All required files included in ZIP
- Uploaded to terminus-project-v2 on Cognyzer Expert Platform
- Metadata filled in
See Platform Submission Guide for detailed submission steps.
Ready?
If you've completed all items above, upload your ZIP file to the Cognyzer Expert Platform.
Good luck! 🎉
Need Help?
- Slack:
#ec-terminus-submission - Reviewer Checklist
- Troubleshooting guide
- FAQ