Cognyzer Cognyzer Terminus-2.0 Docs
07Section 02 · Submission Process

Submission Checklist

Submission Checklist

Use this checklist before every submission to ensure your task is complete and will pass review.

For detailed requirements: See Task Requirements for complete specifications on each component.


Pre-Submission Verification

Task Design

  • Problem statement is clear and unambiguous
  • All requirements are explicitly stated
  • Uses absolute paths (e.g., /app/file.txt)
  • Output files are named in instructions
  • Data schemas are fully specified
  • Difficulty target: < 80% pass rate

Required Files

  • instruction.md - Clear, human-written instructions (requirements)
  • task.toml - Complete configuration with all required sections (requirements)
  • environment/Dockerfile - Builds successfully; dependencies and base image pinned (requirements)
  • solution/solve.sh - Deterministic, human-written solution (requirements)
  • tests/test.sh - Uses uv, produces reward file (requirements)
  • tests/test_outputs.py - Tests with docstrings, verify behavior
  • optional milestones.md - Only include if milestones in task. Short 1-2 sentence descriptions of each milestone
  • optional solution/solve1.sh ... solution/solveN.sh - Only include if milestones in task. Each solution is scoped only to milestone X.
  • optional tests/test_m1.py ... tests/test_mN.py - Only include if milestones in task. Each file scores only milestone X completion.

Rubric

  • Every submission should include a rubric that is aligned to the task.
  • You generate a synthetic rubric via the submission UI in the Cognyzer Platform, then edit it for accuracy and completeness.
  • The rubric must include at least three criteria that assign negative rewards (for example, -1).
  • See the Rubrics page for workflow details and quality criteria.

Quality Standards

  • All requirements have corresponding tests
  • All tests verify described requirements with complete coverage of the prompt (explicit, implicit, and edgecases.)
  • Anti-cheating measures in place (no hints or exposed answers)
  • Tests check behavior, not implementation
  • Complies with Quality Guidelines

Automated Checks

Oracle Agent

harbor run -a oracle -p <task-folder>
  • Oracle agent PASSES

CI Checks

harbor tasks check <task-folder> -m openai/@openai/gpt-5.2
  • pinned_dependencies ✓
  • typos ✓
  • tests_or_solution_in_image ✓
  • test_deps_in_image ✓
  • check_canary ✓
  • check_dockerfile_references ✓
  • check_test_sh ✓
  • check_task_absolute_path ✓
  • check_privileged_containers ✓
  • ruff ✓
  • check_task_sizes ✓
  • validate_task_fields ✓

LLMaJ Checks

  • behavior_in_task_description ✓
  • behavior_in_tests ✓
  • informative_test_docstrings ✓
  • anti_cheating_measures ✓
  • structured_data_schema ✓
  • hardcoded_solution ✓
  • file_reference_mentioned ✓

Real Agent Testing

Run Against GPT-5.2

harbor run -a terminus-2 -m openai/@openai/gpt-5.2 -p <task-folder>
  • Run 1: PASS / FAIL
  • Run 2: PASS / FAIL
  • Run 3: PASS / FAIL

Run Against Claude Opus 4.6

harbor run -a terminus-2 -m anthropic/@anthropic/claude-opus-4-6 -p <task-folder>
  • Run 1: PASS / FAIL
  • Run 2: PASS / FAIL

Difficulty Calculation

Difficulty Pass Rate Description
Hard <= 20% Requires deep expertise, multi-step reasoning
Medium 21-80% Moderate complexity, some domain knowledge
Easy > 80% Straightforward but still challenging
  • Best pass rate: %
  • Difficulty: Easy / Medium / Hard

Final Review

Use the Reviewer Checklist to validate your task against the same criteria reviewers use before you submit.

Self-Check Questions

  1. Would I understand this task as a first-time reader? - If no → Clarify instructions

  2. Are there any ambiguous requirements? - If yes → Make them explicit

  3. Could an agent cheat on this task? - If yes → Add anti-cheating measures

  4. Do tests verify actual behavior? - If no → Rewrite to test behavior

  5. Is the solution deterministic? - If no → Add seeds, remove randomness


Submission Method

  • Created ZIP of files (not folder)
  • All required files included in ZIP
  • Uploaded to terminus-project-v2 on Cognyzer Expert Platform
  • Metadata filled in

See Platform Submission Guide for detailed submission steps.


Ready?

If you've completed all items above, upload your ZIP file to the Cognyzer Expert Platform.

Good luck! 🎉


Need Help?

  • Slack: #ec-terminus-submission
  • Reviewer Checklist
  • Troubleshooting guide
  • FAQ