Submit Conversions
This page demonstrates how you package the results of a ScarfBench evaluation and submit them as a pull request to the repository github.com/scarfbench/submit.git.
We assume you have the scarf CLI installed (see Setup) and that you’ve run an evaluation producing an --eval-out directory containing the conversion results.
Overview
Section titled “Overview”- Prepare the conversion artifacts from your evaluation output.
- Create a dedicated branch, add the conversion files and metadata.
- Push the branch to
github.com/scarfbench/submit.gitand open a pull request.
Prepare the artifacts (all eval runs)
Section titled “Prepare the artifacts (all eval runs)”If you want to submit the entire evaluation output tree (for example /tmp/eval_out) as-is, copy the whole eval_out directory into your submission payload. This is the simplest approach when you want to preserve every run and its context.
# Root containing one or more agent eval directories (each may contain run_* subdirs)EVAL_ROOT=/tmp/eval_outPAYLOAD=/tmp/submit-payloadmkdir -p "$PAYLOAD"
# Copy the whole eval_out tree into the payloadcp -a "$EVAL_ROOT" "$PAYLOAD/eval_out"This places the full eval_out hierarchy under /tmp/submit-payload/eval_out. Verify the payload contains only evaluation artifacts and no secrets or files external to the eval output.
Create a branch and commit
Section titled “Create a branch and commit”cd /tmp/submit-payloadgit initgit remote add origin https://github.com/scarfbench/submit.gitgit checkout -b submit/$(date +%Y%m%d)-all-evalsgit add .git commit -m "Submit: full eval_out submission ($(date +%Y%m%d))"If you have a fork or push privileges, push the branch. If not, fork the repo on GitHub and change the origin remote to your fork URL before pushing.
git push origin HEADOpen the pull request
Section titled “Open the pull request”Open a pull request on GitHub against github.com/scarfbench/submit.git with a descriptive title and body. For multi-run submissions, include:
- Which conversion pairs are included (list or summarize).
- Which agents produced the conversions (agent names from
agent.toml). - Paths to the eval runs (examples under
/tmp/eval_out/.../run_*). - Any validation results or notes (attach
validation.logfiles or paste relevant excerpts).
Suggested PR title and body for a full eval_out submission:
Title: Submit conversions: full eval_out submission (agents: codex-cli, ...)
Body:- Agents included: codex-cli, <other-agents>- Eval root: `eval_out/` (full evaluation output included under `eval_out/`)- Contents: complete evaluation outputs including per-run `input/`, `output/`, `validation/`, and `metadata.json`- Validation: see `validation/*.run.log` files inside the `eval_out` tree
Notes:- This submission contains the entire `eval_out` directory produced by `scarf eval run`. Reviewers can inspect individual runs under `eval_out/<agent>/<run_*>`.Verify submission locally
Section titled “Verify submission locally”Before opening the PR, you can run local validation steps to sanity-check the submission. For example, validate each run’s conversions (replace paths accordingly):
# Validate each run directory found in the payloadscarf validate -vv --conversions-dir /tmp/submit-payload/eval_out --benchmark-dir /path/to/benchmarkWhat to include and what to avoid
Section titled “What to include and what to avoid”- Include:
conversions/tree,metadata.json,validation.log, and a short README describing how the conversion was produced. - Avoid: any secrets, API keys, or large binaries. Only include what is necessary for reviewers to validate the conversion.
Review checklist for maintainers
Section titled “Review checklist for maintainers”- Confirm agent metadata (
metadata.json) identifies the agent and run. - Confirm conversion tree builds or runs basic tests (where possible).
- Check
validation.logfor failing tests or obvious errors.