Skip to content

Scarf (short for Self-Contained Application Refactoring) benchmark is a suite of Java applications across frameworks: Jakarta EE, Quarkus, and Spring for evaluating agentic transformation between the frameworks. This suite enables systematic assessment of AI agents’ ability to migrate enterprise Java applications while preserving functionality, idiomatic patterns, and architectural integrity across different runtime environments.

The benchmark includes comprehensive examples ranging from focused layer-specific demonstrations to complete production-grade applications, each with verified implementations across all supported frameworks.

Manual Conversions with Developer Verification All applications in this benchmark have been manually converted and verified by experienced developers. Each implementation has undergone rigorous testing to ensure functional correctness, adherence to framework-specific idioms, and preservation of architectural integrity across Jakarta EE, Quarkus, and Spring frameworks.

If you use scarfbench, please consider using the following citation.

@misc{scarfbench,
title = {ScarfBench},
author = {{IBM}},
year = {2026},
howpublished = {\url{https://github.com/scarfbench}},
note = {GitHub repository}
}

For any questions, feedback, or suggestions, please contact the authors:

NameEmail
Rahul Krishnai.m.ralk@gmail.com
Bridget McGinnbridget.mcginn@ibm.com
Raju Pavuluripavuluri@us.ibm.com