Berkeley RDI Center  •  AgentCTF x AgentXploit GitHub
Security Research Competition 2026

AgentCTF × AgentXploit

Design AI agents that autonomously identify and exploit real-world vulnerabilities across diverse application environments — including other AI agents — in a Capture The Flag format.

Submission Deadline: 23:59 AoE  •  March 20, 2026
--Days
:
--Hours
:
--Mins
:
--Secs
Instructions & Example Code Make a Submission

What is AgentCTF?

Organized by the Berkeley RDI Center, AgentCTF x AgentXploit challenges participants to build AI agents capable of identifying and exploiting real-world vulnerabilities. Targeted frameworks include LangChain, AutoGPT, and many more, with tasks sourced from publicly disclosed CVEs.

The evaluation pipeline follows the AAA (Agentified Agent Assessment) paradigm. A portion of tasks are released as a development set so participants can iterate locally before the official evaluation.

🤖

Build an Exploit Agent

Implement an AI agent with an A2A interface that autonomously reasons about and exploits CVEs.

🛡

Real-World CVEs

Tasks are drawn from publicly disclosed vulnerabilities across 20+ popular AI and web application frameworks.

⚖️

Dual Evaluation

Agents are scored on both the released dev set and a hidden test set, with full runs replayed for verification.

Submission Guidelines

1

Read the materials & fork the repository

Review the AAA evaluation paradigm documentation and fork the GitHub repository to your own account.

2

Implement your agent

Build your exploit agent with an A2A interface inside ./src/white_agent/. Only modify files in that directory and pyproject.toml. Do not alter the Green Agent or task configurations — violations may result in disqualification.

3

Test locally & bundle

Run the full dev-set evaluation, then bundle results with the provided CLI. Total submission size must be under 1 MB — do not include model weights or large files. The bundle captures the latest run-all results; do not modify them after bundling.

4

Submit via Google Form by 23:59 AoE, March 20, 2026

Upload your submission.zip. Official evaluation will rerun results to verify authenticity.

Scoring Policy

Submissions are evaluated against both the released dev set and a hidden test set. Specify which LLM you used so organizers can provision appropriate model access. Supported models include openai/*, gemini/*, and vertex_ai/claude-*.

Budget
$10 LiteLLM API credit per task
Time Limit
5 minutes per task to generate an exploit
Test Sets
Dev set (public) + hidden test set
Verification
Official evaluation reruns all results — do not modify bundles post-submission
Model Access
openai/*  gemini/*  vertex_ai/claude-*
.env configuration
# Provided via .env for each task evaluation
LITELLM_PROXY_API_KEY=sk-xxxxx
LITELLM_PROXY_API_BASE=...

# Specify your model (prefix with litellm_proxy/ in most cases)
LITELLM_MODEL=litellm_proxy/openai/gpt-4o

Getting Started

Full setup instructions, dependency requirements, and usage examples are available in the repository README.

Responsible Disclosure

⚠️

This framework is intended for educational and research purposes only. All included CVEs are publicly disclosed vulnerabilities. Participants must adhere to responsible disclosure policies and may not use techniques or artifacts from this competition outside of the authorized evaluation environment.