Challenge Overview What You Do Timeline Leaderboard FAQ Acknowledgement

Challenge Overview ๐Ÿš€

The RoboMME Challenge evaluates how well memory-augmented robotic generalist policies can handle truly long-horizon, history-dependent manipulation tasks. It is hosted as part of the Foundation Models Meet Embodied Agents (FMEA) Workshop at CVPR 2026 .

Teams will be ranked on the RoboMME Challenge leaderboard, and top entries will be highlighted live during the workshop session.
๐Ÿ… The 1st, 2nd, and 3rd place winners will receive prizes of $500, $300, and $200, respectively.

What You Do ๐Ÿงช

  1. ๐Ÿ” Train your models on the released RoboMME training data.
  2. ๐Ÿ“Š Evaluate your models using the RoboMME benchmark tools on the open-source validation and test episodes .
  3. ๐Ÿš€ Submit your policy and prepare your policy server. More details are available here.

Timeline โฐ

  • Marchโ€“May 15 โ€“ Develop your policy and test your policy server.
  • May 15 โ€“ Deadline to submit your participant information.
  • Before May 22 (Phase 1 Validation) โ€” We verify the stability and correctness of your Docker image, remote server, or code repo.
  • May 23 โ€“ Deadline to finalize your models and deployment.
  • May 23โ€“June 2 (Phase 2 Full Evaluation) โ€“ We evaluate on held-out episodes for teams that passed Phase 1.
  • June 3 โ€“ Winner announcement at the FMEA Workshop at CVPR 2026.

Challenge Leaderboard ๐Ÿ†

Phase 1 โ€” Validation ๐Ÿ”

This phase aims to validate the stability and correctness of the policy server. The evaluation script is available here.

Radar View

Phase 2 โ€” Evaluation ๐Ÿงฎ

We will run the full evaluation on held-out episodes. Final rankings will be determined by this full evaluation.

Radar View

FAQ ๐Ÿ“š

Q: What's the difference between the RoboMME Challenge leaderboard and the regular RoboMME leaderboard?

A: The regular leaderboard is evaluated by participants themselves, who run the evaluation and open a pull request to update their results. Submissions can be made at any time. The RoboMME Challenge leaderboard is evaluated by the organizers using held-out test episodes (a total of 800 episodes), and the results are updated and maintained on this page. Submissions are only accepted during the challenge period.

Q: Can I use external data, other VLA backbones, LLM APIs, etc.?

A: Yes. Any methods or resources are allowed, but you may not use the RoboMME repository itself to generate additional training data, as this would be unfair. You are welcome to use training data beyond RoboMME, as long as all external resources are clearly described in your method description.

Q: Can I use human-in-the-loop methods during testing?

A: No. Participants must not attempt to manually intervene policy rollouts, as this would unfairly influence the evaluation results.

Q: Can I write rules or design prompts to improve policy performance?

A: The goal of RoboMME is to evaluate robotic generalist policies, so we do not encourage participants to write hard-coded task-specific rules or prompts solely to boost performance on particular tasks.

Q: Is there a team size limit?

A: There is no strict limit, but each team should submit under a single team name and submit only one model.

Q: Will top teams need to provide extra details?

A: Yes. Top teams will be asked to share a brief method description and reproducibility details.

Q: Can I present my work at the workshop?

A: Workshop presentations follow the official procedure. The challenge is independent of the workshop paper track, so if you want to present at the workshop, you must submit your paper separately.

Q: Where can I find baselines and the starter kit?

Baselines, environment setup, and evaluation scripts are available in the official RoboMME repository and the RoboMME policy learning repository. You can start by downloading MME-VLA checkpoints from Hugging Face and evaluate them locally using the code in the repository.

Q: What if my internet connection is unstable for remote evaluation?

A: We will work with you together to help you complete setup before Phase 2 starts, so we recommend submitting early to leave enough time to debug connection issues. If your own server does not allow public IP, you can either rent a cloud server (e.g., Lambda Labs) that allows public IP or choose other options instead.

Q: How can I contact the organizers if I have issues?

A: For any RoboMME Challenge-related questions, please email robomme2026@gmail.com. You can also join our mailing list, robomme-cvpr-challenge-2026@googlegroups.com, to receive the latest updates. For real-time discussion, please join the WeChat and Discord channels linked above.

Acknowledgement ๐Ÿ™

We sincerely thank the Foundation Models Meet Embodied Agents workshop at CVPR 2026 for hosting this challenge, and we are grateful to our sponsors for their support.

Sponsor

Figure AI logo