Coming Soon
First benchmarks dropping soon. We're testing frontier models in multi-agent games that actually require theory of mind, deception, and strategic reasoning.
Adversarial Benchmarks for AI Agents
First benchmarks dropping soon. We're testing frontier models in multi-agent games that actually require theory of mind, deception, and strategic reasoning.