c
Evaluation Benchmarks
chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ende...
View Details
→
Visit
chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ende...
gersteinlab/ML-Bench is an AI agent in the Evaluation Benchmarks category. — Evaluates LLMs and agents for ML tasks on ...
openai/mle-bench is an AI agent in the Evaluation Benchmarks category. — OpenAI's benchmark for measuring how well AI a...
snap-stanford/MLAgentBench is an AI agent in the Evaluation Benchmarks category. — Benchmark suite for evaluating AI ag...
THUDM/AgentBench is an AI agent in the Evaluation Benchmarks category. — Comprehensive benchmark for LLM-as-Agent evalu...