chchenhui/mlrbench

Open Source

Evaluation Benchmarks Updated Apr 1, 2026

🔄 Updated Apr 2026 🖥️ Self-hostable

Overview

chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops.

Problem It Solves

This tool addresses challenges in the evaluation benchmarks domain.

Target Audience: Developers and teams working with evaluation benchmarks automation.

Inputs

• User configuration
• API credentials (if required)
• Task parameters

Outputs

• Automated task results
• Status reports
• Generated content or actions

Example Workflow

1 User configures the agent with required parameters
2 Agent receives input data or trigger
3 Agent processes the request using its core logic
4 Agent interacts with external services if needed
5 Results are returned to the user

Sample System Prompt


              You are chchenhui/mlrbench, an AI assistant. Help the user accomplish their task efficiently.

Tools & Technologies

LLM APIs Python

Alternatives

• AutoGPT
• LangChain Agents
• CrewAI

FAQs

Is this agent open-source?: Yes
Can this agent be self-hosted?: Yes
What skill level is required?: Intermediate

Rate This Agent

Your rating:

Reviews

Loading reviews...