Hugging Face open-source ml-intern, an ML research agent that automatically reads papers, selects data, and runs training

robot
Abstract generation in progress

ME News report, April 22 (UTC+8). According to Dongcha Beating Monitoring, Hugging Face has open-sourced ml-intern, an ML research agent that can autonomously complete the entire workflow of “reading papers, organizing datasets, launching GPU training, evaluating results, and iterating for improvement.” The project is built on its own smolagents framework, providing two entry points—both a CLI and a web interface—with the code open-sourced on GitHub.

ml-intern’s toolchain is built around the Hugging Face ecosystem: it searches for papers on arXiv and HF Papers and performs deep reading along citation chains; it browses datasets on the HF Hub, checks their quality, reformats them, and then uses them for training; when no GPU is available locally, it can call HF Jobs to start cloud training jobs. After training finishes, it automatically reads the evaluation outputs, diagnoses the reasons for failures, and reruns.

By default, it uses Claude Sonnet 4.5 to drive the decision-making loop, with up to 300 iterations per run. Contexts exceeding 170k tokens are automatically compressed.

In its release post, Hugging Face provided three case studies. For scientific reasoning tasks, the agent found the OpenScience and NemoTron-CrossThink datasets from the citation chain of benchmark papers, filtered out 7 variants from ARC, SciQ, and MMLU by difficulty, and ran 12 rounds of SFT on Qwen3-1.7B. The GPQA score increased from 10% to 32%, and it took less than 10 hours.

In medical scenarios, the agent determined that the existing datasets were not of sufficient quality, wrote scripts to generate 1,100 synthetic data points, expanded the dataset 50 times for training, and exceeded Codex by 60% on HealthBench.

In competitive math scenarios, the agent wrote its own GRPO training scripts and launched training on A100 via HF Spaces. After observing reward collapse, it ran ablation experiments to identify the causes.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned