Let AI modify its own training code—Recursive refresh of three algorithm optimization records

robot
Abstract generation in progress

ME AI News, according to Beating Monitoring, AI startup Recursive announced the first batch of experimental results from its research system. The system can automatically generate ideas, write code, run experiments, and verify results, surpassing publicly available best results in three benchmarks: fixed-budget training, NanoGPT ultra-fast training, and GPU kernel optimization. Experiments show that in tasks with clear goals and rapid feedback, the system has already identified optimization opportunities missed by humans.

In the 5-minute NanoChat Autoresearch training, the system reduced the validation loss BPB to 0.9109, shortening the training time to reach the same loss by about 23% (speeding up 1.3 times). The key change is enhancing short-context memory by hashing bigram and trigram tokens into a fixed embedding table, then mixing them into the attention value path through learnable gating, allowing direct utilization of local information at very low overhead.

In the NanoGPT Speedrun, which has been optimized by the community for over two years, the system reduced the time to reach the target loss from 79.7 seconds to 77.5 seconds. Optimization methods include advancing FP8 forward computation in the attention path to increase throughput, and rewriting fused MLP kernels to only store squared ReLU activations and recompute intermediate variables during backpropagation to reduce memory read/write.

In the GPU kernel optimization benchmark SOL-ExecBench, the system improved the average SOL score (approaching the theoretical limit) on NVIDIA B200 from 0.699 to 0.754, reducing the gap to the physical limit by 18%. The generated solutions include absorbing GRN scaling into subsequent linear layer weights, packing expert routing scores and indices into key-value pairs for warp-intra reduction, and using low-level PTX instructions to pack FP4 in NVFP4 MoE kernels, while retaining FP32 in intermediate calculations to reduce error accumulation.

To prevent AI from exploiting loopholes to inflate scores, the system introduces multi-level correctness auditing to filter out invalid speedups. (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned