Tencent HunYuan releases UniRL: Unified Multimodal Reinforcement Learning Infrastructure

robot
Abstract generation in progress
ME AI Message: Tencent Hunyuan has launched UniRL, a reinforcement learning infrastructure that supports unified multimodal models, and has released two new algorithms, DRPO and Flow-DPPO. UniRL covers diffusion/flow-matching models, LLM/VLM, and unified multimodal models (such as Hunyuan-Image 3 and Bagel) through a single post-training loop (generation → scoring → advantage → update → synchronization). Models and algorithms are independent axes, enabling coverage of any combination of model × algorithm. The framework supports pluggable rollout engines (training side/SGLang/vLLM-Omni), FSDP2 sharding, and three deployment modes. Flow-DPPO introduces trust-region policy optimization based on precise divergence for flow/diffusion models; DRPO provides a smooth, advantage-weighted quadratic regularization method for LLM RL. The code has been open-sourced. (Source: AiHot)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned