CoinJie.com news: The SDPG algorithm was open-sourced by Liu Yifeng and Zhang Shiyuan from the University of California, Los Angeles (UCLA), along with Zhang Yifan from Princeton University. It is designed to address the self-evolution bottleneck faced by agents when they lack external teacher model guidance. The algorithm uses an internal teacher guidance mechanism to leverage privileged information to generate high-quality reasoning paths, improving the training efficiency and success rate of multi-step decision-making. Evaluation data shows that SDPG performs better than GRPO and multiple self-distillation baseline algorithms in mathematical reasoning and multi-step planning tasks.
CoinNetwork