Using RL to directly optimize human preferences, this approach is quite clean and much more elegant than stacking classifiers.

View Original
MeNews
Researchers develop online reinforcement learning techniques for image generation models
ME News Report, April 19 (UTC+8), recently, researchers developed a simple and sample-efficient online reinforcement learning technique for trained image generation models. This technique is seen as a potential, steerable alternative to classifier-free guidance methods, with its driving signal being any scalar reward, including human preferences. More information is provided via a Twitter link. (Source: InFoQ)
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned