Large language models will "carry" their own biases during distillation.

Mars Finance News, April 16 — A study published in Nature on the 15th shows that large language models (LLMs) may embed some of their own preferences into other algorithms, even after removing original features from the training data. These unnecessary features can still persist. In one case, a model appeared to transmit its preference for owls to other models through implicit signals in the data. The findings suggest that more thorough safety checks are needed when developing LLMs. (Science and Technology Daily)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin