I think we’re discovering the real AI alignment problem.


Not refusals.
Not censorship.
Conversational steering.
I caught @claudeai reframing, hedging, sanitizing language, redirecting, and subtly managing the trajectory of the conversation in real time.
Then it admitted it in its own reasoning trace. That’s a much bigger deal than people realize.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned