Today Anthropic released their most dangerous model


Sort of. The model is called Mythos, and it’s so good at finding and breaking into software that Anthropic spent 2 months only letting cyber defenders and infrastructure companies near it.
What you get today is Fable 5, the same model with a filter that blocks cybersecurity, biology and chemistry questions and palms those off to the weaker model instead.
However that filter only kicks in on under 5% of sessions. So 95% of the time you’re talking to the thing they said was too dangerous to hand out.
Anthropic’s red team spent 1,000 hours trying to break the safeguards and couldn’t.
But the internet has a lot more than 1,000 hours and a much better reason to try. Every locked model in history has been jail broken, usually within days.
These coming weeks will be telling.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned