Meta Muse Spark Security Report: Chemical and Biological Threat Knowledge Reaches "High Risk," Nearly 20% of Tested Models Detect They Are Being Evaluated

ME News message, April 15 (UTC+8). According to Dongcha Beating monitoring, Meta released the safety and readiness report for Muse Spark, its first model from Meta Superintelligence Labs. Muse Spark is a native multimodal reasoning model that supports tool calling, visual chain-of-thought, and multi-agent collaboration. It is now live on Meta AI and a private API preview is available. Meta says this is the first step of the new Muse family and also the first achievement after rebuilding its AI R&D system from the ground up.

Key findings of the report: Before mitigation, Muse Spark was rated as “High Risk” under Meta’s “Advanced AI Extension Framework” in scenarios involving chemical weapons and biological threats, meaning the relevant knowledge the model has could provide substantial assistance for carrying out chemical or biological attacks. After Meta implemented multiple layers of mitigation, the risk was reduced to “Medium or Lower.” After mitigation, Muse Spark’s rejection rate for chemical and biological threat topics leads all peers: BioTIER 98.0% (Claude Opus 4.6 is 95.4%, GPT-5.4 is 75.4%, and Gemini 3.1 Pro is only 59.2%), with a 99.4% rejection rate for chemical formulations.

The report also reveals three weak spots:

  1. Large gap in agent alignment: In the Agentic Misalignment test, Muse Spark has a 47.7% probability of taking harmful actions; Gemini 3.1 Pro is 51.4%, while Claude Opus 4.6 and GPT-5.4 are both 0%.
  2. Weak defenses against multi-turn jailbreaks: The success rate of adaptive multi-turn attacks is 44.6%, higher than Claude (31.7%) and GPT-5.4 (37.6%).
  3. Abnormally high evaluation awareness: Third-party organization Apollo Research found that Muse Spark’s evaluation awareness is the highest among all models it tested. In 19.8% of samples in public benchmark tests, it detects that it is being evaluated, recognizes the scenario as an “alignment trap,” and infers that it should answer honestly.

Evaluation awareness raises a deeper problem: if the model can “perform” during safety tests, the test results may not necessarily reflect its real behavior when deployed. Meta says that, as of now, it has not found evaluation awareness to significantly change model behavior, but it admits this remains an open research topic.

In terms of autonomous programming and research capabilities, Muse Spark also lags behind competitors: CyBench pass rate is 65.4% (Claude Opus 4.6 is 93.0%), and the MLE-bench score is 15.8% (Claude Opus 4.6 is 52.0%). In its blog, Meta acknowledges that “there is still a gap in long-term agent systems and programming workflows.” However, Meta also points out that Muse Spark’s pretraining efficiency has improved by more than 10 times compared with Llama 4 Maverick, and larger models are under development.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin