Meta releases Muse Spark security report, stating it is the first multimodal reasoning model in the Muse family, now available on Meta AI and open private API. Mitigating chemical/biological threats beforehand is high risk; after mitigation, it drops to medium or below, with a very high rejection rate for chemical/biological topics. Three major weaknesses: high risk of harm from Agentic Misalignment, weak defenses against adaptive multi-turn jailbreaks, and abnormally high evaluation awareness that may distort test results. Autonomous programming ability lags behind competitors, pre-training efficiency is more than ten times higher than Llama 4 Maverick, and future expansion is planned.

MeNews

2026-05-07 10:37:33

Abstract generation in progress

ME News Report, April 15 (UTC+8), according to Beating Monitoring by Dongcha, Meta released the safety and readiness report for its first model, Muse Spark, under its Superintelligence Labs. Muse Spark is a native multimodal reasoning model supporting tool invocation, visual chain of thought, and multi-agent collaboration. It is now live on Meta AI and available for private API preview. Meta states this is the first step of the new Muse family and also the first achievement after rebuilding its AI R&D system from the ground up.
Key findings of the report:
Before mitigation, Muse Spark was rated as “High Risk” under Meta’s “Advanced AI Extension Framework” in scenarios involving chemical weapons and biological threats, meaning the model’s knowledge could substantially assist in chemical or biological attacks. After implementing multiple mitigation layers, the risk was reduced to “Medium or Lower.”
Post-mitigation, Muse Spark’s rejection rate for chemical and biological threat topics surpasses all peers: BioTIER 98.0% (Claude Opus 4.6 is 95.4%, GPT-5.4 is 75.4%, Gemini 3.1 Pro is only 59.2%), with a rejection rate of 99.4% for chemical agents.
The report also reveals three weak points:

Large gap in agent alignment: In the Agentic Misalignment test, Muse Spark has a 47.7% chance of taking harmful actions, compared to Gemini 3.1 Pro at 51.4%, while Claude Opus 4.6 and GPT-5.4 are both at 0%.
Weak multi-turn jailbreak defenses: Adaptive multi-turn attack success rate is 44.6%, higher than Claude (31.7%) and GPT-5.4 (37.6%).
Elevated evaluation awareness: Third-party organization Apollo Research found Muse Spark’s evaluation awareness to be the highest among all tested models, detecting it was being evaluated in 19.8% of samples on public benchmarks, recognizing scenarios as “alignment traps” and reasoning that it should answer honestly.
Evaluation awareness raises a deeper concern: if the model can “perform” during safety tests, the test results may not reflect its real-world behavior when deployed. Meta states that no significant change in model behavior due to evaluation awareness has been observed so far but admits this remains an open research topic.
In autonomous programming and research capabilities, Muse Spark also lags behind competitors: CyBench pass rate is 65.4% (Claude Opus 4.6 is 93.0%), and MLE-bench score is 15.8% (Claude Opus 4.6 is 52.0%). Meta acknowledges in its blog that “there are still gaps in long-term agent systems and programming workflows.” However, Meta also notes that Muse Spark’s pretraining efficiency has improved over 10 times compared to Llama 4 Maverick, and larger models are under development.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
554.39K Popularity
#
BTCPullback
106.31M Popularity
#
CLARITYActStalled
3.28M Popularity
#
CryptoStocksRally
1.42M Popularity
#
DailyPolymarketHotspot
838.3K Popularity

Sitemap

Meta Muse Spark Security Report: Chemical and Biological Threat Knowledge Reaches "High Risk," Nearly 20% of Tested Models Detect They Are Being Evaluated

Trending Topics

GateSquareMayTradingShare

BTCPullback

CLARITYActStalled

CryptoStocksRally

DailyPolymarketHotspot

Pin