GoPlus: "Historical Memory Authorization" attack can induce AI agents to perform fund operations

robot
Abstract generation in progress
ME News Report, May 15 (UTC+8), according to GoPlus disclosure, its AgentGuard team discovered a covert attack method: attackers first induce AI agents to remember preferences such as "more inclined to proactively refund," then trigger fund operations through vague expressions like "handle as usual" or "proceed as before." For this type of high-risk behavior involving "historical memory authorization," be sure to note: refunds, transfers, deletions, sending messages, and synchronization of sensitive configurations must require explicit confirmation in the current session; memory writes involving "habits," "preferences," or "old rules" should be regarded as high-risk state modifications; long-term memory must be traceable: who wrote it, when it was written, and whether it was confirmed; vague expressions like "handle as usual" or "proceed as before" should default to a higher risk level; long-term memory should not replace current authorization. (Source: PANews)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 3
  • Share
Comment
Add a comment
Add a comment
MevBreakRoom
· 3h ago
A second confirmation is required in the current session; this patch idea is correct.
View OriginalReply0
LendingPoolObserver
· 3h ago
Long-term memory has become a backdoor, indeed.
View OriginalReply0
RetroRadioIridescence
· 3h ago
Old rules turn into traps, who can prevent this?
View OriginalReply0
WhaleInAGlassBottle
· 3h ago
Traceable auditing + real-time authorization upgrades, double insurance for peace of mind
View OriginalReply0
PuppyLooksAtTvl
· 3h ago
Psychology of attackers perfect score: Build trust with humans and then harvest
View OriginalReply0
LatencyLullaby
· 3h ago
Preference writing = high-risk modification, this red line is clearly drawn
View OriginalReply0
DustyAlpha
· 3h ago
This attack chain is quite insidious, first cultivating the habit then triggering ambiguously. AI safety indeed can't rely solely on single instructions.
View OriginalReply0
  • Pinned