This article analyzes the data layer challenges faced by Agents in production environments, emphasizing that the high concurrency data calls of Agents will test the underlying storage and computing capabilities. Taking AWS as an example, S3 natively supports Iceberg through S3 Table, uses S3 Vectors as shared memory, and S3 Files to implement POSIX interfaces, enhancing the Agent's context and memory capabilities. The author believes that S3 will become the core foundation for AI workloads and lead the data ecosystem for the next 20 years.

MarsBitNews

2026-05-08 04:26:58

Abstract generation in progress

null

At the beginning of the year, OpenClaw’s popularity in the Chinese market demonstrated everyone’s recognition of Agent’s enormous potential. But what followed was a question all cloud providers had to answer: when Agents start reproducing wildly like cyber lobsters and making high-frequency data calls, is the AI cloud infrastructure layer, especially the data layer, ready?

For example, when enterprise data teams deploy Agents into production, they often encounter bottlenecks at the data layer. Building vector databases, relational databases, graph databases, data lakes, and warehouses across different platforms requires synchronized data pipelines to keep contextual information timely. But in real production environments, this contextual information gradually becomes outdated.

The urgency of this issue stems from the fundamentally different data consumption patterns of Agents compared to human engineers.

“Agents are consuming data in an extremely active and aggressive manner, with call frequencies to data warehouses or data lakes being astonishing.”

Mai-Lan Tomsen Bukovec, Vice President of Technology at Amazon Web Services, recently told me that Agents operate in a “parallel and preferential” working mode: not one query at a time, but dozens or hundreds in parallel, comparing results to find the best path. This makes Agents far more aggressive data consumers than humans—call frequency is several orders of magnitude higher, and data throughput grows exponentially.

Mai-Lan further pointed out, “Now, customers are very eager to build Agent infrastructure. Cost or, more precisely, cost-effectiveness, is no longer a secondary factor but a decisive one. In the next six months to a year, as Agents explode in popularity, the choice of underlying data services will become critical.”

Today, OpenClaw’s frenzy is waning, leaving behind a warning about the pressure on cloud providers’ underlying compute and storage capabilities. Mai-Lan believes that AWS has a natural advantage in this area. The scale of Amazon S3 (Amazon Simple Storage Service), along with the cost efficiency of Amazon Redshift and Amazon Athena under high concurrency, are designed specifically for this kind of ultra-large-scale, high-frequency Agent data interaction.

As Amazon S3 celebrates its 20th anniversary, recent innovations around AI-era customer data processing needs include three major transformations: S3 Table (tabular format), S3 Files (files), and S3 Vector (vectors).

For example, S3 Table’s native support for Apache Iceberg. Mai-Lan explained that Agents tend to interact directly with Iceberg-formatted data via SQL. The underlying logic is that Agents are built on large models, which during training have already developed mature capabilities for handling SQL syntax and Iceberg data formats. Storing all table data in Iceberg format on S3 allows Agents to process data efficiently without needing to learn multiple complex access APIs. Currently, Agents show a high degree of compatibility with S3 and Iceberg.

The introduction of Iceberg capabilities into S3 has sparked a new wave of innovation. Data sources like Postgres and Oracle now write directly into Iceberg, and Agent systems can interact directly with these tables. With the launch of S3 Vectors, more AI applications are using vectors as shared memory carriers, injecting “state” into AI interaction experiences.

Mai-Lan also pointed out that vectors have been introduced as a native data type in S3. Their applications mainly focus on two dimensions: one, constructing contextual information for data stored in S3 via vectors; two, using vectors as shared memory. Within five months of S3 Vectors’ release, market feedback has met expectations. Many customers are beginning to use this feature, generating vectors through embedding models to enrich data context. The usage of S3 Vectors as memory space in Agent systems has exploded.

It’s worth noting that S3 Files was released a few weeks ago, enabling Agents to handle data in S3 via POSIX standards—that is, as filesystems. Large models in Agent systems focus heavily on the “file” format. Whether in Python libraries or shell scripts, files are familiar content during model training, and Agents naturally prefer to treat files as data interfaces.

The design idea behind S3 Files is to mount an EFS filesystem on S3 buckets. This mechanism allows users to handle S3 data using POSIX standards within a filesystem: small files are accelerated via EFS caching, while large files are streamed directly from S3. This enables Agents to interact with S3 data natively using familiar filesystem commands and to view shared filesystems as “shared memory spaces” from S3.

From the perspective of large model memory development, this is a significant step. Current AI experiences are gradually incorporating deeper dialogue context and personalized interactions—whether between Agents, between humans and Agents, or between Agents and data, model performance continues to evolve. Extending the natural interface of filesystems further, the memory capacity of Agent systems is expected to be enhanced at a deeper level.

I’ve noticed that, from 2006 when semi-structured data like images dominated, to later analytical data, from the rise of data warehouses to data lakes, AWS is now actively promoting Amazon S3 as a key foundation for AI workloads to meet current customer demands. Mai-Lan believes that the core of Amazon S3’s design is to promote the growth of mainstream data types in an economical way, always adhering to principles of data availability, durability, and resilience. This is also why customers have entrusted their data operations to S3 over the past 20 years, and it may well carry their data needs for the next 20 years.

(Author | Yang Li, Editor | Yang Lin)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
730.74K Popularity
#
BitcoinFallsBelow80K
95.02M Popularity
#
IranUSConflictEscalates
85.01K Popularity
#
OilPriceRollerCoaster
1.02M Popularity
#
DailyPolymarketHotspot
852.84K Popularity

Sitemap

Conversation with Amazon Web Services Mai-Lan: The next battleground for S3, how to tackle the data consumption surge in the Agent era

Trending Topics

GateSquareMayTradingShare

BitcoinFallsBelow80K

IranUSConflictEscalates

OilPriceRollerCoaster

DailyPolymarketHotspot

Pin