Author: Sebastian Melendez Source: Artemis Translation: Shan Ouba, Golden Finance
Stablecoins are currently the focus of the market. There is significant news almost every day. Last week, Stripe announced it will acquire the wallet service company Privy, while PayPal announced it will natively mint PYUSD on Stellar. News is coming in thick and fast, almost overwhelming. As more companies enter this field, the demand for tracking and acquiring stablecoin data is growing. However, from our communication with clients, people keep asking four questions:
My job at Artemis is to collect, organize, and summarize stablecoin data every day to answer these questions. Today, we are going to debunk some “seemingly simple” data myths and see how difficult these questions really are to answer.
The cost of independently accessing on-chain data is astonishingly high, and the technical barriers are also extremely high. Although the accessibility of raw blockchain data has improved over the past five years, there are still many hurdles. Mainstream data service providers such as Dune, Flipside, Allium, and Goldsky each have their advantages, but none can cover all key blockchains.
Actual Situation:
Nowadays, almost every company is launching its own blockchain, each with its own peculiar characteristics, making data analysis extremely complex.
If you want to gain a comprehensive understanding of your stablecoin usage patterns and discover potential opportunities, you need to be able to conduct a panoramic analysis across all relevant chains, not just the currently deployed platform. With the development of multi-chain strategies and the deepening of analytical needs, the complexity of data infrastructure has also increased.
Taking PYUSD as an example:
Once you have integrated LayerZero’s OFT cross-chain protocol, to truly see the whole picture, you must master:
Worse still, users may also bridge tokens to more platforms, which exponentially complicates data issues.
The problem is not just the chain you are currently online with, but also the continuous expansion of the entire ecosystem, with new chains emerging one after another. This leads to the second problem: architectural fragmentation.
Think back to the early 2000s, when sending a file to someone didn’t mean they could open it. PowerPoint wouldn’t open, videos lacked codecs, and systems were all operating independently, everything couldn’t collaborate seamlessly. Even elementary school students were tormented by these issues.
The current blockchain world is just as chaotic as it was back then.
The most active chains currently —— Solana, Tron, Ethereum, TON, Stellar, Aptos —— have data architectures that are vastly different.
For example:
Understanding these on-chain activities means that you have to dismantle an increasingly complex web of technologies.
Look at PYUSD again:
Previously, you only needed to understand the architecture of Ethereum, Solana, and LayerZero. But now that it has landed on Stellar, you also need to understand:
That is to say, you even have to become an expert in a certain chain to access and parse the data, let alone extract insights from it.
Many people think that as long as the data access problem is solved, it will be easy to gain user insights next. Assuming you have sorted out the access permissions and captured the full-chain balance and transaction datasets, what have you actually obtained?
The answer is: a pile of noise.
On-chain addresses are merely strings of letters and numbers, and wallet balances are often inaccurate or misleading. Raw blockchain data does not equate to insights; it is just a messy pile of data that requires extremely complex cleaning and processing to become valuable.
**The reality is: To understand what happens on-chain, it is inseparable from context and off-chain data
Even if you have gone to great lengths to collect on-chain data, you still cannot answer the key questions: Who is using your stablecoin? Where are they?
All you can say is: “My stablecoin has been used.” This is not actionable and does not help you understand: user behavior, market penetration, growth opportunities. To achieve these insights, you must rely on off-chain context. And the real question is: what off-chain data do you need, and how do you obtain it?
Application and protocol tags: There is no single source of truth for tokenizing on-chain activity. Flipside, Dune, Open Label Initiative, Block Explorer, Arkham – they all provide some information, but each has its own pattern and limited coverage. In order to answer questions such as “What application is used for this address?” “or” What kind of use cases are we seeing? You’ll need to unify these fragmented sources of tags and manually tag important wallet addresses. If you don’t, you’ll only be able to use raw transaction data, which doesn’t provide any information about actual usage patterns.
The reality is that solving this labeling issue requires a significant amount of resources and industry connections. You need to establish partnerships with major L1s and protocols to build a comprehensive labeling dataset. Most teams do not have enough bandwidth or connections to manually handle this issue—that’s why many analytic efforts encounter bottlenecks after acquiring raw blockchain data. The context layer is where the real work begins.
Blockchain is far more complex than it seems. While the industry has begun to standardize around specific design patterns for token transfers over the past few years, this has not always been the case. When bridging technology first became popular, there was no community standard for tracking cross-chain activity. This creates confusion when trying to accurately track balances and transfers – especially for tokens that have been around long enough before these standards were introduced. You need to understand the specific history and characteristics of each chain to get accurate data.
**Reality: The “database model” of blockchain has been constantly changing — you must become a “historian on the chain” to obtain accurate data
We easily forget that these ecosystems are constantly changing. Take Solana as an example; its architecture (how the blockchain operates) and token program (the way tokens are created and transferred) have undergone significant upgrades.
Based on this, people often hear that blockchain is an immutable, public, append-only database. While this is generally true now, it wasn’t always the case in the early days. Optimism is a good example - they didn’t just launch after a genesis event. In fact, they completely relaunched a few months later.
What is the result? There is no complete dataset on all token transfers on the original Optimism chain.
Why is this important? The missing data is crucial for understanding the current and historical activities of major stablecoins on the OP mainnet, including USDC, USDT, and DAI. Without this data, you cannot obtain a complete dataset and cannot calculate accurate wallet balances.
Building an accurate dataset requires becoming a blockchain historian. Understanding the subtle evolution of each chain and explaining all these historical differences takes years of effort.
Blockchain data faces unique challenges that simply do not exist in other industries. Even though it is nominally “open and transparent”, extracting meaningful insights actually requires off-chain data, integrating dozens of data service providers, reading contextual information scattered across crypto Twitter and official documents, along with a team of more than 10 engineers. Otherwise, you’re just a blind person trying to feel an elephant, chasing a phantom market that changes at the speed of light.