Developer Guide for TEE

Author: prateek, roshan, siddhartha & linguine (Marlin), krane (Asula)Compilation: Shew, GodRealmX

Since Apple announced the launch of private cloud and NVIDIA provided confidential computing in GPUs, Trusted Execution Environments (TEE) have become increasingly popular. Their confidentiality guarantees help protect user data (which may include private keys), and their isolation ensures that the execution of programs deployed on them cannot be tampered with - whether by humans, other programs, or operating systems. Therefore, it is not surprising that TEEs are widely used in the Crypto x AI field to build products.

Like any new technology, TEE is going through an optimistic experimental phase. This article aims to provide a basic conceptual guide for developers and general readers to understand what TEE is, TEE's security model, common vulnerabilities, and best practices for secure use of TEE. *(Note: To make the text easier to understand, we deliberately replaced TEE terms with simpler equivalent words).

What is TEE

TEE is an isolated environment in a processor or data center, where programs can run without any interference from the rest of the system. In order to prevent interference from other parts, a series of designs are needed, mainly including strict access control, that is, control over the access of other parts of the system to programs and data within the TEE. Currently, TEEs are ubiquitous in mobile phones, servers, PCs, and cloud environments, making them very accessible and reasonably priced.

The above content may sound vague and abstract, in fact, different servers and cloud providers implement TEE in different ways, but the fundamental purpose is to prevent TEE from being interfered with by other programs.

Most readers may use biometric information to log into devices, such as unlocking their phones with fingerprints. But how do we ensure that malicious applications, websites, or jailbroken operating systems cannot access and steal this biometric information? In fact, in addition to encrypting the data, the circuits in the TEE device do not allow any program to access the memory and processor area occupied by sensitive data.

Hardware wallet is another example of TEE application scenarios. The hardware wallet is connected to the computer and communicates with it in a sandbox, but the computer cannot directly access the mnemonic stored in the hardware wallet. In both of the above scenarios, users trust that the device manufacturer can design the chip correctly and provide appropriate firmware updates to prevent the confidential data inside TEE from being exported or viewed.

Security Model

Unfortunately, there are many different types of TEE implementations, and these different implementations (IntelSGX, IntelTDX, AMDSEV, AWSNitroEnclaves, ARMTrustZone) all require independent security model modeling and analysis. In the rest of this article, we will mainly discuss IntelSGX, TDX, and AWSNitro, because these TEE systems have more users and complete and available development tools. The above systems are also the most commonly used TEE systems in Web3.

Generally speaking, the workflow of applications deployed in TEE is as follows:

  1. 'Developers' have written some code, which may have been open-sourced or not.
  2. Then, the developer packages the code into an Enclave Image File (EIF), which can run in the TEE
  3. EIF is hosted on a server with TEE system. In some cases, developers can directly use a personal computer with TEE to host EIF and provide services externally.
  4. Users can interact with the application through predefined interfaces.

Obviously, there are three potential risks here:

  • Developers: What is the purpose of preparing EIF code? The EIF code may not conform to the business logic of the project party's external publicity, and may steal users' private data.
  • Server: Is the TEE server running the expected EIF file? Or is the EIF really executed inside the TEE? The server may also run other programs inside the TEE.
  • Supplier: Is TEE's design safe? Is there a backdoor that leaks all TEE data to the supplier?

Fortunately, TEE now has a solution to eliminate the above risks, namely Reproducible Builds( and Remote Atteststations).

So what is reproducible build? Modern software development often requires importing a large number of dependencies, such as external tools, libraries, or frameworks, etc., and these dependency files may also have hidden dangers. Now solutions like npm use the code hash corresponding to the dependency file as a unique identifier. When npm finds that a dependency file is inconsistent with the recorded hash value, it can be considered that the dependency file has been modified.

Reproducibility can be considered as a set of standards, with the goal of obtaining consistent hash values when any code runs on any device, as long as it is built according to a predefined process. Of course, in practice, we can also use products other than hash as identifiers. Here, we call it code measurement (code measurement).

Nix is a commonly used tool for reproducible builds. When the source code of a program is made public, anyone can inspect the code to ensure that developers have not inserted any malicious content. Anyone can use Nix to build the code and check if the resulting artifact has the same code measure/hash as the one deployed by the project in the production environment. But how do we know the code measure value of a program in TEE? This is where the concept of 'Remote Attestation' comes into play.

Remote attestation is a signed message from the TEE platform (trusted party) that includes measurements of the code of a program, the version of the TEE platform, etc. Remote attestation allows external observers to know that a program is being executed in a secure location (the real TEE of version xx) that is inaccessible to anyone.

The ability to build and remotely prove enables any user to know the actual code running inside the TEE and the TEE platform version information, thereby preventing developers or servers from acting maliciously.

However, **in the case of TEE, it is always necessary to trust its supplier. If the TEE supplier behaves maliciously, it can directly forge remote attestation. Therefore, if the supplier is considered as a possible attack vector, it is advisable to avoid relying solely on TEE and it is best to combine them with ZK or consensus protocols.

The Charm of TEE

In our opinion, the popularity of TEE is particularly due to the following features, especially the deployment friendliness for AI Agent:

  • Performance: TEE can run the LLM model with similar performance and cost overhead as ordinary servers. zkML, on the other hand, requires a considerable amount of computing power to generate zk proofs for LLM.
  • GPU support: NVIDIA provides TEE calculation support in its latest GPU series (Hopper, Blackwell, etc.)
  • Accuracy: LLMs are non-deterministic; multiple inputs of the same prompt may yield different results. Therefore, multiple nodes (including observers attempting to create fraud proofs) may never reach consensus on the operation result of LLMs. In this scenario, we can trust that the LLM running in the TEE cannot be manipulated by malicious actors, and the program inside the TEE always runs as written, making the TEE more suitable than opML or consensus to ensure the reliability of LLM inference results.
  • Confidentiality: The data in TEE is not visible to external programs. Therefore, the private keys generated or received in TEE are always secure. This feature can be used to assure users that any message signed by this key comes from the internal program of TEE. Users can safely host the private key to TEE and set some signing conditions, and can confirm the signature from TEE meets the pre-set signing conditions
  • Networking: Programs running in TEE can securely access the Internet through some tools (without revealing queries or responses to servers running on TEE, while still ensuring correct data retrieval for third parties). This is very useful for retrieving information from third-party APIs and can be used to outsource computation to trusted yet proprietary model providers.
  • Write permission: In contrast to the zk scheme, the code running in TEE can build messages (whether tweets or transactions) and send them out through API and RPC networks.
  • Developer-friendly: TEE-related frameworks and SDKs allow people to write code in any language and easily deploy programs to TEE, just like in cloud servers

Regardless of good or bad, it is currently difficult to find alternative solutions for a considerable number of use cases using TEE. We believe that the introduction of TEE further expands the development space for on-chain applications, which may drive the emergence of new application scenarios.

TEE is not a silver bullet

Programs running in TEE are still susceptible to a range of attacks and errors. Like smart contracts, they are prone to a series of issues. For simplicity, we classify potential vulnerabilities as follows:

  • Developer negligence
  • Runtime Vulnerability
  • Architectural design defects
  • Operational Issues

Developer Negligence

Whether intentional or unintentional, developers can undermine the security guarantees of programs in TEE through intentional or unintentional code. This includes:

  • Opaque Code: The security model of TEE relies on external verifiability. The transparency of the code is crucial for external third-party verification.
  • Code metrics exist issues: Even if the code is public, if there is no third party rebuilding the code and checking the code metrics in the remote attestation, and then checking based on the code metrics provided in the remote attestation. This is similar to receiving zk proof but not verifying it.
  • Unsafe Code: Even if you carefully generate and manage keys in TEE, the logic contained in the code may also leak the keys inside the TEE during external calls. In addition, the code may contain backdoors or vulnerabilities. Compared to traditional backend development, it requires high standards in software development and auditing processes, similar to smart contract development.
  • Supply Chain Attacks: Modern software development involves a large amount of third-party code. Supply chain attacks pose a significant threat to the integrity of TEE.

Runtime Vulnerabilities

Developers, no matter how cautious, may still fall victim to runtime vulnerabilities. Developers must carefully consider whether any of the following may affect the security assurance of their project:

  • Dynamic Code: Not all code may be kept transparent. Sometimes, the use case itself requires dynamically executing opaque code loaded into the TEE at runtime. Such code can easily leak secrets or break invariants and must be carefully guarded against.
  • Dynamic Data: Most applications use external APIs and other data sources during execution. The security model extends to include these data sources, which are on the same level as oracles in DeFi. Incorrect or outdated data can lead to disaster. For example, in the use case of the AI Agent, over-reliance on LLM services such as Claude.
  • Unsafe and unstable communication: TEE needs to run inside a server containing TEE components. From a security perspective, the server running TEE is actually a perfect man-in-the-middle (MitM) between TEE and external interactions. The server not only can eavesdrop on TEE's external connections and view the content being sent, but can also inspect specific IPs, restrict connections, and inject packets in connections, with the aim to deceive one party into thinking it is coming from xx.

For example, in TEE, a matching engine that can handle encrypted transactions cannot provide a fair ordering guarantee (anti-MEV), because routers/gateways/hosts can still discard, delay, or prioritize based on the IP address of the packet source.

Architectural Flaw

The technology stack used by TEE applications should be handled with caution. When building TEE applications, the following issues may arise:

  • **Applications with a larger attack surface: The attack surface of an application refers to the number of code modules that need to be completely secure. Code with a larger attack surface is very difficult to audit and may hide bugs or exploitable vulnerabilities. This often conflicts with the developer's experience. For example, compared to TEE programs that do not rely on Docker, TEE programs that rely on Docker have a much larger attack surface. Enclaves that rely on mature operating systems have a larger attack surface compared to TEE programs that use the lightest operating systems.
  • Portability and Liveliness: In Web3, applications must be censorship-resistant. Anyone can launch a TEE and take over inactive system participants, making applications within the TEE portable. The biggest challenge here is the portability of keys. There are key derivation mechanisms within some TEE systems, but once a TEE's key derivation mechanism is used, other servers cannot locally generate keys within external TEE programs, which typically limits TEE programs to the same machine and is insufficient for maintaining portability.
  • Insecure Trust Root: For example, when running an AI Agent in a TEE, how to verify if a given address belongs to that Agent? If not carefully designed, the true trust root could be an external third party or a key custody platform, rather than the TEE itself.

Operational Issues

Last but not least, there are also some practical considerations about how to truly operate a server that runs TEE programs.

  • Unsafe Platform Version: The TEE platform occasionally receives security updates, which are reflected as platform versions in remote attestation. If your TEE is not running on a secure platform version, hackers can steal keys from the TEE using known attack vectors. Even worse, your TEE may be running on a secure platform version today, but it could be insecure tomorrow.
  • Lack of Physical Security: Despite your best efforts, TEE may be vulnerable to side-channel attacks, which typically require physical access and control of the server where TEE is located. Therefore, physical security is an important layer of defense in depth. A related concept is cloud attestation, where you can prove that TEE is running in a cloud data center with physical security guarantees.

Build a secure TEE program

We divide our suggestions into the following points:

  • The safest solution
  • Necessary precautions taken
  • Recommendations depend on use cases
  1. The safest solution: no external dependencies.

Creating highly secure applications may involve eliminating external dependencies such as external inputs, APIs, or services to reduce the attack surface. This approach ensures that the application operates independently without external interactions that could compromise its integrity or security. While this strategy may limit the diversity of functions, it can provide a high level of security.

If the model is running locally, this level of security can be achieved for most CryptoxAI use cases.

2. Necessary preventive measures taken

Whether the application has external dependencies or not, the following content is required!

Consider TEE applications as smart contracts, not backend applications; maintain a lower update frequency and strict testing.

Building TEE programs should be as strict as writing, testing, and updating smart contracts. Like smart contracts, TEE runs in a highly sensitive and tamper-evident environment where errors or unexpected behavior may result in serious consequences, including complete loss of funds. Thorough auditing, extensive testing, and minimal, carefully audited updates are essential to ensuring the integrity and reliability of TEE-based applications.

Audit the code and check the build pipeline

The security of an application depends not only on the code itself, but also on the tools used in the build process. A secure build pipeline is crucial for preventing vulnerabilities. TEE only guarantees that the provided code will run as expected, but it cannot fix defects introduced during the build process.

To reduce risks, code must be rigorously tested and audited to eliminate errors and prevent unnecessary information leakage. Additionally, reproducible builds play a critical role, especially when code is developed by one party and used by another. Reproducible builds allow anyone to verify that the program executed inside the TEE matches the original source code, ensuring transparency and trust. Without reproducible builds, it is nearly impossible to determine the exact contents of the program executed inside the TEE, which jeopardizes the security of the application.

For example, the source code of DeepWorm (a project that runs a worm brain simulation model in TEE) is completely open. The executable program in TEE is built in a reproducible manner using the Nix pipeline.

Use audited or verified libraries

When processing sensitive data in the TEE program, please only use audited libraries for key management and private data processing. Unaudited libraries may expose keys and compromise the security of the application. Prioritize thoroughly reviewed, security-centric dependencies to maintain the confidentiality and integrity of the data.

Always verify the proof from TEE

Users interacting with TEE must verify the remote attestation or verification mechanism generated by TEE to ensure a secure and trustworthy interaction. Without these checks, the server may manipulate responses, making it impossible to distinguish between genuine TEE outputs and tampered data. Remote attestation provides crucial evidence for the code library and configuration running in TEE, allowing us to determine whether the program executed in TEE is as expected based on the remote attestation.

Specific verification can be done on-chain (IntelSGX, AWSNitro), using ZK proof (IntelSGX, AWSNitro) for off-chain verification, or by users themselves or hosting services (such as t16z or MarlinHub).

3. Depending on the use case recommendations

Based on the use case and structure of the application, the following tips may help to make your application more secure.

Ensure that user interactions with TEE are always performed over a secure channel

The server where the TEE is located is essentially untrusted. The server can intercept and modify communications. In some cases, it may be acceptable for the server to read data without modifying it, while in other cases, even reading data may be undesirable. To mitigate these risks, it is crucial to establish a secure end-to-end encrypted channel between the user and the TEE. At the very least, please ensure that messages include a signature to verify their authenticity and source. Additionally, users should always verify that the TEE provides remote attestation to confirm they are communicating with the correct TEE. This ensures the integrity and confidentiality of the communication.

For example, Oyster is able to support secure TLS issuance by using CAA records and RFC8657. In addition, it provides a TEE native TLS protocol called Scallop, which does not rely on WebPKI.

Knowing that TEE memory is transient

TEE memory is transient, which means that when the TEE is closed, its contents (including encryption keys) will be lost. If there is no secure mechanism to store this information, critical data may become permanently inaccessible, potentially jeopardizing funds or operations.

Multi-party computation (MPC) networks with decentralized storage systems such as IPFS can be used as a solution to this problem. The MPC network splits the keys to multiple nodes, ensuring that no single node holds the complete key, while allowing the network to reconstruct the key when needed. Data encrypted with this key can be securely stored on IPFS.

If necessary, the MPC network can provide keys to new TEE servers running the same image, provided that specific conditions are met. This approach ensures resilience and strong security, maintaining data accessibility and confidentiality even in untrusted environments.

There is another solution, that is, TEE assigns the relevant transactions to different MPC servers for separate signature, and after the MPC servers sign, the signatures are aggregated and the transactions are finally put on the chain. This method is much less flexible and cannot be used to store API keys, passwords, or arbitrary data (without trusted third-party storage services).

Reduce Attack Surface

For security-critical use cases, it's worth sacrificing developer experience to minimize peripheral dependencies as much as possible. For example, Dstack comes with a minimal Yocto-based kernel that only includes the modules required for Dstack to work. It may even be worth using older technologies like SGX (beyond TDX) because the technology doesn't require a boot loader or operating system to be part of the TEE.

Physical Isolation

By physically isolating TEE from possible human intervention, the security of TEE can be further enhanced. Although we can rely on data centers and cloud providers to host TEE servers and believe that data centers can provide physical security. Projects like Spacecoin, however, are exploring a rather interesting alternative - space. The SpaceTEE paper relies on security measures such as measuring the inertia moment after launch to verify whether the satellite deviates from the expected trajectory during the orbital entry process.

Multiple Validators

Just as Ethereum relies on multiple client implementations to reduce the risk of bugs affecting the entire network, multiprovers use different TEE implementation schemes to enhance security and resilience. By running the same computational steps across multiple TEE platforms, multiproving ensures that vulnerabilities in one TEE implementation will not jeopardize the entire application. While this approach requires deterministic computation processes, or defining consensus among different TEE implementation schemes in non-deterministic scenarios, it also offers significant advantages such as fault isolation, redundancy, and cross-verification, making it a good choice for applications that require reliability assurance.

Looking to the future

TEE has obviously become a very exciting field. As mentioned earlier, the ubiquity of AI and its continuous access to sensitive user data means that large tech companies like Apple and NVIDIA are using TEE in their products and offering it as part of their products.

On the other hand, the crypto community has always been very focused on security. As developers try to expand on-chain applications and use cases, we have seen TEE become popular as a solution that provides the right balance between functionality and trust assumptions. While TEE is not as trust-minimized as a complete ZK solution, we expect TEE to become the approach for the gradual integration of Web3 companies and products from large tech companies.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments