When your browser becomes a proxy

Author: Mario Chow & Figo @IOSG

Introduction

In the past 12 months, the relationship between web browsers and automation has undergone dramatic changes. Almost all major tech companies are racing to build autonomous browser agents. This trend became even more apparent starting at the end of 2024: OpenAI launched the Agent mode in January, Anthropic released the "computer usage" feature for the Claude model, Google DeepMind introduced Project Mariner, Opera announced the agent-based browser Neon, and Perplexity AI launched the Comet browser. The signal is very clear: the future of AI lies in agents that can autonomously navigate the web.

This trend is not just about adding smarter chatbots to browsers, but a fundamental shift in the way machines interact with the digital environment. Browser agents are a class of AI systems that can "see" web pages and take actions: clicking links, filling out forms, scrolling through pages, typing text: just like human users. This model promises to unleash tremendous productivity and economic value, as it can automate tasks that still require human intervention or are too complicated for traditional scripts to handle.

▲ GIF demonstration: Actual operation of AI browser proxy: Follow the instructions, navigate to the target dataset page, automatically take screenshots and extract the required data.

Who will win the AI browser war?

Almost all major tech companies (as well as some startups) are developing their own browser AI agent solutions. Here are some of the most representative projects:

OpenAI – Agent Mode

OpenAI's Agent mode (formerly known as Operator, launching in January 2025) is an AI agent with a built-in browser. Operator is capable of handling a variety of repetitive online tasks: for example, filling out web forms, ordering groceries, scheduling meetings: all accomplished through standard web interfaces commonly used by humans.

▲ AI agents schedule meetings like professional assistants: check calendars, find available time slots, create events, send confirmations, and generate .ics files for you.

Anthropic – Claude's "Computer Use":"

By the end of 2024, Anthropic introduced a brand new "Computer Use" feature for Claude 3.5, enabling it to operate computers and browsers like a human. Claude can see the screen, move the cursor, click buttons, and input text. This is the first large model agent tool of its kind to enter public beta testing, allowing developers to have Claude automatically navigate websites and applications. Anthropic positions it as an experimental feature, with the main goal of automating multi-step workflows on the web.

Perplexity – Comet

AI startup Perplexity, known for its Q&A engine, launched the Comet browser in mid-2025 as an AI-driven alternative to Chrome. At its core, Comet features a conversational AI search engine built into the address bar (omnibox), capable of providing instant answers and summaries instead of traditional search links.

In addition, Comet also has a built-in Comet Assistant, which is a resident agent in the sidebar that can automatically perform daily tasks across websites. For example, it can summarize the emails you have opened, schedule meetings, manage browser tabs, or browse and fetch web information on your behalf.

The Comet aims to seamlessly integrate browsing with AI assistants by allowing agents to perceive current webpage content through the sidebar interface.

Real-world application scenarios of browser proxies

In the previous text, we have reviewed how major technology companies (OpenAI, Anthropic, Perplexity, etc.) inject functionality into browser agents through different product forms. To better understand their value, we can further explore how these capabilities are applied in real-life scenarios in daily life and business workflows.

Routine web automation

E-commerce and personal shopping

A very practical scenario is to delegate shopping and booking tasks to an agent. The agent can automatically fill your online shopping cart and place orders based on a fixed list, or search for the lowest prices among multiple retailers and complete the checkout process on your behalf.

For travel, you can have AI perform tasks like this: "Help me book a flight to Tokyo next month (with a ticket price under $800), and also book a hotel with free Wi-Fi." The agent will handle the entire process: searching for flights, comparing options, filling in passenger information, and completing the hotel reservation, all done through airline and hotel websites. This level of automation far surpasses existing travel bots: it’s not just recommendations, but direct execution of purchases.

Improve office efficiency

Agents can automate many repetitive business operations that people perform in their browsers. For example, organizing emails and extracting to-do items, or checking for openings across multiple calendars and automatically scheduling meetings. Perplexity's Comet assistant can already summarize your inbox content through a web interface or add events to your schedule. Agents can also log into SaaS tools to generate regular reports, update spreadsheets, or submit forms, once they have your authorization. Imagine an HR agent that can automatically log into different job boards to post positions; or a sales agent that can update lead data in a CRM system. These everyday mundane tasks would typically consume a lot of employee time, but AI can accomplish them by automating web forms and page operations.

In addition to single tasks, agents can connect complete workflows across multiple network systems. All these steps need to be operated in different web interfaces, which is the strength of the browser agent. Agents can log into various dashboards for troubleshooting and even orchestrate processes, such as completing onboarding for new employees (creating accounts on multiple SaaS websites). Essentially, any multi-step operation that currently requires opening multiple websites can be delegated to the agent.

Current challenges and limitations

Despite the great potential, today's browser proxies are still far from perfect. Current implementations reveal some long-standing technical and infrastructure challenges:

Architecture mismatch

Modern networks are designed for human-operated browsers and have gradually evolved over time to actively resist automation. Data is often buried in HTML/CSS optimized for visual presentation, restricted by interactive gestures (mouse hover, swipe), or can only be accessed through undocumented APIs.

On this basis, the anti-scraping and anti-fraud systems have artificially added extra barriers. These tools combine IP reputation, browser fingerprinting, JavaScript challenge feedback, and behavioral analysis (such as randomness of mouse movements, typing rhythm, and dwell time). Ironically, the more "perfect" and efficient the AI agents behave—for instance, filling out forms instantly and never making mistakes—the more likely they are to be identified as malicious automation. This can lead to hard failures: for example, an agent from OpenAI or Google may successfully complete all steps before checkout, but ultimately be blocked by CAPTCHA or secondary security filters.

The human-optimized interface combined with a robot-unfriendly defense layer forces agents to adopt a fragile "human-machine imitation" strategy. This approach is highly prone to failure, with a low success rate (without human intervention, the completion rate of full transactions is still less than one-third).

Trust and security concerns

To give agents full control, it usually requires access to sensitive information: login credentials, cookies, two-factor authentication tokens, and even payment information. This raises concerns that are understandable to both users and businesses:

What should I do if the agent encounters an error or is deceived by a malicious website?

If an agent agrees to a service term or executes a transaction, who should be responsible?

Based on these risks, the current systems generally adopt a cautious approach:

Google's Mariner does not input credit card information or agree to the terms of service, but rather returns it to the user.

OpenAI's Operator will prompt users to take over the login or CAPTCHA challenge.

Anthropic's Claude-driven agents may directly refuse login due to security concerns.

The result is: Frequent pauses and handovers between AI and humans have weakened the seamless automation experience.

Despite these obstacles, progress is still advancing rapidly. Companies like OpenAI, Google, and Anthropic learn from failures with each iteration. As demand grows, a kind of "co-evolution" is likely to emerge: websites become more friendly to agents in favorable scenarios, while agents continuously enhance their ability to mimic human behavior to bypass existing barriers.

Methods and Opportunities

Current browser proxies are facing two completely different realities: on one hand, there is the hostile environment of Web2, where anti-scraping and security defenses are omnipresent; on the other hand, there is the open environment of Web3, where automation is often encouraged. This difference determines the direction of various solutions.

The following solutions can be roughly divided into two categories: one category helps agents bypass the hostile environment of Web2, while the other category consists of solutions that are native to Web3.

Although the challenges faced by browser proxies remain significant, new projects are continuously emerging, attempting to directly address these issues. The cryptocurrency and decentralized finance (DeFi) ecosystem is becoming a natural testing ground, as it is open, programmable, and less hostile to automation. Open APIs, smart contracts, and on-chain transparency eliminate many friction points commonly found in the Web2 world.

The following are four types of solutions, each addressing one or more core limitations of the present.

Native proxy browser for on-chain operations

These browsers are designed from the ground up for autonomous proxy driving and are deeply integrated with blockchain protocols. Unlike traditional Chrome browsers, which require additional dependencies like Selenium, Playwright, or wallet plugins for on-chain operation automation, native proxy browsers provide direct APIs and trusted execution paths for proxies to call.

In decentralized finance, the validity of transactions relies on cryptographic signatures rather than whether users are "human-like". Therefore, in on-chain environments, agents can bypass common CAPTCHA, fraud detection scores, and device fingerprint checks found in the Web2 world. However, if these browsers point to Web2 websites like Amazon, they cannot bypass the relevant defense mechanisms and will still trigger normal anti-bot measures in that scenario.

The value of proxy browsers is not their magical ability to access all websites, but rather in:

Native blockchain integration: built-in wallet and signature support, no need to go through MetaMask pop-ups or parse the dApp front-end DOM.

Automation-first design: provides stable high-level instructions that can be directly mapped to protocol operations.

Security model: Fine-grained access control and sandboxing ensure the safety of private keys during the automation process.

Performance Optimization: Capable of executing multiple on-chain calls in parallel without the need for browser rendering or UI delay.

Case: Donut

Donut integrates blockchain data and operations as first-class citizens. Users (or their agents) can hover to view real-time risk indicators of tokens or directly input natural language commands like "/swap 100 USDC to SOL". By bypassing the hostile friction points of Web2, Donut allows agents to operate at full speed in DeFi, enhancing liquidity, arbitrage, and market efficiency.

Verifiable and Trusted Proxy Execution

Allowing agents to gain sensitive permissions carries significant risks. Relevant solutions utilize Trusted Execution Environments (TEEs) or Zero-Knowledge Proofs (ZKPs) to encrypt and verify the expected behavior of agents before execution, enabling users and counter-parties to validate agent actions without exposing private keys or credentials.

Case Study: Phala Network

Phala uses TEEs (such as Intel SGX) to isolate and protect the execution environment, thereby preventing Phala operators or attackers from spying on or tampering with agent logic and data. A TEE is like a hardware-backed "safe room," ensuring confidentiality (invisible to outsiders) and integrity (unable to be modified by outsiders).

For browser proxies, this means that it can log in, hold session tokens, or process payment information, and this sensitive data will never leave the secure vault. Even if the user's machine, operating system, or network is compromised, it cannot be leaked. This directly alleviates one of the biggest obstacles to the implementation of proxy applications: the trust issue concerning sensitive credentials and operations.

Decentralized structured data network

Modern anti-bot detection systems not only check whether requests are "too fast" or "automated," but also combine IP reputation, browser fingerprints, JavaScript challenge feedback, and behavioral analysis (such as cursor movement, typing rhythm, and session history). Proxies from data center IPs or completely repeatable browsing environments can be easily identified.

To address this issue, this type of network no longer crawls web pages optimized for humans, but instead directly collects and provides machine-readable data, or proxies traffic through real human browsing environments. This approach bypasses the vulnerabilities of traditional crawlers in the parsing and anti-crawling stages, providing proxies with cleaner and more reliable inputs.

By proxying the agent traffic to these real-world sessions, the distribution network allows AI agents to access web content like humans without immediately triggering a block.

Case

Grass: Decentralized data/DePIN network, where users share idle residential broadband, thus providing proxy-friendly and geographically diverse access channels for public web data collection and model training.

WootzApp: An open-source mobile browser that supports cryptocurrency payments, featuring a backend proxy and zero-knowledge identity; it gamifies AI/data tasks for consumers.

Sixpence: A distributed browser network that routes traffic for AI agents through browsing contributed by global contributors.

However, this is not a complete solution. Behavioral detection (mouse/scrolling trajectory), account-level restrictions (KYC, account age), and fingerprint consistency checks may still trigger blocks. Therefore, distributed networks are best viewed as a foundational layer of anonymity, which must be combined with human-like execution strategies to achieve maximum effectiveness.

Web Standards for Agents (Prospective)

Currently, more and more technology communities and organizations are exploring: how should websites safely and compliantly deal with automated agents in the future, if network users are not only humans?

This has sparked discussions around some emerging standards and mechanisms, aimed at allowing websites to clearly express "I allow trusted agents access" and provide a secure channel for interactions, rather than defaulting to intercepting agents as "bot attacks" as is done today.

"Agent Allowed" tag: Just like the robots.txt that search engines comply with, future web pages may include a tag in the code that tells the browser agent "this can be accessed safely." For example, if you use a proxy to book a flight, the website will not pop up a bunch of CAPTCHA challenges, but will directly provide an authenticated interface.

API Gateway for Certified Agents: Websites can open dedicated access for verified agents, like a "fast lane." Agents do not need to simulate human clicks or inputs, but instead take a more stable API route to complete orders, payments, or data queries.

W3C Discussion: The World Wide Web Consortium (W3C) is researching how to establish a standardized channel for "managed automation." This means that in the future, we may have a set of globally applicable rules allowing trusted agents to be recognized and accepted by websites, while maintaining security and accountability.

Although these explorations are still in the early stages, once implemented, they could greatly improve the relationship between humans ↔ agents ↔ websites. Imagine this: no longer needing agents to desperately mimic human mouse movements to "fool" risk control, but rather completing tasks openly through an "officially permitted" channel.

On this route, native crypto infrastructure may take the lead. This is because on-chain applications inherently rely on open APIs and smart contracts, which are friendly to automation. In contrast, traditional Web2 platforms may continue to adopt a cautious defensive stance, especially companies that rely on advertising or anti-fraud systems. However, as users and businesses gradually accept the efficiency improvements brought by automation, these standardization attempts are likely to become key catalysts in driving the entire internet towards a "proxy-first architecture."

Conclusion

Browser proxies are evolving from simple initial dialogue tools into autonomous systems capable of completing complex online workflows. This shift reflects a broader trend: embedding automation directly into the core interface of user interactions with the internet. While the potential for productivity enhancement is immense, the challenges are equally daunting, including how to overcome entrenched anti-robot mechanisms and how to ensure security, trust, and responsible usage.

In the short term, the enhancement of the reasoning capabilities of agents, faster speeds, closer integration with existing services, and advancements in distributed networks may gradually improve reliability. In the long term, we might see the gradual implementation of "agent-friendly" standards in scenarios where automation benefits both service providers and users. However, this transition will not be uniform: in automated-friendly environments like DeFi, adoption will be faster; whereas in Web2 platforms that heavily rely on user interaction control, acceptance will be slower.

In the future, competition among technology companies will increasingly focus on several aspects: how their agents navigate under real-world constraints, whether they can be safely integrated into critical workflows, and whether they can deliver stable results in diverse online environments. Whether all this will ultimately reshape the "browser wars" depends not solely on technical strength, but on the ability to establish trust, align incentives, and demonstrate tangible value in everyday use.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)