How does Codex use computers? Three access points and permission boundaries

Question

> Original Title: Three Ways Codex Can Use a Computer > Original Author: Jason > Translation: Peggy, BlockBeats > Editor's Note: This article outlines three entry points for Codex to operate outside its environment: Computer Use, Chrome Extension, and in-app Browser. While all three seem to address the issue of "letting Codex use a computer," they correspond to different task scenarios, permission boundaries, and levels of trust. Among them, Computer Use covers the broadest scope, allowing direct control over authorized native applications, system settings, iOS simulators on macOS / Windows, and even completing workflows across multiple applications. It is suitable for GUI processes without API, plugin, or structured tool support, but at the cost of slower speed and the widest permission boundaries. The Chrome extension is ideal for tasks relying on login states, cookies, multiple tabs, and browser identity, such as Gmail, LinkedIn, Salesforce, internal backends, or logged-in research across multiple sites. The in-app Browser leans more toward development and debugging scenarios, especially suited for local services, visual bugs, responsive layouts, and design annotations; it does not inherit the user's normal browsing login state, has narrower capabilities, but offers stronger isolation. The core judgment of the article is that Codex is not limited to a single "way to use a computer." The truly important thing is to choose the narrowest, safest, and most structured interface based on the task. If plugins or MCP can be used, avoid visual control first; if the task only involves web development, prioritize using the in-app Browser; when user browser identity and login status are needed, switch to Chrome; only when structured tools cannot cover the task and desktop GUI interaction is essential, should Computer Use be the last mile. Appshots are not a fourth way to control the computer but a tool to "show" the current screen context to Codex. They solve the context input problem, while Browser, Chrome, and Computer Use address action execution. Seen together, this layered approach reveals a key aspect of AI agent productization: not granting models unlimited permissions but narrowing permissions, clarifying boundaries, and allowing users to retain oversight of key actions. Below is the original text: There are three ways for Codex to use a computer: Computer Use, Chrome Extension, and in-app Browser. They have some overlap, which can be confusing. After reading this article, you'll know how to install and trigger these three methods, when to use each scenario, how Appshots and Developer Mode connect them, and what to include in AGENTS.md so Codex can choose the appropriate interface itself. The simplified version is: ![](https://img-cdn.gateio.im/social/moments-eadd666417-f3d504dfcf-8b7abd-62a40f) That said, whenever possible, prefer plugins or MCP. For example, a Slack plugin can more precisely retrieve a thread than clicking around in Slack; actions generated by a GitHub plugin are easier to verify than having Codex drive the webpage. Visual control is best used where structured tools reach their limits. Everything can be @Computer================ Computer Use is the broadest among these three interfaces. It enables Codex to view and manipulate graphical interfaces on macOS and Windows, including windows, menus, keyboard input, and the clipboard within authorized applications. It is usually the slowest. Structured plugins can call APIs directly; Computer Use requires observing the interface, deciding where to click, waiting for application responses, then checking the next state. This visual loop consumes time but allows Codex to operate applications with no available API. On macOS, slowness doesn't necessarily mean disturbance. Computer Use can operate authorized applications in the background while you continue using other parts of your computer. Often, I open an app while using Codex, only to find it quietly completing a workflow in the background. Depending on which applications you have installed and authorized, these objects can include Spotify, Xcode, System Settings, iOS simulators, or even mirroring your iPhone with iPhone Mirroring. It can switch between multiple applications, handling workflows across different apps. Use it when the task involves:Native desktop applications like Spotify or financial apps;iOS simulators, iPhone Mirroring, or other workflows requiring graphical interface;System or application settings;Data sources without plugins or APIs;Workflows involving switching between multiple applications;The last step missing in a structured integration. Installation: Open Codex Settings > Computer Use, then click Install. Trigger: Mention @Computer, or explicitly ask Codex to use Computer Use. As model capabilities improve, it will call it automatically when needed. Here are some example scenarios: My favorite example started with a package being stolen. Amazon told me I’d need about 25 minutes to reach customer service. I handed a Codex thread to Computer Use, which checked the chat window every five minutes until the customer service appeared, then switched to checking every minute, trying to get a refund. When I returned from a shower, the refund was completed. > Use @Computer to open Spotify, find my Discover Weekly playlist, and start it. Do not change my account or subscription settings. Use @Computer to open iPhone Mirroring, reproduce the onboarding bug in the iOS app, and take a screenshot of the failing state. Fix the smallest relevant code path, then run the same flow again. I also use Computer Use as the "last mile" in structured workflows. During a video release, Codex could read feedback from Slack, modify code, and render a new video, but the Slack integration in that thread couldn't upload files. So, Computer Use clicked Add File to complete this missing step. It also has the widest trust boundary among the three. Only give it a specific app or workflow at a time. Keep it closed when sensitive apps are not part of the task; carefully check permission prompts; and supervise when dealing with finance, accounts, payments, credentials, privacy, or system security changes. Use @Chrome to handle multiple tabs and login states====================== The Codex Chrome extension allows Codex to access your logged-in Chrome state. When tasks depend on account info, cookies, browser profiles, or already opened and authenticated tabs, it should be used. This interface suits workflows in tools like:Gmail or LinkedIn;Salesforce or customer support backend;Internal dashboards;Logged-in research across multiple sites;Forms relying on your account or browser extensions. Installation: Open Codex Plugins, add Chrome, and follow setup steps. Codex will guide you to install the Codex Chrome extension and approve Chrome permissions. When the extension shows Connected, start a new thread. Trigger: Mention @Chrome, or explicitly ask Codex to use your logged-in Chrome browser: > Use @Chrome to review the open customer account, compare it with the support ticket in the other tab, and draft the missing fields. Stop before submitting. Chrome tasks run within tab groups, helping to keep related tabs together. Unlike in-app browsers, this interface carries your browser identity, making it more powerful and sensitive. Another major advantage is multi-tab control. Chrome can associate multiple tabs with the same task, reading context from one page, cross-referencing info on another, and continuing workflows on a third. Computer Use can also drive browsers visually, but Chrome understands the task as a browser workflow, not just a sequence of screen coordinates. Recently, I handed a tab with an open Strudel Composer page to Codex, asking it to make the music more interesting. Chrome provided the selected tab and the WebMCP tools exposed on that page. Codex analyzed the musical structure, rewrote harmonies and overall form over four minutes, adjusted tempo, saved the track, and continued playback. It didn't need to visually search for each control because Chrome combined the tab context with the page’s structured capabilities. I also used it to run a long-term Twitter thread. The general instructions were: > Every day, use Chrome to check my DMs, read relevant news, and look for feedback or mentions I should know about. Add anything durable to my vault. Do not post or send messages. What’s interesting isn’t just that Codex can open Twitter, but that this thread can stay logged into the same environment long-term, connecting findings to local files and leaving reviewable results. The trust boundary here is crucial. Websites may interpret Codex’s clicks, form submissions, and message sends as your actions. Web content itself is untrusted input. Clearly distinguish high-stakes steps: research, navigation, and drafting can be automated; before sending, publishing, purchasing, or submitting, review is needed. If the entire task is browser-based, prioritize Chrome over Computer Use. Chrome provides the native browser context needed for such tasks without expanding access to the entire desktop. Use in-app @Browser to develop your website======================== The in-app Browser is a browser embedded within a Codex thread. Sharing the same rendering page as Codex, it’s especially suited for building and debugging web applications. I usually start with:Local development servers;File-based preview pages;Public pages that don’t require login;Reproducing visual bugs;Checking responsive layouts;Leaving design feedback on page elements. Its key constraint is isolation. The in-app Browser doesn’t use your regular browser profile, cookies, extensions, login sessions, or existing tabs. When the task involves accounts, this is a limitation; but when it doesn’t, it’s a useful boundary. Setup: Open Codex Plugins, add the Browser plugin, and enable it. Trigger: Mention @Browser in your prompt, or explicitly ask Codex to use the in-app Browser: > Use @Browser to open vite app on http://localhost:3000/, reproduce the mobile overflow bug, fix it, and verify the same route on desktop and mobile widths. This creates a tight feedback loop: Codex can edit code, manipulate pages, check rendering, take screenshots, and re-verify after fixes. My favorite feature is annotation. When reviewing a local app, I can click elements directly or select areas to leave comments. Style controls let me preview and give precise feedback on text, fonts, spacing, and colors. I often combine this with voice input and process guidance: review pages, leave comments, and queue additional suggestions while Codex processes current feedback. The page becomes a living specification. This is especially useful for design work. I often ask Codex to turn an idea, research package, or project status into a single index.html file, then open it in the in-app Browser. Instead of describing the entire design in prompts, I can directly annotate: "This hierarchy is reversed," "Don’t make this look like a card," "These controls need more space," or "Use this font scale site-wide." Codex receives comments with relevant screenshots and element context, modifies the file, then reopens the page for the next iteration. > Create a single-file index.html for this project brief and open it in the in-app @Browser. This cycle feels more like collaborating on a canvas with a designer than exchanging screenshots and descriptions. The in-app Browser also works well as a starting point for hybrid workflows. In another thread, I opened a Tweet in the in-app Browser for Codex to investigate related discussions. The visible page helped it identify which Tweet I meant; then Codex switched to Twitter CLI, retrieving 38 replies, including nested replies hidden in the browser view. This exemplifies the "use the narrowest interface" principle: confirm context visually in the browser, then perform deeper searches with structured tools. There are trade-offs. The isolation of the in-app Browser makes it excellent for development but unsuitable for handling Google login, passkeys, or sites relying on browser extensions. When identity matters, switch to Chrome. Appshots======== Appshot isn’t a fourth way for Codex to control the computer. It’s a method to point Codex to your current visual context. On Mac, pressing CMD twice captures the most recent window. Codex attaches an image and all available text to the thread. You can create an Appshot of an error, an email, a design, a settings panel, or a stranger’s form, then simply say: This is the mental model I find easiest to remember: Appshots are how you point to something on your computer; Browser, Chrome, and Computer Use are how Codex takes action. Currently, Appshots are created via the Codex app on macOS. They capture the frontmost window, not the entire desktop. This makes them a useful way to provide focused context without granting control over the application. How to follow these developments======== These interfaces evolve rapidly. If you want practical details rather than waiting for a big release summary: Follow Ari Weinstein (@AriX) for updates on Computer Use and Appshots; Follow James Sun (@JamesZmSun) for Chrome-related content; Follow Andrew Ambrosino (@ajambrosino) for Codex app releases and broader desktop product narratives; Follow OpenAI Developers (@OpenAIDevs) for broader news on Codex and the OpenAI Platform. [Original Link] Click to learn about job openings at Rhythm BlockBeats **Join the Rhythm BlockBeats official community:** Telegram Subscription Group: https://t.me/theblockbeats Telegram Group Chat: https://t.me/BlockBeats_App Twitter Official Account: https://twitter.com/BlockBeatsAsia

How does Codex use computers? Three access points and permission boundaries

Everything can be @Computer

Use @Chrome to handle multiple tabs and login states

Use in-app @Browser to develop your website

Appshots

How to follow these developments

Trending Topics

MyGateTradeStory

USIranTalksPostponed

PredictWorldCup🇪🇸vs🇸🇦

TradFiCFDGoldMasters

HoldUSD1EarnYield

Pinned