OpenAI has simply launched GPT-5.3-Codex, a brand new agentic coding mannequin that extends Codex from writing and reviewing code to dealing with a broad vary of labor on a pc. The mannequin combines the frontier coding efficiency of GPT-5.2-Codex with the reasoning {and professional} information capabilities of GPT-5.2 right into a single system, and it runs 25% quicker for Codex customers as a consequence of infrastructure and inference enhancements.
For Devs of us, GPT-5.3-Codex is positioned as a coding agent that may execute long-running duties that contain analysis, instrument use, and complicated execution, whereas remaining steerable ‘very similar to a colleague’ throughout a run.
Frontier agentic capabilities and benchmark outcomes
OpenAI evaluates GPT-5.3-Codex on 4 key benchmarks that focus on real-world coding and agentic habits: SWE-Bench Professional, Terminal-Bench 2.0, OSWorld-Verified, and GDPval.

On SWE-Bench Professional, a contamination-resistant benchmark constructed from actual GitHub points and pull requests throughout 4 languages, GPT-5.3-Codex reaches 56.8% with xhigh reasoning effort. This barely improves over GPT-5.2-Codex and GPT-5.2 on the similar effort degree. Terminal-Bench 2.0, which measures terminal expertise that coding brokers want, reveals a bigger hole: GPT-5.3-Codex reaches 77.3%, considerably increased than earlier fashions.


On OSWorld-Verified, an agentic computer-use benchmark the place brokers full productiveness duties in a visible desktop setting, GPT-5.3-Codex reaches 64.7%. People rating round 72% on this benchmark, which provides a tough human-level reference level.
For skilled information work, GPT-5.3-Codex is evaluated with GDPval, an analysis launched in 2025 that measures efficiency on well-specified duties throughout 44 occupations. GPT-5.3-Codex achieves 70.9% wins or ties on GDPval, matching GPT-5.2 at excessive reasoning effort. These duties embrace establishing displays, spreadsheets, and different work merchandise that align with typical skilled workflows.
A notable methods element is that GPT-5.3-Codex achieves its outcomes with fewer tokens than earlier fashions, permitting customers to “construct extra” inside the similar context and value budgets.
Past coding: GDPval and OSWorld
OpenAI emphasizes that software program devs, designers, product managers, and information scientists carry out a variety of duties past code technology. GPT-5.3-Codex is constructed to help throughout the software program lifecycle: debugging, deployment, monitoring, writing PRDs, enhancing copy, operating consumer analysis, exams, and metrics.
With customized expertise just like these utilized in prior GDPval experiments, GPT-5.3-Codex produces full work merchandise. Examples within the OpenAI official weblog embrace monetary recommendation slide decks, a retail coaching doc, an NPV evaluation spreadsheet, and a trend presentation. Every GDPval process is designed by a website skilled and displays lifelike work from that occupation.


On OSWorld, GPT-5.3-Codex demonstrates stronger computer-use capabilities than earlier GPT fashions. OSWorld-Verified requires the mannequin to make use of imaginative and prescient to finish various duties in a desktop setting, aligning intently with how brokers function actual purposes and instruments as an alternative of solely producing textual content.
An interactive collaborator within the Codex app
As fashions develop into extra succesful, OpenAI frames the principle problem as human supervision and management of many brokers working in parallel. The Codex app is designed to make managing and directing brokers simpler, and with GPT-5.3-Codex it positive factors extra interactive habits.
Codex now offers frequent updates throughout a run so customers can see key choices and progress. As an alternative of ready for a single last output, customers can ask questions, focus on approaches, and steer the mannequin in actual time. GPT-5.3-Codex explains what it’s doing and responds to suggestions whereas preserving context. This ‘follow-up habits’ might be configured within the Codex app settings.
A mannequin that helped practice and deploy itself
GPT-5.3-Codex is the primary mannequin on this household that was ‘instrumental in creating itself.’ OpenAI used early variations of GPT-5.3-Codex to debug its personal coaching, handle deployment, and diagnose check outcomes and evaluations.
The OpenAI analysis group used Codex to observe and debug the coaching run, monitor patterns throughout the coaching course of, analyze interplay high quality, suggest fixes, and construct purposes that visualize behavioral variations relative to prior fashions. The event group used Codex to optimize and adapt the serving harness, establish context rendering bugs, discover the foundation causes of low cache hit charges, and dynamically scale GPU clusters to keep up secure latency underneath visitors surges.
Throughout alpha testing, a researcher requested GPT-5.3-Codex to quantify extra work accomplished per flip and the impact on productiveness. The mannequin generated regex-based classifiers to estimate clarification frequency, optimistic and damaging responses, and process progress, then ran these over session logs and produced a report. Codex additionally helped construct new information pipelines and richer visualizations when commonplace dashboard instruments had been inadequate and summarized insights from hundreds of knowledge factors in underneath 3 minutes
Cybersecurity capabilities and safeguards
GPT-5.3-Codex is the primary mannequin OpenAI classifies as ‘Excessive functionality’ for cybersecurity-related duties underneath its Preparedness Framework and the primary mannequin it has skilled on to establish software program vulnerabilities. OpenAI states that it has no definitive proof that the mannequin can automate cyber assaults end-to-end and is taking a precautionary method with its most complete cybersecurity security stack to this point.
Mitigations embrace security coaching, automated monitoring, trusted entry for superior capabilities, and enforcement pipelines that incorporate menace intelligence. OpenAI is launching a ‘Trusted Entry for Cyber’ pilot, increasing the personal beta of Aardvark, a safety analysis agent, and offering free codebase scanning for extensively used open-source initiatives resembling Subsequent.js, the place Codex was just lately used to establish disclosed vulnerabilities.
Key Takeaways
- Unified frontier mannequin for coding and work: GPT-5.3-Codex combines the coding energy of GPT-5.2-Codex with the reasoning {and professional} capabilities of GPT-5.2 in a single agentic mannequin, and runs 25% quicker in Codex.
- State-of-the-art on coding and agent benchmarks: The mannequin units new highs on SWE-Bench Professional (56.8% at xhigh), Terminal-Bench 2.0 (77.3%), and achieves 64.7% on OSWorld-Verified and 70.9% wins or ties on GDPval, typically with fewer tokens than earlier fashions.
- Helps long-horizon internet and app growth: Utilizing expertise resembling ‘develop internet recreation’ and generic follow-ups like ‘repair the bug’ and ‘enhance the sport,’ GPT-5.3-Codex autonomously developed advanced racing and diving video games over thousands and thousands of tokens, demonstrating sustained multi-step growth capability.
- Instrumental in its personal coaching and deployment: Early variations of GPT-5.3-Codex had been used to debug the coaching run, analyze habits, optimize the serving stack, construct customized pipelines, and summarize large-scale alpha logs, making it the primary Codex mannequin ‘instrumental in creating itself.’
- Excessive-capability cyber mannequin with guarded entry: GPT-5.3-Codex is the primary OpenAI mannequin rated ‘Excessive functionality’ for cyber and the primary skilled on to establish software program vulnerabilities. OpenAI pairs this with Trusted Entry for Cyber, expanded Aardvark beta, free codebase scanning for initiatives resembling Subsequent.js.
Try the Technical details and Try it here. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


