OpenAI Introduces GPT 5.2: A Long-Term Workhorse for Agents, Codes and Information Work

nimda December 11, 2025

0 5 4 minutes read

OpenAI Introduces GPT 5.2: A Long-Term Workhorse for Agents, Codes and Information Work

Opena has just shipped GPT-5.2, its most advanced model for professional work and long agents, and it is not wrapped beyond chatgpt and API.

GPT-5.2 is a family of three variants. In ChatGpt, users see chatgpt-5.2 instantly, thinking like a pro. In the API, the models are parallel gpt-5.2-chat-latest, gpt-5.2again gpt-5.2-pro. Fast target with daily help and learning, think target complex work and agents, and pro arocates accoute over difficult technical and analytical tasks.

Benchmark profile, from GDPVAL to SWE benchmark

GPT-5.2 thinking is positioned as the main workhorse for real world information work. In GDPVAL, an assessment of well-defined knowledge jobs in 44 jobs in 9 major sectors, it beats or includes the top industrial experts in 70.9 percent of the time and 11 percent of the time. For Engineering teams this means that the model can reliably produce artifacts such as presentations, spreadsheets, schedules, and diagrams given programmed instructions.

Looking at the interior of the Junior Investing Banking Spreadsheet Design Spreads Delieling Activities, the average score is from 59.1 percent with GPT-5.1 to 68.4 percent and 78.4 percent with GPT-5.2 Pro. These functions include three statement models and purchase models given formatting constraints and citations, representing the functionality of many structured areas.

In Software Engineering, GPT-5.2 Reasoning reaches 55.6 percent in SWE-Bench Pro and 80.0 percent in SWE-Bench certified. SWE-BENCH PRO tests high-quality production in multiple languages, while Bench-Bench is certified with a focus on Python.

Long context and agentic mobility

Long context is an important design target. The thinking of GPT-5.2 sets a new state of the art in OpenAi MrCRV2, a Benchmark that puts many of the same “needle” questions into a long discussion “with steps that the model can produce the correct answer. It is the first model that reports that it reaches a high accuracy of 4 percent with 4 different MRCR in 256k tokens.

With sources beyond that context, GPT-5.2 thinking meets the answers /compact The end, which makes a context agreement to extend the active window of TOOLT HIGHT, RUNEN RUN DIVE. This is appropriate if you are building agents that call for multiple tools in multiple steps and need to maintain state above the raw token limit.

In the use of tools, the logic of GPT-5.2 reaches 98.7 percent in tau2-Bench Telecom, multi Turn Customer Support where the model should be a tool with a place for good work flow. Official examples from OpenAI Reject Post Show Scarios as a traveler with a delayed flight, lost connection, lost accommodation and compensation in a fixed order while GPT-5.1 leaves the stairs undefined.

Theory, science and mathematics

The quality of vision is also progressive. GPT-5.2 reasoning prices are almost the same as Chancurming prices and intuitive benchmarks such as the Charxiv consultation and the Python Tool when the Python tool is enabled. The model shows an improved understanding of the images, for example when it labels the components of the motherboard with balanced boxes, GPT-5.2 identifies other regions with the installation of GPT-5.1.

For scientific workloads, gpt-5.2 Pro is 93.2 percent and GPT-5.2 Thinking is 92.4 percent in GPQA Diamond, and GPT-5.2 Thinking Solves 40.3 percent for TIER 3.3 Python-enabled tools. These benches include degrees in CRIDICE physics, Chemistry, Biology and Expertwork Early use where GPT-5.2 Pro has had an effect on evidence in the validation of human learning under human validation.

Comparison table

Template	Basic posture	Context Window / Max Output	Information Cusaff	Amazing benchmarks (think / pro vs gpt-5.1 think)
GPT-5.1	Flagship model for coding jobs and agentic jobs with configurable consulting effort	400,000 token core, 128,000 max output	2024-09-30	SWE-Bench Pro 50.8 50.8 percent, SW-Bench verified 76.3 percent, Arc-Agi-1 72.8 percent 72.8 percent, arc-agi-2 17.6 percent
GPT-5.2 (thinking)	A new Flagship model for coding and agent jobs across the industry and long-running agents	400,000 token core, 128,000 max output	2025-08-31	GDPVAL wins or ties 70.9 Percent 70.9% vs industry experts, SWE-BENCH Pro 80.0 percent, arc-agi-1 percent 86.2 percent, arc-agi percent
GPT-5.2 Pro	The advanced version of GPT-5.2 for the most difficult reasoning and scientific loads, produces intelligent answers and specific objects	400,000 token core, 128,000 max output	2025-08-31	GPQA Diamond 93.2 Percent 92.4 percent thinking GPT-5.2 thinking and 88.1 percent GPT-5.1 Thinking, ARC-AGI-2 percent 94.2

Key acquisition

GPT-5.2 Thinking is a new automation model: It replaces the GPT-5.1 logic as the main model for coding, information work and agents, while keeping the same output of 400k and 128k max output, but with the functionality of arc-agi, arc-agi and scientific QA.
Greater accuracy surpasses GPT-5.1 by the same measure: In the main benchmarks, the GPT-5.2 logic goes from 50.8 percent to 55.6 percent to 86.6 percent to 86.6 percent to 86.6 percent to 52.6 percent to 88.6 percent to arc-agi-2, while keeping the parameters of the comparable token.
GPT-5.2 Pro is aimed at high performance scientific consulting: GPT-5.2 Pro is a different high-level method that is highly interchangeable with scientific tasks, for example 93,2 percent of thinking in GPPT-5.2 thinking and 88,1 percent of GPT-5.1 Thinking, and high scores in arc-agi tiers.

AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of the intelligence media platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.

Follow Marktechpost: Add us as a favorite source on Google.

Source link

nimda December 11, 2025

0 5 4 minutes read