Introducing GPT‑5.2

URL to Anyon 5 months ago

OpenAI has introduced GPT‑5.2, its most capable model series for professional knowledge work to date. It delivers state‑of‑the‑art performance across multiple benchmarks, stronger long‑context reasoning, more reliable tool use, improved vision understanding, and better everyday usability in ChatGPT. The roll‑out begins today for paid ChatGPT plans, with API availability now.

Model performance and benchmarks
Capabilities, workflows, and availability
FAQ
Conclusion

Model performance and benchmarks

GPT‑5.2 Thinking sets a new bar for real‑world, professional use. Across well‑specified knowledge work tasks, coding, science, and abstract reasoning, it consistently advances the state of the art while reducing errors.

Knowledge work (GDPval)

On GDPval (44 occupations), GPT‑5.2 Thinking beats or ties top industry professionals in 70.9% of comparisons, judged by experts.
It produced GDPval outputs >11× faster and at <1% of expert cost (based on historical estimates; ChatGPT speed may vary). With human oversight, this suggests substantial economic value for day‑to‑day professional work.
Internal finance tasks: average score on junior investment banking spreadsheet modeling rose 9.3 percentage points vs GPT‑5.1 (from 59.1% to 68.4%).

Coding (SWE‑Bench and engineering)

SWE‑Bench Pro: 55.6% (new state of the art), improving over GPT‑5.1 Thinking at 50.8%.
SWE‑bench Verified: 80.0% (new high).
Early testers report noticeably stronger front‑end capabilities, including complex or unconventional UI and 3D elements—better daily support for engineers across the stack.

Factuality and reliability

On de‑identified ChatGPT queries, GPT‑5.2 Thinking’s responses with errors were ~30% less common than GPT‑5.1 Thinking.
Like any model, GPT‑5.2 can still be wrong; double‑check critical work.

Capabilities, workflows, and availability

GPT‑5.2 improves long‑context reasoning, vision, agentic tool use, and overall workflow reliability. It also ships with practical updates for ChatGPT and the API.

Long‑context reasoning

On OpenAI MRCRv2 (integrating information across long documents), GPT‑5.2 Thinking delivers leading accuracy, including near‑100% on the 4‑needle variant out to 256k tokens.
Practical impact: coherent analysis across long reports, contracts, research papers, transcripts, and multi‑file projects.
Extended workflows: GPT‑5.2 Thinking works with the Responses /compact endpoint to stretch the effective context window for tool‑heavy, long‑running tasks.

Vision understanding

Error rates are roughly halved on chart reasoning and software interface understanding.
Better spatial layout comprehension: in component identification (e.g., a motherboard), GPT‑5.2 places more accurate bounding boxes and understands relative positions better than GPT‑5.1.

Agentic tool use and complex workflows

Tau2‑bench Telecom: 98.7% (new state of the art), reflecting reliable tool use across long, multi‑turn tasks.
Latency‑sensitive scenarios: stronger performance at reasoning.effort='none', outpacing GPT‑5.1 and GPT‑4.1.
Real‑world effect: more complete end‑to‑end workflows (e.g., a traveler’s multi‑step support case—rebooking, special‑assistance seating, compensation—coordinated across agents and tools).

Science, math, and general reasoning

GPQA Diamond (graduate‑level, Google‑proof Q&A): GPT‑5.2 Pro 93.2%, GPT‑5.2 Thinking 92.4%.
FrontierMath (Tier 1–3): GPT‑5.2 Thinking solves 40.3% of expert‑level problems (new state of the art).
ARC‑AGI‑1 (Verified): GPT‑5.2 Pro crosses 90% (first model to do so), improving from last year’s o3‑preview while reducing cost ~390× to reach that performance.
ARC‑AGI‑2 (Verified): GPT‑5.2 Thinking 52.9%; GPT‑5.2 Pro 54.2%—stronger fluid reasoning for novel, abstract problems.
Case‑in‑point: in recent work with GPT‑5.2 Pro, researchers proposed a proof in a well‑specified statistical learning setting, later verified by authors and reviewed externally—illustrating how frontier models can assist math research under close human supervision.

GPT‑5.2 in ChatGPT

GPT‑5.2 Instant: faster everyday workhorse for info‑seeking, how‑tos, technical writing, translation; clearer explanations and structure.
GPT‑5.2 Thinking: for deeper work—coding, long‑document summarization, Q&A on uploads, step‑by‑step math/logic, structured planning and decisions.
GPT‑5.2 Pro: highest‑quality option for difficult questions; fewer major errors; stronger performance in complex domains like programming.

Safety updates

Builds on safe completion research, improving helpfulness within robust safety boundaries.
Strengthened responses for sensitive conversations (signs of suicide or self‑harm, mental health distress, emotional reliance). Targeted interventions reduced undesirable responses vs GPT‑5.1.
Early roll‑out of age prediction to automatically apply content protections for users under 18; complements existing parental controls.

Availability and pricing

ChatGPT: rolling out GPT‑5.2 (Instant, Thinking, Pro) to paid plans (Plus, Pro, Go, Business, Enterprise). GPT‑5.1 remains under legacy models for three months, then sunsets.
API: GPT‑5.2 Thinking in Responses/Chat Completions as gpt-5.2; GPT‑5.2 Instant as gpt-5.2-chat-latest; GPT‑5.2 Pro as gpt-5.2-pro.
Reasoning effort: GPT‑5.2 Pro and Thinking support the new fifth effort level, xhigh; Pro exposes a configurable reasoning parameter.
Pricing: GPT‑5.2 at $1.75/1M input tokens and $14/1M output tokens, with a 90% discount on cached inputs. Despite higher per‑token cost vs GPT‑5.1, GPT‑5.2 often delivers lower total cost for a given quality due to greater token efficiency. GPT‑5.2 Pro output is priced $21–$168/1M output tokens.

FAQ

Is GPT‑5.2 available in ChatGPT today?

Yes, it begins rolling out to paid plans starting today. Deployment is gradual; if you don’t see it, try later.

What API model names should developers use?

gpt-5.2 (Thinking) in Responses and Chat Completions.
gpt-5.2-chat-latest (Instant) in Responses.
gpt-5.2-pro (Pro) in Responses, with configurable reasoning effort including xhigh.

How does GPT‑5.2 perform in coding tasks?

It sets a new SOTA on SWE‑Bench Pro (55.6%) and reaches 80% on SWE‑bench Verified, with better reliability for debugging, feature implementation, refactoring, and shipping fixes end‑to‑end.

Can GPT‑5.2 reliably handle long documents?

Yes. GPT‑5.2 Thinking achieves leading MRCRv2 results and near‑perfect accuracy on the 4‑needle variant out to 256k tokens, enabling coherent analysis across very long materials.

Does GPT‑5.2 reduce hallucinations?

On internal tests with de‑identified ChatGPT queries, erroneous responses dropped ~30% vs GPT‑5.1 Thinking, improving day‑to‑day dependability.

Is GPT‑5.2 safe to use for sensitive topics?

It includes targeted safety interventions and safe completion techniques, plus age prediction roll‑out to limit sensitive content for users under 18. Still, human oversight is essential for critical decisions.

Conclusion

GPT‑5.2 marks a meaningful step forward in practical intelligence: higher benchmark scores, better long‑context and visual understanding, more reliable tool use, and clearer everyday interactions in ChatGPT. Combined with safety improvements and new API options, it is well‑suited to end‑to‑end professional workflows—from spreadsheets and slides to production code and complex multi‑turn support.

If you need to quickly convert webpages to PDF or Markdown, you can try URL to Any.