Veracode Research Reveals OpenAI’s GPT-5 Models Lead the Way in Secure Code

Gen AI Code Security Report Reveals Concerning Variability in LLM Safeguards

BURLINGTON, Mass. – November 18, 2025 – Veracode, the global leader in application risk management, today released new data from its GenAI Code Security Report, revealing a breakthrough in secure AI-generated code—but only from one provider. The October 2025 analysis found OpenAI’s latest GPT-5 reasoning models lead the market on security, while nearly all competitors remain flat or have fallen behind.

Veracode evaluated the most recent large language models (LLMs) using a standardized 80-task benchmark, consistent with the methodology from its July 2025 report. Results reveal a clear shift: OpenAI’s GPT-5 Mini achieved a 72 percent pass rate on security tests—the highest recorded to date. The standard GPT-5 followed closely at 70 percent. These numbers reflect a marked improvement over previous generations, which historically scored between 50 and 60 percent.

Meanwhile, other AI providers, including Anthropic, Google, Qwen, and xAI, failed to demonstrate comparable progress. The latest releases from Anthropic’s Claude Sonnet 4.5 (50 percent) and Claude Opus 4.1 (49 percent), Google Gemini 2.5 Pro (59 percent) and Flash (51 percent), Qwen3 Coder (50 percent), and xAI Grok 4 (55 percent) remained within the 50–59 percent range observed in July, while some even declined. The data confirms that simply increasing model size or updating training sets is insufficient to achieve substantive security gains.

Reasoning Alignment Drives Security Performance

Veracode’s analysis indicates OpenAI’s advances are directly tied to “reasoning alignment”—a process by which models internally evaluate and filter their outputs in multiple steps before producing code. GPT-5 Mini and GPT-5, both employing dedicated reasoning, sharply outperformed OpenAI’s non-reasoning GPT-5-chat model, which delivered only a 52 percent pass rate. This gap underscores the importance of structured reasoning for identifying and avoiding insecure code patterns.

Enterprise Language Gains Reflect Increased Focus on Business-Critical Security

Drilling into language-specific results, Veracode observed targeted improvements in C# and Java security outcomes—the languages most relied on for enterprise-critical systems—suggesting AI labs are increasingly focused on enhancing security in high-impact use cases. Despite this, improvements were uneven. Python and JavaScript performance remained comparatively flat, mirroring their July benchmark scores.

Fig. 1: Security Pass Rate vs LLM Release Date Stratified by Language

Across all languages and models, the industry continues to struggle with key vulnerability classes:

SQL Injection (CWE-89): Modest improvement, as the latest models increasingly recommend secure patterns, such as parameterized queries.
Cross-Site Scripting [XSS] (CWE-80): Progress remains stagnant, with success rates below 14 percent. Models still miss key output sanitization requirements.
Log Injection (CWE-117): Similarly poor performance, with pass rates near 12 percent.
Cryptographic Algorithms (CWE-327): Results remain strong industry-wide, with over 85 percent of tasks passing.

Fig. 2: Security Pass Rate vs LLM Release Date Stratified by CWE ID

Persistent low scores in XSS and log injection highlight a technical limitation: LLMs lack the contextual analysis capabilities necessary to reliably flag untrusted data flows.

Implications for AI-Assisted Development

“These results are a clear indication that the industry needs a more consistent approach to AI code safety. While OpenAI’s reasoning-enabled models have meaningfully advanced secure code generation, security performance remains highly variable and far from sufficient industry-wide,” said Jens Wessling, Chief Technology Officer at Veracode. “Relying solely on model improvements is not a viable security strategy.”

Veracode recommends development teams take a layered approach to application risk management:

Choose reasoning-enabled AI models when available for code generation, as these offer clear security advantages over traditional models.
Maintain continuous scanning and validation using Static Analysis and Software Composition Analysis, regardless of code origin.
Automate remediation with tools such as Veracode Fix to address discovered vulnerabilities promptly.
Enforce secure coding standards throughout AI-assisted and traditional workflows.
Block malicious dependencies proactively using capabilities like Package Firewalls.

About the GenAI Code Security Report

This update reflects rigorous testing conducted in October 2025. For details on methodology, security trends, and full data, download the GenAI Code Security Report.

About Veracode

Veracode is a global leader in Application Risk Management for the AI era. Powered by trillions of lines of code scans and a proprietary AI-assisted remediation engine, the Veracode platform is trusted by organizations worldwide to build and maintain secure software from code creation to cloud deployment. Thousands of the world’s leading development and security teams use Veracode every second of every day to get accurate, actionable visibility of exploitable risk, achieve real-time vulnerability remediation, and reduce their security debt at scale. Veracode is a multi-award-winning company offering capabilities to secure the entire software development life cycle, including Veracode Fix, Static Analysis, Dynamic Analysis, Software Composition Analysis, Container Security, Application Security Posture Management, Malicious Package Detection, and Penetration Testing.

Learn more at www.veracode.com, on the Veracode blog, and on LinkedIn and X.

Copyright © 2025 Veracode, Inc. All rights reserved. Veracode is a registered trademark of Veracode, Inc. in the United States and may be registered in certain other jurisdictions. All other product names, brands or logos belong to their respective holders. All other trademarks cited herein are property of their respective owners.

Press and Media Contacts

Veracode:
Katy Gwilliam
Head of Global Communications, Veracode
kgwilliam@veracode.com

Related Links
veracode.com

Veracode Research Reveals OpenAI’s GPT-5 Models Lead the Way in Secure Code, While Wider Industry Progress Stalls

About Veracode

Press and Media Contacts