Jul 30, 2025

We Asked 100+ AI Models to Write Code. Here’s How Many Failed Security Tests.

By Jens Wessling

If you think AI-generated code is saving time and boosting productivity, you’re right. But here’s the problem: it’s also introducing security vulnerabilities… a lot of them. In our new 2025 GenAI Code Security Report, we tested over 100 large language models across Java, Python, C#, and JavaScript. The goal? To see if today’s most advanced AI systems can write secure code.

Unfortunately, the state of AI-generated code security in 2025 is worse than you think. What we found should be a wake-up call for developers, security leaders, and anyone relying on AI to move faster.

Download the full 2025 GenAI Code Security report here.

The Results: AI-generated Code That Works, But Isn’t Safe

Here are the topline stats from our evaluation:

45% of code samples failed security tests and introduced OWASP Top 10 security vulnerabilities into the code.

Figure 1: Security and Syntax Pass Rates vs LLM Release Date

Java was the riskiest language, with a 72% security failure rate across tasks.

Figure 2: Security Pass Rate vs LLM Release Date, Stratified by Language

Other major languages didn’t fare much better:

Python: 38%
JavaScript: 43%
C#: 45%

These weren’t obscure, edge-case vulnerabilities, either. In fact, one of the most frequent issues was: Cross-Site Scripting (CWE-80): AI tools failed to defend against it in 86% of relevant code samples.

Aren’t Newer AI Models Generating More Secure Code?

It’s a great question. Unfortunately, they don’t.

We evaluated LLMs of varying sizes, release dates, and training sources over a matter of years. While the models got better at writing functional or syntactically correct code, they were no better at writing secure code. Security performance remained flat, regardless of model size or training sophistication.

This challenges the idea that “smarter” AI models naturally lead to more secure outcomes. In practice, they don’t.

Even If You Don’t Use GenAI, You’re Still at Risk

Here’s where things get even more concerning.

AI code isn’t just being written by your team; it’s being written by:

Open-source software maintainers
Third-party vendors
Low-code/no-code platforms
Outsourced contractors using GenAI behind the scenes

That means AI-generated code is likely already in your stack, whether you know it or not. And if you’re not validating it for security, your organization could be exposed to:

Costly data breaches
Reputational harm
Financial loss and legal risk

What You Can Do About GenAI Code Security Risks

AI is revolutionizing how we write software, but it’s also introducing new risks at scale. You wouldn’t deploy a new app without scanning it for vulnerabilities. Why treat AI-generated code any differently?

The takeaway is simple: Speed without security is a risk you can’t afford.

We created the 2025 GenAI Code Security Report to equip the software community with facts, not fear. Inside the report, you’ll find:

Data that arms teams with benchmarks for code security and ammo for getting the resources they need for securing applications
Testing methodology details
Recommendations for developers, security teams, and executives

It’s the one of the most comprehensive looks at GenAI code security available today.

Download the full report now.