Gartner analyst Neil MacDonald has written that Byte Code Analysis is not the Same as Binary Analysis. He describes the difference between statically analyzing binary code, which runs on an x86, ARM, or SPARC CPU, and statically analyzing bytecode, which runs on a virtual machine such as the Java VM or the .NET CLR. As more companies with software security testing technology wade into the "no source available" pool (come on in guys, the water is nice), it is important to understand what capabilities you need for software assurance when you don't have access to source.

If the software you are concerned about is written in a language such as C or C++, and then compiled to form an executable binary, as the majority of commercial software is, you will need true binary analysis. The analysis technology provided by Ounce Labs and Fortify Software isn't capable of understanding this native compiled code. The other situation where you will need binary analysis is when you have access to some of the source but other parts of your software are in binary form. This is common because most C/C++ programs, written by enterprises and software vendors alike, are partially built with compiled libraries that are distributed in binary form. If you are only looking at the source-available subset of the software you are not covering 100% of the code. You will also need binary analysis, and not just bytecode analysis, if your Java code uses JNI or your .NET assemblies call into non-managed code.

Even within the set of bytecode analysis techniques available today there are significant differences in technology. At Veracode, we generate our software analysis model directly from the bytecode with no lossy intermediate step back to source code. Source code static analysis tool companies have taken an indirect route to analysis. The tools first use a bytecode decompiler to create source code from the bytecode. Then the tools build an analysis model from the source code. This means that any code generation decisions made by the compiler, which are in the executing software, will be missing from this model. I would say this isn't really even bytecode analysis at all. It is decompiled bytecode source analysis.

Bytecode analysis and binary analysis are important technologies for assuring the integrity of the software supply chain. These techniques are a powerful addition to first generation static analysis where source was required. Make sure you are getting the capabilities of true binary analysis and direct bytecode analysis to protect your organization from application security risk.

Veracode Security Solutions
Security Alternatives
Security Threat Guides

About Chris Wysopal

Chris Wysopal, co-founder and CTO of Veracode, is recognized as an expert and a well-known speaker in the information security field. He has given keynotes at computer security events and has testified on Capitol Hill on the subjects of government computer security and how vulnerabilities are discovered in software. His opinions on Internet security are highly sought after and most major print and media outlets have featured stories on Mr. Wysopal and his work. At Veracode, Mr. Wysopal is responsible for the security analysis capabilities of Veracode technology.

Comments (3)

Terence | July 28, 2009 9:27 am

Thanks for the article. Can you explain this in more detail "This means that any code generation decisions made by the compiler, which are in the executing software, will be missing from this model.". What actually would be lost and how it would affect analysis?

cwysopal | August 3, 2009 2:54 pm

Hi Terence,

There is a good paper by Thomas Reps from the University of Wisconsin that covers some of these differences. It is WYSINWYX: What You See IS Not What You eXecute. Ref:

Here is an excerpt:

Less widely recognized is that even when the original source code is available,
source-code analysis has certain drawbacks [Howard 2002; WHDC 2007]. The reason is that computers do not execute source code; they execute machine-code programs that are generated from source code. The transformation from source code to machine code can introduce subtle but important differences between what a programmer intended and what is actually executed by the processor. For instance, the following compiler-induced vulnerability was discovered during the Windows security push in 2002 [Howard 2002]: the Microsoft C++ .NET compiler reasoned that because the program fragment shown below on the left never uses the values written by memset (intended to scrub the buffer pointed to by password), the memset call could be removed—thereby leaving sensitive information exposed in the freelist at runtime.

memset(password, ’’, len);



Such a vulnerability is invisible in the original source code; it can only be detected
by examining the low-level code emitted by the optimizing compiler. We call this
the WYSINWYX phenomenon (pronounced “wiz-in-wicks”): What You See [in
source code] Is NotWhat You eXecute [Reps et al. 2005; Balakrishnan et al. 2007;
Balakrishnan 2007].

WYSINWYX is not restricted to the presence or absence of procedure calls; on
the contrary, it is pervasive.


kme | August 14, 2009 7:01 am

For another example, there was a recent Linux kernel vulnerability where the code looked like:

foo = sk->bar;

if (!sk)

The compiler reasoned that the if() statement could be optimised out, because if sk was NULL then the earlier derefence would have faulted. This created an exploitable NULL pointer dereference, because userspace can map pages at address zero and then call into the kernel.

Please Post Your Comments & Reviews

Your email address will not be published. Required fields are marked *

Love to learn about Application Security?

Get all the latest news, tips and articles delivered right to your inbox.