Analysis of binary files without access to the source code is becoming more prevalent in the last five years or so. Of course Java decompilers have been around almost as long as Java itself, but that’s not machine code. I’m talking about analysis of native machine code (x86 or PowerPC instructions), and not from object code (.o or .obj files), which have relocation and symbol information in them. In other words, the actual programs that run on real computers.
The University of Wisconsin has had their Codesurfer/x86 project since about 2003. It uses a combination of disassembly and custom static analyses to automatically analyze x86 binaries for security vulnerabilities, with a research slant. Of course, Veracode is using static binary analysis for commercial security analysis services. Researchers at the University of Arizona have been investigating alias issues and register liveness of executable code. There has been work on DSP (Digital Signal Processor) binaries in Europe and elsewhere. There are even PhD theses on binary analysis (including my own, currently under examination).
The author of IDA Pro is beta testing a decompiler-like visualization plug-in called Hex-Rays. Phrack Magazine, issue 64 has an article entitled “Automated vulnerability auditing in machine code“. We’ve come a long way since any analysis of binary programs was compared to making pigs from sausages.
It seems to me that the benefits of binary analysis are moving from underground to mainstream. Binary analysis is a superset of source code analysis. Often an organization uses third party applications or libraries in software development and cannot legally or logically access source data. Additionally compiler, optimizer and OS bugs, security vulnerabilities or other malicious behavior can be reflected in an application’s security state. Binary analysis reflects the data flow of the entire compiled application as the OS/Platform may execute the intended and un-intended functionality inherent within the code.
So, you can always compile source code to put it into binary form but you cannot do the reverse for binary code. Binary analysis thus analyzes the part of your software that you have source code for and the binary part that you do not.
The increasing availability of binary analysis tools will surely lead to more effective discovery of vulnerabilities, by all parties including those generating malware. It makes sense that software developers should also take advantage of binary analysis for security checking.