A few of us were hanging out in the CA Veracode kitchen the other day and got to discussing the idea of programmatically injecting vulnerabilities into software. This is essentially the opposite of the problem that most security vendors, including ourselves, are trying to solve -- that is, detecting vulnerabilities. Clearly there's not much business value in making software less safe, though you could imagine such a tool being used for educational purposes or a way to mass-produce QA test cases.
It sounds easy, right? Certainly it would be easy to inject the types of classic security problems that are trivially detectable. For example:
printfstyle calls that use
%sformat string specifiers, e.g.
scanf()calls with everybody's favorite function,
gets()(Hi AIX developers!)
Or, on the web application side:
Of course, when you start messing with input validation, you run the risk of altering the intended operation of the program. Maybe that regex replacement you removed was security-related, but on the other hand, maybe it's performing a transformation that's relevant to the application logic. If you didn't care about the program actually being able to function, you could:
malloc()return values, whatever you feel like
I could go on forever but you probably get the point. The trick would be finding the ones that didn't make the program segfault after 30 seconds of operation. Then again, is it really important that the vulnerable version of the program behaves identically to the original under normal operating circumstances? That constraint makes it more challenging; otherwise, it's kind of a boring exercise. You could argue that it's important if you plan to use the modified version to test a fuzzer (or other dynamic analysis tool) but not for static analysis.
Either way, you'd eventually hit the same boundaries as a vulnerability detection tool. There would still be entire classes of vulnerabilities that could not be addressed effectively. How do you create authorization bypass issues without an understanding of what's being protected and from whom? How do you inject CSRF without knowing which functions are meaningful and which tokens/identifiers are safe to remove? And you can forget about business logic flaws entirely. Basically, it's hard to break something so specific without a decent understanding of how that something was designed to function in the first place.
Now that my brain is sufficiently uncluttered, I can get back to doing real work.