A few of us were hanging out in the Veracode kitchen the other day and got to discussing the idea of programmatically injecting vulnerabilities into software. This is essentially the opposite of the problem that most security vendors, including ourselves, are trying to solve -- that is, detecting vulnerabilities. Clearly there's not much business value in making software less safe, though you could imagine such a tool being used for educational purposes or a way to mass-produce QA test cases.
It sounds easy, right? Certainly it would be easy to inject the types of classic security problems that are trivially detectable. For example:
- Replace bounded string manipulation calls with unbounded ones, e.g.
printfstyle calls that use
%sformat string specifiers, e.g.
scanf()calls with everybody's favorite function,
gets()(Hi AIX developers!)
- Create type mismatches and potential integer coercion issues by replacing unsigned variables with their signed counterparts
Or, on the web application side:
- Create SQL injection by replacing prepared statements with concatenated queries -- not trivial but there are a limited number of database APIs, and SQL statements follow a defined syntax, so it wouldn't be that hard
- Inject XSS by removing all calls to known output encoding routines
- Disable input validation by removing all calls to mechanisms such as regex replacement
Of course, when you start messing with input validation, you run the risk of altering the intended operation of the program. Maybe that regex replacement you removed was security-related, but on the other hand, maybe it's performing a transformation that's relevant to the application logic. If you didn't care about the program actually being able to function, you could:
- Arbitrary shorten character arrays
- Use the incorrect functions to manipulate standard and/or wide strings
- Add or remove calls to
- Swap calls to
- Remove checks for null -- string contents,
malloc()return values, whatever you feel like
- Create off-by-one errors in loop counters and array indexes
I could go on forever but you probably get the point. The trick would be finding the ones that didn't make the program segfault after 30 seconds of operation. Then again, is it really important that the vulnerable version of the program behaves identically to the original under normal operating circumstances? That constraint makes it more challenging; otherwise, it's kind of a boring exercise. You could argue that it's important if you plan to use the modified version to test a fuzzer (or other dynamic analysis tool) but not for static analysis.
Either way, you'd eventually hit the same boundaries as a vulnerability detection tool. There would still be entire classes of vulnerabilities that could not be addressed effectively. How do you create authorization bypass issues without an understanding of what's being protected and from whom? How do you inject CSRF without knowing which functions are meaningful and which tokens/identifiers are safe to remove? And you can forget about business logic flaws entirely. Basically, it's hard to break something so specific without a decent understanding of how that something was designed to function in the first place.
Now that my brain is sufficiently uncluttered, I can get back to doing real work.