/aug 1, 2016

Vulnerable Method detection now available for Python projects

By Darius Foo

SourceClear now supports Vulnerable Method detection for both Java and Python projects. In addition to notifying you of the vulnerable libraries you're using, we will now let you know exactly where you are using the vulnerable code. Of course, if it turns out you're not actually vulnerable, we'll let you know that too. More signal, less noise.

How does it work?

To support Vulnerable Methods in Java we analyze bytecode, which gives us a few nice things for free:

  • Types: We know if a symbol is a primitive or an object, what class it is, etc.
  • Static name resolution: Given a method call and the types in context, we know exactly the set of methods it may resolve to.

All this information helps us build an accurate call graph for Java. With Python, the situation is a bit more complicated.

Python packages are distributed on PyPI in source code form, the vast majority of which are untyped. This means we need to get creative to compute or infer types in order to resolve their names. Typing in a language like Python is also complicated by the presence of many dynamic features.

Classes

Classes being first-class, mutable values in the language make it impossible to assign Python objects a canonical class, as we can in Java. Classes can be modified at runtime, created on the fly, and associated with different objects as needed.

def meow(self):
    return 'meow'

def catify(c):
    c.bark = meow
    return c

@catify
class Dog(object):
    def bark(self):
        return 'woof'

Dog().bark() # meow

Objects can have their class change under them, and their methods retroactively modified. Local modifications to the methods of a single object are also possible.

Metaclasses

Metaclasses compound this. A class definition can have radically different semantics depending on context, so even it cannot be taken as canonical.

class Metaclass(type):
    def __init__(cls, name, bases, dct):
        # custom initialisation code
        print "metaclass __init__ %s" % name

class Animal(object):
    __metaclass__ = Metaclass

class Dog(Animal):
    def __init__(self):
        print 'woof'

Dog()
# metaclass __init__ Animal
# metaclass __init__ Dog
# woof

Magic methods

Magic methods obviate any simplifying assumptions which could help type inference (e.g. + always having arguments with numeric type, things which can be applied always being functions).

class Dog:
  def __call__(self):
    print 'woof'

d = Dog()
d() # woof

Inferring Types

Python, like JavaScript, has "flow-sensitive typing". Facebook has written a good bit recently about their work investigating JavaScript programs that is worth a read.

We explored a similar approach for Python and drew inspiration from open source projects, particularly Pysonar, which uses a form of abstract interpretation, traversing programs and collecting type information along the way.

The next step was to generate call graphs.

A call graph of a program captures the calling relationships between methods. Each node represents a method and each edge (m1, m2) indicates that method m1 calls method m2. It is directed and possibly cyclic.

This was relatively straightforward, assuming names were properly resolved by the type-checking phase.

With the call graphs we were able to use our existing machinery to transitively expand the set of vulnerable methods in a given library, finally ending up at call chains.

Making the call graph processing phase agnostic to language and source code allowed us to improve support for Python's dynamic features without any downstream changes.

Check it out

Once you've installed (or upgraded) the latest agent, you can scan this example repository to see Vulnerable Method Detection for Python in action:

srcclr scan --url https://github.com/srcclr/example-python

Our Vulnerable Methods feature for Python and Java is available today to all SourceClear Pro customers.

Related Posts

By Darius Foo

Darius is a software engineer on the SCA team at Veracode, helping developers make use of open source software safer.