What was the first thing you learned about network security? There's a good chance it had something to do with port scanning. After scanning a few boxes, you realized that modern operating systems have a lot of open ports by default, meaning a lot of services. Some had an obvious purpose, like telnet on tcp/23 or ftp fon tcp/21. Others left you wondering, what the heck is listening on tcp/515 or tcp/7100? And remember, you couldn't ask Google because it didn't exist (well, maybe it did depending on when you got into security).
Your first real lesson about locking down a host was how to reduce its attack surface. You learned how to disable services using /etc/inetd.conf. Then you learned about rc.d and how to prevent unnecessary services from being launched at startup. Next, maybe you configured the Xserver to disallow remote connections or moved on to removing setuid permissions from files. As you worked, you'd periodically re-scan the box to gauge progress, asking yourself "have I removed everything I don't need?" The underlying motivation, of course, is that an attacker can't hack something that isn't there.
You learned how to extend those concepts to the network -- configuring firewall rules, router ACLs, VLANs, etc. Segmenting the network. Creating a DMZ. No need to dwell on this, you get the idea.
Eventually, people realized that applications had an attack surface too. Web servers and application servers got a lot of attention, followed closely by custom web applications. "What do you mean you can execute SQL queries against my database? That's impossible, I have a firewall!"
Some companies, the ones who could afford it anyway, started to build security into their development cycle. Doing threat modeling during the design phase made sense, because hey, it's much cheaper to fix security holes in a whiteboard drawing than it is to rewrite your authorization module from scratch after it's in production.
Let's talk strictly about custom web applications now. What I've observed is that most development groups, even the ones who actively engage in threat modeling, do not understand their web application's attack surface. The lead architect can whiteboard a high-level diagram of all the major components and how they interact. Individual developers can go a bit deeper, telling you which files they touch, what database permissions they need, or how various pieces of data are encrypted in storage. At the end of this exercise you have a complete picture of the processes, data flows, protocols, privilege boundaries, external entities, and so on, and you're well on your way to understanding all of the potential attack vectors.
Or are you?
What often gets overlooked or glossed over is the impact of external libraries or packages. Nobody writes everything from scratch. A typical list of third-party libraries for a Java-based Web 2.0 application might include DWR, GWT, Axis, and Dojo, plus about 30 other libraries to do everything from logging to parsing to image manipulation. Nine out of ten times, the libraries will be installed in full, using the default configuration from page one of the README file.
Why is this relevant? Because just as those old Unix boxes exposed unnecessary services, libraries expose unnecessary code. Let's say you installed Dojo to simplify the process of creating an HTML table with rows and columns that can be sorted on demand. Did you remember to remove all the .js files you didn't need? Or maybe you installed Axis or DWR or anything else that has its own Servlet(s) for processing requests. Have you compared what that Servlet can do against what you need it to do?
A fictitious example may help illustrate further. Imagine you just downloaded a new library called WhizBang. You follow the installation instructions to define and map two servlets in your web.xml file, WhizServlet and BangServlet, and you configure it to integrate with your web app. After a bit of trial and error, it's functional. Yay! This is where most developers stop.
Nobody asks, "how much of this do I actually need?" Case in point, what if your application only uses WhizServlet? BangServlet is still exposed, and you don't even use it! Similarly, what if WhizServlet takes an "action" parameter which can be either "view", "edit", or "delete", and your application only uses "view"? You're still exposing the other actions to anybody who knows the URL syntax (pretty trivial if it's open source). You wouldn't expose large chunks of your own code that you weren't using, so why should it be any different with libraries?
This post is getting kind of long so I'm going to split it up. In the next post, I'll continue the discussion of attack surface minimization, as well as some of the tradeoffs that go along with this approach.