Dr. David Brumley, a professor at Carnegie Mellon University and CEO of ForAllSecure, explains what Fuzzing is and how companies can use it to improve application security and speed up their software development life cycle.
The concept of fuzzing or fuzz testing is decades old, but isn’t well known outside of cyber security circles. That needs to change. Luckily, Dr. David Brumley, one of best in the digital security business, was kind enough to give me a fuzzing 101 lesson not too long ago, and I can share it with you.
Dr. Brumley is a professor at Carnegie Mellon University and CEO of ForAllSecure. He also built the fuzzing technology that won the DARPA Cyber Grand Challenge. In this exclusive TechRepublic cyber security lesson, Dr. Brumley explains what fuzzing is and how companies can use it to help improve both their application security processes and software development cycles. The following is a transcript of the video edited for readability.
Bill Detwiler: So, David, thanks for joining me, and let’s jump right to it. What is fuzzing?
Dr. David Brumley: Well, as you said, fuzzing was named about 25 years ago. The story is Professor Bart Miller and his graduate students were looking at the reliability of Unix, Microsoft, and Apple applications and they noticed something kind of funny. When they gave these applications random input, they could cause about a third of them to crash. A pretty pig number. Right? It was really like the proverbial monkeys typing on a keyboard.
Bill Detwiler: Right.
Dr. David Brumley: But instead of creating Shakespeare, they found serious security issues.
Bill Detwiler: That’s worse, right?
Dr. David Brumley: It’s worse. It’s much worse. So let me explain how fuzzing works and I’m going to use an analogy here. So think of a program like a maze, right? And so we know when a programmer is developing code, they have different computations depending upon what the user gives them. So here the program is the maze and then we have, let’s just pretend, a little robot up here and input to the program is going to be directions for our robot through the maze.
So for example, we can give the robot the directions, I’m going to write it up here, down, left, down, right. And he’s going to take two rights, just meaning he’s going to go to the right twice. And then he’s going to go down a bunch of times. So you can think about giving our little robot this input and robot is going to take that as directions and he’s going to take this path through the program. He’s going to go down, left, down first right, second right, then a bunch of downs.
And when you look at this, we had a little bug here. They can verify that this is actually okay. There’s no actual bug here. And this is what’s happening when a developer writes a unit test. So what they’re doing is they’re coming up with an input and they’re making sure that it gets the right output.
Now, a problem is, if you think about this maze, we’ve only checked one path through this maze and there’s other potential lurking bugs out there. So what fuzzing does is it really automates this idea of coming up with an input and running the program and seeing if we find a bug.
So for example, if we think about just switching these directions a little bit, we have down, left, down, but instead of taking two rights, we only take one right, and then go down and some more directions. The robot may take this particular path through the program down, right, and instead of going two, it’s only going to go down one, say it comes over here, and we find that the program crashes.
Now, what Bart originally found of course was providing random input, so it wasn’t a structured like this. Random inputs could actually cause applications to crash, pretty often. Now, we’re on our third generation of fuzzing techniques. It’s no longer monkeys typing on a keyboard. There’s a lot more tech behind it where the idea though is still the same. We’re going to automatically generate input. We’re going to see if the program crashes or not. And here’s the cool thing. It can be completely automated. By making computer do this, as opposed to developer writing the unit test, you can go through thousands of these iterations in a single second.
Let me contrast this with static analysis, because I know a lot of people think about static analysis and fuzzing and wonder what the difference is between them. So when you think about static analysis, what static analysis is doing is it’s looking at the program. It never actually runs it. And it’s saying, well, there may be a problem here, maybe a problem here, maybe it knows already this is okay, maybe there’s a problem it thinks here and so on and so forth, but it’s never actually proved there’s a problem.
Bill Detwiler: So it’s looking for patterns in the code?
Dr. David Brumley: It’s looking just for patterns. And so if you actually look at this maze, right, you can say, well, static analysis flagged this, but there’s no way a little robot can get over there. It’s blocked. And when you think about static analysis, it can potentially find more bugs, but you have to staff someone manually reviewing it. What fuzzing is doing is incrementally exploring the program to come up with these, to find lots and lots of problems. For example, Google has a project where they’re checking Google Chrome and many of the open source libraries Google uses and they found 25,000 bugs completely automatically with zero false positives over the last three years.
I also want to throw security aside and say, how can this benefit the developer? Because security is not always a cost. It can actually benefit. We all know that the better we test a program, the more reliable it’s going to be in the field. And we also know developers don’t particularly like writing test cases. And so by using fuzzing to come up with different inputs that execute all these paths, they’re really just test cases and you can do that to do regression tests over time. So one of the benefits beyond security of fuzzing is you can use it to speed up your software development life cycle to produce more trustworthy and better quality code.
Bill Detwiler: So how can companies get started using fuzzing as a technique and what are some of the actual fuzzers that are out there? Let’s talk about that.
Dr. David Brumley: Yeah. So I started off by saying this was invented or coined 25 years ago by Professor Bart Miller and we’re really on our third generation. So the original set of fuzzers were what we call black box fuzzers and they would generate input, maybe at random or with some algorithm, and they just run the program and see if it crashed or not.
Bill Detwiler: Just over and over and over. Okay.
Dr. David Brumley: Just over and over and over again. Now, the problem with that is if you’re just generating a random input, it may not take the robot anywhere. For example, you don’t want to generate input that has the robot going down and back up and back down and so on and so forth. So that was the first generation. These techniques actually still work today, randomly generating, but not as well.
The second generation are what we call protocol or grammar based buzzers. And what they do is you have someone manually generate a template for how to create those inputs. So in our example, here, someone may write a template that says always go down and then go either down or right, go either left or right next, go after that maybe down again or up again and so on and so forth.
And if you think about what this is doing, it’s constraining the set of things you’re going to explore. So for example, if you write this protocol or grammar out, it may end up inadvertently only checking part of the program because you haven’t actually said it’s possible to go over this far. So that’s a second generation. Great products out there today.
The third generation is what we call instrumentation guided fuzzing. And what instrumentation guided fuzzing does is it generates an input and it watches as the robots executing the path and it learns from that to come up with the next input. And so sometimes this is branded as AI fuzzing. I don’t think of it as AI, but it is learning. The more it executes, it’s learning about which paths it’s already looked at and what are the new places out there.
Bill Detwiler: So it’s a little bit of the best of both worlds, right? You have a constrained process, but you’re not missing half of the potential vulnerabilities.
Dr. David Brumley: I think so. And I think if you go look at modern development shops, the people like Google and Microsoft who would put tons of money into this, they’ve settled on instrumentation guided fuzzing for a reason.