Smart greybox fuzzing

VT Pham, M B�hme, AE Santosa…�- IEEE Transactions�…, 2019 - ieeexplore.ieee.org
VT Pham, M B�hme, AE Santosa, AR Căciulescu, A Roychoudhury
IEEE Transactions on Software Engineering, 2019ieeexplore.ieee.org
Coverage-based greybox fuzzing (CGF) is one of the most successful approaches for
automated vulnerability detection. Given a seed file (as a sequence of bits), a CGF randomly
flips, deletes or copies some bits to generate new files. CGF iteratively constructs (and
fuzzes) a seed corpus by retaining those generated files which enhance coverage.
However, random bitflips are unlikely to produce valid files (or valid chunks in files), for
applications processing complex file formats. In this work, we introduce smart greybox�…
Coverage-based greybox fuzzing (CGF) is one of the most successful approaches for automated vulnerability detection. Given a seed file (as a sequence of bits), a CGF randomly flips, deletes or copies some bits to generate new files. CGF iteratively constructs (and fuzzes) a seed corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely to produce valid files (or valid chunks in files), for applications processing complex file formats. In this work, we introduce smart greybox fuzzing (SGF) which leverages a high-level structural representation of the seed file to generate new files. We define innovative mutation operators that work on the virtual file structure rather than on the bit level which allows SGF to explore completely new input domains while maintaining file validity. We introduce a novel validity-based power schedule that enables SGF to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. Our evaluation demonstrates the effectiveness of SGF. On several libraries that parse complex chunk-based files, our tool AFLsmart achieves substantially more branch coverage (up to 87 percent improvement) and exposes more vulnerabilities than baseline AFL. Our tool AFLsmart discovered 42 zero-day vulnerabilities in widely-used, well-tested tools and libraries; 22 CVEs were assigned.
ieeexplore.ieee.org