FormatFuzzer
 is a framework for high-efficiency, top quality era and parsing of binary inputs. It takes a binary template that describes the layout of a binary enter and generates an executable that produces and parses the given binary layout. From a binary template for GIF, for example, FormatFuzzer
 produces a GIF generator – often referred to as GIF fuzzer.
Turbines produced via FormatFuzzer
 are extremely environment friendly, generating hundreds of legitimate check inputs in line with 2d – in sharp distinction to mutation-based fuzzers, the place the massive majority of inputs is invalid. Inputs generated via FormatFuzzer
 are unbiased from this system below check (or in truth, any program), so you’ll be able to additionally use them in black-box settings. Then again, FormatFuzzer
 additionally integrates with AFL++ to supply legitimate inputs that still intention for optimum protection. In our experiments, this “perfect of 2 worlds” method surpasses all different settings; see our paper for main points.
The binary templates utilized by FormatFuzzer come from the 010 editor. There are greater than 170 binary templates, which both can be utilized without delay for FormatFuzzer
 or tailored for its use. Out of the field, FormatFuzzer
 produces codecs akin to AVI, BMP, GIF, JPG, MIDI, MP3, MP4, PCAP, PNG, WAV, and ZIP; and we stay on extending this checklist each week.
Individuals are welcome! Seek advice from the FormatFuzzer venture web page for submitting concepts and problems, or including pull requests. For main points on how FormatFuzzer
 works and the way it compares, learn our paper for more information.
FormatFuzzer is to be had from the FormatFuzzer venture web page. You’ll obtain and unpack the most recent unencumber from the releases web page.
For the very newest and largest, you’ll be able to additionally clone its git repository:
git clone https://github.com/uds-se/FormatFuzzer.git
All additional movements happen in its major folder:
To run FormatFuzzer, you wish to have the next:
- Python 3
- A C++ compiler with GNU libraries (particularlyÂ
getopt_long()
) akin toÂclang
 orÂgcc
- The Python programsÂ
py010parser
,Âsix
, andÂintervaltree
- AÂ
zlib
 library (for compression purposes) - AÂ
spice up
 library (for checksum purposes)
When you plan to edit the construct and configuration scripts (.ac
 and .am
 information), you’re going to additionally want
- GNU autoconf
- GNU automake
sudo apt set up git g++ make automake python3-pip zlib1g-dev libboost1.71-dev
pip3 set up py010parser six intervaltree
xcode-select --install
brew set up python3 automake spice up
pip3 set up py010parser six intervaltree
On all programs, the usage of pip
:
pip set up py010parser
pip set up six
pip set up intervaltree
Observe: all construction instructions require you to be in the similar folder as this README
 report. Construction a fuzzer out of doors of this folder isn’t but supported.
There is a construct.sh
 script which automates all building steps. Merely run
to create a GIF fuzzer.
This works for all report codecs equipped in templates/
; if there’s a report templates/FOO.bt
, then ./construct.sh FOO
 will construct a fuzzer.
There is a Makefile
 (supply in Makefile.am
) which automates all building steps. (Calls for GNU make
.) First do
contact configure Makefile.in
then
after which
to create a GIF fuzzer.
This works for all report codecs equipped in templates/
; if there’s a report templates/FOO.bt
, then make FOO-fuzzer
 will construct a fuzzer.
If the above make
 manner does no longer paintings, or if you wish to have extra regulate, you’ll have to continue manually.
Run the ffcompile
 compiler to collect the binary template into C++ code. It takes two arguments: the .bt
 binary template, and a .cpp
 C++ report to be generated.
./ffcompile templates/gif.bt gif.cpp
Use the next instructions to create a fuzzer gif-fuzzer
. First, collect the generic command-line driving force:
g++ -c -I . -std=c++17 -g -O3 -Wall fuzzer.cpp
(-I .
 denotes the positioning of the bt.h
 report; -std=c++17
 units the C++ same old.)
Then, collect the binary parser/compiler:
g++ -c -I . -std=c++17 -g -O3 -Wall gif.cpp
After all, hyperlink the binary parser/compiler with the command-line driving force to acquire an executable. When you use any further libraries (akin to -lz
), make sure to specify those right here too.
g++ -O3 gif.o fuzzer.o -o gif-fuzzer -lz
FormatFuzzer will also be run as a standalone parser, generator or mutator of particular codecs. As well as, it might referred to as via general-purpose fuzzers akin to AFL++ to combine the ones format-specific functions into the fuzzing procedure (see the segment under on AFL++ integration).
The generated fuzzer takes a command as first argument, adopted via choices and arguments to that command.
Crucial command is fuzz
, for generating outputs. Its arguments are information to be generated in the correct layout.
Run the generator as
./gif-fuzzer fuzz output.gif
to create a random binary report output.gif
, or
./gif-fuzzer fuzz out1.gif out2.gif out3.gif
to create 3 GIF information out1.gif
, out2.gif
, and out3.gif
.
Observe that the gif.bt
 template we offer has been augmented with particular purposes to make era of legitimate information more uncomplicated. When you use an authentic .bt
 template information with out variations, you can get warnings throughout era and create invalid information.
You’ll additionally run the fuzzer as a parser for binary information, the usage of the parse
 command. This turns out to be useful if you wish to check the accuracy of the binary template, or if you wish to mutate an enter (see `Resolution Recordsdata’, under).
To run the parser, use
./gif-fuzzer parse enter.gif
You are going to see error messages if enter.gif
 can’t be effectively parsed.
Whilst parsing, you’ll be able to additionally retailer all parsing choices (i.e. which parsing choices have been taken) in a resolution report. It is a collection of bytes enumerating the selections taken. Every byte stands for a unmarried parsing resolution. A byte price of 0
 signifies that the primary selection used to be taken, a byte price of 1
 signifies that the second one selection used to be taken, and so forth.
You’ll generate any such resolution report when parsing an enter:
./gif-fuzzer parse --decisions enter.dec enter.gif
Right here, enter.dec
 shops the selections made for parsing `enter.gif’.
You’ll additionally use any such resolution report when producing inputs. The fuzzer will then take the very same choices as discovered throughout parsing. The next command generates a brand new GIF report the usage of the selections decided whilst parsing `enter.gif’:
./gif-fuzzer fuzz --decisions enter.dec input2.gif
If the whole thing works neatly, each information must be equivalent:
By way of mutating a call report (e.g. changing person bytes), you’ll be able to create inputs which can be an identical to the unique report parsed. This turns out to be useful for interfacing with particular checking out methods and fuzzers akin to AFL, the place you’ll be able to use gif-fuzzer
 and the like as translators from resolution information to binary information and again: AFL would mutate resolution information, and this system below check would run at the translated binary information. Against this to mutating binary information without delay (as AFL would generally do), this could have the benefit of at all times having legitimate inputs – and thus progressing a lot quicker against protection.
Along with the format-specific fuzzers, akin to gif-fuzzer
, FormatFuzzer may also be compiled into format-specific shared libraries, akin to gif.so
 (for that, merely run ./construct.sh gif
 or make gif.so
). The ones shared libraries will also be loaded via general-purpose fuzzers, akin to AFL++.
To run AFL++ with FormatFuzzer, simply apply the directions on our changed model of AFL++. We reinforce other fuzzing methods, together with:
AFL+FFMut: runs AFL++ the usage of FormatFuzzer to offer format-specific good mutations.
AFL+FFGen: makes use of FormatFuzzer as a format-specific generator, whilst AFL++ mutates its resolution seeds.
To write down your individual .bt
 binary templates (and thus create a high-efficiency fuzzer/parser for this layout), learn the segment Advent to Templates and Scripts from the 010 Editor Guide.
In lots of instances, a template of the layout you’re searching for (or a an identical one) would possibly exist already. Take a look on the 010 editor binary template assortment whether or not there’s something that you’ll be able to use or base your layout on.
Observe that the .bt
 information equipped within the repository typically goal parsing information. They can be used for producing information, too; however they continuously lack precise data which portions of the enter are required.
On this segment, we talk about one of the most techniques wherein you’ll be able to customise .bt
 information to paintings neatly with FormatFuzzer
.
For instance, for the GIF layout, the report templates/gif-orig.bt presentations the unique binary template, which used to be most effective designed for parsing, whilst the report templates/gif.bt is a changed model which is in a position to producing legitimate GIFs. Evaluating the 2 information, we see {that a} small quantity adjustments used to be required to succeed in this.
When you have created a gif-fuzzer
, both via operating make gif-fuzzer
 or via the usage of the ffcompile
 instrument, you’ve got already acquired a C++ report gif.cpp
 which comprises an implementation of the GIF generator and parser. This turns out to be useful to peer how the adjustments you’re making to the binary template are translated into executable code. Extra main points at the C++ code are offered at the subsequent segment.
The GIF binary template uses lookahead purposes ReadUByte()
 and ReadUShort()
 to seem forward on the values of the following bytes within the report ahead of in truth parsing them right into a struct box. At era time, we permit the ones purposes to obtain an extra argument specifying a suite of fine recognized values to select for the bytes that we glance forward. As well as, we additionally permit specifying an international set of fine recognized values to at all times use when calling a specific lookahead serve as, akin to ReadUByte()
. The ones are saved within the ReadUByteInitValues
 vector.
By way of default, our translation process ffcompile
 tries to mine fascinating values which were utilized in comparisons towards lookahead bytes and use them as an international set of recognized values. When operating
./ffcompile templates/gif.bt gif.cpp
a published message presentations the lookahead purposes known, in addition to the mined fascinating values:
Completed growing cpp generator.
Lookahead purposes discovered:
ReadUByte
ReadUShort
Mined fascinating values:
GlobalColorTableFlag: ['1']
LocalColorTableFlag: ['1']
ReadUByte: ['0x3B', '0x2C']
ReadUShort: ['0xF921', '0xFE21', '0x0121', '0xFF21']
Signature: ['"GIF"']
For GIF era, alternatively, it’s higher to specify the set of fine recognized values for ReadUByte()
 for my part at each and every name to the serve as. So we outline an empty array (measurement 0)
const native UBYTE ReadUByteInitValues[0];
to overwrite the set of worldwide ReadUByteInitValues
 and for each and every name to ReadUByte()
, we use an extra argument to specify the set of fine values to make use of for that specific location. The binary template language may be tough sufficient to permit this option to be made in keeping with runtime prerequisites. For instance, within the following code we display how the collection of suitable values for a ReadUByte()
 name can rely at the present GIF model we’re producing. A GIF model 89a
 permits one further imaginable price for the byte (0x21).
if(GifHeader.Model == "89a")
native UBYTE values[] = { 0x3B, 0x2C, 0x21 };
else
native UBYTE values[] = { 0x3B, 0x2C };
whilst (ReadUByte(FTell(), values) != 0x3B) {
...
}
The rest edits required for the GIF binary template are an identical. For instance, for each and every struct box too can specify a suite of recognized excellent values. For instance this specifies the proper values for the Model
 box: 87a
 and 89a
.
char Model[3] = { {"87a"}, {"89a"} };
For debugging functions, in addition to for working out how you can make suitable adjustments to beef up your turbines and parsers, it can be helpful to know some inside workings of the generated C++ code. Preferably, you must be capable of edit the binary template information till they may be able to be used to generate legitimate information with excessive likelihood, so that you should not have to edit the generated C++ code.
The C++ code creates a category for each and every struct
 and union
 outlined within the binary template, in addition to for local sorts, akin to int
.
At building time, when initializing a variable, we will be able to outline a suite of fine recognized values that this variable can think. For instance, the constructor name
char_array_class cname(cname_element, { "IHDR", "tEXt", "PLTE", "cHRM", "sRGB", "iEXt", "zEXt", "tIME", "pHYs", "bKGD", "sBIT", "sPLT", "acTL", "fcTL", "fdAT", "IHDR", "IEND" });
would specify 17 excellent values to make use of for variable cname
. However that is continuously no longer sufficient, because the collection of suitable chew sorts is context delicate. So we additionally permit specifying a suite of fine values at era time when producing a brand new chew. For instance, this name might be used to generate an example of chew
 for the primary chew, which will have to have sort IHDR.
GENERATE(chew, ::g->chew.generate({ "IHDR" }, false));
When producing the second one chew, we would possibly use this lengthy checklist of imaginable chunks that may come between the IHDR chew and the PLTE chew:
GENERATE(chew, ::g->chew.generate({ "iCCP", "sRGB", "sBIT", "gAMA", "cHRM", "pHYs", "sPLT", "tIME", "zTXt", "tEXt", "iTXt", "eXIf", "oFFs", "pCAL", "sCAL", "acTL", "fcTL", "fdAT", "fRAc", "gIFg", "gIFt", "gIFx", "sTER" }, true));
The generator will then uniformly pick out probably the most excellent recognized values to make use of for the brand new example. We additionally permit the collection of an evil price which isn’t probably the most excellent recognized values with small likelihood 1/128. This selection will also be enabled or disabled any time via the usage of the process set_evil_bit
.
All of the random alternatives taken via the generator are achieved via calling the rand_int()
 manner.
lengthy lengthy rand_int(unsigned lengthy lengthy x, std::serve as<lengthy lengthy (unsigned char*)> parse);
When operating this system as a generator, this technique samples an integer from 0 to x-1 via studying bytes from the random buffer. When operating this system as a parser, this technique makes use of the parse()
 serve as to determine which random bytes will have to be provide within the random buffer with the intention to generate the objective report, after which writes the ones bytes to the random buffer. The parse
 serve as receives as a controversy the buffer on the present place of the report and will have to then go back which price would must be returned via the present name to rand_int()
 with the intention to generate this precise report configuration.
FormatFuzzer used to be designed and written via Rafael Dutra <[email protected]>.
The idea that of a fuzzer compiler used to be presented via Rahul Gopinath <[email protected]> and Andreas Zeller <[email protected]>.
FormatFuzzer is Copyright © 2020, 2021 via CISPA Helmholtz Heart for Data Safety. The next licenses observe:
The FormatFuzzer code (particularly, all C++ code and code associated with its era) is topic to the GNU GENERAL PUBLIC LICENSE, as present in COPYING.
As an exception to the above, C++ code generated via FormatFuzzer (i.e., fuzzers and parsers for particular codecs) is within the public area.
The unique pfp code, which FormatFuzzer is founded upon, is topic to an MIT license, as present in LICENSE-pfp.
Authentic repository: https://github.com/uds-se/FormatFuzzer