Fuzzer Project Overview
Overview
This project is for individuals.
One of the most helpful tools that a security-minded software developer can have is a fuzz-testing tool, or a fuzzer. A fuzzer is a type of exploratory testing tool used for finding weaknesses in a program by scanning its attack surface.
The best fuzzers are highly customizable, so generalized fuzzers are often quite complex to configure and use, and can become out-of-date quickly. Fortunately, we’re software engineers, so we’ll build a fuzzer that can be customized to a specific web application rapidly.
Programming Language
You have two choices of programming language: Ruby or Python. You may choose the specific packages to use - research them and find ones that fit your needs (there are many for both languages). For Ruby, successful projects have used Ruby with Mechanize, or Python with the Requests package or Mechanical Soup. If you are using Python, past students strongly recommend Python3+ due to some inconsistencies in the 2.7 versions of packages (i.e. use “python3” and “pip3” to run since Python 2 is long dead).
Think of the above libraries like a GUI-less browser - it can simulate everything that a browser does, but programmatically. It does HTTP requests, parses HTML, and a lot more. In particular, they will:
- Handle the HTTP protocol, so you don’t have to worry about networking details
- Parse HTML, so you can find inputs via relatively simple XPATH queries or something similar
- Simulate user actions, such as clicking links and submitting forms
Your code will be tested against a modified version of the DVWA we provided in the web application activity. We recommend you use DVWA as your test bed, but make sure your code is general enough to work on any website.
NOTE: The root URL for the CI image is slightly different from what you most likely install on your local machine. Specifically, the root URL to get to DVWA is http://localhost
when using the CI image (not http://localhost/dvwa
). This different is intentional, and your fuzzer should be able to handle that flexibility since the URL is a command line parameter.
Starter Examples
This example demonstrates a quick script for getting the links from our course web page using Ruby and the Mechanize gem:
require 'mechanize' # you'll need Mechanize installed (gem install mechanize)
# This is some example code that just grabs all the links on a page
# More docs can be found here: http://docs.seattlerb.org/mechanize
agent = Mechanize.new
url = 'http://www.se.rit.edu/~swen-331'
puts "Visiting #{url}"
agent.get(url).links.each do |link|
puts "Link: #{link.uri}"
end
# Are you getting this error?
# `<class:Persistent>': uninitialized constant Process::RLIMIT_NOFILE (NameError)
# On Windows there's a known issue at the moment (https://github.com/sparklemotion/mechanize/issues/529)
# You can read about the workaround there
Here is some Python that uses MechanicalSoup:
import mechanicalsoup
# Connect to our course website
browser = mechanicalsoup.StatefulBrowser(user_agent='MechanicalSoup')
browser.open("http://www.se.rit.edu/~swen-331")
# Find all links using the CSS selector
for link in browser.page.select('a'):
print(link.text)
Command-Line interaction
Your fuzzer must run from the command line. We recommend using a command line parsing library, such as Python’s argparse or optparse or Ruby’s option parser (or any library you wish to add).
Depending on your language, your exact command might vary (e.g. python fuzz.py
or ruby fuzz.rb
), but the basic structure should follow this manpage:
fuzz [discover | test] url OPTIONS
COMMANDS:
discover Output a comprehensive, human-readable list of all discovered inputs to the system. Techniques include both crawling and guessing.
test Discover all inputs, then attempt a list of exploit vectors on those inputs. Report anomalies that could be vulnerabilities.
OPTIONS:
Options can be given in any order.
--custom-auth=string Signal that the fuzzer should use hard-coded authentication for a specific application (e.g. dvwa).
Discover options:
--common-words=file Newline-delimited file of common words to be used in page guessing. Required.
--extensions=file Newline-delimited file of path extensions, e.g. ".php". Optional. Defaults to ".php" and the empty string if not specified
Test options:
--common-words=file Same option as in discover - see above.
--extensions=file Same option as in discover - see above.
--vectors=file Newline-delimited file of common exploits to vulnerabilities. Required.
--sanitized-chars=file Newline-delimited file of characters that should be sanitized from inputs. Defaults to just < and >
--sensitive=file Newline-delimited file data that should never be leaked. It's assumed that this data is in the application's database (e.g. test data), but is not reported in any response. Required.
--slow=500 Number of milliseconds considered when a response is considered "slow". Optional. Default is 500 milliseconds
Example invocations:
# Discover inputs, default extensions, no login
fuzz discover http://localhost:8080 --common-words=mywords.txt
# Discover inputs to DVWA using our hard-coded authentication, port 8080
fuzz discover http://localhost:8080 --custom-auth=dvwa --extensions=extensions.txt --common-words=mywords.txt
# Discover and Test DVWA, port 8000, default extensions: sanitized characters, extensions and slow threshold
fuzz test http://localhost:8000 --custom-auth=dvwa --common-words=words.txt --vectors=vectors.txt --sensitive=creditcards.txt
Example Output
Your output should be human readable. Think of it like a build report you might get in an email that you can review from time to time. It should be detailed enough that you can look into potential vulnerabilities, and it should also be readable enough that you’re trying to read through HTTP outputs and stacktraces.
An example of well-formatted output for the discover
can be found here. (You do not need to match this format exactly.) An example of good output from test
can be found here. Note that DVWA and our requirements evolve over time, so these exact outputs may be slightly different than yours - these are just examples of how to format your outputs, not oracles for expected results.
Test Environments
DVWA. Use the DVWA download from our web application activity as your main test environment. You can use the zip file from our Web Application Activity.
fuzzer-tests. Additionally, we have created a set of simpler test cases at the /fuzzer-tests
url in the DVWA zip file and on the Docker image. You can find the PHP files in the zip file under /htdocs/fuzzer-tests
. Your fuzzer should be able to find the following:
…during fuzz discover
:
- The main page
index.php
has two inputs, one calledcalzone
(discoverable via form) and another calledmessage
(discoverable via url parameter parsing). - There is a page called
valid.php
- There is a page called
timeout.php
- There is a page called
admin.php
not linked from anywhere, it has an input calledcompany
(discoverable via form OR url parsing) - There is a page called
sensitive.php
linked only fromadmin.php
- There is NOT a page called “CioffisTheBest.html”. That link is dead.
- Your fuzzer should NOT go in an infinite loop.
- Your fuzzer should NOT go
http://se.rit.edu
…during fuzz test
:
index.php
, inputcalzone
lacks sanitizationindex.php
, inputmessage
lacks sanitizationvalid.php
has no inputstimeout.php
has no inputs, but it has a long delayadmin.php
, inputcompany
lacks sanitizationsensitive.php
has no inputs, but leaks sensitive data of 123-45-6789timeout.php
takes at least 2 seconds to load
These test cases are maintained over at this GitHub project. Pull requests welcome!
Submission Instructions
You must use RIT’s installation of GitLab for this project. By a pre-determined date (given by your instructor), please do the following:
- Go to https://git.gccis.rit.edu/ (NOT kgcoe-git and NOT gitlab.com!!)
- Use your RIT login to sign in (“LDAP Login”).
- Create a new project - be sure to have the word “fuzzer” in the title. Visibility is Private (please do not share your code, even after this class has finished).
- Now add the instructor and CA to your project:
- On the project page use the Manage > Members menu option.
- The system will display the Project members page. Click on the Invite members button (upper-right).
- In the Invite members dialog: (a) enter the instructor’s name and select the appropriate user, (b) select Reporter in the Role field and (c) click on the Invite button.
- Follow the same instructions to add your section’s Course Assistant.
You are required to push your code to this repository by the deadline. At each deadline, we will automatically pull the code and grade that. You do not need a separate repository for each release - just keep working on the same repository for the entire fuzzer project. Each time you finish a submission, create a Tag in the master
branch for that submission (see details below)
Please include a file called .gitlab-ci.yml
(note the dot at the beginning of the file name) in the root of your repository. Here is the base file you should use, but then adapt it to your configuration. Note that YML files don’t like tabs as whitespace and are finicky about number of spaces for indentation.
image:
name: andymeneely/swen331fuzzer # don't change this
entrypoint: [""] # don't change this
before_script:
# don't change these either
- chown -R mysql:mysql /var/lib/mysql /var/run/mysqld
- echo '[+] Starting mysql...'
- service mysql start
- echo '[+] Starting apache'
- service apache2 start
fuzzrunner:
script:
# here is where you can write your commands to run your fuzzer or any custom setup commands
- echo "hello class"
# need some example files for vectors and words? These are on the image
- cat /words.txt
- cat /vectors.txt
- cat /badchars.txt
# An example fuzzer command. Note the url is DIFFERENT than XAMPP example (no /dvwa).
# Remove whatever you need to.
- ruby fuzz.rb discover http://localhost/ --custom-auth=dvwa
- ruby fuzz.rb discover http://127.0.0.1/fuzzer-tests
- python3 fuzz.py discover http://localhost/ --custom-auth=dvwa
- python3 fuzz.py discover http://127.0.0.1/fuzzer-tests
stage: test
This is a continuous integration configuration file. Every commit you push to the repository, your fuzzer will be run against DVWA installed in a clean environment. To see the output, go to GitLab and find your build in the “Pipelines” page. You are strongly encouraged to keep an eye on this output to make sure your code is working as expected as you work.
Your application should be easy to use from a customer’s perspective. Some notes about your submision include:
- Fuzzers will be run on our docker image fuzzer-tests and DVWA first for grading. The Docker image has DVWA installed, much like the XAMPP installation from the activity. You must make sure your code runs on the CI successfully! Please note that the version of Python installed on these machines may differ from yours, so you should make sure your application works on these environments.
- Along with functionality, your applications and instructions should be clear and easy to understand and use. Imagine that you are releasing the application to an only reasonably technically proficient customer. Instructions should be clear, concise and unambiguous. Write a clear README.txt file that includes specific commands like
gem install
andpip3 install
. - If need be, you may modify your GitLab CI file to do any custom installs. For example
MechanicalSoup
is not installed by default in our image, so you may need to addpip3 install MechanicalSoup
to yourbefore_script
clause (note that it’spip3
notpip
)
Part 0: –custom-auth and CLI
For this initial part, you will need to implement the –custom-auth feature to log into DVWA. Implicit in this is command line parsing of discover
and --custom-auth=dvwa
(Note: you can use built-in command line parsing libraries like Ruby’s optparse and Python’s argparse [or parse yourself]).
This fuzzer should work on any web application. But, when --custom-auth=dvwa
is given, your fuzzer will know the location of the DVWA setup and login pages along (i.e. these are partially hardcoded). With custom authentication turned off, the fuzzer should just crawl the exterior of the webapp (perhaps get lucky if the vector list had a password).
For the DVWA custom auth, your fuzzer will need to do the following sequence of operations automatically:
- Go to
{URL}/setup.php
where{URL}
is the given url from the command line that points to a DVWA instance - “Click” on the Create/Reset Database (i.e. submit the form)
- Go to
{URL}
, and it will forward you to the login page - Enter in “admin” and “password”
- “Click” Login (i.e. submit the form)
- Go to the DVWA Security page (
{URL}/security.php
) - Select “Low” and submit the form
- Begin your fuzzing operations. (See rest of instructions)
You cannot assume that DVWA will be installed on a specific server, in a specific folder, on a specific port. For example, http://example.com:1234/foo/dvwa
is a valid url we might try.
Be sure to document in your README any setup you need. If your code does not work on the CI, the grader will be running this locally. Any confusion in that setup might hurt your grade.
To demonstrate you are logged in and security is correctly set, go back to the home page and print out the contents of the HTML of the DVWA home page, to stdout. Remove this output after submitting Part0.
When you are done with your implementation (i.e. it is working, and ready to grade), push your code to gitlab (make sure it works in the CI!), and create a Tag called “Fuzzer-Part0”
Part 1: fuzz discover
On the discovery side, your fuzzer will need to discover as many potential inputs to the system as possible. It will need to do the following:
- Page discovery. The fuzzer must crawl and guess pages.
- Page Guessing. The fuzzer should use the common word list to discover potentially unlinked pages. Attempt every combination of word and extension, (e.g. admin.php, admin.jsp). The list of words and extensions are in text files referred to in your command line arguments. Your guessing may limit itself to the root of the given URL, you do NOT need to guess on every existing page (although a real fuzzer should!). For example, if the given url was
http://localhost/foo
, and your word list hasbar
andbaz
then your fuzzer should guesshttp://localhost/foo/bar
andhttp://localhost/foo/baz
(and you do NOT need to construct other permutations likehttp://localhost/foo/bar/baz
orhttp://localhost/bar
). - Link Crawling. Starting from the initial URL given, and from any page guessed, the fuzzer should discover pages on the site by finding links and visiting them (i.e. “crawling”). Keep a list of URLs that your fuzzer confirms exist. Do not follow any links off-site in your crawl. Do not go into an infinite loop. Beware of logout links: you may amend your custom auth feature to be aware of what DVWA’s logout page is and skip crawling it.
- Page Guessing. The fuzzer should use the common word list to discover potentially unlinked pages. Attempt every combination of word and extension, (e.g. admin.php, admin.jsp). The list of words and extensions are in text files referred to in your command line arguments. Your guessing may limit itself to the root of the given URL, you do NOT need to guess on every existing page (although a real fuzzer should!). For example, if the given url was
- Input discovery. Given a page, the fuzzer should attempt to discover every possible input into the system.
- Parse URLs. The fuzzer should be able to take a URL and parse it down to manipulate its input and recognize which page a URL goes to. For example,
http://localhost/index.jsp?something=a
is the same page ashttp://localhost/index.jsp?anotherthing=b
, and there are two input that can be fuzzed (something
andanotherthing
). - Form parameters. All input fields to forms should be considered inputs.
- Cookies. Cookies are values that the application write to the browser cache, then reads from later. Since the application reads this data from the browser, cookies are also considered inputs. (e.g. DVWA uses a cookie to set High/Medium/Low security)
- Parse URLs. The fuzzer should be able to take a URL and parse it down to manipulate its input and recognize which page a URL goes to. For example,
When you are done with your implementation (i.e. it is working, and ready to grade), push your code to gitlab (make sure it works in the CI!), and create a Git tag called “Fuzzer-Part1”
Part 2: fuzz test
Once you’re done with input discovery, it’s time to test. Testing has two parts: trying vectors, and then determining if the outcome was out of the ordinary.
To conduct your test, you must use an external list of fuzz vectors. These are strings of common exploits to vulnerabilities. These lists can be found all over the internet. This list at OWASP is a fine place to start. TIP: when developing, keep this list short and targeted. You’re fuzzing applications you know are vulnerable, so unnecessary vectors can slow your Edit-Compile-Test cycle.
Upon sending in the vector, you’ll need examine the response to see if the page may have a vulnerability. Here are some reasons, and you may think of more.
- Lack of sanitization. Given inputs with data that should be sanitized, the fuzzer should report whether or not the data was actually sanitized. Your fuzzer should default to
<
and>
if a newline-delimited list is not specified by the--sanitized-chars
command line option.- For example, your fuzzer could try a string like
foobar<foobar
and if the resulting page had sanitization, then that string would show up asfoobar<foobar
. But if the original, dangerous stringfoobar<foobar
is still in the resulting page (i.e. the dangerous character was not sanitized, and you know it’s not a false positive becausefoobar
is not a string in the application normally), then you you know the page had a lack of sanitization.
- For example, your fuzzer could try a string like
- Sensitive data leaked. As a tester, you might have some test data that you know is sensitive and should never be leaked. The system should contain a list of sensitive data, and should check each request and response if that data has been disclosed. This is in an external text file and provided on the command line (
--sensitive=file
). This list ought to include technical words that users should never see like “MySQL” as well as personal information like a social security number. Add “MySQL” and “123-45-6789” (without the quotes) to your sensitive data list, as well as anything you feel you need for your testing. - Delayed response. If the response takes longer than a pre-defined threshold, then there’s a potential of a denial-of-service vulnerablity.
- HTTP response codes. If the HTTP response code is not OK (i.e. 200), then something went wrong. Report it. This can happen on broken links, authorization problems, whatever. Translate those HTTP codes to be human readable
When you are done with your implementation (i.e. it is working, and ready to grade), push your code to gitlab (make sure it works in the CI!), and create a Git tag called Fuzzer-Part2
Grading
Given that we are fuzzing known vulnerable applications, we will run your fuzzer to ensure that it reports errors on those vulnerabilities. However, your fuzzer should not hardcode specific vulnerabilities in specific applications - the point of this exercise is to make your fuzzer discover potential vulnerabilities. The grading breakdown is as follows:
- (25 pts) –custom-auth=dvwa
- (45 pts) fuzz discover
- (30 pts) fuzz test
- Each deliverable must include simple and clear setup steps via README
- Each deliverable must include meaningful, easy to read output
Specifically, we will be assigning points to these categories:
Part 0 (25 pts)
- Shared Fuzzer in gitlab: 1
- On Time: 3
- Runs on CI: 2
- Correct DB Init: 2
- Custom Auth Implemented: 4
- Authenticate into DVWA: 4
- Readme: 3
- Print homepage: 3
- Clean code: 3
Part 1 (45 pts)
- On Time: 2
- Runs on CI: 3
- Link Discovery: 4
- No External Links: 2
- Page Guessing: 4
- Found Pages: 3
- Common Words list: 2
- Guess Pages Crawled: 2
- Parse URLs: 4
- Duplicate URLs handled correctly: 2
- Form Parameters: 4
- Cookies: 4
- Readme: 3
- Output: 3
- Clean code: 3
Part 2 (30 pts)
- On Time: 2
- Runs on CI: 2
- Tests with vectors: 5
- Tests with sanitized-chars: 4
- Tests with sensitive data leaks: 4
- Use parameter for slow response: 2
- Reports non-200 response codes: 3
- Readme: 2
- Clean/ readable output: 3
- Clean code: 3
Submission
Submit your source code via GitLab. We will pull from the master
branch at the deadline (if you need us to pull from a different branch, let us know when you make your submission. Write up a basic README on getting the code to run in other environments.