Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Pfuzz, a web fuzzer following the Unix philosophy (github.com/codesoap)
93 points by codesoap 12 months ago | hide | past | favorite | 26 comments
I recently dipped my toes into bug bounty hunting and finding security flaws in web applications. As a friend of UNIX shells I was building a repertoire of command line tools to make and analyze HTTP requests. Fortunately there are already many suitable tools like curl, jq, different fuzzers and some really nice tools for specific tasks by Tom Hudson [1].

However, I disliked that the existing fuzzers were monoliths where I had no easy way of creating custom behavior or analyses. They commonly do a multitude of things: Create multiple requests using one or more wordlist, sending the request, possibly with rate limiting, displaying progress, applying filters to the received responses and storing the output. If you want something different from the offered features, for example custom delays between requests or a new filter for the responses, your only option is to dig into a moderately large code base and try to adapt it to your needs.

I am a fan of the UNIX philosophy and felt like it could help out here. If there was a common format for communicating HTTP requests and responses, an ecosystem of small, specialized tools could use it to work together and fulfill tasks like fuzzing, while allowing the user to easily create custom behavior by combining the existing tools in different ways or adding small, quick to write tools to the ecosystem.

This is what I've attempted with the httpipe format [2]. It is a line based JSON format for exchanging HTTP requests and responses. I have also built some first tools using this format, namely pfuzz [3] for creating HTTP requests from wordlists, preq [4] for sending HTTP requests and receiving their responses and hpstat [5] for filtering the responses by their HTTP status codes. Since it's a line based format, many UNIX tools can be used with it as well and since each line is JSON, jq can also be used for manipulation, filtering and displaying.

[1] https://github.com/tomnomnom

[2] https://github.com/codesoap/httpipe

[3] https://github.com/codesoap/pfuzz

[4] https://github.com/codesoap/preq

[5] https://github.com/codesoap/hpstat




This may be a stupid question, but what makes this a fuzzer exactly?

I think would call this a formatter, as it doesn't seem to do any mutations or executions itself. I would call AFL a fuzzer.

I would think at the minimum a fuzzer would more or less copy the mutation aspect, because to me that's what makes fuzzers uniquely useful.

Curious to hear your thoughts, or if mutations are a planned feature I missed


It seems to me like "fuzzing" has a different meaning in web application penetration testing. Here, "fuzzer" is a term for tools that just generate different request using wordlists, without adding any mutations. For example, the two popular tools ffuf [1] and wfuzz [2] also call themselves fuzzers.

I see how reusing a term for a different concept is bothersome, but I feel like "fuzzer" is the term that people learning about bug bounty hunting are familiar with.

[1] https://github.com/ffuf/ffuf

[2] https://wfuzz.readthedocs.io/en/latest/


Yeah, this is generally what people mean by "web fuzzing".


Does anyone know of a web fuzzer like tool that mutates structures like JSON or file uploads or the like while it tests?

So things that look like reasonably complex HTTP requests but have deficiencies or small variations?

The last few API outages we had in my group were due to JSON payload edge-cases (either malformed or incorrectly structured) that weren't caught by what we thought were pretty extensive E2E tests and validation.


You can use radamsa [1] to create mutations for JSON payloads. There's an example using it with ffuf here: https://github.com/ffuf/ffuf?tab=readme-ov-file#using-extern...

[1]: https://gitlab.com/akihe/radamsa


Thank you so much! Just finished doing a few test with both of those tools and it looks like they will be very helpful.


I'm working on a fuzzer for json blobs as a side project, more work is needed before first public release but my email's in the profile if you're curious.


Doesn’t wireshark have a format for storing request / responses ? That seems a fair standard to lean on.

I like the idea - I think I have at least two formats for storing expected request / responses and probably more.

But standardising - as in not just my ball of twine tools uses but every uses is great.

I just think it already exists ?


To be frank, I hadn't even considered looking at the file format of wireshark when thinking of existing file formats, that I could reuse. I've now taken a brief look and it seems like wireshark supports quite a lot of different formats [1], but the preferred one seems to be PcapNG. At a first glance, there are several attributes that make them less suitable for my purposes:

1. PcapNG as well as the other file formats look like they are storing packets, which is a lower level than HTTP requests and unnecessarily verbose for my intended purposes.

2. They are binary formats, which makes them less suitable for printing to stdout. This also means, that they are not line based, which means UNIX tools, like grep, cannot be used effectively.

3. They are not designed for streaming. The httpipe format is line-based and contains no header/global fields. Thus it is trivial to, for example, build a filtering program: it would just read one line at a time and print it again, if it matches filter criteria; the output would automatically be valid httpipe again.

4. Lastly, parsing and composing JSON is something most developers have done before and basically every programming language has libraries for it. This makes it easy for the ecosystem to grow and enables users to build custom tools without too much initial effort.

[1] https://wiki.wireshark.org/FileFormatReference

[2] https://pcapng.com/


Fair enough. This does seem like one of those things that’s so simple that no-one thinks “oh I will grab the library to work with it”. That’s kind of a good thing, I mean JSON is so simple but I still use a library for reading and writing


I implemented my own packet capture stuff for Wireshark, the format is pretty straightforward. Use case was different and I reimplemented the rpcapd protocol but the packets themselves are easy to dump assuming you have access to the raw packet information (Ethernet headers and whatnot). You can of course also synthesize that information if needed.


If you are interested in the Unix philosophy and doing Web stuff, then perhaps node red[0] might be something for you.

It's a visual Unix pipes environment and has great support for all things Web. The advantage is constructing pipelines visually and not via the command line.

Node-RED is NodeJs based which unfortunately might make it harder to integrate Go code but it does support executing command lines tools using an exec node.

[0] https://nodered.org


Just wondering about httpipe - why do you split the host / port / tls config:

  {"host":"test.net","port":443,"tls":true,"req":"GET /src HTTP/1.1\r\nHost:
As opposed to something like this?

  {"base": "https://test.net","req":"GET /src HTTP/1.1\r\nHost:


It was a tradeoff. The solution with "base" is more compact and arguably easier to read. On the other hand, it's easier to filter or manipulate httpipe, if the fields are separated. For example, to filter out all requests to a certain host, independent of port or protocol, one could just use this:

    jq -c 'select(.host == "test.net")' my_stored_results.jsonl
The port of some stored requests can be swapped out like this:

    jq -c '.port = 8080' my_stored_requests.jsonl
In other words: I chose to sacrifice compactness for "ease of tinkering".

However, I was also thinking about specifying an optional/informational "url" field or something similar, which also includes the path; "https://test.net/src" for your example. This field would be ignored by tools, but one could more easily copy things over into curl or a browser for further investigations.


Hm yeah I wondered if it might be something like that. I would say that adding an additional field might be the worst option, as then the info could get out of sync somehow. I'd just pick whichever and if it's the combined one have some code to explode it into bits, or keep what you have and have some code to combine it, and maybe print the combined one out in a log line.


Good point. Maybe a small 'hp2url' or 'hp2curl' tool, which takes httpipe input and prints URLs or curl commands for each given request, would be a better solution.


Have you considered using the HAR file format used by Chrome Dev Rools and others for saving network captures?


Only briefly. HAR is extremely verbose, which makes it impractical for storing large amounts of requests and responses. It's also not line based, not designed for easy streaming/filtering (it has header/config fields) and the more "granular structure" (e.g. HTTP headers are separate JSON objects) allows for less freedom in creating malformed requests, which can be desirable when trying to find bugs in web applications.


Is JSON a good format for storing requests/responses? I mean, it deals only with text... What about streaming, binary data, etc.


I have thought about this a lot, but came to the conclusion, that people are most likely to write tools if the format is easy to parse and construct in many programming language. I think it's hard to find an encoding that suits this criteria better than JSON.

It also has the benefit of being single-lined, since newline characters are encoded, which is necessary for a line based format. This, in turn, allows the use of many UNIX tools, like grep.

However, it certainly is not the most compact format, when encoding large amounts of binary data. Gzipping for storage will alleviate the overhead somewhat. Overall, dealing with large amounts of binary data in web application testing seems like a less common use case, so I felt like I shouldn't put too much focus on it.


Thanks for the shout-out :)

I like the idea! I'll see if I get a chance to play around with it at work next week.


Wow, didn't expect you to see this, but glad to have you here :-) Don't hesitate to contact me if you have any feedback!


this is great. lowers the bar of entry to people who are learning it for the first time because they can learn each tool one by one understanding what it actually does instead of using an all-in-one tool as a kind of magic black box.


Have you found any interesting bugs with it?


I haven't been on the hunt much since I wrote the tools, but the first two bug bounties I ever received where issued for things I found with a web fuzzer. No doubt I would have made that money with pfuzz, if I had it back then :-)


Neat! How do I use this to make a webserver running in bash via netcat?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: