Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Mitmproxy2swagger – Automagically reverse-engineer REST APIs (github.com/alufers)
691 points by alufers on May 12, 2022 | hide | past | favorite | 85 comments



Wanted to show off my little project which helps whith reverse engneering APIs used by various apps. It takes HTTP traffic capturewd by mitmproxy and generates an OpenAPI specification for a given REST API.

I have used it already on two apps and the results are good enough to write an alternative client or quickly automate some stuff.


mitmproxy dev here, very awesome! :) This seems to be particularly useful to quickly generate clients for reverse-engineered APIs.


Swagger Editor dev which now works at Airbnb here. This is hilarious!


Hilarious indeed! The first thing I thought of with this project is actually AirBnB, because the sort/filter/map view is so terrible and missing features. AirBnB captures data on a bunch of stuff, but doesn't make it possible to search for in the UI (ever want a property with a lake view or a sauna? AirBnB knows which ones have those things, but they won't let you look for them!)

AirBnB doesn't have an official API but changes the tags so often that scrapers people put up on Github go out of date quickly. Now I can run this whenever I want to have actual search functionality (instead of the hobbled crap available on the website) and ensure that whatever flavor of API is available on the website that day is easily queryable!


How will this let you search for a sauna?


Easier to modify requests vs doing it using browser tools. The ability to search for the things I mentioned is actually there, but only via an undocumented url parameter that erases itself every time you pan the map. Doing it via REST calls is much easier than trying to do it in the UI.


What a fantastic idea! I have so many half baked things that some idiot (me) built without documenting the underlying API. This will make life so much easier


This is a really clever project. It seems like an obvious idea once you've seen it, but it clearly isn't. Thank you for sharing it.


This is great :) You can then fuzz your APIs for issues using https://github.com/Endava/cats.


does it capture route/server rendered pages too?


It does, but it will only generate schema descriptions for JSON endpoints. Whis means that the URL and method will appear in the spec, but not the response/request schema.


This is really incredible. With a rooted android phone and these tools, plus a couple others [1,2,3], you can get a skeleton to implement a backend for any app you want.

[1]: https://github.com/koxudaxi/fastapi-code-generator

[2]: https://github.com/ioxiocom/openapi-to-fastapi

[3]: https://infosecwriteups.com/hail-frida-the-universal-ssl-pin...


That's interesting, but it won't work with native code that statically links a SSL implementation.


In many applications you can bypass built-in verifications with some Frida [1] code. It requires more effort to do so, of course, as you'd need to find the OpenSSL methods (with a script like this [2] and bypass the verification in there.

If you're really intent on getting it to work, downloading the binary, patching out the verification function and putting it back is also possible if you're root.

[1]: https://frida.re/docs/android/

[2]: https://mobsecguys.medium.com/exploring-native-functions-wit...


Can this be used to generate a REST documentation for your own frontend just by interacting with it? This should be augmented via a crawler, that click everyclickable element recursively.


Totally, but you would need to do some manual cleanup and naming afterwards to make it more useful than just reading the source code. You could also for example use your integration tests if you have some to capture as much routes as possible.


of course the generated doc should be refined (e.g. filling missing types, error codes) but your lib would save us a lot of work and make the world a better place.


"...and we expect it to be free and open source as our budget for this is zero."


The relationship between actual utility/value and price is only vaguely correlated. Many of the most useful things on earth can't be marketed, not because they're not worth the money but because people are extremely greedy for some kinds of domains and simultaneously are bad at realizing the impact on their lives. E.g I have never spent a single dollar to access music despite being one of the few things in life that brings me intense joy


I'm glad I can subsidize your music hobby and that you feel no sense of guilt for not supporting the people who "bring you intense joy"


It's vaguely correlated because you don't value the work of others in general. This means that at some point in your life, others did not value your work and showed you that was perfectly acceptable.


Very nice!

On the same note, I wrote a program to generate Python code (requests) from a HAR capture: https://github.com/louisabraham/har2requests

I think using HAR captures is simpler for the end user than spawning mitmproxy as they don't require any installation and are extracted from the network tab of the browser devtools. Is there a reason why you didn't use them?

EDIT: I realized that mitmproxy can also get traffic from other devices like phones. Very cool project, I will think about modifying mine to support mitmproxy captures!


Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.


Wow that's super cool! Thanks!


Oh, I used a python script to generate pre-made requests from HAR recently, I'm pretty sure it was your git ! Very useful :)


Thanks!


Almost exactly a fit against my idea[1] to generate OpenAPI from HAR files. Going to read through to see if I can add HAR support.

[1]: https://github.com/captn3m0/ideas#openapi-specification-gene...


OpenAPI is just the latest version of swagger. Should not be hard to change.

I was able to translate HAR to OpenAPI with this web site's free preview: https://www.apimatic.io/transformer/

I also see others are working on the same thing: https://github.com/dcarr178/har2openapi



Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.


Very interesting! Would this also be able to determine what kind of auth (header tokens, cookies, etc) the APIs require or is that something you still need to detect manually?


At this point yes, but I am working on adding this.


this is absolutely insane!!! I understand capturing the REST api network part, is it then examining the request body, headers being sent back and forth to figure out the API?


Yes, this is basically what this program does.


From what I understand it’s also somewhat how JIT works in various JavaScript engines: observe the sorts of objects (which naively have the performance characteristics of hash tables) you see, and start defining static offsets for fields you observed. The JIT’d (fast) objects may morph over time as new fields are observed, but I’d imagine it’s a similar idea to creating documentation… “this object tends to have these fields, so just pretend those are the only fields it can have, until another request proves otherwise”, with similar guess/checking for their types/etc.


Really awesome, I tried my hand at writing something similar and was surprised at how well it actually ended up working.

I feel liken the next step is automatically generating load tests and/or fuzzing tests. Felt like that could be a real product.



Really amazing.

We're having hundreds of undocumented endpoints created over the years, and running this tool on our backends will create instantly good documentation

Thanks for that! Will give feedbacks if any issues


Can we have this as a browser dev tool please? F12 -> Tab REST -> Create spec from API


This looks amazing. Will it also capture data types like enumerators by someway detecting patters?


I thought about it, but it would be hard to distinguish between an enumerator and just static data. For example if you logged in with only one account it could classify the "username" field as an enumeration, because there is only one captured value.


Yeah I imagine that is nearly impossible without capturing data at scale. Awesome tool! I'm super grateful :-)


This is awesome; I’m going to try it as soon as I get back to my desk. I’ve been working on trying to glue together tools to translate Charles proxy output to OpenAPI (swagger). I think it would be a great tool to have in a web app reverse engineering toolbox.


I did something similar a year ago at the company which I work, I basically wrote a middleware that intercepts all the requests(express JS) and writes to a OpenAPI YAML file. It diffs previous requests to see which parts of the request path could be variables. The system isn't perfect but you are 95% there which is better than having no documentation or to hand write documentation or keep that spec file updated with changes that people introduce in the code. (got promoted to tech lead after this :-) )


little bit off-topic, but do anybody know of something similar for soap/wsdl? I'm aware of soapui mock service.


Doesn't wsdl just expose the schema on the server?


WSDL and OpenAPI/Swagger solve similar problems.

Roughly speaking: WSDL is to XML web services as OpenAPI is to REST

They both model the API and message structure of an API. AFAICT WSDL goes a little farther in that you can declare message sequences (I might be giving short shrift to OpenAPI here).


Short of “this requires oauth” I think you are right about openapi


Hi, I would also like to add another tool I'm contributing to at work (cisco) called APIClarity [1]. It aims at reconstructing swagger specifications of REST microservices running in K8S, but can also be run locally.

This is a challenging task and we don't support OpenAPI v3 specs yet (we are working on it).

Feel free to have a look, and get ideas from it :)

We'll also be presenting it at next Kubecon 2022.

[1]: https://github.com/openclarity/apiclarity


Try out https://www.apimatic.io/transformer/ for converting Swagger Specs to OpenAPI


This is great work!

This would come in very handy for codebases where an OpenAPI v3 spec would be welcome, but is too onerous to create by hand. Run this for a bit, have it spit out a nearly complete spec, and tweak it a bit to output the final product.

In fact, it is precisely what we did to generate the OpenAPI docs for NodeBB [1]. We had an undocumented API that we turned into an OpenAPI v3 file.

[1] https://docs.nodebb.org/api/read


The question is maybe a bit off-topic a d vague. That's because I struggle to express it with the right terms:

I'm looking for a generic tool to build and then serve:

Accept Incoming request (API contract A) Send outgoing request (API contract B) potentially with parameters from the incoming request Receiving incoming response (API contract B) Do some translations/string manipulation Send outgoing response (API contract A)


mitmproxy (https://mitmproxy.org/) has scripting support that will let you do most of this.

For example, you can expose mitmproxy, listen to HTTP requests for a specific host (using this API: https://docs.mitmproxy.org/stable/api/mitmproxy/http.html), intercept the request, do whatever API calls you need, and inject a response without ever forwarding the request to the original server.

Alternatively, you could modify the request and then change the request destination, like in this example here: https://docs.mitmproxy.org/stable/addons-examples/#http-redi.... Using the WSGI support, you could even use normal Python annotations to build your own API without doing too much pattern matching: https://docs.mitmproxy.org/stable/addons-examples/#wsgi-flas...


Ok. This sounds great for easy developing. But when I'm hosting this I'm not a mitmproxy. I want to act like a normal server/endpoint for API A.


I don't know any libraries for this in any good backend languages, but I've worked with these packages in NodeJS to do something like that:

- https://www.npmjs.com/package/http-proxy

- https://www.npmjs.com/package/connect

- https://www.npmjs.com/package/harmon

If you don't want to act like a proxy, you're going to approach this like a normal web applications that does HTTP requests using whatever HTTP client your framework of choice uses.


Congrats, this is really awsome and i have a use for it right now, it will be really useful for debuging old and undocumented api's


I've always wanted to build something similar to this, by reading HAR files captured right out of the devtools. Have you given any thought to that as an alternative input?


Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.


Is it possible to do this on wireshark/tcpdump pcap dumps? Like for finding out hostnames, endpoints and request packets of HTTPS requests that an android app is making?


The problem with pcap is that whe requests there would be encrypted and basically there is no way to practically decrypt them.

Mitmproxy solves that by being between the client and server and injecting it's own self-signed certificate (which you need to add to the trusted certificates on the phone, which requires root).


See SSLKEYLOGFILE


explain bit more please? Do you mean root is not needed? Isn't that a curl feature?


Various browsers support it to log ssl keys which allows decrypting packet captures without requiring something like mitmproxy.


Really neat! Gives me an idea on using something like this to generate e.g., CURL commands to mimic SSO flows.

Even just documenting an SSO flow as a diagram would be quite neat.


Note that for single resources, Chrome/Edge can do this now. There's a semi-hidden "copy this resource as Curl" option:

https://everything.curl.dev/usingcurl/copyas#:~:text=From%20....

When it works, it's effing magic! Spectacular for very quickly knocking out Bash scripts that test multiple APIs.


Awesome idea! Thank you for creating and sharing!


This is great. Good example too since Airbnb could use with some improvement to the user chrome: include cleaning fees, etc


Starred. Does this work with non-emulated iOS or Android http calls in which you may need to disable app level security?


For Android you'll probably need root access (unless the app developer has opted in to loading your user-imported certificate authorities). For iOS this should be easier.

However, many apps apply cert pinning in production builds, which will require tools like Frida to disable them, which in turn requires root access/a jailbreak to function.

Alternatively, you could pull the apps from your phone without root (at least on Android), patch the most obvious cert pinning out (usually in the network manifest file) and install the new version.


I gave this a try today. It was silky smooth! Is it possible to tell Swagger to omit OPTIONS methods?


This is one of the most clever projects I've seen in a while. Nice work.


This is a great idea. Kudos.


Oh I love this so much! This would help me with scraping certain sites.


How did you bypass cert pinning in the video for the Airbnb app?


I didn't, just added a self-signed cert to my keychain on macOS and launched the app as downloaded from App Store.

I guess Airbnb doesn't use cert pinning.


It doesn't have anything to do with mobile. The web client uses the same APIs.


Be interesting to run a fuzzer on the API whilst doing this.


This is absolutely phenomenal!


This is fantastic. Thank you


This is fantastic!


Super nice! We might integrate something similar in Caido proxy.


lol!

step 2: features for training a language model on the request and response variables in the mitm stream and a shim for standing up a fully ml data driven zero code mock backend.


bravo, I've wanted something like this


very nice !


awesome take




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: