What would be particularly useful is if it saved token values and then (through search) joined them on the response of the auth call to get the initial token.
That way you could easily determine what auth call was needed to get you a token to use the endpoint.
This is super cool. Writing code to drop into the JavaScript console lets you do insane things. I’ve found great success using ChatGPT to help me write the code, which I then just cut and paste into the console. Asking it to “make it all run in parallel using async/await” will massively speed up execution of serial tasks.
For instance, I had GPT help me write browser JS that groks literally thousands of IP addresses in an open security tool that shall not be named. I can vacuum much of their entire database in seconds by making hundreds of async calls. While they do have bot protection on the website, they appear to have no protection at all on their browser APIs once the user has been given a cookie… I suspect this is common.
As a follow up, the algorithm that powers this makes use of the chrome.devtools.network API. Specifically it passes the Request object that is in the HAR 1.2 archive format.
So if you can pass the equivalent of that in Firefox/other browsers to the insert method and switch things up a bit, it should be relatively straightforward. I will think about pulling out the core logic into its own lib.
Do you mean the devtool protocol[1]? I didn't follow the space so have no knowledge on it. On the other hand there seem to be a polyfilled API on chrome.devtools.network.Request which OP's extension uses extensively https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...
I made a fork of the Chrome DevTools that adds "Copy as Python" to the right click menu of each request in the Network tab. You can tell Chrome to use a different version of the DevTools if you start it from the command line
You could take the OpenAPI json generated from this project and feed it to https://docs.scalar.com/swagger-editor which generates boilerplate in several formats, including Python
1. You should almost always use requests.Session() instead of requests. It's faster, and can make the code shorter.
2. requests can dump to JSON for you by using json=, so you don't need a separate module. It'll even set the content-type header to application/json for you.
vcr.py, playback, and rr do [HTTP,] test recording and playback. httprunner can record and replay HAR. DevTools can save http requests and responses to HAR files.
It seems like you could combine this extension with some of the OpenAPI -> Python projects to get your desired result. (e.g. https://github.com/wy-z/requests-openapi )
Nice this made me go back and check up on the Gorilla LLM project [1] to see whats they are doing with API and if they have applied their fine tuning to any of the newer foundation models but looks like things have slowed down since they launched (?) or maybe development is happening elsewhere on some invisible discord channel but I hope the intersection of API calling and LLM as a logic processing function keep getting focus it's an important direction for interop across the web.
afaik, the langchain solution loads entire openAPI spec which consumes a lot of token and won't work for many large API. For efficient token usage, api2ai divides the task into two steps: api planning and params parsing. First step takes a summarization of all the endpoints. Once the endpoint is known, we parse params using the schema of the selected endpoint.
https://www.useoptic.com/ is another one, which is a little more tailored to building & updating OpenAPI specs. Works well on live traffic and/or tests.
> Akita makes monitoring and observing system behavior accessible for every developer. Quickly discover all your endpoints, see which are slowest, and learn which have errors
Translation: Install a Docker extension that intercepts and inspects your network requests to infer the shape of your API.
I feel like when you're targeting developers, you should quickly explain what it is you actually do.
Companies in general should do this, not just ones targeting developers! Instead they have a bunch of vague marketing copy that means nothing. It's a pet peeve.
My favorite is when they think they're keeping it short and to the point, with no bull. So, they'll have a hero section with copy like "Sharpen capacity. Scale across segments. Nuff said." No, not enough said, say more!
> Companies in general should do this, not just ones targeting developers! Instead they have a bunch of vague marketing copy that means nothing. It's a pet peeve.
This seems to appeal to purchasing teams. When you write what the app actually does suddenly it’s technical and the team doesn’t understand what is written any more.
I'll second/third the feature request for auto-including auth headers/calls (as many of the sites I'm trying to understand/use APIs from use persistent keys, and scraping these separately is just unnecessary extra time).
On that same note, I'd greatly appreciate keeping the initial request as a "sample request" within the spec.
I'd also greatly appreciate an option to attempt to automatically scrape for required fields (e.g. try removing each query variable one at a time, look for errors, document them).
Genson-js is used to merge JSON Schema objects. Essentially there are 5 schemas that we care about in each request - request bodies, request headers, response bodies, response headers, and query parameters. Each endpoint (which may or may not be parameterised) has only one schema for each of these values.
The idea for a crawler is a good one. The core logic that handles spec generation is decoupled from everything else, so it can be extracted into its own library.
But there are approaches that exist for this already, such as har-to-openapi.
Some great ideas here, thank you. I do want to keep it small and focused so I'll forego complex functionality like the Repeater, but you've raised some common pain points I'll tackle.
Very nice! Auto generating type information from looking at permutations of values is hard though. Q: Does this handle optional values? Also, being able to mark string field as "enums" and then collecting the possible values instead of just typing it as "string" would be mega handy.
It doesn't have any way of determining which values are optional, so it doesn't make that distinction. Hear you on the enums, I'll take another look at what's possible without adding overhead.
For sure, there are a few tools out there like Requestly to change API behaviour, but it's a frustrating experience. In terms of the direction, planning to keep this simple so I've no plans for additional features.
It could do as it works with the HAR 1.2 format. There is another library that can do this. It isn't suitable for the FE as it uses QuickType & pulls in a ton of dependencies, but it is far more configurable.
This looks very useful, but what do I do with the discovered data?
Suppose I have a site that runs a search that I want to be able to automate. However, instead of sending the search term in the URL, it updates live (presumably via some API call).
Now suppose I need a one-click solution to be able to open that page and run a specific search.
Is there another Chrome plugin that would allow me to use this API data to make that happen?
Sometimes I click on a path parameter and it doesn't "create" it, even though there are several other examples in the list. Not sure if it's a bug, or something I'm doing wrong.
Overall, this is an absolutely wonderful tool and I've wanted something like this for a long time. Incredibly useful, thank you!!
Damn I literally built a really similar tool myself using HAR files just a couple weeks ago! Yours is way more polished though, nice work.
I have a lot of ideas in this space (some PoCs), and I've been starting to scope out a company around them. Would love to chat to see if there's any shared opportunity for both of us!
The problem with this type of tools is that they only produce specs based on infos they can get.
The spec produced will be incomplete (missing paths, methods, response variants, statuses). For that you should use a framework like Fastify, NestJS, tsoa, FastAPI, which have built-in OpenAPI support.
Can be very valuable for reverse-engineering though :)
Really cool, we're using a similar technique at Kadoa to auto-generate scrapers for any website. Analyzing network calls to find the desired data in API responses is one of the frist things we do before starting to process the DOM.
The description doesn't explain exactly what this extension does.
I assume it monitors all XHR requests as you browse a website, and if the request/response matches [some criteria (e.g. is JSON?)] it will assume it's an API request and log it?
Is that correct?
If so, it will only work on websites where the frontend is implemented like a PWA, with lots of AJAX calls to fetch data, etc. For sites whose pages are all generated server-side, the extension won't generate any API schema, right?
Edit: Also how does it differentiate "API requests" with regular AJAX content fetching? If a website fetches some arbitrary content via an AJAX request (e.g. some lazy-loaded HTML), that's not an API request. That's just part of a website's layout.
How would it, there isn't any API in the first place with classic websites. Your could maybe consider the urlencoded post requests an API, but then the reply is another html website so how do you formally specify the reply format? "The queried data is somewhere right below the third <h3> except when there's a new message for the logged in user, then it's the fourth one"
Worse, for something like SvelteKit load functions, this will think there's a "real API" where what's actually there is an internal detail and will change often.
Yeah but my question remains: by what criteria is a request classed as an "API request"? Websites make tons of XHR requests and not all of them are API requests.
I want to know what this extension does that's different than me looking at the browser's Dev Tools > Network tab.
The documentation states 'automatically populate based on JSON requests that fire as you browse the web' so does this mean that gRPC protobuf are not captured?
I saw your sibling comment about "keeping it simple," however that is a bit counter to "generates OpenAPI specifications" since those for sure are not limited to just application/json request/response bodies
The latter is likely problematic, but the former is in wide use still, including, strangely enough, the AWS API, although some of their newer services do have an application/json protocol
I know that's a lot of words, but the tl;dr would be that if you want your extension to be application/json only, then changing the description to say "OpenAPI specifications for application/json handshakes" would help the consumer be on the same page with your goals
this is very cool!
I just tried using it, unfortunately, my NextJS app dir project makes most requests from the server side, so it was only capturing "posts" made from the client. Is there a way to run it from the server?
What would be particularly useful is if it saved token values and then (through search) joined them on the response of the auth call to get the initial token.
That way you could easily determine what auth call was needed to get you a token to use the endpoint.