Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: OpenAPI DevTools – Chrome extension that generates an API spec (github.com/andrewwalsh)
811 points by mrmagoo2 on Oct 25, 2023 | hide | past | favorite | 102 comments
Effortlessly discover API behaviour with a Chrome extension that automatically generates OpenAPI specifications in real time for any app or website.



I wish this would document the auth headers.

What would be particularly useful is if it saved token values and then (through search) joined them on the response of the auth call to get the initial token.

That way you could easily determine what auth call was needed to get you a token to use the endpoint.


Great suggestion, I will look into this.


I have used llama to figure out the duplications / pathparamters :)


You… you built the thing I’ve been spending 3 days looking for… oh man


This is super cool. Writing code to drop into the JavaScript console lets you do insane things. I’ve found great success using ChatGPT to help me write the code, which I then just cut and paste into the console. Asking it to “make it all run in parallel using async/await” will massively speed up execution of serial tasks.

For instance, I had GPT help me write browser JS that groks literally thousands of IP addresses in an open security tool that shall not be named. I can vacuum much of their entire database in seconds by making hundreds of async calls. While they do have bot protection on the website, they appear to have no protection at all on their browser APIs once the user has been given a cookie… I suspect this is common.


This is about OpenAPI (Swagger), not OpenAI (ChatGPT).


This is an extension for Chrome, not Safari.


I know.


Love it!

I used https://vite-plugin-web-extension.aklinker1.io/guide/ before to have cross browser extension support. If you don't mind I could take a look to add firefox support (no guarantee)


As a follow up, the algorithm that powers this makes use of the chrome.devtools.network API. Specifically it passes the Request object that is in the HAR 1.2 archive format.

So if you can pass the equivalent of that in Firefox/other browsers to the insert method and switch things up a bit, it should be relatively straightforward. I will think about pulling out the core logic into its own lib.

https://developer.chrome.com/docs/extensions/reference/devto...

https://developer.chrome.com/docs/extensions/reference/devto...

https://github.com/AndrewWalsh/openapi-devtools/blob/main/sr...


Indeed I have issue here. Firefox maintain a library for unified extension API https://github.com/mozilla/webextension-polyfill

Their type definition for HAR request isn't exported https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...

So I can't drop in replace the type on https://github.com/AndrewWalsh/openapi-devtools/blob/main/sr...


Also the polyfill has a promise based API rather than a callback, which I don't yet know if there is a workaround


Hey absolutely please do, thank you!


This would be excellent


There's also Plasmo which provides some abstractions over the browsers: https://github.com/PlasmoHQ/plasmo


Are devtools extensions/panels standardized?


Do you mean the devtool protocol[1]? I didn't follow the space so have no knowledge on it. On the other hand there seem to be a polyfilled API on chrome.devtools.network.Request which OP's extension uses extensively https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...

[1] https://chromedevtools.github.io/devtools-protocol/


I'd love to see FF support on this.


My most common use case here is to then want to hit the API from python, and adjust the params / url etc.

Would love a "copy to python requests" button that

grabs the headers

generates a boilerplate python snippet including the headers and the URL:

    import requests
    import json

    url = '<endpoint>'

    headers = {
        'User-Agent': 'Mozilla/5.0 ...',
        ...
    }

    data = {
        "page": 5,
        "size": 28
        ...
    }

    response = requests.post(url, headers=headers, data=json.dumps(data))

    if response.status_code == 200:
        print(response.json())
    else:
        print(f"Error {response.status_code}: {response.text}")


Steps to do so:

- open the network console

- right click on the request

- click "copy as curl"

- visit https://curlconverter.com/ to convert to Python/Node/any language


Also available as a VSCode extension that automatically matches the pasted content to the programming language used in the current file: https://marketplace.visualstudio.com/items?itemName=curlconv...


I made a fork of the Chrome DevTools that adds "Copy as Python" to the right click menu of each request in the Network tab. You can tell Chrome to use a different version of the DevTools if you start it from the command line

https://github.com/curlconverter/curlconverter/issues/64#iss...


Thank you for this. I didn’t know curlconverter existed.


This is my current workflow, though with ChatGPT.

I was just trying to save a few clicks


You made your request sound important to implement when you already have a workaround that doesn't take very much time...

This is why feature bloat is a thing


I was to say this lol


You could take the OpenAPI json generated from this project and feed it to https://docs.scalar.com/swagger-editor which generates boilerplate in several formats, including Python


1. You should almost always use requests.Session() instead of requests. It's faster, and can make the code shorter.

2. requests can dump to JSON for you by using json=, so you don't need a separate module. It'll even set the content-type header to application/json for you.

  import requests
  
  url = '<endpoint>'
  
  headers = {
      'User-Agent': 'Mozilla/5.0 ...',
      ...
  }
  
  session = requests.Session()
  session.headers.update(headers)
 
  data = {
      "page": 5,
      "size": 28
      ...
  }
  
  response = session.post(url, json=data)
  
  if response.status_code == 200:
      print(response.json())
  else:
      print(f"Error {response.status_code}: {response.text}")


SeleniumIDE can record and save browser test cases to Python: https://github.com/SeleniumHQ/selenium-ide

awesome-test-automation/python-test-automation.md lists a number of ways to wrap selenium/webdriver and also playwright: https://github.com/atinfo/awesome-test-automation/blob/maste...

vcr.py, playback, and rr do [HTTP,] test recording and playback. httprunner can record and replay HAR. DevTools can save http requests and responses to HAR files.

awesome-web-archiving lists a number of tools that work with WARC; but only har2warc: https://github.com/iipc/awesome-web-archiving/blob/main/READ...


You could potentially go one step further and make Python classes that wrap the whole API automatically from the OpenAPI file: https://github.com/mom1/apiclient-pydantic-generator


It seems like you could combine this extension with some of the OpenAPI -> Python projects to get your desired result. (e.g. https://github.com/wy-z/requests-openapi )


wow what a perfect service to steal session cookies


This reminds me a lot of:

https://github.com/alufers/mitmproxy2swagger

However, having the capability delivered in a browser extension is extremely handy!


this comment section is a goldmine :)

Thanks for sharing this, I suspect this is going to be super useful for my work


Nice this made me go back and check up on the Gorilla LLM project [1] to see whats they are doing with API and if they have applied their fine tuning to any of the newer foundation models but looks like things have slowed down since they launched (?) or maybe development is happening elsewhere on some invisible discord channel but I hope the intersection of API calling and LLM as a logic processing function keep getting focus it's an important direction for interop across the web.

[1] https://github.com/ShishirPatil/gorilla


I open sourced this tool that takes OpenAPI spec and let you control API using natural language https://github.com/mquan/api2ai

Let me know if you have any questions or feature request


How is this different from what LangChain already offers with their OpenAPI chain?

https://python.langchain.com/docs/use_cases/apis


afaik, the langchain solution loads entire openAPI spec which consumes a lot of token and won't work for many large API. For efficient token usage, api2ai divides the task into two steps: api planning and params parsing. First step takes a summarization of all the endpoints. Once the endpoint is known, we parse params using the schema of the selected endpoint.


There's a similar, more powerful tool if you're into this

https://www.akitasoftware.com/



https://www.useoptic.com/ is another one, which is a little more tailored to building & updating OpenAPI specs. Works well on live traffic and/or tests.


Crikey, if you hadn't directly connected this as similar, I would have no idea what their product would even vaguely do from that landing page.


Agree.

> Akita makes monitoring and observing system behavior accessible for every developer. Quickly discover all your endpoints, see which are slowest, and learn which have errors

Translation: Install a Docker extension that intercepts and inspects your network requests to infer the shape of your API.

I feel like when you're targeting developers, you should quickly explain what it is you actually do.


Companies in general should do this, not just ones targeting developers! Instead they have a bunch of vague marketing copy that means nothing. It's a pet peeve.

My favorite is when they think they're keeping it short and to the point, with no bull. So, they'll have a hero section with copy like "Sharpen capacity. Scale across segments. Nuff said." No, not enough said, say more!


> Companies in general should do this, not just ones targeting developers! Instead they have a bunch of vague marketing copy that means nothing. It's a pet peeve.

This seems to appeal to purchasing teams. When you write what the app actually does suddenly it’s technical and the team doesn’t understand what is written any more.


The important distinction that this is entirely client-side, while Akita requires an agent running server-side.


There's actually a whole lot of them! Keploy comes to mind and Pixie (eBPF-based)


This is amazing! to figure out the website apis has always been a huge pita. With our dlt library project we can turn the openapi spec into pipelines and have the data pushed somewhere https://www.loom.com/share/2806b873ba1c4e0ea382eb3b4fbaf808?...


This is awesome!

I'll second/third the feature request for auto-including auth headers/calls (as many of the sites I'm trying to understand/use APIs from use persistent keys, and scraping these separately is just unnecessary extra time).

On that same note, I'd greatly appreciate keeping the initial request as a "sample request" within the spec.

I'd also greatly appreciate an option to attempt to automatically scrape for required fields (e.g. try removing each query variable one at a time, look for errors, document them).

Thanks for this :)


This is a first step into turning the entire web into an API albeit before we hit the login/signup roadblocks (but then that's where agents come in)


That's used to be called "the semantic web".

Dreams never die and what is old will be new again.


Great project! These features come to mind that would be great additions:

1. Ability to filter response properties.

2. Ability to work with non-JSON (web scraping) by defining a mapping of CSS selectors to response properties.

3. Cross-reference host names of captured requests with publicly documented APIs.

4. If auth headers are found, prompt user for credentials that can then be stored locally.

5. "Repeater" similarly found in Burp Suite.

6. Generate clients on the fly based on the generated OpenAPI spec.


- Allow using it as a library instead of just a browser extension which would in turn allow:

- Integration with some kind of web crawler to allow automatically walking a web site and extract a database of specifications

Edit: Hmm, it seems that genson-js[1] was used to merge schemas.

1 - https://www.npmjs.com/package/genson-js


Genson-js is used to merge JSON Schema objects. Essentially there are 5 schemas that we care about in each request - request bodies, request headers, response bodies, response headers, and query parameters. Each endpoint (which may or may not be parameterised) has only one schema for each of these values.

The idea for a crawler is a good one. The core logic that handles spec generation is decoupled from everything else, so it can be extracted into its own library.

But there are approaches that exist for this already, such as har-to-openapi.

https://github.com/jonluca/har-to-openapi


Interesting! Thanks! Awesome project :)


7. Train a machine learning model to recognize and extract tabular and repeated data based on training data.

8. Optionally publish generated OpenAPI specs to a central site or open PR to a GH repo, "awesome-openapi-devtools"?


Some great ideas here, thank you. I do want to keep it small and focused so I'll forego complex functionality like the Repeater, but you've raised some common pain points I'll tackle.


Very nice! Auto generating type information from looking at permutations of values is hard though. Q: Does this handle optional values? Also, being able to mark string field as "enums" and then collecting the possible values instead of just typing it as "string" would be mega handy.


It doesn't have any way of determining which values are optional, so it doesn't make that distinction. Hear you on the enums, I'll take another look at what's possible without adding overhead.


Amazing. I’ve often wished this would exist. Thank you.

It was always my step 1 towards Xxx. Keen to know what directions you were thinking?

I’d love to see more remixing on top of API’s websites typically only expose for their own use.


For sure, there are a few tools out there like Requestly to change API behaviour, but it's a frustrating experience. In terms of the direction, planning to keep this simple so I've no plans for additional features.


Thanks for sharing Chrome extension @mrmagoo2.

It's amazing to see a tool that simplifies the process of generating OpenAPI spec. this is the best showHN this year.


Agreed! What would be more awesome though is if it could generate OpenAPI spec from existing HAR files


It could do as it works with the HAR 1.2 format. There is another library that can do this. It isn't suitable for the FE as it uses QuickType & pulls in a ton of dependencies, but it is far more configurable.

https://github.com/jonluca/har-to-openapi


This looks very useful, but what do I do with the discovered data?

Suppose I have a site that runs a search that I want to be able to automate. However, instead of sending the search term in the URL, it updates live (presumably via some API call).

Now suppose I need a one-click solution to be able to open that page and run a specific search.

Is there another Chrome plugin that would allow me to use this API data to make that happen?


Had in mind to build something like this for quite some time to quickly explore undocumented APIs - looking forward to see your progress!


Thank you!


Okay, this is wonderful. Love it already!!

Sometimes I click on a path parameter and it doesn't "create" it, even though there are several other examples in the list. Not sure if it's a bug, or something I'm doing wrong.

Overall, this is an absolutely wonderful tool and I've wanted something like this for a long time. Incredibly useful, thank you!!


That sound like a bug, I need to test that feature more thoroughly. Thanks for reporting.


Damn I literally built a really similar tool myself using HAR files just a couple weeks ago! Yours is way more polished though, nice work.

I have a lot of ideas in this space (some PoCs), and I've been starting to scope out a company around them. Would love to chat to see if there's any shared opportunity for both of us!


The problem with this type of tools is that they only produce specs based on infos they can get.

The spec produced will be incomplete (missing paths, methods, response variants, statuses). For that you should use a framework like Fastify, NestJS, tsoa, FastAPI, which have built-in OpenAPI support.

Can be very valuable for reverse-engineering though :)


Really cool, we're using a similar technique at Kadoa to auto-generate scrapers for any website. Analyzing network calls to find the desired data in API responses is one of the frist things we do before starting to process the DOM.


Cool! Can you add autocomplete of paths to URLs based on the spec now?

so I can be typing in the URL bar for any website I have landed on in the past and tab through all the available routes?

e.g.

- news.ycombinator.com_

- news.ycombinator.com/new

- news.ycombinator.com/submit

- news.ycombinator.com/show

etc.


A Firefox version of this would be super handy! Does that already exist?


The description doesn't explain exactly what this extension does.

I assume it monitors all XHR requests as you browse a website, and if the request/response matches [some criteria (e.g. is JSON?)] it will assume it's an API request and log it?

Is that correct?

If so, it will only work on websites where the frontend is implemented like a PWA, with lots of AJAX calls to fetch data, etc. For sites whose pages are all generated server-side, the extension won't generate any API schema, right?

Edit: Also how does it differentiate "API requests" with regular AJAX content fetching? If a website fetches some arbitrary content via an AJAX request (e.g. some lazy-loaded HTML), that's not an API request. That's just part of a website's layout.


How would it, there isn't any API in the first place with classic websites. Your could maybe consider the urlencoded post requests an API, but then the reply is another html website so how do you formally specify the reply format? "The queried data is somewhere right below the third <h3> except when there's a new message for the logged in user, then it's the fourth one"


Obviously - a browser extension can only monitor API calls from the browser.


Not obviously; all it says is it generates a schema "while using" a website.

"Using" could mean navigating between pages, submitting data via forms, etc.


Worse, for something like SvelteKit load functions, this will think there's a "real API" where what's actually there is an internal detail and will change often.

https://kit.svelte.dev/docs/load


It does generate schema.

> Instantly generate an OpenAPI 3.1 specification for any website or application just by using it


Yeah but my question remains: by what criteria is a request classed as an "API request"? Websites make tons of XHR requests and not all of them are API requests.

I want to know what this extension does that's different than me looking at the browser's Dev Tools > Network tab.


The criteria can be found below. There are no hard and fast rules, but the goal is to only include requests that you might otherwise find in a spec.

https://github.com/AndrewWalsh/openapi-devtools/blob/main/sr...


We at Step CI have a similar tool, that acts as a proxy and can generate OpenAPI spec for the request/response pairs.

(You can also use it to generate automated tests)

If you're interested: mish@stepci.com


The documentation states 'automatically populate based on JSON requests that fire as you browse the web' so does this mean that gRPC protobuf are not captured?


Anything that isn't a JSON request is specifically ignored.


I saw your sibling comment about "keeping it simple," however that is a bit counter to "generates OpenAPI specifications" since those for sure are not limited to just application/json request/response bodies

I wanted to draw your attention to "normal" POST application/x-www-form-urlencoded <https://github.com/OAI/OpenAPI-Specification/blob/3.1.0/vers...> and its multipart/form-data friend <https://github.com/OAI/OpenAPI-Specification/blob/3.1.0/vers...>

The latter is likely problematic, but the former is in wide use still, including, strangely enough, the AWS API, although some of their newer services do have an application/json protocol

I know that's a lot of words, but the tl;dr would be that if you want your extension to be application/json only, then changing the description to say "OpenAPI specifications for application/json handshakes" would help the consumer be on the same page with your goals


You raise a good point and it would be great to account for this. I will take a look at this. Excellent suggestion.


Does gRPC /ProtoBuf have support for web browsers? I don't know of a situation where I'd encounter those in a web application.



This looks super interesting. Works for anything? Damn.


Is there a way to filter out headers?

The result contains headers like content-length and similar.

Also it would be nice if it could factor out common schemas.


Awesome! Any chance of a Safari extension too?


Would be cool if this shared the user found specs to create a database of API specs for the web


this is very cool! I just tried using it, unfortunately, my NextJS app dir project makes most requests from the server side, so it was only capturing "posts" made from the client. Is there a way to run it from the server?


I'm sure many developers wished at some point such magic would exist


Would love this for apps.


This looks super useful, can’t wait to try it at work tomorrow!


Care to share what would it be useful for?

I mean, and I'm asking as a backend dev, if you have to integrate with some API, you use the provided docs/swagger ui.

Why/when would you care to rely on an API integration when it's interface is not publicly shared?


Because this is for these edge cases where docs are crap and Swagger spec is non-existent.


Reverse engineering? You just described it


This could be useful for learning from any site you admire.


looks really cool! congrats!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: