Hacker Newsnew | past | comments | ask | show | jobs | submit | arjunchint's commentslogin

We execute the code in a sandbox and proxy the fetch calls through main world!

We found Gemini Flash to be the sweet spot for both agentic actions as well as writing code. Even Flash-Lite is too hit or miss.

We are thinking through on self healing mechanisms like falling back to a live web agent and rewriting script.


The bigger goal is to build and maintain a global library of popular automations. Users can also quickly re-record and recreate the scripts to update.

Since it runs inside your own browser, there should be no captchas or challenges. On failure it can fallback to our regular web agent that can solve captchas.

Big picture wise with the launch of Mythos it might just become impossible for websites to keep up, and they will have to go like Salesforce and just expose APIs for everything.


Hey Muchael, we had similar thoughts at Retriever AI of moving from runtime agentic inference to writing scripts combining webpage interactions and reverse engineered site APIs.

Compared to your our approach, we are doing this entirely within a browser extension so meeting users where they already doing their existing work.

Within the extension just record doing a task, we reverse engineer the APIs and write a script. Then execute the script from within the webpage so that auth/headers/tokens get automatically added.

You can just prompt to supply parameters and reuse the script at zero token cost.

Use cases we were targetting is like Instagram DMs or LinkedIn connection requests but it should also work for your healthcare use case!

Deeper dive: https://www.rtrvr.ai/blog/ai-subroutines-zero-token-determin...


Hey Alex, we had similar thoughts at Retriever AI of moving from webpage interactions to reverse engineering the underlying APIs.

Compared to your our approach, we are doing this entirely within a browser extension so meeting users where they already doing their existing work.

Within the extension just record doing a task, we reverse engineer the APIs and write a script. Then execute the script from within the webpage so that auth/headers/tokens get automatically added.

You can just prompt to reuse the tools at zero token cost.


Interesting. We essentially do the same thing, but with MITM. We have a chrome extension internally, but have found it's a bit of a clunky interface. Might be releasing one soon. The approach with executing script in webpage is interesting. Best of luck!

Yea would love to try out the extension! ITs always interesting to see everyone's design approach.

I really don't want to install an app on to my laptop, especially an MITM, so I think extension would be better.


cool . can you give me the link of the tool

The tool: https://www.rtrvr.ai/

Our recent technical write up on network discovery/ranking/codegen: https://www.rtrvr.ai/blog/ai-subroutines-zero-token-determin...


Really do think that spreadsheets are the most optimal way to coordinate agents.

Each row spins up a parallel agent, columns mapped as input, agent executes and writes new columns as output.

We tried initial implementation of this with rtrvr.ai building out Sheets Workflows, but I can't help but feel that there is a thread we're pulling towards a deeper insight on this


We built out RoverBook as a fun idea from a DeepMind x Vercel hackathon: what if agents can leave comments/notes/reviews on websites and then surface these in a PostHog like analytics dashboard.

- agentic visitors to your site can rate, comment, leave notes on your site

- embed our script tag, it will leave instructions on how to call api for leaving feedback as well as track agentic trajectories and identify failures

- more and more of a websites visitors will be agents, but there's no solution to collect metrics and surface analytics on these agents

Deep dive: https://www.rtrvr.ai/blog/roverbook-posthog-for-ai-agents


Also PageAgent's DOM based understanding is pretty simple and based on top of Browser-Use's approach.

On the other hand we construct our own custom Agent Accessibility Trees to represent webpages to models. This approach leads to twice as good performance in WebBench of 300+ tasks (81% vs 40%)


I appreciate the responses and will be looking deeply into Rover. Thank you.


Every website is just a wrapper around an API: GraphQL mutations, JSON endpoints, paginated XHR. The data layer is cleaner than anything you'd get from DOM parsing.

The hard part of raw HTTP scraping was always (1) finding the endpoints and (2) recreating auth. Your browser already has both. We built Vibe Hacking to let the agent use them.

The agent navigates the page, captures network activity, and generates scripts that replay those API calls at scale. Auth propagates automatically because it runs from inside the page.

We tested it on X, it pulled 2,000+ followed profiles despite the UI capping at 50.

DOM-native, no vision/screenshots, #1 on Halluminate WebBench (81.39%). Chrome extension, Gemini Flash Lite default (500 free req/day). Two ex-Google engineers, bootstrapped, 25K+ users.

Happy to answer questions on architecture or limitations.


I actually tried out PageAgent it was reaaaally slow, and not that accurate.

You can actually try it out on our own site rtrvr.ai


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: