Hacker Newsnew | past | comments | ask | show | jobs | submit | qafy's commentslogin

I can totally see the value of agent driven flows for automating flows that are highly dynamic, poorly specified, error prone, zero shot environments, etc, but that doesn't seem to be at all what you are demonstrating here. Maybe your demos could show something more "challenging" to automate?

As someone who has spent a LOT of my time in my career working on browser automation and testing, speed and cost was always key. Even with existing programmatic tools like selenium, playwright, cypress, etc speed and headfull hosting costs were already big issues. This seems orders of magnitude slower and more expensive. Curious how you pitch this to potential customers.


We've generally seen the "easy" flows not actually be easy. Workflows that have complex branching logic (shown when filling out the Aetna form in the first example in the video), structured scraping (the second example we showed in the video), and login/2FA/intelligent multi-agent scraping (shown in the last example in the video), are all things that are difficult to impossible to do with traditional automation libraries.

We also have an example of a complex, multi-agent workflow here that might be useful for you to look at: https://www.simplex.sh/blog/context-engineering-for-web-agen...


got it, I only looked at the website not the youtube video you posted above, my apologies. On the website, neither the billing platform demo nor the screenshots in the section below convey this value prop very well. Both sections show what appear to be trivial flows without explanation of some of the underlying complexities.

I suppose if you are hitting your target demographic dead-on with your marketing efforts, the value prop should be completely obvious to them, but still could be more explicit in your differentiation.


replying to myself here... I would be interested to see a more hybrid approach where an AI could step in to help retry / get past failures, or as a way of re-recording automation steps for a flow when something changes, but having AI in the loop for every action all the time feels wasteful at best.


Yep we actually cache flows after first run! This makes flows that are closer to traditional RPA pretty much the same as using Playwright/Puppeteer.


Great! I see that further down in the website, which I did not see before posting this comment. I think this could be valuable to demonstrate / communicate in the billing platform demo which is the first thing you see, and is what captured all of my attention (i never even scrolled down).

Edit: I just re-ran the demo and it seemed way faster this time??? the first time it said GOAL: PRESS_ENTER... (agent proceeds to think about it for 5-8 seconds) which seemed hilarious to me.


Sorry this may be a dumb question, why would you cache a flow?


this is actually hilarious because now they can't call it a fluke or an act of god


Crashing it a second time is, in my books, an act of God. I guess God isn't really that fond of Amazon.


News Media: "Bezos has more money than God"

God: "Hold my staff"


I want to believe this, and I think I still do believe this... What makes me waver in my position was an interview I gave to an engineer who had previously worked on pedestrian safety simulations at Waymo and had quit over ethical concerns. He wouldn't go into details obviously, but it did make me think... This was in ~2019 or 2020 though when they were still early in their development compared to now.


What is the output format here? an iframe? an SDK i can integrate into my webapp? a whitelabeled URL? a non-whitelabeled URL? where is your documentation?


The output is literally a link, like typeform. And that link will render the design you've customized within the app


What model(s) / providers are you using? Are you training on the data that the agent gets access to? Seems like there are some data governance and privacy red flags for anything involving remotely sensitive data...


we're using OpenAI's API for business. they don't train on data sent to the business api, unlike the consumer tier

this is still an early beta, so at the moment everything is only available with OpenAI's API. however, for people who want to use it in a higher security environment, we'll support switching OpenAI with any hosted model API including on-premise or models held in private VPCs. that way people can manage their data with no exfiltration to a third party


unfortunately, the techniques you are trying in order to get access to a dormant Github account are EXACTLY the same ones that github gets spammed with every day by bad actors attempting supply chain attacks. You don't have anything that proves your identity any more than any rando on the internet in Github's eyes at least. Everything you have presented here may be convincing enough to me, but probably not to GitHub's opsec policies.


Also suppose you Facebook account was compromised, that bad, sad for the person affected. May cause some media attention if the person was famous.

But if the right GitHub account is compromised, we could see massive supply chain issues. Or a big important web service with millions of users affected.

The downside of making a wrong call here is just really really big.

There are real businesses being deployed from GitHub.


I'm not even convinced it's the real person. Lost your items, lost your email, changed passwords, criminal records. Sounds like a scam for sure.

No offense, OP, but it seems easier to recover the email if you can prove physical identity.


WSJ is paywalled and also actively blocks archive.org crawls / snapshots, so just FYI 99% of people here can't read this article.


This article does a good job of comparing functionality between codex and claude, but I see very little discussion here or elsewhere about the actual UX of the CLI tools. Codex is absolute garbage when it comes to the look, feel, and overall polish of the CLI experience (no syntax highlighting, no proper diff displays, no vim mode, poor visual differentiation of user vs agent messages, etc). Claude is a tiny bit better. However, both fall flat on their face compared to some open source agentic TUIs like Opencode, Crush, etc.


I was just focusing on a few specific things in the Claude Code 2 release notes, namely they added two features /rewind and /usage, and I was disappointed in both. It's also probably a little too long as is. And yeah I've heard a lot of people complain about the Codex CLI experience though in a previous post I mentioned many Redditors like this repo to improve it:

https://github.com/just-every/code

I haven't tried Crush nor even heard of Opencode so I'll have to check them out, thanks for the feedback.


I have tried just about every third party CLI / TUI and I personally like Opencode the most. It has the best UX and the fact it natively integrates LSP for the agent to interact with is excellent. It is limited to models available via API, so for example it couldn't use codex at launch.


Yeah I am curious what the actual resolution of these videos will be. The launch videos on this link will only play in like 360p for me.


Optimization hinders evolution. - Alan Perlis

Write that garbage code as long as it works. PMF doesn't give a shit about your code quality.


In general, a good rule of thumb is only code "clean" enough so that you / your team / someone else can figure out what the hell you were doing at that particular area of the source code


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: