I can totally see the value of agent driven flows for automating flows that are highly dynamic, poorly specified, error prone, zero shot environments, etc, but that doesn't seem to be at all what you are demonstrating here. Maybe your demos could show something more "challenging" to automate?
As someone who has spent a LOT of my time in my career working on browser automation and testing, speed and cost was always key. Even with existing programmatic tools like selenium, playwright, cypress, etc speed and headfull hosting costs were already big issues. This seems orders of magnitude slower and more expensive. Curious how you pitch this to potential customers.
We've generally seen the "easy" flows not actually be easy. Workflows that have complex branching logic (shown when filling out the Aetna form in the first example in the video), structured scraping (the second example we showed in the video), and login/2FA/intelligent multi-agent scraping (shown in the last example in the video), are all things that are difficult to impossible to do with traditional automation libraries.
got it, I only looked at the website not the youtube video you posted above, my apologies. On the website, neither the billing platform demo nor the screenshots in the section below convey this value prop very well. Both sections show what appear to be trivial flows without explanation of some of the underlying complexities.
I suppose if you are hitting your target demographic dead-on with your marketing efforts, the value prop should be completely obvious to them, but still could be more explicit in your differentiation.
replying to myself here... I would be interested to see a more hybrid approach where an AI could step in to help retry / get past failures, or as a way of re-recording automation steps for a flow when something changes, but having AI in the loop for every action all the time feels wasteful at best.
Great! I see that further down in the website, which I did not see before posting this comment. I think this could be valuable to demonstrate / communicate in the billing platform demo which is the first thing you see, and is what captured all of my attention (i never even scrolled down).
Edit: I just re-ran the demo and it seemed way faster this time??? the first time it said GOAL: PRESS_ENTER... (agent proceeds to think about it for 5-8 seconds) which seemed hilarious to me.
I want to believe this, and I think I still do believe this... What makes me waver in my position was an interview I gave to an engineer who had previously worked on pedestrian safety simulations at Waymo and had quit over ethical concerns. He wouldn't go into details obviously, but it did make me think... This was in ~2019 or 2020 though when they were still early in their development compared to now.
What is the output format here? an iframe? an SDK i can integrate into my webapp? a whitelabeled URL? a non-whitelabeled URL? where is your documentation?
What model(s) / providers are you using? Are you training on the data that the agent gets access to? Seems like there are some data governance and privacy red flags for anything involving remotely sensitive data...
we're using OpenAI's API for business. they don't train on data sent to the business api, unlike the consumer tier
this is still an early beta, so at the moment everything is only available with OpenAI's API. however, for people who want to use it in a higher security environment, we'll support switching OpenAI with any hosted model API including on-premise or models held in private VPCs. that way people can manage their data with no exfiltration to a third party
unfortunately, the techniques you are trying in order to get access to a dormant Github account are EXACTLY the same ones that github gets spammed with every day by bad actors attempting supply chain attacks. You don't have anything that proves your identity any more than any rando on the internet in Github's eyes at least. Everything you have presented here may be convincing enough to me, but probably not to GitHub's opsec policies.
Also suppose you Facebook account was compromised, that bad, sad for the person affected. May cause some media attention if the person was famous.
But if the right GitHub account is compromised, we could see massive supply chain issues. Or a big important web service with millions of users affected.
The downside of making a wrong call here is just really really big.
There are real businesses being deployed from GitHub.
This article does a good job of comparing functionality between codex and claude, but I see very little discussion here or elsewhere about the actual UX of the CLI tools. Codex is absolute garbage when it comes to the look, feel, and overall polish of the CLI experience (no syntax highlighting, no proper diff displays, no vim mode, poor visual differentiation of user vs agent messages, etc). Claude is a tiny bit better. However, both fall flat on their face compared to some open source agentic TUIs like Opencode, Crush, etc.
I was just focusing on a few specific things in the Claude Code 2 release notes, namely they added two features /rewind and /usage, and I was disappointed in both. It's also probably a little too long as is. And yeah I've heard a lot of people complain about the Codex CLI experience though in a previous post I mentioned many Redditors like this repo to improve it:
I have tried just about every third party CLI / TUI and I personally like Opencode the most. It has the best UX and the fact it natively integrates LSP for the agent to interact with is excellent. It is limited to models available via API, so for example it couldn't use codex at launch.
In general, a good rule of thumb is only code "clean" enough so that you / your team / someone else can figure out what the hell you were doing at that particular area of the source code
As someone who has spent a LOT of my time in my career working on browser automation and testing, speed and cost was always key. Even with existing programmatic tools like selenium, playwright, cypress, etc speed and headfull hosting costs were already big issues. This seems orders of magnitude slower and more expensive. Curious how you pitch this to potential customers.