Hacker News new | past | comments | ask | show | jobs | submit | cschiller's comments login

Good call! The timing was actually a coincidence, but not unexpected. OpenAI had already announced their plans to work on a desktop agent, so it was only a matter of time.

From our tests, even the latest model snapshots aren't yet reliable enough in positional accuracy. That's why we still rely on augmenting them with specialized object detection models. As foundational models continue to improve, we believe our QA suite - covering test case management, reporting, agent orchestration, and infrastructure - will become more relevant for the end user. Exciting times ahead!


Thank you! Sonnet 3.5 is indeed a powerful model, and we're actually using it. However, even with the latest version, there are still some limitations affecting our specific use case. For instance, the model struggles to accurately recognize semi-overlaid areas, such as popups that block interactions, and it has trouble consistently detecting when UI elements are in a disabled state.

To address these issues, we enhance the models with our own custom logic and specialized models, which helps us achieve more reliable results.

Looking forward, we expect our QA Studio to become even more powerful as we integrate tools like test management, reporting, and infrastructure, especially as models improve. We're excited about the possibilities ahead!


Hi cschiller, I think we can help you with those issues at Waldo. I guess you are using Appium under the hood to get the UI hierarchy. At Waldo we developed a competing (proprietary) engine that solves a lot of Appium problems.

We provide the most accurate view hierarchy for mobile apps (including React Native and Flutter apps), and we do it under 500ms for each view.

I would love to get in touch: at e.de-lansalut [at] tricentis.com

Here is an example of what we are able to do: https://share.waldo.com/7a45b5bd364edbf17c578070ce8bde220240...


Thanks for sharing your experience! Completely agree - there's often a huge gap between the perception that testing is "solved" and the reality of manual QA still being necessary, even for core features. We recently had a call with one of the largest US mobile teams and were surprised to learn they're still doing extensive manual testing because some use cases remain uncovered by traditional tools. It's definitely not as "solved" as many might think.

Thanks for your thoughtful response! Agree that digging into the root cause of a failure, especially in complex microservice setups, can be incredibly time-consuming.

Regarding writing robust e2e tests, I think it really depends on the team's experience and the organization’s setup. We’ve found that in some organizations—particularly those with large, fast-moving engineering teams—test creation and maintenance can still be a bottleneck due to the flakiness of their e2e tests.

For example, we’ve seen an e-commerce team with 150+ mobile engineers struggle to keep their functional tests up-to-date while the company was running copy and marketing experiments. Another team in the food delivery space faced issues where unrelated changes in webviews caused their e2e tests to fail, making it impossible to run tests in a production-like system.

Our goal is to help free up that time so that teams can focus on solving bigger challenges, like the debugging problems you’ve mentioned.



One of our customers recently compared GPTD with Maestro’s Robin (formerly App Quality CoPilot). Their mobile platform engineering manager highlighted three key reasons for choosing us: lack of frustration, ease of implementation, and reliability.

To be more concrete their words were: - “What you define, you can tweak, touch the detail, and customize, saving you time.” - “You don’t entirely rely on AI. You stay involved, avoiding misinterpretations by AI.” - “Flexibility to refine, by using templates and triggering partial tests, features that come from real-world experience. This speeds up the process significantly.”

Our understanding is that because we launched the first version of GPT Driver in April 2023, we’ve built it in an “AI-native” way, while other tools are simply adding AI-based features on top. We worked closely with leading mobile teams, including Duolingo, to ensure we stay as aligned as possible with real-world challenges.

While our focus is on mobile, GPT Driver also works effectively on web platforms.


I agree that it can seem counterintuitive at first to apply LLM solutions to testing. However, in end-to-end testing, we’ve found that introducing a level of flexibility can actually be beneficial.

Take, for example, scenarios involving social logins or payments where external webviews are opened. These often trigger cookie consent forms or other unexpected elements, which the app developer has limited control over. The complexity increases when these elements have unstable identifiers or frequently changing attributes. In such cases, even though the core functionality (e.g., logging in) works as expected, traditional test automation often fails, requiring constant maintenance.

The key, as to other comments, is ensuring the solution is good at distinguishing between meaningful test issues and non issues.


I would say that mobile apps are still the primary format for launching new consumer services, incl. new apps like ChatGPT and many others. However we’ve observed that teams are expected to do more with less—delivering high-quality products while ensuring compliance, often with the same or even smaller team sizes. This is why we focus on minimizing the engineering burden, particularly when it comes to repetitive tasks like regression testing, which can be especially painful to maintain in the mobile ecosystem due to use of third-party integrations (authentication, payments, etc.).

> mobile apps are still the primary format for launching new consumer services, incl. new apps like ChatGPT and many others

OpenAI launched ChatGPT to the public on the web first and it took like, several months I think from I used their public web version until they had an official app for it in App Store. In the meantime, some third party apps popped up in App Store for using ChatGPT. I kept using the web version until the official app showed up. And probably having the mobile app in App Store has helped them grow to the number of users they have now. But IMO, ChatGPT as a product was not itself “launched” on App Store and they seemed to do very well in terms of adoption even when initially they only had the web version. The main point, that mobile apps are still desired, I agree with though.


Yes, great point! We have an 'Assistant' feature where you can perform the flow on the device, and we automatically generate the test case as you navigate the app. As you mentioned, it’s a great starting point to quickly automate the functional flow. Afterwards, you can add more detailed assertions as needed. Technically we do this by using both the UI hierarchy from the app as well as vision models to generate the test prompt.

If you are interested in getting weekly recipe ideas tailored to your preferences, easy to follow step by step cooking instructions, and a handy shopping list export / cart transfer to Amazon Fresh/Whole Foods Market feel free to check out Kitchenful (https://www.producthunt.com/products/kitchenful#kitchenful).

It's an app that I have built with my team over the last few months to streamline the cooking at home process.


So, I did this personal project a couple of years ago where I downloaded a big recipe database and then did a bunch of parsing / cleaning code to reduce each recipe down into a list of ingredients... Then I did an analysis of the ingredients to determine which were the most effective ingredients (frequently used in recipes) as it occurred to me that that this could be used to create a shopping list that enabled the most home cooking. The resulting list was ingredient list was fascinating... Salt & Oil were right at the top. Anyway, you might want to steer peoples shopping in the direction that enables the most meals for them.


That sounds very cool! Did you share this list by any chance? I'm interested to stock up on some "efficient" products for the times that I'm not meal-planning but want to quickly whip something together.


I'm a happy user of Kitchenful and it has definitely saved me a lot of time when meal-planning. The recipes are tasty too.

I signed up just when they were still in the "e-mailing every client personally phase", and it's been a joy to talk to Christian and see their product improve.


Thank you for it being so well said!

The beauty of it is that everyone is likely to face or notice problems in their own life which offer the chance to be solved. And usually one is not the only one facing it :)

Hopefully this inspires someone to solve a problem which nags them and which they can get passionate about.

We are excited about the opportunity to work on this problem. And are grateful for the constructive feedback we are receiving from the HN community.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: