agrnet's comments

agrnet · 2025-10-14T23:18:30 1760483910

could you explain what it means for someone to “have interesting latent spaces”? curious how you’re using that metaphor here

bikeshaving · 2025-10-15T01:35:00 1760492100

I don’t think I’m using it as a metaphor? To “have interesting latent spaces” just means you have access to the actual weights and biases, the artifact produced by fine-tuning/training models, or you can somehow “see” activations as you feed input through the model. This can be turned into interesting 3D visualizations and reveal “latent” connections in the data which often align with and allow us to articulate similarities in the actual phenomena which these “spaces” classify.

Not many people have the privilege of access to these artifacts, or the skill to interpret these abstract, multi-dimensional spaces. I want more of these visualizations, with more spaces which encode different modalities.

https://en.wikipedia.org/wiki/Latent_space

agrnet · 2025-10-10T18:24:57 1760120697

Atleast in my industry (highly regulated), I think it would be better if these agentic e2e tools output playwright code instead of keeping it all under the hood, as no risk averse regulated company will use a QA agent which could be nondeterministic when re running the same test

tarasyarema · 2025-10-10T19:03:39 1760123019

As I mentioned above, a playwright won’t make the cut for many of the serious test cases we’ve seen, you need a whole system that ensures your tests are run and improved immediately. We created this project in a way that supports on-premise deployments, but you’ll need to run the whole engine and eventually use some SLMs/LLMs at different stages.

agrnet · 2025-10-11T00:40:02 1760143202

At the end of the day, is the LLM not just calling Playwright APIs? I’d rather have access to the final set of Playwright API steps that the LLM executed to accomplish a goal, rather than just hoping the LLM will choose the same actions again the second time i run it

tarasyarema · 2025-10-11T10:20:46 1760178046

We use PW for the interaction with the browser, but really how we represent what to do is in a custom format (could be executed in other frameworks too). So the PW we could generate would be a subset, where the more interesting parts (custom functions) are not really implemented in PW.

Also part of our format is specially finding deterministic way of running steps, with automatic healing when failed. And we also build the whole system in a way that is self-hostable, so in the cases you mention you could be able to have control over what is run and where.

agrnet · 2025-09-30T04:03:09 1759204989

agrnet · 2025-05-15T01:18:34 1747271914

This is why I always turn off these settings immediately when I turn on any video game for the first time. I could never put my finger on why I didn’t like it, but the camera analogy is perfect