Hacker News new | past | comments | ask | show | jobs | submit | cjonas's comments login

How do you get by when every major sites starts blocking headless browsers? A good example right now is Zillow, but I foresee a world where big chunks of the internet are behind captcha and bot detection

That's not really a problem for Stagehand. It's a problem for Selenium, Playwright, Puppeteer and others at the browser automation library level.

it's not really a problem for Playwrite, because Playwrite is really intended to be run by the owners of the website, not as a webscraper.

It may become a real problem for the usefulness of this style of LLM driven browsing.


Playwright does have some docs on scraping, which makes it seem like they do want to support that as a use case https://playwright.dev/docs/docker#crawling-and-scraping. I'm unfamiliar with that though and wouldn't be able to speak to addressing issues with scraping

Right, but I mean it's a plugin issue outside of Stagehand. That's currently how the space treats the issue.

There are plugins that extend the major browser automation libraries to do this sort of thing.

And undetectable browser automation doesn't use Playwright, Puppeteer, or Selenium at all.


What makes the "agents". From what I can tell, they don't perform any external actions. I would call these chatbots...


> What makes the "agents"

As most people understand them, a lot of marketing and imagination. There's some fairly dense theory on the underlying concept that is probably inaccessible to someone without a math degree, and much of this research happened well after my academic days so I cannot comment on a deep level, but I'm extremely skeptical of its attempted implementation in the current market, particularly when language models are at the core of it. From what I understand, an agentic system can exist entirely without LLM's and all the rube-goldberg machinations behind giving it the appearance of working. However, this is the course that every single tech company has gone all in on, so we'll see how it goes. I suspect that LLM's will be discarded in favor of something much better in the mid to far future, but the fact that no one can currently say what that is or even would look like is a little bit concerning to me when the large bets are being made that it will just happen inevitably in more near-future timelines.


By default the "agent" is pretty limited in actions (only talking / and thinking).

Currently it's up to the user to add its own actions, you can find an example where I gave the agents access to ChatGPT (cf link)

Obviously, some default actions will be added in the future :D

link to example: https://github.com/Thytu/Agentarium/blob/main/examples/3_add...


I don’t (at very brief glance) see either agentic or workflow code there, I think it’s up to the users of the library to bring their own agents?

(this being the only sensible terminology definition I’ve seen for which has very definitely been a “fund my startup” marketing term until now: https://www.anthropic.com/research/building-effective-agents)


Ya, these are the definitions I'd use.

While all "Agentic" system will likely use some form of function calling, not all function calling is "Agentic". Most implementations are more "Workflows" than "Agents".


Ypu would need mathematical theory of embedded agency to dig yourself up from the terminological hellhole.


can you point me to some materials that are easy to digest?


Easy to digest to me is a matter of process plus order. Just because you can boof wine/ai and feel a greater high than if you just took a sip doesn't mean you should. I'd start by setting the table and throwing out your last meal. start over in the 60s and work forward.

stafford beer on cybernetics (also worth mention, norbert weiner): https://www.youtube.com/watch?v=JJ6orMfmorg

Lots of other people start w/ other things but i'm a mgmt minded person so a social engineering + psychology + anthropology oriented lens has always been my anchor.

My first real intro to math where everything clicked was w/ primitive graph theory as it were 2000+ years ago. From there, algebra, geometry, trig, calc, etc# started clicking.


love norbert weiner, fantastic. the great cyberneticians of their times were such far future forward visionaries it's a bit astounding.


A. Demski's illustrated, more blog-style (rather than papers) works online.


What I mean is that we want something like set theory merged with weak computability theory (e.g. Kolmogorov complexity) but set means environment, embdded agent means element inside set, and then everything else gets built on top of that theory. This may be a an infinitely large -feeling work because there are many rules & interactions & games, but you are essentially answer what does it mean to be a rational embdded agent that's part of an environment, that probably doesn't have the exact same wants, in the abstract & formal sense of the inquiry.

Once you start examining scenarios where there are multiple clones of you that you perhaps cannot tell apart, your memory has been limited, you are facing larger amounts of pain & pleasure that you can handle without going insane, everyone wants the same thing but you also want something different, or someone gets mind-reading powers that might be partial and only work when they reveal their thoughts... then you are analytically working towards the end goal of building a larger catalogue of "everything" & making a theory & terminology out of that.

I would dedicate 2-20 years on the formal work but there's no funding I could get I know of & besides couldn't promise to yield results since foundations yields no results until it does if it ever does.

---

A related note: Kolmogorov complexity & finitism go along beautifully but the bit-lifting to calculate memory computational process takes insofar is out of my capacity. Nevertheless you can define a maximum number to ever allowed to exist in a system as Church number & have then perhaps the minimum amount of logical operations required to get there to be the maximum amount of memory your operations can take, closing the system since you lose memory in say substraction to a, b, the process initial size (hello, Kolmogorov) & process memory consumed at most (in any procedure, the smallest procedure being the number itself).


"AI agent" -> LLM with function calling

is the new

"AI" -> LLM


Isn't this exactly what ansel tried to do?


I previously used wisebanyan which was a robo advisor with zero fees (their model was to charge more for add-ons like tax loss harvesting). Then they added small portfolio management fees... Then they sold to another institution and my fees are now inline with every other roboadvisor.

What would you say to someone who is skeptical about the long term viablity of your low fee promise?


I imagine what it could have been if it wasn't acquired by Salesforce and it makes me sad


Anyone got the insider scoop on how the "easy" mode of the delta chess app ended up being a savage?


There was a guess somewhere in the video comments that I found convincing. (I watched the video a few days ago and can no longer find the comment).

It claimed some old chess engines used time limits on certain computations as a means of configuring difficulty. Given more time, the engine may be able to look further ahead.

If such limits were tuned for older hardware, then upgrading the computer could significantly increase the difficulty.


Favorite: earthworms

Least favorite: spotted knapweed



By the time you've figured out how to install and run this, you've probably learned more than the class your cheating in had to offer.


This is what happened to me in middle school. I wrote a bunch of programs to “help” me with algebra II tests and ended up inadvertently learning an insanely marketable skill in the process.


Similar story here. I wrote programs in TI-Basic to help with some exam questions, and wrote summaries in TI NoteFlio. I ended up getting so many few-euro donations for them from everyone who used them. 30 minutes before the exams, exchanging these things over the TI-link connection.

Of course, nowadays all these TI calculators have exam mode, which blocks the internal memory and a bright LED to indicate this mode is enabled. And the older calculators, with practically the same feature-set (TI-83, TI-84) are now forbidden (should clarify still the same TI-8x calculators are allowed - but only if they have exam mode). Nothing is fun anymore these days.


Same but high school. I ended up porting the Drug Warz game onto my TI calculator and playing it while in class instead. Now some ~23 years later I'm a python wizard.


At least, the one techy guy selling them to the other students has.


I vaguely recall some children's book in which the protagonists spent hours and hours poring over their math textbooks and copying all the important things onto hidden cheatsheets, only to realize--to their horror--that they'd actually studied after all.


Ah yes, we apparently would prefer an oligarchic theocracy.


If you're not an expert in something, you should probably listen to those who are...

The general public does not have the time to study climate models and review raw data on vaccine efficacy.

Without overall "trust in science" (or perhaps "trust in the scientific community") we lose the ability to form data driven policy.

The exception being when there is monetary value to be gained by fooling the public. IOW, be skeptical of studies funded by special interest, but

"Do not be so open-minded that your brains fall out."


Just because I listen doesn't mean I'm going to become.

I like making my own opinions, because that's how I become informed. If I'm just taking in what others say, I'm just an NPC acting along their programming.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: