Hacker News new | past | comments | ask | show | jobs | submit login

Makes sense, thanks. I wonder whether human web-browsing strategies are optimal for use in a LLM, e.g. given how much faster LLMs are at reading the webpages they find, compared to humans? Regardless, it does seem likely that Google’s dataset is good for something.





Take this example:

A human googles "how much does a tire cost?"

They pick out a website from search results, then nav within it to the correct product page and maybe scroll until the price is visible on screen.

Google captures a lot of that data on third party sites. From Perplexity:

Google Analytics: If the website uses Google Analytics, Google can collect data about user behavior on that site, including page views, time on site, and user flow.

Google Ads: Websites using Google Ads may allow Google to track user interactions for ad targeting and conversion tracking.

Other Google Services: Sites implementing services like Google Tag Manager or using embedded YouTube videos may provide additional tracking opportunities

So you can imagine that Google has a kajillion training examples that go: search query (which implies task) -> pick webpage -> actions within webpage -> user stops (success), or user backs off site/tries different query (failure)

You can imagine that even if an AI agent is super efficient, it still needs to learn how to formulate queries, pick out a site to visit, nav through the site, do all that same stuff to perform tasks. Google's dataset is perfect for this, huge, and unparalleled.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: