Hacker News new | past | comments | ask | show | jobs | submit login

> That's a massive barrier to entry. You need enough data and compute to train a massive language model, more compute to run the model against all incoming queries, and then even more compute to handle the extra search load precipitated by use of the language model.

Luckily, most of the time you could improve my user experience by removing that cr*p and give me my 2007-2009 Google back.

From there you would only need to allow users to make personal blacklists, share personal blacklists (this was about the time when auto-generated content started to become popular) and maybe also aggregate some popular blacklists for a default blacklist and it would be better than anything we have seen since.

(I remember having a txt-file with -spammydomain.com -anotherspammer.com etc etc that I pasted in at the end of certain searches to take care of sites that had either had

- auto-generated content

- or stuffed their pages with black/black or white on white keywords )




Giving you 2007 Google might not work because people are using 2020 strategies to game it. But I'm definitely skeptical that that's all there is to it.


I hear this a lot.

But in all honesty it is not the SEO scammers fault that Google serves me pages that doesn't contain the words I searched for after I have chosen the verbatim option.

It also isn't SEO scammers fault that when I search for Angular mat-table[0] I get a number of pictures of tables with mats on. That is probably the result of someone playing with some cool AI tools while othwrs are busy trying to make more efficient ways to ignore customer feedback ;-)

We must manage to keep those two thoughts in our head simultaneously:

- Black hat SEO have changed

- Google has adapted to another audience and has ditched us power users hoping we wouldn't notice.

[0]: screenshots of that and some other clear examples of Google and Amazon testing out AI in production here: https://erik.itland.no/tag:aifails


Do you know many black hat techniques? Around 2011-2013 was when Google shifted from being extremely easy to game to very difficult. 2014 was really the end of it. Have a look at some niche site blogs from the time - revenue from new niche sites tanked from like $1.5k/month each to barely close to $100 (with a lot more work up front).

Anyway my point is if you rewound the clock to 2008, you'd have a way bigger problem than you might think.


Fine. But we must still must be able to separate between backend and frontend: it should be possible to upgrade the anti-spam machinery without breaking

- doublequotes

- + (ok, they broke that deliberately around Google+)

- the verbatim operator

All those should be able to work even if the crawler and processing techniques are updated, right?

Also a heads up: I added some more details to my post above,I didn't think you would answer so fast :-)

Edit: I only know the black hat methods that was well know 10 years ago like:

- backlink farming from comment fields (we protected against it by applying nofollow to all links in comments)

- Google bombing (coordinated efforts to link to certain pages with particular words in the link, trying to get Google to return a specific result for an unrelated query. I think the canonical example was something like a bunch of people making links with the text "a massive failure" that all pointed to the White house website.

- Link-for-link schemes

- etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: