Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why does Google rank the real Python documentation below content farms?
274 points by josephcsible on Sept 12, 2023 | hide | past | favorite | 142 comments
Do a Google search for "python endswith". Obviously, https://docs.python.org/3/library/stdtypes.html#str.endswith is the correct best hit for that term. Why does Google rank that page below all of the well-known low-quality content farms like GeeksforGeeks, W3Schools, and Tutorialspoint?



Hot take here, but as someone that doesn't code daily, I prefer those sites over the actual docs in most cases.

If I need to get something done quick, those sites will give me a quick 5 second refresher with clear examples.

Actually, in the doc you described as "obviously the correct hit", all I see is

> str.endswith(suffix[, start[, end]])

> Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

Meanwhile, the first hit in Google for me is Programiz, which has actual real examples without any additional clicking around or trying to understand how the information is structured.

Besides, I know the docs exist, I don't need a google search for it. I'll click on the content farms every time because they've consistently been the fastest way for me to get what I need.


Yeah I enjoy the content farms. Who needs documentation on a simple problems when you can instead have something easy like this:

Many are asking questions about the Python Documentation How To Check Null Python 2023. Here are some solutions to How To Check Null Python 2023:

<ADVERTISEMENT>

Solution 1: Type "if variable is None"

<ADVERTISEMENT>

The first solution out of many is to type "if variable is None". This is the best way to solve Python issues, recommended by experts.

<ADVERTISEMENT>

Solution 2: Download Windows Computer Cleaner


That seems a bit disingenuous on your part. I picked the top 3 hits on Google and they are all very helpful and to the point - Programiz, W3Schools, Tutorialspoint. Granted, I have ublock origin, as most people should have anyway.


This got a chuckle out of me, but plenty of content farms for programming are extraordinarily useful. One could make the case W3schools is a content farm, and that's raised a whole generation of programmers for their knowledge.


> "if variable is None"

Is missing :

We all know we should do the following, it's much clearer.

# Example of a dictionary to check None

> # Assign None to a variable

> myvar = None

> # Declare dictionary to check None

> mydictionary = {None: 'None is stored in this variable'}

> print(mydictionary[myvar])

If I expected something, I would try it.


You don't have adblock?


Every recipe website!


Well, I for one think it's quite kind for them to offer a free computer cleaner! Just let me install it and... 1250 errors?!


And then it deletes your .venv folders to save space!


It's really surprising how awful the official Python docs are, considering how much the language has grown of late. If I need to reference core Python docs these days, I almost always go to this version on devdocs.io[1].

Thankfully most of the reference documentation I have to look up are the popular data science libraries like pandas. Their documentation[2] is so much cleaner than core Python.

1: https://devdocs.io/python~3.11/ 2: https://pandas.pydata.org/docs/reference/index.html#api


Except those farms don't do any original research and just copy off each other. They're littered with mistakes and you will see the same mistake pop up across all of them.

These days for obscure terms, you don't even get the luxury of reading garbage written by people who barely understand the topic at hand, instead you get meaningless fluff generated en masse using LLMs.

Honestly, I'd rather spend time parsing whatever doxygen spits out than try to figure out what the needlessly verbose yet inaccurate LLM output is trying to get at.


> Actually, in the doc you described as "obviously the correct hit", all I see is

>> str.endswith(suffix[, start[, end]])

>> Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

> Meanwhile, the first hit in Google for me is Programiz, which has actual real examples without any additional clicking around or trying to understand how the information is structured.

I'm sorry, but what examples could you possibly want that the official documentation doesn't make clear? It's written as concisely as can be, describing the possible inputs and the expected outputs of the particular function, no? I don't see how sifting through tens of lines explaining what the docs say in two short sentences is preferable.


I don't know if you're trolling or not.

When you get good at it, it is much easier to skim through 10 lines of fluff to find the answer (that is usually visually distinguished in a code block) then it is to parse through 4 really dense, terminology filled sentences.

If I need to know exactly how all the options work, sure the docs are the place to go, but 90% of the time I just need a quick example to go off of.


I'm not trolling. From my perspective mastering a language includes mastering its included library (for things like python where its standard library is indeed what everyone is using). Thus, it is always preferable to read the complete documentation for some functionality and pick up every detail along the way instead of having an idea in your head how to do the thing you want and picking only the exact use case you wanted from an example.

> If I need to know exactly how all the options work, sure the docs are the place to go, but 90% of the time I just need a quick example to go off of.

You will almost never get to the point where you'd need to know "exactly how all the options work" because if your routine is "search example, copy example, continue", you won't even know what options exist and that there is a way you could do things different (maybe more efficient? simpler?).


Php docs get this right in a way that works for many more people than Python does.


Agreed. For all of the hate PHP gets, the state of their respective docs was a big pain point for me in trying to learn Python after years of writing PHP.

However, the PHP docs, especially for older, less-used functions, are riddled with subtle errors and inconsistencies related to the typing of arguments and handling of edge cases. But the way they're structured is fine! I particularly enjoy how you can type php.net/sprintf and land directly on the doc page you're looking for.


Official python docs are awful. Same with MSDN. Docs should be more than just the auto-generated pydocs which just parrot the function signature—which I can easily infer because I can, you know, read.

For any sufficiently complex function that requires me to actually look up the docs, I want example usage. Not all arguments are obvious. Not all return types are obvious. This is especially bad for overloaded functions. Worse is when docs requires a circular graph traversal of clicking endless links to more documentation.

Needless to say, I prefer content farms and blogs to the official docs.


> I'm sorry, but what examples could you possibly want […]?

Any? Any example would do.


Humans are much better at inferring rules from examples than at deducing an example from a concisely stated rule. The official python docs are the latter.


I get your point for vanilla Python docs. Content farm pages can be more helpful for quick look ups here.

But if you use Pandas, Numpy, Scipy, etc., you know how fantastic the official docs are. They are much better than content farm crap. And, yet, in these cases, too, Google ranks those sites higher.

I use DDG a lot more, and it has almost replaced Google for me.

I use Google now only for local uses- gas stations near me, restaurants near me, and so on.

I also highly encourage you to try code.you.com and phind.com. I have been very happy with them.


> Actually, in the doc you described as "obviously the correct hit", all I see is

> > str.endswith(suffix[, start[, end]])

> > Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

What exactly don't you understand from that concise and official documentation?


Parent wants a real example not a partial syntax definition


Why not generate a few examples of your own using the REPL?


Because the page you've linked has 24,599 words in total, out of which only 43 words are dedicated to "str.endswith". You can make a beefy SEO article specifically on this method by talking about it a bit, and then providing a lot of examples in various different scenarios.

The bottom line being that there is nothing you can do about it unless Python themselves fundamentally change how the documentation is structured, which I doubt they have plans to do.


I disagree - Google can and should modify their algorithms until the site that users actually want is first. Tracking which sites users click on for those searches, or maybe which sites users with those searches click on last should give a pretty strong signal that python.org should be ranked far above the spam farms no matter what the content percentage difference is. Content percentage is a clearly gameable metric that shouldn't outweigh other factors.

I'm honestly surprised Google employees haven't fixed this somehow, surely they have thousands of people running into this problem daily.


> Google can and should modify their algorithms until the site that users actually want is first.

Who, exactly? A professional Python programmer probably has their IDE/editor set up so it can shows a brief document for `str.endswith` already. Someone who is new to Python (probably not even a programmer yet) might actually prefer the hand-holding from those content farm pages.

[1]: https://news.ycombinator.com/item?id=37483100


As a professional Python programmer, I regularly search for the official documentation on things like the details of MagicMock or other things, for both Django and Python. I'm generally looking up things that are more than "what does str.endswith do", and more like "what's the name of the field of the mock that shows what it was called with", or other things that I know exist but don't use day-to-day.


There is much more to documentation than just API documentation. A documentation window in a text editor isn't shareable either.

Also, who'd actually want hand-holding from low quality content?


> A professional Python programmer probably has their IDE/editor set up so it can shows a brief document for `str.endswith` already. Someone who is new to Python (probably not even a programmer yet) might actually prefer the hand-holding from those content farm pages.

Someone who has been exposed to the most basic Python features and knows they want info on str.endswith can just type help(str.endswith) in the Python interactive environment.


Which is all very well (and for something simple like str.endswith might well be enough), but for many things, even a professional Python programmer might either a) not remember the exact name of the method or other thing they're looking up, or b) want a more detailed explanation or examples showing usage.

None of that is going to be given either in an IDE or from help().


So, maybe google should tell them that and turn away their business?


I’m saying that fact shapes the set of people resorting to Google in the first place and what they are looking for.


It seems that you believe Google is not taking those signals in consideration in their algorithm.

Google search rank has been an ML driven algorithm for ages now and take into account dozens of signals. User rejection (how fast people come back to the search from a result click) and obviously which results get clicked more often are examples of that.


I do believe that they do this, but that their weighting is off, sorry for not being clear. I was mostly responding to the parent comment talking about ratios.

Based on other comments saying they like the sites I consider spammy, maybe the weights aren't even wrong, they're just not optimized for me!


> Google can and should modify their algorithms until the site that users actually want is first.

Why?

Google is an ad company, that has integrated search.

Why would they cut their own bottom line in order to make data happy?

You aren’t their customer. You are Google’s data. Google’s customers buy your data.

Why would they change anything about their business model, which works amazingly well?


Because not paying attention to the quality of the product means google is A-okay with the normal brand arc and is on the part where the brand cuts quality to boost profits while coasting on name recognition- I.e the enshittification phase.

They can coast for an indeterminate amount of time, but with the risk that eventually the consumer will be fed up and stop using them, especially if a decent alternative pops up.


Decent alternatives have been bandied about here for over a decade. Guess what? There are none.

Google’s business model is selling user data to the highest bidder. Google’s other business model is selling opportunity to collect data on people.

You as a search user are not a customer, but a data point.

There is literally no incentive for this to change.

Desiring an ad company to not show ads is a ridiculous desire.


That’s not quite right - users are customers paying for a service, search, in exchange for a payment, I.e. their time and eyeballs.

At the same time google coordinates with their other customer and swaps the time and eyeballs for money.

But users are definitely a customer paying with time in return for search results.

If the product becomes too obnoxious, people don’t even need an alternative to quit. It’s happened with other products. Not everyone will quit, but the customer base will shrink.


Google owns chrome, and chromium.

Google’s data gathering isn’t just search.

Anyone not using Firefox, and even those who are using Firefox, is gifting Google data.

That data is worth more than search results or search data.

Google knows where you browse. Google knows how many clicks, scrolls, and keys pressed. Google knows what your email contents and preferences.

Search is a tiny portion of Google’s data collection behemoth and whether users quit Google search for bingo search doesn’t matter much to them.


Their business model depends on users finding their search results useful. I think Google’s business is far more fragile than it looks. That’s why they’re doing all the things that are now in the crosshairs of regulators.


> Google’s customers buy your data

AFAIK, Google doesn’t sell user data.



but will this strategy give them more advertising impressions or clicks? if not, then why bother? Google dropped it's desire to be user-friendly quite a while ago, when it became clear that it didn't matter if they were serving anyone but it's advertisers


Google has never worked this way. By adding more examples to an SEO article you are also actively farming long-tail clicks, which will make people spend more time on the page and if Google was to use that as a metric (there have been rumours) then it is beneficial to the SEO site as well.

Like I said, Python or any other language can fix this issue by changing the entire mindset of how documentation is built. You can then get the OSS community involved on GitHub to slam dunk on all the SEO spammers.

Just look at MDN or Web.dev, both have their content repos up on GitHub and people contribute metric tons of useful info. And both of these sites have healthy standings in the context of SEO.


> Like I said, Python or any other language can fix this issue by changing the entire mindset of how documentation is built.

Please, please, I beg of you, in the name of all that is holy, do not push for SEO bloat in documentation!


No, MDN routinely loses to w3schools and geeksforgeeks too.


I do agree with you but I didn’t feel it was necessary to point that out. MDN doesn’t use SEO titles so my guess is, that is where it loses out a bit, but for most in-depth JS/CSS stuff, I generally get MDN as a top result.

Can’t remember the last time I visited W3S/GFG to be honest.

It’s about to get worse though as it appears that DigitalOcean is putting a nail in the coffin for CSS-Tricks and Google will absolutely demote it because of inactivity, only the most linked-to pages will survive until DO decides to shit the bed and try and move the entire site to their Tutorials platform.


I mean at one point Google never returned image results.

Certainly some of the Python queries could return a card promoting a result from python.org just like other queries return a card promoting wikipedia.org


How do you know their users don’t actually want these other sites and would rather read the formal docs?


I've been doing a little python lately. the top hits are 80% ads, and if they even have what you want its buried after 2-3 pages of nonsensical crap. this isn't a real choice


> Google can and should modify their algorithms until the site that users actually want is first.

That is what Google is doing! But remember, the advertiser is the user, not the person running the search.

If you want to be the user, and get good search, pay for kagi.


Somehow kagi.com manages to rank the top results better than Google:

- a stackoverflow answer that has a useful example: https://stackoverflow.com/a/18351977/

- the Python documentation: https://docs.python.org/3/library/stdtypes.html

- Pandas documentation (they also have an endswith): https://pandas.pydata.org/docs/reference/api/pandas.Series.s...

- a blog post on Python regular expressions (this looks useful but is a wrong result for this query): https://www.johndcook.com/blog/python_regex/

- something that could be a content farm with rephrased Python documentation: https://pythontic.com/concepts/string/introduction

The top three results are good, and I guess it's a matter of taste whether the succinct Stackoverflow answer is better than the verbose official documentation.


You can also raise/lower domains in kagi. So if you're a python developer, you can boost docs.python.org to personalize your results.


"somehow" is due to their different income model from google. their customers are their search users, not advertisers, so it's not surprising they serve the people that pay them. I'm a very, very happy Kagi user due to this!


>The bottom line being that there is nothing you can do

Google's ENTIRE JOB is to surface the CORRECT content for a query. If they are objectively not doing that, then they are in the wrong. Imagine credit card fraud just kept increasing, and instead of adapting and improving their systems, companies just threw up their hands and said "deal with it, not our problem, nothing we can do" as you had to generate a new credit card number every other day because even the most trivial card number enumeration attacks were not stopped.

The fact that google is completely unwilling to change how they operate to prevent even the simplest, intern driven SEO trash from overtaking literally authoritative sources should be exhibit A in a trial to break up Google, as evidence of how negative value they are to the consumer.

It's the rough equivalent of Fedex just throwing away a third of the packages they are supposed to deliver at random, and people seemingly just saying "eh, nothing can be done"


> Google's ENTIRE JOB is to surface the CORRECT content for a query

Google's job is to serve ads profitably. It's something like 90% of their profit (80% of revenue). To some extent they even have an incentive to show poorer results first because it makes you stay on the page longer and look at more sites (more ad impressions).


in to boost this thinking. "correct" is only relevant for profitability of the advertising stream


Honestly, changing the way they do it is a fine idea. Massive documentation pages were how we did it in the olden days. And I think they're still good for when you want to read through and get an understanding of the whole library. But having that also broken up into a lot of small pages that link back to the main doc would be really helpful to people who are searching for specific solutions.

I think that could mostly be done in an automated fashion to start. Then into the master doc you can start adding hinting and extra material that gets rendered into the sub-pages. So it could be approached gradually.


You're not wrong, but I think now the majority don't want an understanding, they just want answers.


> unless Python themselves fundamentally change how the documentation is structured, which I doubt they have plans to do.

Nor should they, in my opinion. Quality of documentation is more important than that.


I really hope they don't enshittify their documentation just to rank in Google.


Kagi has a simple fix for this. It’s not a hard problem to solve, they just have no incentive to solve it.


Kagi's solution to pin, bump or blacklist some websites is awesome. It's a shame Google doesn't have something as simple and elegant after all these years.


Aren't you describing the exact problem that Page Rank was supposed to fix?

We've come full circle.


It’s a general problem. In the past I have seen great HN threads sharing exclude lists etc to avoid the scrapers. Google could, of course, track a few thousand canonical sources so that just python and SO and a few others don’t get beaten to the top in their specific expert areas, but it doesn’t genrtalize. Google has no real algorithmic idea who is copying who.

And then it hits me: the proper documentation for things are on pages without ads! Perhaps that’s the signal google needs to start weighing heaviest…? ;)


> Google has no real algorithmic idea who is copying who.

This is an irrelevant distraction, because Google's literal army of nearly 200,000 full time employees, a large number of whom are Python developers, do know. Geeksforgeeks, w3schools, programiz, tutorialspoint, and python-reference.readthedocs, which all rank higher than the official documentation, are not flash-in-the-pan sites. They've been polluting the search results for literally years. Trivial manual processes at Google scale would be 1000% fine and effective.

Google could, if it gave half a shit, give every employee a Chrome browser extension that lets them manually vote on the reasonability of site rankings within known problem genres for their own searches.


I have the feeling that this very page would be downranked very fast if such an extension existed.


Honest question: haven't people developed something akin to mental-ad-masks when it comes to Programiz, G4G, TP, etc.?

Like I never ever ever click on a web ad, and can't even remember the brand it was about 3 seconds after leaving the page, whenever I see G4G, TP, etc., I never ever ever click them. At this point, I think this doesn't even reach my brain, but is handled by my spinal chord.


It does reach your brain, and it does affect you, just like a fog of gnats you've grown accustomed to still affects you.

For what it's worth, the ads aren't for you. They're for anybody or anything that might click them, knowingly or not, and that's not you.


SO isn't a canonical source, they're another content farm using nofollow links to the content they appropriate to maintain their search edge.

Its entire purpose is to inject itself as a search middleman. Even worse, I find myself landing there from google results to a question they had closed as not worthy of the site, but worthy enough still to keep up for the search juice.


> I find myself landing there from google results to a question they had closed as not worthy of the site

Well, you did google for the question and not for the answer. ;)


True, and it was even more accurate a few years ago. Now Google moved towards understanding queries and giving context-specific answers rather than a strictly keyword based engine. You were considered a bad googler if you asked it a question, now it's the norm.


Isn't boosting websites without ads against their interest?


This is what I was hoping all the replies would be about :)

Is monetising the web through advertising the reason these scraper sites exist? And removing that would make scraping sites an expense with costs rather than profits?

Perhaps google could do this. Ads on third party sites are probably a small part of their income compared to sponsored ads in search results, so they could be sacrificed and they’d take down their competitors who rely more on the website ads because they don’t have the search?

However, they have no need to compete on search quality at this time. If another player in the search and browser wars started downweighimg ad heavy sites then this might make google have to reluctantly follow suit. But you can imagine the FUD campaigns to paint such an actor as killing the internet and being the bad guy etc…


Personally, it's because the docs aren't that good unless you're looking for something very specific. People are talking about the fact that there are no ads, but I would think it's because of the bounce rate. I could see there being a lot of people learning Python who go to the docs, see a large chunk of text with OK examples, and then bouncing to go to the other sites that have more examples. This is something that's happened to me a lot when I was doing Python development. Some of the stuff is very helpful on the docs (e.g. asyncio), but for other things the actual stuff that I want to do gets lost in the details of the docs. So while ads probably play a part, I think the bounce rate is a bigger factor, especially for people who aren't necessarily developers.


I often skip python docs for this reason: I find myself scrolling to find actual usage examples of the functions I am looking up, preferably 3 short examples showing typical use cases + idiomatic code. Bonus points if it also references frequent mistakes, i.e. "for X use case you probably want Y instead." I am often surprised to find no usage examples at all, particularly for such a popular language (ex: itertools section). I'm not sure if this is consistent w/ a newer programmers mindset, but I often find looking at _just_ a few snippets is sufficient to understand the function's intended use case and capabilities, and can dive into details later if needed (ex: What's performace / memory usage like? How can I extend / combine this? etc).


I've come across this many times and I've come to think it's an issue with that specific page in particular. The problem is partly on Python's end: they've got a single page where they've jammed in all of the built in types (str, dict, list, complex, ...). It's a huge list of types and a correspondingly huge page!

I suspect if there were separate pages for each type then it would be ranked higher... and it would actually be more useful. I don't get why they've done it like that.

The higher ranked pages admittedly have a whole page just for a single method, which is too far in the other extreme and is obviously more for SEO than use. But with the Python docs the way they are, we'll never know whether a more sensible official page would beat them or not.

As some evidence that it's not just Google allowing spammers to shine through - sometimes for a search about a built-in type I actually do get official docs at the top, but it turns out to be the tutorial page about the class (which is not a giant mish-mash).


The real question is, why are these content farm sites indexed at all? They are spam, and should be blocked, just like the way GMail blocks spam. They should never appear in any search result for anything ever, let alone be ranked first!

If someone simply took Google and just applied a huge blocklist so that garbage sites like those never got indexed, it would be the perfect search engine.


Because adding exceptions to an algorithm doesn't scale, they won't add any. Because customer service doesn't scale, they don't have any. And so on. Frustrating isn't it? It's what happens when you let engineers design a product. Letting system behaviour at the limits dictate everything.


They've added plenty of exceptions in the name of "cards". Wikipedia, movies, celebrities, ...


The text ...

> Return True if the string ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

... without an example is NOT easier to use than an example for people learning Python. It uses language specific jargon (suffix, tuple), unexpected capitalization, and is needlessly terse.

How about ...

> Check if a string ends with a certain ending (or endings) and return True if it does, or False if it doesn't. You can specify one ending or multiple endings in a tuple. You can also choose where to start and stop checking within the string.

.. along with an example of code that can be easily copied? That is the value the other sites provide. Readability and usability.


As others have said, the pages with the most ads are the ones Google wants you to go to. Their highest priority is making money. Sometimes, that means showing you the right, best content, but a lot of the time now that's not the case.

Which is why LLMs have Google scared, in my view.

If an LLM has all the answers, you don't need to hand over your question to Google so it can steer you to the "right" (ad-filled) answers. It just knows, and tells you. Yes, hallucinations are still a problem but they aren't a growing one. LLMs that can provide you a reference to the right docs will be a thing soon if they aren't already.

How does Google make money in a world where fewer and fewer people need to ask them for where to find the answer?


Checkout devdocs.io. I'd suggest using this if you want to search official docs. But first you need to enable whatever language you want to search. I enabled python 3.11 and searched `endswith` and immediately found the documentation.

I know this doesn't answer your question, but I hope this helps you in the interim.




I highly recommend comparing google results with kagi.com results. The extreme difference in quality has a simple explanation: kagi.com is a paid service so it can downrank sites with many ads and tracking scripts.

Google needs that sweet surveillance money so its results are filled with crappy content farms both human- and LLM-generated. Kagi doesn't need to make money so it can happily link to the highest quality sites, even if they don't take part in the targeted advertising economy.


Every one of those "well-known low-quality content farms" is a better result for someone searching "python endswith" than the official documentation.

You don't have to parse through a veritable novel of irrelevant results to find what you're looking for.

They provide example code to show you how to use the method.

They break down the usage more thoroughly than the official docs.

They _show_ you the different parameters you could pass to the method.

Some of them provide interactive REPLs where you can play with and test the method.

The docs break it down _technically_ but they leave questions. Are start/end inclusive? What does it mean to "stop comparing at that position"? Why would you use the start parameter if you're trying to find the end of the string? If you use start does the end parameter count from 0 or from start? What happens if you pass a start or end that are outside the bounds of the string?

Look, I think the Python docs are great and use them all the time. But for the average person looking for info on `endswith` - whether that's someone new trying to understand how it works, or someone experienced looking to understand the parameter types - those pages are more approachable.


FWIW, DuckDuckGo gives me the same crap results[1], but at least the !bang syntax works.[2]

[1]: https://duckduckgo.com/?q=python+endswith

[2]: https://duckduckgo.com/?q=!python+endswith


The state of Python documentation in general is terrible.

"This function accepts the following arguments, half of them are documented below. It also accepts kwargs, but we won't tell you what to put in there or where it ends up. To keep you on your toes, we throw some special mystery exceptions that we won't tell you about. Do they have a common base? It's a secret. Have fun digging in the source code like a chump; Python is easy to read, you'll figure it out."

Not to mention that there aren't any overarching standards like JavaDoc. There was a PEP a while back describing ReST with some inconsistent examples of how to use it to document code. Most projects I've seen pick one of the competing standards, half-ass and inconsistently follow it, then rely just on examples to document usage. It's a mixed up hodgepodge of incomplete prose that's difficult to read with a machine.


There are a lot of ways to look at it.

The "You're not the customer"-perspective: You as a user of google search is not the customer. The customer is the people placing ads on Google search, and secondary the people placing ads on the pages google search leads users to.

The "its an algorithm"-perspective: Google is a search engine, not a collection of curated links. In the past, Google has been very much against having human rate results, but I think they actual have focus groups that come in a lab and do some searches and rates what they see (under the guise of being a different search engine, most likely). Google is very conservative about adjusting their algorithm (or at least have been) and small changes can lead to huge changes in income.


GeeksForGeeks is the bane of my existence and IMO a symptom that the entire system has perverse incentives. Fortunately, I've moved 90% of my code inquiries to LLMs, to great success.


Have you ever seen Google ads on the Python documentation?

But more seriously... I think it's because the language and framing of the python documentation is difficult for a new user to understand.

Yes, Python is miles better than some other languages at documentation, but it's still more of a reference than a tutorial. I remember when I first started learning Python, I read blog entries (e.g. blogspam) more than I did the official docs.

In my intermediate phase, I searched Python docs but I didn't need Google's help for that.


I wanted to start a new search engine that would index content from ad-free websites only.

Then I realised StackOverflow has ads nowadays and my search offering would be useless for like 98% of devs.


Agreed that Google is not very helpful here, but searching the Python docs directly works pretty well:

https://docs.python.org/3/search.html?q=endswith

Otherwise I really like ChatGPT like this: you put in minimal work into the query and it usually fills in useful info. If you use "Advanced Data Analysis" mode it will run those examples in the browser.


Why exactly is MDN considered a more authoritative source than w3schools?

MDN would, I suppose, be a more authoritative source on what Mozilla thinks. And, presumably, a less authoritative source on what everyone else thinks.

The principle difference as far as I can see is that w3schools gives me the same information in 3 pages instead of 10.

My suspicion is that google cares less about what you think, and more about what everyone else thinks.


Because content farms give Google money and the official documentation doesn't.


This shit is so frustrating, every query I ever have for Rails stuff goes to APIdock


Those sites are plastered with Google ads so it makes Google more money to recommend them over a useful result.

Additionally Google guidelines for search ranking prioritize meaningless fluff and spam because they want to waste as much of your time as possible. More wasted time = more ad exposure.

The worlds largest search engine is owned by the worlds largest advertising company. I am surprised no one saw this coming lol


Hey at least the first result isn't some StackOverflow overlord schooling you on why you should never use endswith, right? /ₛ


Short answer: The rea Python documentation is paying less money to Google than the content farm.

Long answer: Balancing the many interests of search result parties, the decrease of consumer satisfaction is by Googles benchmarks outweigh by money received from their paying customers.

Use Bing, results are relevant and they do not yet rank paying farms as number one.


It's a genuine problem, not only with Python but with other languages.

However, the api library reference is only one kind of documentation, and not necessarily what everyone is looking for. For whatever language I'm working in, I keep the library docs handy for immediate use, and only go to search the web when I'm looking for something beyond a dry reference. Maybe I want a tutorial, or short how-to for a specific task. Maybe I'm looking for something deeper, with context and explanation.

I somewhat agree with another comment here: the library reference docs should be a keystroke or click away in your development environment. Are there plugins for your preferred editor or IDE to make this possible? Use those. If you're looking for a different kind of documentation and it's not part on the official python site, maybe that's something to be addressed.


> Obviously, https://docs.python.org/3/library/stdtypes.html#str.endswith is the correct best hit for that term.

Just because it's official, doesn't mean it's good documentation. The other results are significant better for this specific query. They are more elaborated, have better readability, offer examples, and don't force you to search through a long text to find the 3 lines which are relevant for you. Some even have a live-test.

The only real benefit python.org offers here is to offer more documentation about the language itself. Which is interesting for beginners, but not necessarily for everyone else.


Google doesn't index anchors (the parts after #) so the comparison is between https://docs.python.org/3/library/stdtypes.html and https://www.w3schools.com/python/ref_string_endswith.asp. The former page has a lot of content unrelated to "endswith" and thus is ranked lower. I also think that the "content farms" pages are useful because they offer up lots of examples which the official docs don't.


Why do people still use legacy search engines that are no longer fit for the purpose of finding things you need?

Google isn't built with the end users in mind. It's built to exploit people, and sell that onto their actual customers, the advertisers.


While not answering the question, prefixing the query with `site:docs.python.org` will get you what you need. If you're doing this a lot then I recommend at least adding a search shortcut to:

    https://www.google.com/search?q=site%3Adocs.python.org+%s
then you can type something like `py endswith` in the address bar. Or use ddg's "I'm feeling lucky" (prefix the query with !) and go directly to the first result:

    https://duckduckgo.com/?q=!+site%3Adocs.python.org+%s
Even better, just use https://devdocs.io/


As someone that loves python and uses it daily, the official docs are terrible for getting a quick answer. They're impenetrable as a result of being exhaustive. If you google anything react related their official docs will be the first to show.


Kagi returns the expected link first, for what it’s worth.


We are to a point where I think Google should start manually de-ranking content mills and blogspam, or even stop indexing them altogether.


Google search heavily focuses on the last page that users visit for a query. And python documentation is hard for most novice programmers to understand.

My guess would be that engineers first go to documentation, don't understand it, go to low-quality-content-farms which answer their questions in natural language. It's low quality but it's enough for novice use cases such as python endsWith.

And this leads to a big reduction in Google's ranking of python docs.

TLDR : most novice programmers don't/can't read docs.


I think your TLDR is accurate, but I think the mechanism is likely even simpler than Google looking at the last page visit.

The docs are written by and for experienced programmers. They're very dense with information but light on examples and comments that explain things for dummies.

Its popularity means the vast majority of Python users are novices who would find the docs hard to read. Geeksforgeeks et al are providing content for them and so is actually a better resource to show the majority of searchers (especially given that it's a basic string formatting question, something very likely to be searched by for novices).


Your post inspired me to do some research on options for blocking these sites and I stumbled upon the uBlacklist browser add-on. It's open source, easily configurable, supports multiple search engines, and you can even use community block-lists instead of building your own.


Not an answer to OP question, but if you find yourself googling "python endswith" (as per example) then your DX is somehow broken.

Invest in good code editor with linting. No more googling for such trivial things


DX: "Developer Experience"

I suspect that someone who needs to know that their developer experience is broken probably doesn't know what "DX" means.


Who googles dev docs today? Use ChatGPT. /s

But seriously, try the same search query with some LLM, you will get description and code example. Add usecase and it will fit it into example.


I have stopped using Google for searching tech refrences like python.org, and I work at Google (not on search). IMHO it's sad.


I've completely stopped using Google for this kind of thing. ChatGPT or devdocs.io to go straight to the docs


I'm not sure I would consider those results "low-quality". Not necessarily for this use case, but often time 3rd party sites do a better job of providing examples and readable documentation than the "official docs". Also, if you want to use the official documentation, just use the search bar of docs.python.org. Then Google gets none of your views!


Google isn't here to help you find stuff, it's here to mine you for money.


the worst thing about those sites is they are slow.

I get why at least with the python docs, they're a little dense. some of those others have example uses which I could imagine people find useful.

geeksforgeeks with the login nag page is pretty bad.


Google search is now practically shit. Instead of adapting it's rank so it links to high-quality content, it forces sites to adapt to its ranking model so they have to produce meaningless crap.


I don't think word count is a new metric. It's just that documentation wasn't much of an ad revenue market in the past. Link juice also flows for backlinks. These sites definitely get heavily linked by blog spam.


While I believe this is still a bad metric, I also believe that finally it should be up to the user to decide which sites they want ranked first - I'd definitely boost python.org while banning these content farms.


Some day I'm going to start paying these people:

https://kagi.com

DuckDuckGo is a bit better than Google at avoiding content farms and semantic web spam, but I think it's mostly for the same reason Linux and MacOS used to have zero malware.


I hate shilling for the Russians but I've been very impressed with Yandex lately.

It retains that early-2000s Google vibe, where there were still quirky independent sites to discover. Using it also surfaces just how much Google is censoring its own results on controversial topics.


Yandex is great, same as its image search. Props to the Russians heh.


https://kagi.com/search?q=python+endswith checks out:

1: docs.python.org › 3 › library › stdtypes.html

2: pandas.pydata.org › pandas-docs › stable › reference › api

3: This discussion!

I've tried DDG and Brave for a few weeks each, and gone back to them once in a while when Google search has completely failed for things I know exist. They mostly have worse results than Google (purely in terms of relevance of the first page), and Google is trash now. I'll give Kagi a try for a few more days and see, but maybe the web has finally got to the point where paying for search is a useful strategy.


Brave Search is enabling you to do that with Goggles: https://search.brave.com/help/goggles


I have used Brave search for some days.

It's extremely bad.

You can see for yourself.


I use Brave Search pretty regularly. I'm not going to say it's better than Google, but in my experience, it's a not-too-distant second.


For me, second, and in many cases, better is- DDG.

As I mentioned in another comment, Phind and You Code have been quite good as well.


Because Google is run by business executives and clueless product managers. The engineers have been forced into the back to only think on whatever the current sprint is and told to buzz off when it comes to biznass decisions only those with MBAd can solve.


as far as I know in technical subjects google started to deliver farm content, a little bit after the happening of chatgpt.


I don’t see any ads on any of those mentioned pages…


Are you using an ad blocker?


i'd love a search page that lower the page rank if it contains ads. Maybe even a search that blocks all pages containing ads.


That might have a chance of happening with a search engine not built by an advertising company.


probably to do with realpython blocking content with the popup unless you sign in.


Because google is shit.


AI content farms are about to kill Google anyway


Google can't make money serving ads if you go to the ad-free python.org.

Obviously I'm not saying they're singling out python or documentation in general as some kind of cash cow. More realistically the story is that, sites that serve ads make money, and can spend money on cat-and-mousing SEO to keep making more money. Technical docs aren't going to do that. Google could whitelist them but it seems they turn a blind eye for the aforementioned reason.


Google shouldn’t whitelist them but use them as a test case for the ranking algorithm. If a content farm comes up ahead of the official documentation site then the algorithm is obviously broken.


If I had to guess, most Python users would rather read the farm sites than use the official documentation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: