I'm ready to believe the most charitable explanation for this: the new code search (which I have found to be exceptionally good) has a lot more going on under the hood than a normal search engine, which makes it a lot more resource-intensive - limiting it to signed-in accounts saves a huge amount of server resources that would otherwise be used to serve crawlers.
My guess is that the trade-off here is genuinely a question of if they spend 3-4x (I'm guessing, but I think this is likely a low-ball estimate) on their search infrastructure, v.s. having people angry at them for requiring login to use the feature.
The most realistic one is, that people will get fed up and register an account, and github can brag about how many new users they brought in.
The other side is, that existing users will get pissed at github, because they can't even search anymore without logging in, and sometimes that's a pain (not their pc, public pc, incognito tab, the time needed to do the 2fa, etc.).
Github can still keep the cheap, fast basic search for users not logged in, but they didn't.
I agree with you, but I also feel like you're throwing pearls at swine. Some people will always have an incredible amount of entitlement to justify their laziness or incompetence.
I mean, the person opening the issue escalated this into socioeconomic issues just because they don't know how to use grep... What else is there to say?
> I mean, the person opening the issue escalated this into socioeconomic issues just because they don't know how to use grep... What else is there to say?
I suppose one could try to see the value in the argument and engage in good faith discussion but it's easier to flippantly dismiss people like this, I imagine.
In reality, how many people are regularly using github search without having a github account? Can this change really be expected to bring in a meaningful number of users?
Trying to think this through, my own best guess for what is going on is that there is some amount of traffic coming from bots/scripts and github would like to have all queries associated with an account so that they can block accounts? I'm not sure if this makes sense, but I think it's a reasonable possibility.
I honestly don’t understand why. On my computer I log in once and that’s it. They don’t automatically log you out. I haven’t logged into GitHub since setting up my computer.
They justify it with "engagement" and it kinda works to a point, since if you don't get pissed with the broken search and register an account, you're not considered a "user", and if you try to use the tool but don't want to register you're not "engaging" with it, I guess... But it makes the initial contact that much more miserable.
I’m not saying you’re wrong, if anything it makes their decision even more ridiculous, but GitHub has completely saturated the market. Other than pissing people off what do they really benefit from this?
If you really care about being able to search github without logging in https://grep.app/ is pretty good, I would often use it instead of the old github search cause I found the results to be better.
I'm really focusing on the search feature that's no longer available for a given repository unless you're logged in. Everything else is "Microsoft added extras to git". If all I can do on GitHub is use it to git clone, then what's the purpose of it?
It's clear the "extinguish" phase of Microsoft's open source venture has started. There're plenty of very good alternatives, it's time to move on.
There is actually such a place! The Linux kernel (the original use case for Git) is developed by emailing patches to the appropriate mailing list. The mailing list is compatible with self-hosted mail systems.
You don't need an account to send email. You can send emails using nothing but a CLI tool on a *nix machine. Although most mainstream email services will send those to the spam folder these days.
You can browse code on Github just fine (and search within a file!) - you just can't search through the tree, which is a function of whatever platform the code lives on, be it your box or Github or some other web UI.
Do you mean "git clone"? I thought the idea about git is that you always take the code into your local machine before doing anything. Sure, github gives some bells and whistles to obviate the need from time to time, but that's basically decorations.
Agreed. If this is indeed a resource issue and they want to just block crawlers/bots/anonymous users from using lots of resources then perhaps having the old "less resource intense search" for users who aren't logged in and having the new "more resource intense search" for users who are logged in would be an improvement.
Ideally yes but the old code search may also require costly infra, like building specialized indexes of the codebase (eg an elasticsearch index).
Since those are "fixed costs" per repo rather than per search, they'd now be much more expensive per search if they were only used by logged-out users.
These are prerequisites to what I suspect you have in mind when you consider open source participation. Of which I do a decent amount, logged in of course, but this comes first.
Becoming a member means sharing your personal information with GitHub and abiding by their terms of service. It also implies that they will then track and profile locations where you sign in, technologies you use to interact with their services etc. That's a lot to give for "I want to browse a piece of code someone made for free".
Facebook recently estimated that knowing about a user costs close to €11/month. That's a lot of money for hosting and indexing git repositories... I guess we need an alternative.
As someone who contributes a decent amount of OSS code on GitHub, I want my code scraped by AI related companies. If GitHub starts making opinionated decisions about how to “protect” my code, then I won’t like that.
> If GitHub starts making opinionated decisions about how to “protect” my code . . .
They already did.
> . . . then I won’t like that.
There is no universe in the multiverse where Microsoft or one of its subsidiaries gives a single flying fuck. Something about those who forget history being doomed to repeat it . . . .
A scraper won't be using search. Only crawling links. If they required login to browse a repository that would stop bots but also cause a 1000x greater riot.
I just assume this to be the case for all future decisions like this. Surely the Reddit API changes were at least partly from this? Reddit is one of the richest sources of (pretty) authentic human interaction that humanity has ever made. Every AI company and startup was going to mercilessly mine this forever. I'm sure Reddit would much rather sell (our) interactions instead.
If they are doing this intentionally, then fuck Microsoft. Bunch of assholes as always. If this is mere incompetence, then again I'm not surprised that they fumbled such a simple task. Grep is literally 50 years old. They don't even have to write any code to do text search, they could use free software if they love it so much. This isn't some amateur startup with a CRUD app and 3 devs. This is one of the largest and wealthiest tech companies in the world, and they can't even get text search to work without a login? What a bunch of losers.
If there is a 3rd explanation, I'd love to hear it.
As far as their code search under the hood it's just Elasticsearch, nothing special. Github's explanation sounds like mostly bullshit. Forced authentication of public endpoints is not an appropriate solution for the issue they're claiming. People building bots who actually care about this endpoint can just have their bots login with free accounts. This doesn't actually prevent load on their servers in a meaningful way.
This may involve running two different sets of (for example, ElasticSearch) clusters, which doesn’t justify itself from a business perspective, especially given that non-logged in users can still clone code and use local tooling for search.
Might also help with thwarting easy access to tokens/keys... especially for those that are not currently in their filtered list.
While a bad actor can certainly scrape/clone the same data... Given repos with thousands of files... Scanning by scraping vs searching vs cloning.... Search is certainly the cheapest for the attacker while the most expensive for GitHub
Aside from performance concerns, there have been many occurrences of bad actors using code search to find hardcoded credentials, and then using that to gain unintended access to additional systems.
I suspect the change has more to do with security concerns (and having an audit trail) than performance.
I think the solution to hard coded credentials isn’t making search harder (security through obscurity == bad) but the other things GitHub is doing to detect and mitigate hard coded credentials.
Search is still possible using google and other methods, so any theoretical gains from forcing login are dangerous to count on as vulnerable projects are still vulnerable.
If anything, I think the solution to projects with hard coded credentials are to make the credentials easier to find and exploit so they fixed more quickly after being created. The most dangerous are ones they are hard to find so someone uses them for long periods of time without detection.
Some high-profile cases where credentials were leaked on public GitHub: Uber in 2014 and 2021 [1, 2] and Twitch in 2021.
> Search is still possible using google and other methods
Yes, you can search with Google or other sources. But the thing is, pretty much only GitHub has easy access to all the code present there, readily available to search within minutes of pushing.
You could try mirroring GitHub yourself, but you'd need enormous disk space and bandwidth, and would quickly hit rate limits. You also wouldn't be able to do fast full regex search like you can with GitHub's search, as you don't have their search infrastructure.
Hackers are aware of this and do make use of GitHub's search to identify possible leaked credentials. I recently experimented with uploading a dummy AWS credential pair to a public Git repo, and saw that numerous IPs started trying those credentials less than 5 minutes after I pushed.
> any theoretical gains from forcing login are dangerous to count on as vulnerable projects are still vulnerable
Indeed, requiring login is not going to _solve_ the problem. But it could lessen the impact of leaked credentials at large scale, by making it more difficult for automated systems to harvest them.
Perhaps more significantly, requiring login could give better audit trails in incidence response situations, as the logs would indicate which accounts were searching for secrets.
> I think the solution to projects with hard coded credentials are to make the credentials easier to find and exploit so they fixed more quickly after being created
Yes, this can help! There are several other companies that specialize in secret detection. I've also written Nosey Parker, a fast regex-based detection tool that has higher-precision rules than similar tools [4]. GitHub also has its own offering in Advanced Security to address this problem.
Yes, you are correct. But requiring that one be logged in to use the search functionality makes a stronger audit trail possible: looking at the search logs would indicate who has been hunting for secrets!
You'd think, with all that work going on under the hood, that they'd be able to support searches for repos by multiple paths:
> Give me repos with project.toml and a flake.nix at the root
The interface supports it, but searches come back empty. I end up leaving scripts running while I sleep which walk search results and narrow them by just checking to see if the files are there.
If you're out there Microsoft, please fix your search so I can stop being a bad citizen.
The new search is still so useless that I just maintain a full local clone of our org's code. Someone was requesting perms to run a script to search across the org today -- it would've taken 10+ minutes to run while respecting undocumented secondary rate limits (which there's no way to check via headers, by the way). I rewrote it to use ripgrep locally, and it's down to 20 seconds.
If that's the case, they should really fall back on a grep of the codebase because whatever is "under the hood" sucks. It routinely returns no results for exact string matches in code. These days I clone the repo and use ripgrep instead of codesearch because it's worse than useless, it's wrong.
They were already rate limiting unauthenticated users.
So are expected to believe that a hostile user who has zillion of ip addresses (to get around the rate limits) won't also be able to make a handful of accounts?
Yeah but that’s been the case since search existed. It seems unlikely that somehow now that’s become an urgent problem or requiring login will fix anything.
We have been running Internet without requiring user accounts. Search was possible. Has something changed? Has github been having real problems with bots? No?
Surprised to see this as news as it feels like it's been years at this point. I can scarcely remember when I was last able to search without being logged in.
I understand perfectly why this has been done from a business standpoint as it invariably increases 'user engagement' by driving people to become literal users!
Can we please stop treating GitHub like it's an open platform? It's not. It's a closed, walled garden like every other one. The fact that they host a bunch of open source projects you use doesn't make them better; if anything, it makes them worse, because they've helped these projects wall off a part of their contribution infrastructure behind the lock and key of a corporate account.
If I have to sign up for a "corporate account" or I have to sign up for a random self-hosted gitlab account or I have to sign up for your bugzilla & wiki & gitwhatever.... it makes no difference. As a user, it's all the same. Actually... of all of these options, I prefer a "corporate account" (aka github) because I can participate in a nearly infinite amount of projects without having to create new accounts/logging in/etc.
No one has walled anything off in any way that is materially different than any other option in the space.
I would argue that Github has done a LOT of good in the space. Making good software, making it freely available. Keeping it reasonably open and accessible. Keeping it standards compliant where there are standards. Having API's for the rest. And in general, giving a huge amount of storage and compute away for free for open source projects.
The beauty with git is that it doesn’t need an account. It can all be done via email. Hosted repos that use that are much more open and allow you to use the tools you want to contribute.
A lot of these walled garden platforms have contributed a ton and github has made source code more easy to host, no doubt. At the same time, we need to ask them to do better and not allow them to concentrate power for when the inevitable enshittification begins.
I think it also needs to be said that GitHub has helped countless open source projects grow- Git Hosting, wiki hosting, issues, GitHub Actions, GitHub Pages, a nice API...
There are LOTS of reasons to host your open source project on GitHub
The web has really closed in 2023. StackOverflow, Reddit, Github, Twitter all put the brakes on scraping and API access. The trigger was preventing AI training combined with a push to increase profitabilty due to commercial realities (tech recession, new ownership at Stack and X, Reddit wants to IPO).
I believe a long term effect will be a rise in the marketplace for proprietary data. Search engines, AI tools, anyone who needs the data will have to pay for the firehose or API access. (That, in turn, may cause antitrust issues if only the richest companies can afford access to that data.)
... and everybody will just think it's normal for "StackOverflow, Reddit, Github, Twitter" to sell access to content that other people created and own...
If something is truly proprietary, then the compensation should go to its owner, not to some random Git server operator or whatever. And the owner, not the server operator, should be setting the price and terms. If the server needs to make money, it can charge for the basic service itself.
On the other hand, if somebody created something to give it away, on a platform that at that time provided a channel effective for giving things away and advertised itself as suitable for giving things away, then the platform changing the rules midstream is morally piracy of that person's work.
Probably a very large fraction of the material on those platforms would never have been put there in the first place if the actual owners had expected random restrictions and arbitrary access charges.
If they want to radically rewrite the rules like that, then they need to not sell access to any preexisting content unless the original creator explicitly opts in. But we had Reddit, for instance, actively reinstating posts that had been mass-deleted by their authors specifically to prevent that kind of abuse.
To be fair, GitHub still has vast majority of features available as public anonymous API end points other than code search. It's just that they are rate-limited a lot more aggressively. And making a GitHub account is free and not terrible intrusive. Even then, all APIs are rate-limited even if you are logged in anyway as they don't want you to kill the servers (that's why tools like https://gitstar-ranking.com/ have troubles because they are relying on free accounts to scrape all repos' GitHub stars).
As for the actual important data (source code), you can simply clone the repo anonymously using HTTPS endpoints, and do your code search there (e.g. there are third-party websites like https://grep.app/ that can do this).
The issue with say Reddit/Twitter is that the source data (posts/comments/upvotes and tweets) is no longer available via APIs, making it impossible / hard to build third-party tools. With GitHub, the source data is still easily accessible, with search (the auxiliary layer) disabled for non-logged-in users. I think they are quite different situations and mixing them together is not useful.
Yes and no. ExpertsExchange was notable for having "walled" responses and was similar to StackOverflow. In fact, I remember when StackOverflow was launched, it was compared to expertsexchange but "open".
It is time to move from centralized services to something more distributed and "micro transaction based". I know a lot of people in HN will dislike this butn All the ones you mentioned (Q&A forum, News link + commenting aggregator, Git Hosting + Approval Flow + Wiki + Ticketing system ) can and should be implemented in a completely distributed manner: (using Kademilla/Bittorrent technologies, IPFS and CryptoTokens for paying micro transactions).
Going 100% distributed is the only way these sort of things are going to stay "open" and free of corporate greed (we have seen time and time again, original founders may start the project with good intentions, but in the long term, the product gets sold and bean counters take the lead).
A charitable interpretation might be that search requires a fair amount of compute, and is therefore a big denial of service vector.
I am not sure how much behavioral data GitHub can gather from logged in user, and how useful that is compared to the code that is there anyway. Maybe to figure out which parts of code are important? But that isn't really user-specific.
Yes, it's a real problem for anyone offering any sort of search capabilities. Like, about 0.5% of the traffic to my search engine is human. I'm not aware of any search engine that doesn't have similar stats.
Well about 99% of the search requests I got back when I was using cloudflare couldn't get past their bot-mitigation, and of what made it through, at least half looked very automated.
I'm a human and I can't get past cloudflare "bot mitigation" with my browser. Bot mitigation actually just means your browser executing the latest bleeding edge javascript functions to make sure your behavior is monetizable.
No that's not actually true at all. The website always worked with text-only browsers, cloudflare or not. Thoroughly tested with the likes of w3m and dillo.
Virtually all of the traffic that was intercepted claimed to be modern Chrome or Safari or similar, which should be capable of "executing the latest bleeding edge javascript functions".
The primary reason why anyone gets shit from bot mitigation is IP reputation, this is far more important (and effective) than looking at browser characteristics.
I'm a human, yet I am unable to get past Steam's captcha. It is not the only site that I cannot prove to not be a robot. I'm guessing the number of collateral damage is worth it to them. I'm not a big gamer, and wouldn't be a big source of revenue for them anyway.
This (behavioural data) is precisely Microsoft's playbook - no charitable interpretations ought apply. As far as I am concerned, no Open Source project has any justification for still being on the platform as of the day of the MS buyout. It's not as though there aren't good alternatives just a git clone away.
> This (behavioural data) is precisely Microsoft's playbook
What behavioral data can you glean from a code search like Github's? The context is very different than, for example, Google's, so is there really much useful data you can get here?
From a code search in the wild, with no context? Not a lot. From a code search from a person who's logged in, identified? Well, probably still not a lot, but it's another factoid about that person to hang onto the knowledge graph.
Another factor: anonymous faceted regex search across a huge volume of code allows bad actors to find hardcoded credentials and gain access to additional systems, without a good audit trail.
But yes, there are multiple good explanations for why they would lock down the API.
I've been using sourcegraph when I'm not logged in, which has also the bonus of being a lot better (at least when compared to the old search).
It's as simple as appending the repo URL, starting from github.com: e.g. sourcegraph.com/github.com/rust-lang/rust/ (you can try searching for unit on this repo)
I don't see a real issue with forcing you to make a free account and log in. After all it's also permissible to only offer source code in the form of shipping you a CD in return for postage fees. If people don't like how the code is provided they are free to upload it somewhere else.
A paid subscription could be crossing the line. Or maybe it would be fine as long as no profits go to the entity that provided you with the binary. Hard to tell.
It requires accepting additional terms, seems pretty likely to be a breach of the obligation to provide the sources (if it's the only way to obtain them).
Yes. Open source doesn't require that you distribute the software to anyone. It only requires the the people you do distribute it to (if any) all have access to the source code under an open source license.
If the binaries are publicly available anywhere the sources need to be as well (or at least be provided upon request, but I don't think many developers would like to deal with that)
API Usage yes as it requires a token, but git clone requires no auth and still seems to work behind datacenter IPs (e.g. mullvad) meaning that there is little stopping someone from mass cloning.
It's a perverse reality but it seems that in order to keep some ecosystems open one has to take actions which are resource-wasteful (though I would argue in the larger picture it will save resources)
Yes. You can put a piece of code behind a paywall even. The only requirement under GPL and similar copyleft licenses is to make the code available to those you make the software available to (which can be only paying users, if you so wish), and to allow them to redistribute it under the same terms (which tends to mean that if you do make it pay-only, one of your users can just publish the code if they wish). Absolutely nothing in any commonly used open source license requires anyone to post something publicly and freely.
Yes (though the GPL does allow you to make the code available via post and charge a reasonable postage fee), if they're a user in the first place. You don't need to give the code to anyone you didn't distribute the software to.
Even if I agreed with the position of person who posted the question (which I don't really, because search is not necessarily cheap and could be used for a DOS style attack), why do people feel the need to be like this? Like, what benefit is saying, "Is it not enough to monetize every bowel movement, you now feel the need to track which individual lines of code I'm browsing?" here? Chill out.
The underlying argument isn't even that good. Besides that repos themselves are _not_ gated behind a login and so an open repo is able to be accessed publicly, including cloning the repo, if the point is that the author wants his repos and code to be useful to the public, then if there is ever a need to interact beyond just downloading the code and searching, such as creating an issue or making a PR, that contributor would necessarily need to log in.
Yes, there is very often a need to quickly search in source code (to figure out how something works, to discuss about it in places other than GitHub issues...).
Cloning is usually fast but not with very large repositories, and you might not even have enough space on your current device to clone them.
By the way if a lot of people will end up cloning everytime instead of logging-in (after all logging in to GitHub is a pain with the 2fa) I don't think their systems will have less load.
And not everyone might want to have a GitHub account
HackerNews: Can no longer view the frontpage without seeing a complaint about a free and ad-free service not being provided to users who are not logged in.
If it's free, then you are the product. That's an adage that holds true in most cases, and in this case, your code is being used to train Copilot, among other things. Even if that weren't the case though, and the service really was as benevolent as you seem to believe it is, would you be sitting there clapping as the service gets worse for a decent subset of users? Is that exciting and interesting to you?
Even if your open source code is on GitLab they can still be used to train Copilot. Most permissive license allows you to train AI models using them. Any license that forbids AI training would not be considered "open source" under most definitions.
Being on GitHub just makes it easier for them as they don't need to manually clone it from a different service.
Do you have any references for this claim? I browse HN pretty much everyday, and "I can't do X without logging in" doesn't seem to be a popular type of post. The only example I can think of is Reddit's API changes, which were a while ago and loosely related to logging in.
If folks want to continue searching open source, https://sourcegraph.com/search does not require sign in and also includes major projects that are not on GitHub.
Thank you. I don't want to sign into my personal github account on my work computer, so I always use sourcegraph if I need to search a repo.
I remember longing for the ability for github's search to rival git clone + grep, but I never expected a login wall to come with it. IIRC I expected the login wall to just be because the feature was in beta and would be removed when it became the primary search.
The bots can still do a git clone and index everything, this just inconveniences normal users working on "some other" PC (or browser or incognito tab), where they are not logged in, and/or don't want to log in (coworkers PC, 2fa, whatever).
> The bots can still do a git clone and index everything
Of course they can, but then they're gonna be chewing up their own disk space and bandwidth for anything after the initial hit. I think the real problem is that the bots hit the GitHub servers over, and over, and over again.
A lot of bot traffic is just mindless "follow any link" traffic, not specialized bots to do X. It really is hugely pointless and wasteful to have tons of these bots request tons of comparatively expensive search links.
Maybe if the bot operators have the resources, but it's far from trivial to keep an up-to-date mirror of every project on Github, especially if Github is actively putting up barriers to prevent it. Once login is required, it becomes much harder to bypass rate limits because the company can rate-limit signups from unknown domains, enforce 2FA, etc.
They can do all this (rate limits, etc.) for unregister accounts already, and most users would never notice that (since a human only does a few searches per time unit), but they decided to require a login anyway.
I run a small website that tracks releases of a software, it probes releases every few hours, with each run consuming a several hundreds of API requests.
GitHub API token limits are pretty generous. There is no paid offerings to increase the limits, so I suppose if you ask GitHub nicely, they will increase the limits given a reasonable justification.
I can't think of any software I've ever used that I was this concerned about release schedule. Even if I was a user of your site, I might check it daily, but even that is doubtful.
Searching code server-side (especially with the newly added intellisense-ish features) is probably expensive, yet they think they deserve such a feature for free, when code searching/browsing isn't GitHub's primary goal.
Why not either learn to use local code search tools and clone a repo, or just log in and search then? Are either of these things such a hurdle?
I sympathize that the person opening the post is frustrated by the chain of PEBKAC events that unfolded, but that's not really an issue with GitHub.
The issue opener is behaving like an entitled toddler by making his problems "societal" problems. Honestly, the lack of self awareness with some people is astounding.
My problem with code search is that when I’m searching I’ll randomly get dumped to github.com/search with the search blanked, and anything I put in the search box at that URL is ignored. I have to go back in my history until I’m in the normal GitHub UI again. My search has also been discarded there, but at least I can interact with it.
This is pretty disappointing on GH's part. They seem to be moving further and further away from things that made the platform great.
https://sourcegraph.com/search has a really powerful code search can be a pretty powerful alternative and allows you to search codebases without logging in and across different code hosts.
You all talk a big game, but your actions are ineffectual. You don't consider the leverage you have to "encourage" Microsoft to turn public GitHub search back on.
Real leverage looks like all of us banding together to boycott Microsoft's most profitable income streams. Not GitHub. But cloud services like Azure, and Office, gaming, etc:
It's about making unilateral ensh*ttification decisions like this so expensive for parent companies that they become unthinkable. The board and investors sometimes need to be reminded how expensive these decisions really are.
We've all seen stock prices tank by half in one day since the Dot Bomb. A weak signal starts a big wave in this age of algorithmic trading.
A high-profile company publicly switching to AWS/GCP to avoid eating MS's sh@t sends a strong signal.
More importantly, after everything that MS has put us through over the decades, divesting is FUN! :-)
I mean, GitHub is still by far the most open source friendly repository hosting service by orders of magnitude. I wish we have code search back for anonymous GitHub use as well, but the simple fact is there isn't another service that comes close to providing free service to open source software. Now, I know that GitHub built its reputation on hosting open source software so it's probably not fair to compare other services like BitBucket or GitLab to it, but I'm not sure what people are thinking they can divest to.
as an alternative, you may add `1s` after `github` in your address bar to open the repository with a browser-based VS Code, and then use Ctrl + Shift + F to search across all files
Why do it via 3rd party if you can just change github.com to github.dev and get 1st party VSCode? Or better yet, just press "." (dot) character on keyboard and VSCode will pop up.
But that is repository search only, of course and not github-wide.
I see a lot of people upset about the openness of the web or whatever, and I am too, but I don't see many who point to specific harms that this causes. Of course, this leads others to believe that the complaints are nitpicky and lazy. Since I use this feature heavily, I feel like I might weigh in why this really sucks.
First off, I have a GitHub account. But I'm often in contexts where signing into it is annoying, and intentionally so (it lets me push code!) I usually don't want it on my work machine, even though I search GitHub a whole lot from there. Some of it is of course related to my job directly, which you might plausibly make the argument for that my employer should somehow compensate them for a free service that indirectly makes them money, but a lot of it is literally my work computer being a second machine that I work on, with open source code running on it, and usually these searches are to contribute value back to the community–either because I want to file a bug against it later, or maybe even deciding whether it is worth my while to contribute code to it. And I do this all the time from other devices, too: I might have logged into GitHub on my phone once, but my iPad? My mom's computer? Being able to search GitHub is an excellent way for answer people's tech questions at any time, much like if Google asked you to sign in everywhere it would be a massive pain.
Second, and also quite important, is that I can't use GitHub links as a way of pointing people at code anymore. I frequently (check my history!) will post a comment like "yeah the Foo project uses bar API a bunch like this, [GitHub search link]". It's a very quick and very direct way of sharing this information. Of course I have no idea if the people on the other side are logged in or not, which means that if you put them behind a login wall I will slowly stop sharing these, because people will complain that they can't see what I sent them. It's the same way I hesitate to send people links to Twitter/X these days, because whether the content will be accessible is a coin flip. And I'm definitely not going to ask people to clone the repo (on what, their phones?) to see what I'm seeing, so I might as well link to Sourcegraph instead.
This issue blows it way out of proportions. The code is still freely available, Github just doesn't want to give away resources for free because it's a for-profit company.
If you're mad at a for-profit company for not giving away things for free, you're simply delusional.
> This issue blows it way out of proportions. The code is still freely available, Github just doesn't want to give away resources for free because it's a for-profit company. If you're mad at a for-profit company for not giving away things for free, you're simply delusional.
GitHub is still giving away resources for free, they've done it forever because it was their business model: attract as many people as possible through open source repositories, a fraction of which will then use their non-free services (and more recently, also to train AI models).
Now that they're so dominant (and owned by Microsoft) they can afford to worsen the services, but people have every right to be pissed about it and push developers to move their repositories.
I have a simple guess. We are in a post zirp (zero interest rate policy) world. I've seen very large companies embracing degrowth.
Quite possibly Microsoft told it's subsidiary GitHub that they needed to spend less money and this is how they save resources while impacting the users they care about the least.
You also can't sort code-filtered search results by last-indexed anymore. The removal of this feature imposes a new tax on searching popular domains of work involving rapidly changing APIs and library preferences, such as deep learning.
GitHub search is not exactly supported by ads. It was a much more, uh, baffling decision to have limited public access to Twitter, which is literally ad impressions. I guess I can't comprehend that level of business genius.
Controversial opinion: when a service provider makes antagonistic changes to its service, it's actually a good thing, as it pushes people to find different solutions and not depend too heavily on one service provider
I imagine Microsoft view the code on their platform as an asset, as they can use it all to train AI that they can sell on. They don't want anyone else doing the same thing with the code on their platform.
Yes, this is standard now on Github. I noticed this weeks or even a few months ago. Now I often resort to cloning repos and running my own dumb grep. Thanks for being helpful, Github.
Yes, but does come with some checks and balances: an e-mail account (allowing them to filter out any dodgy domains or throwaway services, a captcha (bot detection of sorts anyway, may be hidden, would also be used for public search), a barrier to entry via their signup procedure, etc. And then they would have a unique token - your username / ID - to help detect flooding / scraping, set a limit of e.g. 100 searches per hour or whatever they have deemed normal and acceptable behaviour.
To be fair, I’m surprised it was open in the first place. For a website that does not have any ads, having its enormous amount of data not only readily available but also searchable felt like a bad choice.
What I hope never happens is the actual enshittification of the site for logged in users.
everyone here will forget we only have encryption today because openbsd was in Canada.
every usa based project had to quit shipping any cryptography code. but who cares about history. who cares you can only participate in some open source project today if you have an account with usa companies? nobody cares. screw the Iranian engineering student. and hope the usa doesn't outlaw encryption again.
To train LLMs it seems asinine to try to scrape things instead of making a clone.
The only automated searches I could envisage to MAYBE be happening are those looking for vulnerabilities, and there are probably only a few dozen actors doing it.
State actors seem actually more likely to use clones of everything.
I stopped being able to do GitHub code search without being logged in several months ago. The search bar would search through open issues; clicking code on the left in the results prompts me to login. Has been this way since at least June.
Yeah, I used to use github to search for codes that I don't remember in what repository I used, by searching for keywords that I can remember, but that doesn't work anymore.
I still can’t get over the fact that GitHub blatantly disabled regular search in favour of some Git mumbo jumbo nonsense that forbids you from looking up recent code. Any new technology, be it an API or otherwise, you can’t explore it on GitHub because “search by recency” is no longer a thing.
What an asinine thing to do. My GitHub usage has dropped exactly to 0% for this very reason. I know I am not alone either since their org forums is filled with complaints about this.
Not going to use any strong words but I want to. Disgraceful UX choice.
They probably want to use user info on what they search for, what they use, etc to better filter between good/bad popular/not-popular code for finetuning things.
If people can search anonymously it gets a lot harder to datamine
If you wished to build a ML bot that taught switches/loops; instead of cloning every repo all you'd need to do is search for switches/loops within X lang.
Much less so. It can easily be parallelized without running into rate limits of the proprietary search API. Then once you've cloned it once it's on your system and you won't have to talk to github again. So much more sensible.
Sensible isn't a thing when it comes to malicious activity. Same could be used to cheat the search.
Downloading a repo and than neither knowing the repo has the syntax you wish to learn is still going to be more resourceful than having a webpage thrown at you with all search results.
To those arguing that GitHub needs to do this because search is resource intensive, why can't GitHub put their code search behind something like Cloudflare Turnstile?
My guess is that the trade-off here is genuinely a question of if they spend 3-4x (I'm guessing, but I think this is likely a low-ball estimate) on their search infrastructure, v.s. having people angry at them for requiring login to use the feature.
I actually built my own code search tool for use with GitHub repos, but I've mostly stopped using that because the new GitHub code search is so useful by default: https://simonwillison.net/2020/Nov/28/datasette-ripgrep/