I would be willing to bet that the driving force behind the decision was to make it less trivial for LLMs to say "the data was already there under an open license, so we legally undercut stack overflow".
The fact that everyone is hoarding data because they think there is a gold rush afoot is obvious. Everyone with loads of data is clamping down, hoping they can get a cut of those AI VC dollars. Except for Wikipedia at least.
But let's be real about the morality here: Stack Overflow is a badge-powered mechanical Turk. It uses 100% unpaid labor to go and search Google for answers and post them on SO, providing a "service"[1]. For it to moralize about the ownership or sanctity of data is irony.
[1] - There are exceptions, obviously. There are true experts who wander the virtual halls of StackOverflow and dole out wisdom. But overwhelmingly it is clear that answers primarily come from people who rush to Google and then copy/paste from blogs and tech papers. And while Stack Overflow dumps are CC because that's the agreement that it made with contributors, a lot of the content on the site was ripped without attribution and in defiance of IP. So...maybe not too many tears for SO.
> There are exceptions, obviously. There are true experts who wander the virtual halls of StackOverflow and dole out wisdom.
This is what makes SO valuable for me. Every year it seems like "trust but verify" requires a little more emphasis on the verify part, so it's extremely valuable when I run into a true expert where I know I can take their input at face value and rely on it.
I think the winners of the AI gold rush will be the ones that figure out how to help users assess the trust they should be putting in the information being surfaced. The problem I see with the current ChatGPT, etc. systems is that they seem to treat visibility and popularity as credibility and expertise when that's not the case. In my experience, the real experts don't say anything unless they can add to the conversation and there's never a lack of low quality information from uninformed participants.
For example, via ChatGPT...
Q: How do I protect against the scrub of death with ZFS?
A: The term "scrub of death" is commonly used to refer to a situation in ZFS where a data scrub operation can lead to catastrophic data loss due to undetected errors or issues. While ZFS is designed to provide data integrity and protection, it is still important to take certain precautions to minimize the risk of encountering such a situation. Here are some steps you can take to protect against the scrub of death with ZFS:
Except it's not a real thing [1]. For bonus fun, read the comment right after Ahrens'.
I don't understand why SO doesn't lean into that. Anyone can train an LLM on the raw data, but SO has the information needed to do a better job of ranking the quality of the inputs, so wouldn't they be able to build an LLM that's significantly better than anyone else with the same raw data? Understanding the quality and reliability of an answer is far more important to me than getting an answer.
What's more frustrating than getting an answer on a programming question and taking hours to figure out that it was complete BS and doesn't work as described?
I don't know much about LLMs, but, if I were SO, I'd be figuring out how to lock down the ranking information as quickly as possible because that's where the value is. The ranking and acceptance of answers, alongside tags, overall user rank, participation frequency, etc. should mean that SO has a significant advantage when it comes to ranking and weighting the input data, right?
I want the input from subject matter experts to count the most and SO has the best data set to provide that. I don't see the point of locking down the content when the real value is in the ranking. It's odd that SO doesn't see that considering the entire network is modelled on that idea. Maybe they do and there are bigger changes coming down the pipe.
I think the real debates are going to come in the future if SO releases a paid LLM product that's trained on community contributed content and rankings.
So, so much decentralized tech never gets adoption due to a lack of an identity management layer that nobody wants to build because it can’t be perfectly decentralized and have the account recovery features that 99% of regular folks need. This is an example where perfect is the enemy, nemesis even, of good.
Someone should build an identity system that is optionally centralized or federated (if you like your key custody, you can keep it), migrateable and that ONLY handles identity. That will still be orders of magnitude better than relying on Google, Twitter and friends, simply because there won’t be a glaring conflict of interest of platform rent-seeking.
Moreover, anyone who wants to build decentralized/federated apps don’t have to reinvent the wheel poorly. It’s so sad to see project after project fading into the ether because people can’t fucking sign in in a reasonable way.
At least with crypto currency, there’s a somewhat strong argument for individual key custody, but I’m not talking about protecting $20M while on the run from the feds, I’m talking about afternoon shitposting with friends and strangers.
Ahaha.. is this a serious post? I'll take the bait.
If you want to shitpost with friends and strangers than exists no realistic purpose for identity management since the main goal is to remain anonymous and true anonymity comes by default on nostr.
In case you do want to protect your identity in that case protect your keys. In case you missed the last few months, there are browser extensions that do not grant access to private keys, similar to metamask and other crypto wallets.
All of that are battle-proven technologies with several years of practice and success in keeping private keys private. You should know that, the question is why don't you know that, or more frankly why won't you know that.
I think you’re misunderstanding my point. I’m not saying key custody is infeasible. I’m saying current solutions aren’t working for average people, ie non-techies who don’t even know what a private key is. Do you disagree with this?
If you agree with this premise, that also rules out browser extensions as a universal solution because most users are on mobile. They also have multiple devices and somewhat frequently forget their credentials. Nostr is amazing and if you read my history I have only good things to say about it. But that doesn’t mean that the UX works for everyone, and I simply argue that key custody is a recurring issue in practice.
(Btw, by shitpost, I mean any random discussion such as the ones here on HN or Reddit. Not 4chan.)
Please notice that the large majority of metamask users are complete non-techies. The rules of this mechanism are explained since the beginning: keep your keys somewhere safe and this hasn't been an issue since years now.
I'm sure you can agree that having someone above users with access to their private keys is a serious failure point to user privacy. Exactly the reason why nostr remains strongly out of reach when compared to government controlled media.
How does nostr handle determined activists with establishment backing? What if a nostr user posts information an activist wants censored and the activist goes around threatening relays, their hosting providers, anyone connected to people operating the relay?
There is a quintessential feature that differs nostr from any other social network in the past decades: Your private key proves that you wrote a specific text.
Assuming the scenario where a user is basically chased away from major western relays, he can still continue writing new texts with his private key. As long as some relay located somewhere (e.g. China, Panama, Moon, ..) accepts his texts, then others will still be able to read and know it was from that specific person.
There are other ways to censor a person/nostr: 1) Block the whole traffic related to nostr relays at provider level inside a country. 2) Make illegal to use nostr since it is "unregulated communication media". 3) spam the network with hideous/horrible content, and then market the protocal as "darknet" only used by criminals or mentally ill people.
Any of these tactics are used often. The thing about nostr is that texts don't live just on relays and anyone can easily archive them. This means that history by specific users can be kept and safeguarded for the future. That is mostly the reason why I like it so much. We only know detailed history thanks to the records that survived until our days. Any closed platform eventually closes down their data (e.g. reddit, stackoverflow, twitter, etc) but in practice this is the same as denying access to our collective online history. Nostr will survive woketivism or any other *isms.
Maybe we won't even have to wait for LLMs to destroy the web we used to know.