Hacker News new | past | comments | ask | show | jobs | submit login
X/Twitter has updated its terms of service to let it use posts for AI training (stackdiary.com)
345 points by skilled on Sept 1, 2023 | hide | past | favorite | 287 comments



Musk has been fixated on this idea that Twitter is a huge treasure trove of data for AI training, often complaining about AI companies using its data for training purposes. I had just assumed that companies like OpenAI were just crawling the web, which included Twitter, rather than targeting Twitter in particular. Is the Twitter data really that valuable for training AIs? What particular qualities does it have that make it particularly useful compared to any other freely available data set?


(1) Twitter's data is accurately timestamped, (2) there's new data constantly flowing in talking about recent events. There's no other source like that in English other than Reddit.

AFAIU neither of those are relevant to GPT-like architectures but it's not inconceivable to think there might be a model architecture in the future that takes advantage of those. Purely from a information theoretic POV, there's non-zero bits of information in the timestamp and relative ordering of tweets.


> There's no other source like that in English other than Reddit

1) Facebook Posts/Comments, 2) Instagram Posts/Comments, 3) Youtube Comments, 4) Gmail content, 5) LinkedIn Comments, 6) TikTok contents / comments

X and Reddit are definitely valuable, but they're definitely not unique. I think Meta and Google have inherent advantages because their data is not accessible to LLM competitors and they have the actual capabilities to build great LLMs.

Unless X decides to tap AI talent in China, they're going to have a REALLY hard time spinning up a competitive LLM team compared to OpenAI, Google, and Meta, which I think are the top three LLM companies in that order.


The discourse on Facebook & Instagram is significantly different than what you would find on X or Reddit, both in terms of quality & topics.


Facebook groups has, weirdly enough, had a bunch of quality discussions similar to Reddit. Can't speak for Instagram, but FB groups are worth peeking into to follow your favorite software projects.


My local city/town Facebook groups are surprisingly good. Like on Reddit there's always the specter of sketchy weird things happening behind the scenes with the mods/admins, but the day-to-day experience is very much that of chatting and sharing with my neighbors.


Fandom/topic or hobby/writing groups on Facebook are better quality discussion venues than Reddit if you can accept seeing some very obvious instances of spam posts and spam comments.


I haven't noticed a significant difference in discussion quality between The Platform Formerly Known As Twitter and Youtube/Reddit for most topics. Maybe I'm cynical, but the vast majority of public communication on the internet seems to be roughly the same quality in my mind.


> the vast majority of public communication on the internet seems to be roughly the same quality in my mind.

I mean, could it be that it's just that the platforms you're familiar with are similar quality? There are major quality differences. Consider, for example, HN vs Instagram. Do you really see no difference in the quality of discourse, or do you just not use Instagram?


> vast majority

By bulk/raw data volume, I'd say that the vast majority of internet communication is the same quality, yeah-- I'll stick by that assertion. That's not at odds with acknowledging there exist locations where intelligent communication happens. My position is just that the signal to noise ratio is pretty bad in the majority of places.


Not sure which way that cuts


> 3) Youtube Comments

Is probably LLM poison.

> 4) Gmail content

Is huge but also has enormous privacy issues. Most people by default assume their emails are reasonably private, whereas most people wouldn't assume their comments on these platforms are private.


That totally depends on the youtube channel. The tech and project channels I follow have excellent comments, often better than HN and proggit. Eg:

https://www.youtube.com/@HyperspacePirate

https://www.youtube.com/@scottmanley

https://www.youtube.com/@3blue1brown

https://www.youtube.com/@reps

https://www.youtube.com/@AppliedScience


Hiring a lot of experts from China sounds extremely politically challenging.


[flagged]


Oh give it a rest, will you? Hiring AI "also-rans" or whatever term satisfies your sense of national chauvinism then.

For kicks I tried searching for news about "Chinese AI experts" and found the CCP has apparently infiltrated august institutions such as the Financial Times and Harvard Business Review (here, for instance: https://hbr.org/2021/02/is-china-emerging-as-the-global-lead...). Maybe you can go pester them.


I am 100% certain that there is at least a single AI expert in China.

This is bait, right?


Of course, the issue is they don't have an ai industry. The idea that they do is a CCP talking point theyve been pushing lately. They're pumping out fake ai crap all over the internet.


That has to be the fastest I’ve seen someone concede a statement is true that they just called “CCP propaganda” a moment ago.


How is AI talent from China related?


I assume it’s for access and expertise into the “Sinosphere” of knowledge. Wherein the other side of the coin is the Anglosphere in the west.


Just take an aggregate of every news article written in the top 100 newspapers.

That's high quality content, timestamped and about current events.

There is very little content on Twitter that compared in quality to one will written news article.


I would assume this is simply not enough data. You also have access to all public domain books and they usually make up a small fraction of the data used to train a model from scratch. For fine tuning a model, having a unique source of high quality data is probably valuable even if small.


There is information posted to X which the top 100 newspapers are not willing to / do not care to publish.


As we've learned in recent months, it goes both ways.


It could also be argued that the Twitter firehose requires substantial RLHF, de-biasing and moderation controls because of its colloquial nature.


Safety/Alignment researchers have been too fixated on making the one perfect LLM that has zero bias (or, arguably, one preferred bias) in my opinion.

I don't think Musk is the type of person to make the same mistake, so we'll either end up with a Twitter LLM that accurately represents the sum total of the Twitter firehose, and/or many derivative LLMs each having a set of, possibly orthogonal, biases. Honestly, I think the later is preferable and would represent the diversity of opinions in reality more accurately.

Given the data source, I think it will be important to be able to switch between LLM personalities in the future to get the "crowd truth".


> Twitter LLM that accurately represents the sum total of the Twitter firehose,

We need an xkcd showing a conversation between twitter, reddit, and hacker news based LLMs. Political rage meets memes meets pedantry.


Wire news services come to mind as an alternative.


Perhaps, but the quantity of data is comparatively miniscule.


The quantity of information is probably higher though :)


Of course, but for training data current LLMs seem to need quantity above all else.


I’m sure it depends on the type of prediction task, right?

Current LLMs are trying to predict typical human prose from samples pulled from the internet. So it isn’t as if they are sacrificing quality for quantity. A bunch of text from the internet is a very good representation of typical human prose. Whether it is well written or the descriptions contained in the prose accurately represent, like, actual physical reality is another issue.

Maybe they want to predict something with, like, less dimensionality but more utility than a paragraph of fiction.


Agreed, not that anyone really seems to care so.


> there's new data constantly flowing in talking about recent events.

In which the distinction between "data" and "information" is crucial. Especially now that the "floodgates" have been re-opened regarding misinformation, bots, impersonators and the likes.

Data is crucial when in need of training body. But information is crucial when the training must be tuned, limited or just verified.


What about HN?


Probably HN is already part of The Pile [1]?

I guess X is harder to scrap without permission.

[1] https://pile.eleuther.ai


It is indeed:

We introduce new datasets derived from the fol- lowing sources: PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, Ubuntu IRC, HackerNews, YouTube, PhilPapers, and NIH ExPorter. We also introduce OpenWebText2 and BookCorpus2, which are extensions of the original OpenWebText (Gokaslan and Cohen, 2019) and BookCorpus (Zhu et al., 2015; Kobayashi, 2018) datasets, respectively.

From https://arxiv.org/abs/2101.00027 (The Pile: An 800GB Dataset of Diverse Text for Language Modeling)


...HackerNews...

And there's my incentive to stop posting on HN.

It's been a blast, guys. I'm going back to lurker mode.


> > Probably HN is already part of The Pile

Everybody smile for the camera, or we could just moon them, or both!


HN is a very specific demographic and probably orders of magnitude smaller in scale. Similar, but also not really comparable.


Smaller in scale, yeah, but probably high bias towards technical content and written by people who (mostly) care about how to write properly. There is at least 37,354,035 items that could be indexed with lots of it high quality, percentage wise probably higher quality/post than Twitter, Facebook and other sources.


I think Twitter is not a great place to dig out training data in general. Most of its data is not well structured and/or tagged. Its signal to noise ratio is relatively low. Its texts are generally very short and very dependent on context. Twitter has been largely failing as a short-form video platform. There's some trace of algorithmic generation of interest/topic-based feed on Twitter, but you know, its quality was never great. I guess it's just a hard problem given Twitter's environment.

Its strength is freshness and volume, but I guess these can be achieved without Twitter if you have a strong web crawling infrastructure? Also, the current generation of LLM is not really capable of exploiting minute-level freshness... at least for now.


Also, Twitter is not where people go to be nice. Twitter incentivizes snarky, disparaging, curt behavior because (a) it limits message length to an extent where nice speech doesn't have a place (b) saying nice things gets you likes while saying not-nice things gets you retweets, and retweets are more highly valued by the algorithm.


Twitter’s data would be very valuable for generating tweet-like content: short, self-contained snippets, images and video.

There’s not a lot of data in Twitter today resembling long-form content: essays, news articles, books, scientific papers, etc. That’s probably why Twitter/X expanded the tweet size limit, to be able to collect such data.


Yep! It would be a great it generating hot takes and click bait.


and racist, anti-Semitic content


If Twitter was as much a treasure trove of user data that Elon thinks it is, then why is Twitter's ad targeting so much worse than Facebook's and Instagram's?


Twitter has the largest database of tweets. If you want an AI that writes tweets, there’s nothing better. Why? Twitter could offer a service that bypasses the need for community manager… just feed it a press release, or product website, and it will provide a stream of well crafted ad tweets… or astroturf, even.


Doesn't really matter what the content is as long as the sequences of tokens make sense. That's the goal: predict the next token given the previous N tokens. Higher level structures like tweets just fall out of that but I wouldn't be too surprised if a model trained only on tweets could also generalize to some other structures.


Twitter and Reddit are extremely valuable to LLMs and makers of both are really kicking themselves over missing the boat with open APIs.


Now they're kicking themselves into irrelevance by restricting access.


Comment datasets are valuable for conversational AI, it’s the same reason Reddit locked down the API I imagine.


Training anything on a network of bot traffic. What a time to be alive...


It's not a change confined to Twitter; it's a glimpse into what could be the new normal for the internet.

You can bet Google/Gmail/YouTube, Amazon, Microsoft, TikTok, and every other Internet platform that works with user-generated content will soon do the same ... if they haven't done so already.


GMail's already been using your data for classification, data mining, etc. other AI purposes. I would be legitimately surprised if they're NOT using processed GMail data (potentially removing sensitive data, etc.) in training their LLMs or other AI projects.


Yup. To be clear, that's with the free version though.

If you pay, corporate versions of Google Workspace won't train on your data. That's very much by design, since companies don't want anything internal ever being exposed.

But with the free version, that's part of what you're "paying" for it to remain free.


Could you share a source? I'll immediately switch off my gmail onto one of my google workspace domains, if so.


https://workspace.google.com/blog/identity-and-security/prot...

https://9to5google.com/2023/07/03/google-privacy-policy-ai-t...

Although judging from that, it sounds like Google may have changed policy so that even private data in the free versions is no longer used for AI training.

They stopped targeting ads in free Gmail based on your e-mail contents years ago because of the bad press. So maybe they've stopped training AI on free Gmail/Docs data out of similar precaution, now that LLM's are everywhere in the news.


They'd certainly use it to train their spam classifiers, but using that data in a generative model shared with other users would risk information leaks, so they wouldn't do that.


They've been doing it for years. Everything you put on their servers is a source for them to train on. At FB Messenger in the mid to late teens, the suggestion and auto-reply models were trained on the entire corpus of unencrypted messages sent between users.


One of the big reasons why I went with Proton Mail for whitelabeled email service when I stopped hosting my own. Nothing to scrape on the server side.


The absolute irony of Google doing this.


[flagged]


I interpret this as more of a call to greater action, lest people thinking boycotting Twitter is going to address this problem, while happily continuing their use of instagram, google search, tiktok et al.


The terms now also have a class-action waiver clause


how the fuck are those legal? frankly more disturbing to me than the AI training one, which I find less objectionable than selling people's attention to advertisers


>how the fuck are those legal?

The US doesn't have loser pays and has some of the most expensive litigation in the world, which has created all kinds of problems. Someone can file a lawsuit against you knowing that they're unlikely to win, but in so doing they could cost you hundreds of thousands of dollars for lawyers, so why don't you just go ahead and settle for tens of thousands of dollars? It will cost you less to settle than to win in court.

This flaw was made to scale by class action lawsuits, which more than any other should be loser pays, because there is little question that thousands of people who have each been harmed to the tune of $100 could each front $10 for a meritorious lawsuit. But instead you get opportunistic lawyers signing up anyone they can find for questionable claims, so they can reach a settlement where the plaintiffs each get $7 -- or a $7 gift certificate -- and the lawyers get millions.

This was rightly regarded as a problem but the lawyers had enough political power to prevent a good solution, so what we got instead was to make it easier to force binding arbitration and opt out of class action suits.

Lawyers ruin everything. They even ruin lawyers.


IANAL. According to [1] they are legal per Supreme Court precedent, but it also says:

"Contract formation is increasingly scrutinised. Following Concepcion and its progeny, some courts have focused on issues of contract formation to determine whether the consumer in fact agreed to arbitration and the class action waiver. This inquiry is largely confined to online transactions, where a consumer is deemed to have consented to arbitration by using the business's website to purchase goods or services. These contracts fall within the rubric of "clickwrap," "browsewrap," or "webwrap" agreements and their enforceability is beyond the scope of this article. However, it is important to note that the courts will refuse to enforce class action waivers and arbitration agreements in such agreements when the arbitration provisions were insufficiently conspicuous to ensure the consumer objectively agreed to their terms."

The article also mentions that non-negotiable consumer contracts are viewed with more suspicion by some courts.

1. https://content.next.westlaw.com/practical-law/document/I9f1....


Class action lawsuits suck for victims. The only people it’s good for are lawyers, who become fabulously wealthy, while the victims get a check for two dollars in the mail.

Individual lawsuits are better for everyone.


> AI training one, which I find less objectionable than selling people's attention to advertisers

How do you know that the AI won't be used to sell people's attention to advertisers?


Unfortunately, "legal" means whatever certain specific people agree that it means.


That backfired on them when all the Twitter employees they fired sued them in individual lawsuits, so now they have to do the same discovery process and everything else over and over for each case. Lol. That only backfired because enough individuals that had the means to bring a full suit at the same time though. Unlikely to happen normally.


Oh, I see. For a few seconds I was wondering how a windowing system uses data for AI...


Other big companies should also shorten names of their premiere services, just to add to confusion.


If anyone is wondering, Netflix is now x.net, Amazon is a.com, Apple is a.us, Android is a.tv and Azure is a.mov.

Facebook will be known as y.us.


I can’t imagine anything worse for the world than an AI trained on whatever Twitter users are tweeting.


Can't wait to add all the thought leader hot takes bots to my ban lists.


This was done because Elon's other company xAI needs this data.


Nearly all terms allow them to use data to improve the product...

"Replacing some shoddy heuristics with a massive AI model" seems like product improvement to me.

Therefore, they were already allowed to train AI models with your data.


These terms are probably needed to cover public usage scenarios where someone could claim ownership or privacy violations even over a public forum.


So that they can train a chatbot? I thought Elon set out to eliminate bots.


Use it to heaven ban folks: https://cosmosmagazine.com/technology/internet/heaven-bannin...

CEO can use it on Elon to keep him placated


Did anyone actually believe him?


Elon has the superpower that makes it so that regardless of the question asked/answered, the audience is 40-40 "always believe" and "never believe", the rest being skeptics or 'don't cares'.

This often works to his advantages; the 'always believe' are loud.


The thing is that almost everyone believed him until it became known that many times he either:

1. didn't know what he was talking about

2. was over-optimistic about timelines to the point that there would be very little difference if it were a lie

3. objectively lied on purpose in order to further self interests

4. exaggerated or over-promised more than could be attributed to salesmanship

5. flip-flopped and refused to acknowledge the change in stance

6. stated things entirely to be vindictive, and let the truth of the statements be irrelevant

The problem isn't that people 'never believe', the problem is 'whatever is said should be suspect to the point that if it is actually true then that is entirely a side-effect of the statement and not the point of it'.


It's been obvious that he's inclined to talk nonsense for a _long, long_ time, tho.

At this point, there are shades of him being E Lon Hubbard to the remaining believers; for most of them it seems totally unshakeable.

(Humanity was extremely fortunate that the existence of L Ron Hubbard and the existence of the Internet did not significantly overlap.)


Everyone believed what?

You seem to be unaware or Elon time or how Elon has made predictions in the past, generally not that "this will happen by date X" but that "this cannot possibly happen _before_ date X". Which are very different statements.

There is a contingent of people who want to retcon any statement ever to paint Elon as a fraud, when reality is much too subtle for trivial blanket labels like you want to apply.


My favorite time on Hacker News was reading all the quasi-fortune tellers and intellectuals say his $44B bid for Twitter was never going to go through, either through his withdrawal or being rejected by the board. They were so set in their hubris that the only debate was whether the SEC was going to sue him after the bid inevitably fell through.


> either through his withdrawal

Which, as I recall, he tried but the courts said 'no'.


they were actually saying he would try to back out of it, which he did, and also that he would fail to do so, which he did


Surprisingly, a lot of people did (and some still do).

How many times have Trump been caught lying or doing a 180 from one day to the next? And there are still tons of people believing him.


Oh, that guy with the self-driving cars and the self-landing rockets and the creepy humanoid robots that can sorta walk? Yeah totally, he hates automated stuff.


other peoples' bots.


You've become the very thing you swore to destroy!


That guy had already smashed into the wall trying to get into the Internet, and not much relevant anymore. I mean, there's inertia, sure.


To eliminate spam bots. He didn't set out to eliminate AI.

I'm honestly surprised that anyone on HN wouldn't understand this.


Ah yes, the platform formerly known as twitter shall surely NEVER allow the weights produced from analysis of its conversational data to go into the production of spambots!

After all, there's definitely no market for a product that has been finely-trained on social media posts to the point where it can perfectly ape social media posts. There's no money for Twitter to make there, so the technology it produces from this analysis shall certainly never find its way into spam bots that create content indistinguishable from genuine humans on Twitter.


Cool theory but you have no supporting arguments.


It really is baffling. Removing bots from Twitter is totally orthogonal to training a bot on Twitter data- arguably the former is a prerequisite for the latter. And we've known that Musk wants to train a competing chatbot for at least 6 months now. Has he ever said anything to indicate that he wants to eliminate bots entirely?

Maybe I'm overthinking it and this is just a rhetorical gotcha.


Yeah, that this willfully stupid a comment is so high up on so important a topic is pretty damning of the value of the conversation here.


"Consumer action," whether organized or individual has 0% chance of affecting what happens next here. It's all down to ip law and it's adjacents. No real reason to expect intentional actions here.


Consumers can stop consuming... ex-Twitter, or whatever we're calling it.


Twitter's quite easy to stop consuming... Easier than Google, Amazon, etc.

It doesn't matter though. It's not just a matter of lock in. There are other reasons.

For one thing, data isn't really data in private chunks. It's only valuable collected.


Can but won’t


Is that true?

What if the consumer action was poisoning the dataset ?


Think of all the consumer action efforts, and (especially) talk on one hand. Think of the successes on the other... basically none but a few minor symbolic wins.


We are all going to be consumed in the AI machines.

"All capabilities are built from data analysis and the data they are analyzing is you. Whether by direct intent or not, all advancements are encroaching on consuming every knowable fact and inference about you. Your soul must be sacrificed to the machine to grant the powers it manifests."

https://www.mindprison.cc/p/ai-end-of-privacy-end-of-sanity


Including the biometric data they also said they're going to start collecting?


Is there a python interface to Twitter that can use password auth and doesn't require an API key? Is there some existing piece of software I can run that doesn't require paying anything to mass delete my tweets(or xeets or whatever they're called now)?

Not that I'm particularly worried but this is a good reminder that I need to blank my twitter history.


I use Redact[0] to delete all my Tweets, replies, retweets, and likes. It works with over 30 different sites too.

[0]: https://redact.dev/


I used https://tweetdelete.net some time ago. Not sure if it still works today.


I think it does but you need to pay them to use it. I don't feel comfortable linking my identity to my twitter account.


That seems new. A few months ago it was free.


X gon’ take it from ya.


I approve.


Ah, yes, the most annoying AI.

(In fairness, the AI trained on LinkedIn data might be worse.)


That was escalated quickly. :O

Any ideas how we, the users of the internet, should act to prevent those steps? or building a much better connected and not controlled by mega-corporates INTERNET social platform?


The obvious first step is to not give these companies your data in the first place.


I can't upvote this enough. I don't know why people clutch their pearls so hard when they find out the free app they are using is trying to monetize their content. I'd actually be critical of X if they didn't do this.

There's plenty of places where you can pay a nominal free to have your data 100% controlled by you alone. Plus, you can still use X, Insta or whatever to drive traffic to your site. Its not that hard folks!


Actually there isn't. We live in a world where the freakin TV I paid for is spying on me


> There's plenty of places where you can pay a nominal free to have your data 100% controlled by you alone.

Do you happen to have a list handy? If I had to do this today the only clear way would be a personal cloud setup.


> Any ideas how we, the users of the internet, should act to prevent those steps?

Don't use services with ToS you don't agree with.


What about when someone else buys a company and changes the ToS, taking possession of work we've already contributed under a different ToS that you were okay with?


Assign no loyalty to any company that forces you into a TOS.

Understand that nothing is more sacred than the Holy Dollar to an entity whose only purpose is to extract money from its customers.

Always assume you'll be betrayed by corporations and store your possessions (data in this case) accordingly.

Diversify your interests and hobbies such that a betrayal by a corporation doesn't significantly impact your life.

At this point, trusting corporations without putting any mitigations in place seems like begging desperately for trouble.


Clearly you should live under a rock if you don't want anyone to cross their fingers behind their backs or take advantage of the fact that you didn't say "no backsies".


That helps, but individual boycotts/market actions can't easily overcome the momentum of preferential attachment because brands are not fungible. Market evangelists/absolutists often rely on naive models of supply and demand, but forget that those models only work under conditions of perfect competition. This means ignoring the many many factors that contribute to inelastic demand like the cost of switching, the lack of ToS clarity, sunk costs, and prior network effects. There are plenty of people who hate Twitter but can't disengage from it because that's where most of their clicks are coming from.


Bad take when the ToS can change on a dime and companies don't give a hoot about retroactive respect.


The problem with TOS changes is they never provide a diff. They provide the entire TOS again, one that most people already skipped. I don’t think it’s fair to even provide a legal TOS that does not match your demographic.

These things need to be shorter and written at a 8th grade writing level, and I don’t mean that in any pejorative way. It needs to be clear to the average person (and I’m not even sure if I’m being generous there by suggesting 8th grade writing/reading level).


More importantly, I think official organisations like government, local government, safety, charities etc should not be encouraging people to use non-safe channels to interac. We should be lobbying them to stop


We need legislation that forces companies to prompt users with a ToS that’s no longer than 400 characters.

The prompt should read “So, you cool with that brah?”, and the word “Accept” needs to be replaced with “Yeah sure why not”, and “No” with “Nah, no thank you”.

Current TOS’s don’t provide enough clarity.

That way any semantic issues down the line can be boiled down to “brah, users said they were cool with it” (to the judge).


Yea that's the first step... but I think that as a group of users we can do much more and build a better internet.

This is like Cold-war between companies, that we as users are the cannon fodder...


Unfortunately that is basically all of them.


For me it was one part deleting my Twitter/Reddit accounts and one part rethinking the role of these sites in my life so I don't end up adopting the same habits elsewhere that will inevitably let me down, commercial or otherwise.


Stop commenting on HN? It'll take government action around copyright of training data + the use of decentralized and private instances of Mastodon to avoid it. Though it's still probably inevitably going to happen.

What I want to know is, what's the endgame? We've proven that AI learning eventually plateaus, so you keep feeding it data and...? Seems like an underpants gnome problem or companies trying to find another stock bump for all of this data lying around (that should be considered toxic instead)



Textbooks and reference books are much more useful for training a LLM than twitter. Both reddit and he are vastly overestimating the value of their content for training corpus data.


Depends on your use case, twitter is pretty good if you want a bot for astroturfing, artificially boosting content and engagement,slipping adverts into comments that look like they come from real people, etc


They are two completely different use cases. You can’t learn cultural memes and the best taco spots in X city from textbooks.


Any data you send to the cloud will be used to train AI unless you have a binding TOS or contract that says otherwise and that contract cannot be unilaterally amended this way.


I am not surprised at all. All major tech companies are at an arms race at the moment. Everyone wants to create the new thing and AI is the hip thing right now, with good reason. X, isn't necessary a tech company in the same ballpark as Amazon, Google and Meta. But have significant backing to able to create something. I wouldn't be surprised if other companies start changing their terms to allow them to phish data for AI training.


My 2 cents, no source, but training AI out of public data will eventually stop, due to the lack of transparency and potential legal issues associated with copyright. AI needs clean, diverse and also specialized data to offer an incentive for companies to build on it, beyond just conversational chatbots. Shameless plug, still early but I am trying to build this with faie.io


Ok. So What is the problem?

X and other social networks will do with anyway and you're just waiting for all the other social networks, like Facebook, Instagram, Threads, LinkedIn, Reddit, TikTok etc to follow suit and train their AIs on your data.

The solution? If you have a problem with it and you haven't done so already, just delete your accounts on ALL social networks.


Sweet. So, if I don’t log in ever again, or otherwise don’t agree to the new terms, they’ll delete my account for me?


I can see the future of platforms like twitter https://miro.medium.com/v2/resize:fit:1100/format:webp/1*bnU...


As far as I know Twitter always did ML on the data, and they resold it so that others did ML on it.


It's time to regulate these social media companies as utilities. If a platform has more than N million users, and its main feature is communication between humans, then it should be subject to regulation.


I kinda like those "titans of industry" rubbing it in.

It casts the dismal level of governance and regulation of online life in the unforgiving light of an X-ray. Let those skeletons be clear to all.


X/Twitter, Meta/Facebook, Ąćęńś/Microsoft, who's next?

If you need to write X/Twitter to make people understand what are you writing about, this means the rebranding was not good


I disagree. The change is recent and a rebrand isn't necessarily a failure if people take time to switch.

In twitter's case, they built a lot of culture around "tweeting" and the bird theming. The fact is that none of those are going to change soon.


Time to start x-ing.


Yeah x-ing will not "land" in any language...


I’m ok with it because it’s public and they own the platform.


With data going forward? or past data? Because I don't agree with using my past data... (kinda of rhetorical as I long deleted my accounts, though I'm sure the data is still there)


Technically, not deleting your data would be a GDPR violation of the right to be forgotten. They could be ignoring that but it would be very expensive if they were found out. Probably doesn't apply to models that have already been trained, but should apply effectively going forward.


what's public? you can't access X data without an account and accepting ToS


Meh, the ToS is unenforceable and VPNs with residential IPs are not that expensive. Sure it's little harder to scrape than HN for example, but everyone trains on twitter data (along with everything else on the internet)


Oh, it's very enforceable. People have literally been imprisoned for years for using alternate tools, like wget, to access a website. The not being enforced most of the time so that everyone breaks / has broken the CFAA laws covering TOS is a feature of the system. Like other such laws it only gets enforced when you rock the boat and someone rich/powerful enough buys a district attorney to indict you.


> People have literally been imprisoned for years for using alternate tools, like wget, to access a website

Source?


Weev has attacked by AT&T who then had him imprisoned for years for using wget instead of a browser to access a public website of theirs. https://www.wired.com/2013/03/att-hacker-gets-3-years/


You can Google it in a second. Aaron Swartz for instance faced up to 50 years. Weev (regardless of the nazi scum that he is) went to jail for it.


I asked because I did google it and I found nothing.

Aaron Swartz was not "imprisoned for years".

Weev "exposed a flaw in AT&T security in June 2010, which allowed the e-mail addresses of iPad users to be revealed.[39] The flaw was part of a publicly-accessible URL, which allowed the group to collect the e-mails without having to break into AT&T's system.[40] Contrary to what it first claimed,[41] the group revealed the security flaw to Gawker Media before AT&T had been notified,[40] and also exposed the data of 114,000 iPad users, including those of celebrities, the government and the military"

That's a pretty silly security breach, but it's still a real security breach. Not comparable to scraping twitter


There's very little chance he would have lost in court.

Obviously, there can be extrajudicial consequences though.


Having to go to court in itself and spend literally months or years with the possibility of being sentenced to most of the rest of your life in prison is itself a very severe punishment.


I think they're talking about the intention of the user in posting the data. People that post on Twitter and HN understand that the visibility is public, unlike Gmail.


This makes sense if you consider the new API price gouging. Much easier to attract those interested in training on twitter data if it is explicititly allowed.


Am surprised that are surprised with this, am pretty sure AI is already trained with internet data… Reddit and other sites are very easy to scrap



Going to be one stupid, stubborn, opinionated LLM


Threads is really nice, has a web version now, and text search coming very soon. No need to help fund insurrection at twitter.


You really don't think the same thing is going to happen there? I'd trust Facebook even less with my data.


That's all well and good but the content is, in my usage, trash. Their algorithm simply doesn't work to surface posts I like and the people I want to follow aren't on there.


I might be taking this as the final cue to archive my own history with the service. Wanted to do that for a long time already.


i mean, haven't they been doing it forever for antispam, antiabuse, ads and discovery systems anyway?

ai hype is exhaustingly stupid.


For a moment I thought the headline said XTerm would be updated to use AI training, and was confused.


OpenAI had access to the twitter database since forever ago anyway with the old terms of service.


So I get paid for using X now?


No, you've been freely and happily giving your content to them for their own use for as long as you've been using the platform. Why would they start paying you for something you've provided for free since the beginning?


It has been this way for awhile now, yes. You buy their premium subscription and then you get part of your ad revenue.


You have been for a few months. The payouts are better than many other platforms too.


yeah


No, you use X for free.


How does this work for old tweets if a person doesn't accept the new terms?


If training data is so valuable, then 4chan should be really valuable, right?


Relevant Andy Weir short story: http://www.galactanet.com/oneoff/twarrior.html


who in thier right minds wants to train data on 4.....chan



What if you want to make a chatbot that just argues and disrupts topics ad nauseum?

I'd go right for /b/. Perfect.


Is it possible to opt-out and not allow them to use my account’s data?


Is it a new trend now? Zoom then Microsoft, the list keeps on going.


Elon can take his Twitter data and stick it up his big fat jacksie ...

Train your AI on that ...

Respectfully speaking, fully in accordance with HN terms, conditions and guidelines.


Off-topic but I thought Elon Musk wanted to make X into a WeChat-like everything app where twitter would make up the social media part. Is he instead renaming twitter to X?


> Is he instead renaming twitter to X?

He already did.


And yet the url is still twitter.con


Would they like to resurrect Tay?


I love how this is justified by "everyone is doing it so it's okay." That's the bandwagon fallacy and I don't buy it.

Twitter (I will not call it X, because that's just stupid), is free to attempt to change their Terms of Service, policies, etc, but we do not have to accept it or agree with it or be resigned to it. Also, it should not be retroactively applied to past content, and it should be an opt-in consent -- but that is pie-in-the-sky wishing at this point given the garbage heap Musk, and others, has made of Twitter.


You're free to delete your content and your account if you're unhappy. There's nothing it "should" be, it's not your company.

Also, it's pretty obvious that everyone was training on Twitter data already before they cracked down on scraping. It is, after all, a public forum.


If someone created an account back when AI was not a thing, they did NOT consent to their data being used for AI:

You cannot expect people to retroactively consent to a thing the existence of which they were not even aware of when they gave consent for some limited OTHER use.

Besides, it's just rude to do things for which there was no consent in general, no matter whether it is AI or anything else. No consent is no consent.

Further, you can't expect people who have a life in general to sit around all day just waiting to figure out where they have to delete accounts before they are misused.

Bazillions of websites nowadays want you to create an account, the normal behavior is that users just abandon accounts which they don't need anymore. Nobody has the time to delete all of them.

Your profile says you were "Senior Director of Monetization at Reddit". Considering all the outrage that company has caused with its users as well (redesign cough cough - just one example out of a truckload), and that the outrage-causing things largely seemed to be aimed at monetization, perhaps you should do some soul seeking to figure out whether your values are aligned with common societal morals.

Or in other words: How many more people will you make angry until you realize that maybe you're the baddie?


You can get mad all you want about people using “your data” that you posted on a public forum for whatever, doesn’t change the fact that you were dumb for thinking that your (not important, unique, or interesting) musings would be protected from people using them for whatever they want by posting it on the open internet, much less the website of the company you posted it on, get real.


The fact that you, Jamie Quint (look at his nickname, it says that), a previous Senior Director of Monetization at Reddit (see his profile), answer being accused of a poor moral compass by insulting the person who did (calling them "dumb", and trying to belittle everything they have to say as "not important, unique, or interesting") shows that you in fact have what you've been accused of - not only by me but by the community of reddit as a whole:

A poor, or even no, moral compass.

I'd lean as far out of the windows to speculate that what has often been said about people in positions of power applies here:

Those positions attract people who are completely unable to perceive empathy, and who act solely out of the desire for power and narcissism.

It's a shame, you cause so much harm for society - and you probably are unable to even perceive the harm you're causing because your brain is just not wired to be capable of empathy.

If you want to do the world a favor, go read up what a psychopath is, and by that I do NOT mean to insult you, but rather the actual medical term "psychopath".

Ask yourself whether it applies to you, and learn to protect society from yourself if it does.


You most likely consented to future changes to the ToS. It's kind of like their version of asking a Genie for infinite wishes as one of their three wishes.


So they can just change the ToS to say "Your full bank account balance belongs to Twitter after YYYY-MM-DD" ?


I'd like someone to address this. Surely there's a limit. The statement "your account balance belongs to Twitter" is not against the law (like they can say the service is worth whatever amount of money and that you owe them this money), but you're not allowed to do that, because you're not allowed to retroactively change a contract's terms.

So, surely, you can't change ToS retroactively and expect that any of it applies?


Let's breakdown what tos is. It's the terms they are enforcing to provide you a service. They can refuse you service at any point for any reason. You can have your own terms of service that they must follow or you will refuse to do business with them.

By adding that they can train AI they are trying to get out of a future lawsuit that may happen if the courts require consent for training (current anyone can train on anything).

Your right to sue them for using the data they hold might be lost if you continue to use after the term change.

If they asked for your car and you refused they could stop service but they can't take your car.

They can't change payment terms from the past and sue for them. But if they change the tos to say it costs more now your next bill will go up. If they say they can use your data now that they hold and you have an active account they could take that as an acceptance that past/future can be used to train ai.

A better example might be a right given. For a year you could download photos for AI training. Today they forbid that for all future and past posted photos. Anything downloaded before the date can be legally used to train.


Basically, they can ask anything that is not forbidden by the law. If you disagree, you can try to get them to court.


Please try imagining a society where everybody does the FULL extent of things which they can legally get away with.

Ask yourself whether that would be a place worth living in, or rather a hellscape.


What, you mean kinda like this?

https://youtu.be/mH3La3RJdNA?si=qIojw1j5NkH0pBCj

... cause that seems like a pretty kickin' party to me.

Oh, wait, there's nothing legal about some of what goes on at those concerts. So, not even that level of fun. Gotcha.


So you think this will work? Every company/country auto transfers their networth to Elon? Do I have to explain to you like a child


You can’t modify ToS without asking the user’s consent. See Sifuentes v. Dropbox, Inc. (20-cv-07908-HSG).


Given how many "deleted" posts recently resurfaced, you are in fact not free to delete your content, because it won't be deleted, just hidden.


Go through the GDPR deletion process if you care that much, will very likely be deleted because the fines are massive (4% of annual worldwide turnover)


I am not an EU citizen or resident, so I don't believe that avenue is open to me. Also, Twitter has not of late demonstrated significant susceptibility to regulations.

I will note that we've narrowed the claim from "you're free to delete your content" to "If you live in some countries you're very likely to be able to delete your content", which I agree is probably true.


>I am not an EU citizen or resident, Resident is enough.


In a very real, practical sense, GDPR can be safely ignored for most non-EU companies.

Europeans seem to believe things like GDPR apply to the entire world. They don't.

If your company has no physical presence within the EU - ignore EU laws as much as you want. There is nothing they can do about it.


You would think so! I've requested deletion under GDPR and here's what you get in response: Thank you for your inquiry. You can deactivate your account at any time. When deactivated, your Twitter account, including your display name, username, and public profile, will no longer be viewable on Twitter.com, Twitter for iOS, and Twitter for Android. For up to 30 days after deactivation, it is still possible to restore your Twitter account if it was accidentally or wrongfully deactivated.

Keep in mind that search engines and other third parties may still retain copies of your public information, like your profile information and public Tweets, even after you have deleted the information from our services or deactivated your account.


They don't have a form to request the information to be deleted under GDPR. I have looked everywhere under help.twitter.com and they just ask you to disable your account.


GDPR means absolutely nothing between an American customer and an American company. It's completely irrelevant.

And I'm not even sure it means anything even for users in the EU. If Twitter doesn't have any offices/subsidiaries/bank accounts in the EU, then even if the EU fined them, I'm not sure how that would ever be enforced?


Twitter has an office in Ireland and is most definitely regulated under GDPR.


Is it still there? I knew they did previously, but last year there were reports it was possibly closing as part of Musk's layoffs.

Looking online I can't find any recent information.

Basically, given the way Musk has been ignoring other regulations and/or not paying for things, I'm wondering if he even cares about GDPR. And if he doesn't care and shuts down any legal European presence, then does it matter?

(Of course if the Ireland office is still active and receiving lots of European advertiser revenue, then of course the GDPR has teeth.)


It's Twitter International Unlimited, headquartered in Ireland. Still operational it seems, https://www.solocheck.ie/Irish-Company/Twitter-International...


My friend works there, it's definitely still there.


Perhaps not for long...

Shutting that office down and safely ignoring the GDPR is probably a valid concession for a US-based business built around data collection.

It's not like EU users will stop or be blocked from using X anyway.


Thanks! OK, GDPR should still have teeth then. Good to know.


Musk only cares about "getting his way" at this point.

Years ago, "dark triad incarnate" was checked by lack of nearly as established a position* and a longer timeline over which to grow and consolidate wealth and power (which does require time and attention).

He's past 50 now, and started transitioning into his own "late Putin phase" (substitute your own 'favorite megalomaniac' at will) quite aggressively in the past 5 years (especially). Now the game is using that wealth and power for a kind of ultimate "spoiled child fantasy camp".

Regardless of how directly any of them channel the childishness of the archetype, the traits are always there - "I'm special", "your (parent-style) 'rules' don't apply to me"**, "I will get my way", etc. It's the whole point, and the motivation that people who don't think this way miss. The motivation that makes the behavior make at least some sense.

Musk is one of the real extreme examples in terms of how transparent the behavior is - whenever he does something that seems hard to explain, ask yourself how the situation might look "in a sandbox". Seriously. This may sound like typical rhetoric, but I'm serious: try it. Twitter is a perfect example - "if I can't have it my way, then I'll make sure no one can have it" ...

... and, just like in the analogy, there are layers of goals. I.e., it's also good if while we (may ultimately) destroy "the sandbox", we can use it to harm those we don't like who've been playing in it. Either directly (e.g., firing employees of Twitter), or in various indirect ways (reporting "troublesome users" to their authoritarian governments [when applicable], etc.).

* Specifically, still needing something from others here and there - most recently and likely the final example: funding for Twitter deal

** People who think this way can't help 'telegraphing' - it's one way they identify members of their own flock, in part. "Nanny state", "snowflake", etc.

"Snowflake" has got to be a personal favorite. Every time someone uses that one, I know I'm going to need a WHINE break after a few sentences... https://youtu.be/tl4VD8uvgec?si=H2MadAVDduLfolfS&t=1m17s


I'm not sure the GDPR can be enforced outside of the EU, therefore one can't be absolutely certain the posts will not appear if accessed from elsewhere.


It is enforced internationally just like copyright law. It is a law for EU users no matter where their data resides. Meta was fined 1.3 billion dollars in spring for GDPR violation.


Being fined != paying a fine.

Realistically, a company is only obligated to pay the fine if they have a physical presence within the EU, and care to keep that presence.

For some companies, the calculus says pay the fines and cooperate with EU laws.

But for most companies, they can and do safely ignore GDPR and other EU laws. EU laws do not apply outside the EU... despite what many Europeans want to believe.


That is your opinion. We will see what the courts say to Meta and Amazon.


But for most companies, they can and do safely ignore GDPR and other EU laws. EU laws do not apply outside the EU... despite what many Europeans want to believe.

You’re right but when did you meet Europeans who said that a company selling in India has to respect GDPR? You only need to apply EU regulations if you serve EU citizens, or you get either fined or blocked


[flagged]


Type "twitter deleted posts reappearing" into Google, and browse the dozens of news articles.


[flagged]


If you've used Twitter in the last six months or so, you'll have seen serious and obvious bugs.

Large numbers of people report seeing a bug. Having them all be liars seems rather unlikely.


Seeing bugs is one thing (I stopped using twitter like 5 years ago or so and it was always a bug fest so I can believe it got even worse) but we are talking about a single specific bug here with important privacy consequences. Some reliable proof would be nice.


> There's nothing it "should" be, it's not your company

Well for a lot of us humans, ownership is not a stopper for conversations about ethics in technology.


The ethics argument is asinine. If you don’t like the rules, don’t use the product or start your own company. Just don’t be an insufferable whiner about it.


I think you may have meant “not asinine” or perhaps “I have a terrible attitude and should skip it.”

The ethical discussion is kind of the entire point. We all know what the law is.


> asinine

> insufferable whiner

I don't think this is a proportionate response to a convo about ethics in tech.


If you pull the “we need to have an ethics discussion” card about public data posted on a website you can quit anytime, you’re an insufferable whiner. Sorry not sorry.


Have a nice day.


>You're free to delete your content and your account if you're unhappy.

>Senior Director of Monetization at Reddit

You of all people should know that "free to" and "should" and consent by default is perfectly legal. And yet, it's also gross and slimy. So I guess I'm not at all surprised to find out you're a monetization person.

Gross.


Comments like this act like the consumer has all the power to choose and create the world they want. We don't, companies own this world and major public forums and make decisions together like this which we are powerless to change.

Corporations have more power in our government than individuals. They have a better understanding and coordination. Acting like a public forum owned by a company is immune from criticism because it's a private company is sweeping so much under the rug.

I imagine you wouldn't be where you are in life if you didn't believe such things, though.


Twitter: the public square that owns your speech.


I don’t know how people can be so absolutely shocked that their public posts online could be used by anybody for basically any purpose.

If you want to keep your thoughts private, then maybe don’t post them publicly?

If WhatsApp or another private messaging app started doing this, I’d be right there with the people calling that absolutely unacceptable.

But I’m not surprised at all that Twitter is doing this, and I don’t know how anybody even remotely tech savvy could be.


If only there was some legal regime that turned this conundrum into a scenario whereby people could publicly share their intellectual property while still retaining control of their rights in the material.

This seems so critical to a functioning society that one would have thought it would have been considered in the Constitution. Oh well!


If you think Twitter is violating the constitution or copyright law, then take them to court.

But by agreeing to the terms of use, Twitter retains certain rights over what you post on the platform.

If that is not acceptable, then don’t use Twitter. If you think your thoughts are too valuable for Twitter to use, then write them into a book or blog or some other venue where your intellectual property can be protected.


I do not think Twitter is violating the Constitution unless we are talking about some kind of state actor doctrine vis a vis misinformation censorship under previous ownership.

I do not think there are any violations of intellectual property law given that there is surely a waiver of ownership of posts in the TOS.

I, of course, do not have the kind of free time required to do something like engage with Twitter, and accordingly I have no account, cannot post, and have not agreed to the TOS.

I think you have misconstrued my post, but that’s ok.


the ethical implications of using someone's data without explicit, informed consent for each specific use case is obviously problematic.

the data landscape is ever-evolving, and what was acceptable or even conceivable years ago may not be the same today.

companies should not only be transparent but also dynamically update users on how their data is being used and offer an option to opt-out.

ignoring this not only impacts individual users but also has broader societal implications.

consent fatigue is real; expecting users to keep track and delete their accounts across numerous platforms is neither practical nor ethical.

also, cancelling your account or laboriously deleting all of your content doesn't necessarily guarantee that all your data will be deleted on the backend... did you think your comment through at all?


I did. Right after Musk took over. Doesn’t mean I can’t also have an opinion on the matter.


[flagged]


It's against HN's rules to post like this - please see https://news.ycombinator.com/newsguidelines.html.

Also, could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing the guidelines and taking the intended spirit of the site more to heart, we'd be grateful.


> The mentality of actively opposing or criticizing anyone who defends a particular individual, organization, or viewpoint can be described as "tribalism"


Tribalism is loyalty to an ingroup, not opposition of "a particular individual, organization, or viewpoint".


Something tells me Elon could announce he's going to shit on all your faces and you'd be like "it's his company he can do it!" While opening your mouth and looking up with a smile.


Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. You may not owe $CelebrityBillionaire better but you owe this community better if you're participating in it.


It's been entertaining watching the pro-Elon crowd shrink and get quieter as he repeatedly screws up. I'm not saying every decision he's made has been bad (some, like revenue sharing and using Twitter Blue, are good imo). But he really has been screwing up bad, and in ways that are indefensible by anyone with common sense.


> the pro-Elon crowd shrink and get quieter

Source?

> he really has been screwing up bad, and in ways that are indefensible by anyone with common sense.

Examples?


You don't have to call it X, this is from https://x.ai:

> We are a separate company from X Corp, but will work closely with X (Twitter), Tesla, and other companies to make progress towards our mission

Even they call it Twitter!


I also will not call it X, because I think it's confusing.

Quite a few years ago I remember working in Paypal's X API. Part of me wondered if I misremembered this, but no ... there are still references to it online. Maybe Musk named it. He wanted to name the entire company X, right?

https://www.paypalobjects.com/webstatic/en_US/developer/docs...


yeah. wasn't that even part of the reasons he fell out with the other PayPal peeps


I can't imagine how a retroactive agreement would possibly hold up if this issue were to go to court. It seems like it invalidates the whole ToS if potentially anything can be added at any time to apply across time. Wild. Imagine if a rent contract could do that. You now owe rent for two year ago because your landlord change your contract now to take retroactive effect.


Everyone is doing it, because it’s capitalism! Our public forums are privately managed and controlled by a few people. Venture CAPITALISTS invest in startups, get shares, prop up money-losing economics for years and then sell the shares to the public in an IPO. The corporations have quarterly earnings calls where they have to explain how they are extracting rents from their ecosystem, in order to make “number go up”, ie make shareholders happy.

Open source can liberate us from this, but we need someone to build really good and competitive alternatives to Twitter, Zoom et al.

I started Qbix to do it. LA Weekly just published this piece about my company and what it’s doing differently: https://news.ycombinator.com/item?id=37353229


Capitalism isn't "things I don't like", capitalism is when investors are paid simply for "owning" things rather than being forced to sell their labor in exchange for a wage like everyone else. Your company is selling shares to external investors and is not worker-owned; it is a capitalist firm that pays dividends to people who didn't necessarily do any work for the company. Words have meaning.


Words do indeed have meaning. Absolutely! And I am using the words advisedly, and in their original meaning. Venture capitalists are … well, capitalists! I don’t mischaracterize anything.

Capitalism is characterized by PRIVATE ownership of the “means of production”. That’s the term used in the 19th century, but today we could point to the technological infrastructure which enables each new user to engage with a network.

“Ownership” means exercising exclusive control over this, and excluding others from using (even a copy of) it.

Musk controls Twitter. Zuck controls Facebook. Durov controls Telegram. Moxie controls Signal. And so on. This is centralized control by people who won’t give you their back-end software. They’ll at best let you have your own custom client for a while, until they don’t (Reddit).

But in the meantime they’ll spy on you everywhere so they can mine your data and try to extract profits for shareholders. It’s called surveillance capitalism: https://en.m.wikipedia.org/wiki/Surveillance_capitalism

Cory Doctorow recently wrote about the “enshittification” that happens as the end result of all this private ownership. “I built it — I own it!” Well, if you believe that, you shouldn’t complain when a privately owned company does something, not even when they deplatform you. What you should complain about is the lack of open source alternatives.

Does Linus own Linux?

Does TimBL own the Web?

Does Rasmus Lerdorf own PHP?

Does Vitalik own Ethereum?

Just because one specific company in an ecosystem is privately owned does not mean the network infrastructure is centrally controlled by a few people.

In fact our company has experimebted with ways to reward contributors properly:

https://qbix.com/blog/2016/11/17/properly-valuing-contributi...

Wordpress, Drupal, Magento, Linux etc. can be hosted anywhere. It is a free market. By contrast, Twitter and Facebook (oh sorry, X and Meta) are digital feudalism!

https://qbix.com/blog/2021/01/15/open-source-communities/

We also are working on utility tokens that, unlike shares, entitle people only to services in that free market, and not to expect rents to be extracted forever. If Qbix or Automattic extracts too much rents from their open source ecosystem, or doesn’t do the best hosting in, say, Hawaii, then a competitor can arise and compete with them, locally or globally.

In fact, Qbix can be used to host social networks in areas with bad internet, including rural villages, cruise ships and planes. They can help young people of all sexes be educated in rural areas with bad internet. Can the same be said of Google or Facebook? NO! Their capitalist ideas always involve sending the signals back to their own server farms. Whether it’s Project Loon (google) or the solar-powered drones (facebook), what they don’t offer is local villages to simply load their own forked copy of their backend software, and owe them nothing!

We do. We give the source code away and help hosting companies install it. We are working on creating an entire decentralized ecosystem where we don’t have centralized control … so if host locally, you NEVER have to worry about us training our AI models on your data, or any of the other thousands of things to ebtray your trust. It’s YOUR choice who will run your infrastructure — and it could be your friend on a local computer and connecting your town over a mesh network:

https://qbix.com/ecosystem


For a long time, the top HN post about Elon Musk was "Elon Musk Deletes Own, SpaceX and Tesla Facebook Pages After #deletefacebook" [0], so he was definitely someone who prized being perceived as not doing what everyone else in Big Tech did.

[0] https://techcrunch.com/2018/03/23/elon-musk-deletes-own-spac...


> Twitter (I will not call it X, because that's just stupid)

https://news.ycombinator.com/item?id=37352719

The Register has started calling it Xitter. I like that!


I can't continue using Twitter after so many terrible decisions by Musk.


> it should not be retroactively applied to past content

If they attempt this will open them to lots of lawsuits


Why would they? Most sites TOS already stipulate that by uploading data to the service, you grant a global irrevocable unlimited license to use all submitted data for any business purpose without your further consent. I'd be surprised if Twitter didn't have this for years.


If they are changing the TOS now that's because past TOS didn't allow them to do it. And previous interactions are governed by past TOS


Most TOS documents are so broadly written that they are pretty much a blank check anyway. Seems like a CYA thing on the company's end.


If that's the case, then Twitter would have no need to change the TOS; their usage would already be permitted and changing the TOS would serve only to bring additional scrutiny


> Twitter (I will not call it X, because that's just stupid)

LOL, it's the latest craze to change company names. When I see their "new" logo somehow my mind immediately associates it (correctly) with the X11 logo. Facebook another one that decided to change its name for something that maybe turns out to be biggest money burn a company has ever done. Maybe tomorrow we will wake up with Pear instead of Apple, who knows. Now that I mentioned FB, what's the current status of the so called Metaverse? Are we there yet? Or are they still furiously pouring millions and millions and getting nothing out of it?


It's always weird when big, established names/brands attempt to rebrand.

Like "the artists formerly known as" Prince, Kanye, Snoop Dogg, etc. There's basically no getting away from the old branding because it has to be included with the new branding so one knows what we're even talking about.

As far as rebrands go, X just seems dumb. The more an article/news segment talk about X, it feels like an unfilled mad libs made it to air. Or it feels like they're talking about something general, like when X Company does Y thing.


Yeah this is probably the worst corporate rebranding I have ever seen. They replaced one of the most globally recognized brands with a generic and meaningless one. Plus the rollout was a mess, just like everything else post-musk Twitter does. Have they even gotten around to updating all of their own branding references yet?


> given the garbage heap Musk, and others, has made of Twitter.

It's a more vibrant, open, and honest community than ever, despite the organized and coordinated (I wonder by who) advertiser boycott. If anything, a lot of garbage has been removed from the heap.


Lol yeah, the shadowy cabal of people who don't like neo-Nazis is conspiring against Musk.

People have been plenty clear about why they find him disgusting, if you care to look.


Who?


X wants your biometric data, your job history, and your education history for 'safety, security, and identification purposes'.

Those items along with your posts and social network will be used for advertising, monetization, tracking, selling services to government agencies, and training their machine learning bots.

Not for your benefit, but for X's uses and benefits. You are the product and X wants to sell your info and attention so Elon can earn his money back.


Microsoft (and others) beat them to the punch on this by at least a decade.


[flagged]


You are free not to give it, if it tells you to jump off a cliff you don’t have to


I would love to know how the people blotivating about this think Twitter worked before.

Of course they’re doing ML training on user data! Every major tech company has been doing ML training on user data for over a decade at this point! Twitter used to have official tutorials on how to do ML on data you pulled from their APIs!

Seriously, how the f** do you all think their moderation algorithms worked before? Unicorn farts? Some guys in a room that could read tweets really fast?

The only thing that’s changed is that they feel the need to put it in their terms of service.

And no, this isn’t a defense of Twitter or Musk. If you actually give a shit about this and don’t just want to perform slacktivism on social media, get off ALL of these platforms. Migrate away from Gmail. Close your Facebook. Use paid services from trusted privacy-focused vendors.


Well, this is one easy way to to create the hate filled alt-right LLM, sorry, "based AI" that Musk want's to build.

https://www.popularmechanics.com/technology/robots/a43126181...


[flagged]


As with any business/product, consumers are also free to say what they think to the business, other users, competitors, and anyone else.


well that pushes me from not using my account to deactivating it for good. What a shame how he trashed that site in so little time.


Why the heck is Bluesky still invite only?

Truly I do not understand why they are not even attempting to advantage of their competition actively driving people away. And they have history of Google+ to reflect on! But they're making the same mistake.


Bluesky and also the Fediverse are Open Platforms¹. That means your toot or post reaches anyone including those who want to train AI.

An open platform and open protocol makes it harder to prohibit AI bots from ingesting your published thoughts than when on a private, centralised service.

Mastodon, the fediverse, Bluesky, are really enabling AI learning more than prohibiting it.

¹ well, BSKY is really still just a single server.


Honestly Bluesky is more obsessed with Twitter and Musk than probably Twitter itself.

I spend a few days on there after getting the invite and didn't enjoy the experience. Every other post was making fun of Twitter/Musk and for some reason there's a lot of furry porn there.


I wouldn't know, because Bluesky doesn't let anyone just sign up. I'm sure the invite-only nature produces a lot of interest bubbles.


I have a few invites spare — email me: david [at] davidbarker.me and I'll send one over.


You on a platform for content or just there because their system is nice?


Neither, I just want a usable alternative to Twitter that isn't Facebook and has a chance of not being used only by techies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: