Musk has been fixated on this idea that Twitter is a huge treasure trove of data for AI training, often complaining about AI companies using its data for training purposes. I had just assumed that companies like OpenAI were just crawling the web, which included Twitter, rather than targeting Twitter in particular. Is the Twitter data really that valuable for training AIs? What particular qualities does it have that make it particularly useful compared to any other freely available data set?
(1) Twitter's data is accurately timestamped, (2) there's new data constantly flowing in talking about recent events. There's no other source like that in English other than Reddit.
AFAIU neither of those are relevant to GPT-like architectures but it's not inconceivable to think there might be a model architecture in the future that takes advantage of those. Purely from a information theoretic POV, there's non-zero bits of information in the timestamp and relative ordering of tweets.
X and Reddit are definitely valuable, but they're definitely not unique. I think Meta and Google have inherent advantages because their data is not accessible to LLM competitors and they have the actual capabilities to build great LLMs.
Unless X decides to tap AI talent in China, they're going to have a REALLY hard time spinning up a competitive LLM team compared to OpenAI, Google, and Meta, which I think are the top three LLM companies in that order.
Facebook groups has, weirdly enough, had a bunch of quality discussions similar to Reddit. Can't speak for Instagram, but FB groups are worth peeking into to follow your favorite software projects.
My local city/town Facebook groups are surprisingly good. Like on Reddit there's always the specter of sketchy weird things happening behind the scenes with the mods/admins, but the day-to-day experience is very much that of chatting and sharing with my neighbors.
Fandom/topic or hobby/writing groups on Facebook are better quality discussion venues than Reddit if you can accept seeing some very obvious instances of spam posts and spam comments.
I haven't noticed a significant difference in discussion quality between The Platform Formerly Known As Twitter and Youtube/Reddit for most topics. Maybe I'm cynical, but the vast majority of public communication on the internet seems to be roughly the same quality in my mind.
> the vast majority of public communication on the internet seems to be roughly the same quality in my mind.
I mean, could it be that it's just that the platforms you're familiar with are similar quality? There are major quality differences. Consider, for example, HN vs Instagram. Do you really see no difference in the quality of discourse, or do you just not use Instagram?
By bulk/raw data volume, I'd say that the vast majority of internet communication is the same quality, yeah-- I'll stick by that assertion. That's not at odds with acknowledging there exist locations where intelligent communication happens. My position is just that the signal to noise ratio is pretty bad in the majority of places.
Is huge but also has enormous privacy issues. Most people by default assume their emails are reasonably private, whereas most people wouldn't assume their comments on these platforms are private.
Oh give it a rest, will you? Hiring AI "also-rans" or whatever term satisfies your sense of national chauvinism then.
For kicks I tried searching for news about "Chinese AI experts" and found the CCP has apparently infiltrated august institutions such as the Financial Times and Harvard Business Review (here, for instance: https://hbr.org/2021/02/is-china-emerging-as-the-global-lead...). Maybe you can go pester them.
Of course, the issue is they don't have an ai industry. The idea that they do is a CCP talking point theyve been pushing lately. They're pumping out fake ai crap all over the internet.
I would assume this is simply not enough data. You also have access to all public domain books and they usually make up a small fraction of the data used to train a model from scratch. For fine tuning a model, having a unique source of high quality data is probably valuable even if small.
Safety/Alignment researchers have been too fixated on making the one perfect LLM that has zero bias (or, arguably, one preferred bias) in my opinion.
I don't think Musk is the type of person to make the same mistake, so we'll either end up with a Twitter LLM that accurately represents the sum total of the Twitter firehose, and/or many derivative LLMs each having a set of, possibly orthogonal, biases. Honestly, I think the later is preferable and would represent the diversity of opinions in reality more accurately.
Given the data source, I think it will be important to be able to switch between LLM personalities in the future to get the "crowd truth".
I’m sure it depends on the type of prediction task, right?
Current LLMs are trying to predict typical human prose from samples pulled from the internet. So it isn’t as if they are sacrificing quality for quantity. A bunch of text from the internet is a very good representation of typical human prose. Whether it is well written or the descriptions contained in the prose accurately represent, like, actual physical reality is another issue.
Maybe they want to predict something with, like, less dimensionality but more utility than a paragraph of fiction.
> there's new data constantly flowing in talking about recent events.
In which the distinction between "data" and "information" is crucial. Especially now that the "floodgates" have been re-opened regarding misinformation, bots, impersonators and the likes.
Data is crucial when in need of training body. But information is crucial when the training must be tuned, limited or just verified.
We introduce new datasets derived from the fol-
lowing sources: PubMed Central, ArXiv, GitHub,
the FreeLaw Project, Stack Exchange, the US
Patent and Trademark Office, PubMed, Ubuntu
IRC, HackerNews, YouTube, PhilPapers, and NIH
ExPorter. We also introduce OpenWebText2 and
BookCorpus2, which are extensions of the original
OpenWebText (Gokaslan and Cohen, 2019) and
BookCorpus (Zhu et al., 2015; Kobayashi, 2018)
datasets, respectively.
Smaller in scale, yeah, but probably high bias towards technical content and written by people who (mostly) care about how to write properly. There is at least 37,354,035 items that could be indexed with lots of it high quality, percentage wise probably higher quality/post than Twitter, Facebook and other sources.
I think Twitter is not a great place to dig out training data in general. Most of its data is not well structured and/or tagged. Its signal to noise ratio is relatively low. Its texts are generally very short and very dependent on context. Twitter has been largely failing as a short-form video platform. There's some trace of algorithmic generation of interest/topic-based feed on Twitter, but you know, its quality was never great. I guess it's just a hard problem given Twitter's environment.
Its strength is freshness and volume, but I guess these can be achieved without Twitter if you have a strong web crawling infrastructure? Also, the current generation of LLM is not really capable of exploiting minute-level freshness... at least for now.
Also, Twitter is not where people go to be nice. Twitter incentivizes snarky, disparaging, curt behavior because (a) it limits message length to an extent where nice speech doesn't have a place (b) saying nice things gets you likes while saying not-nice things gets you retweets, and retweets are more highly valued by the algorithm.
Twitter’s data would be very valuable for generating tweet-like content: short, self-contained snippets, images and video.
There’s not a lot of data in Twitter today resembling long-form content: essays, news articles, books, scientific papers, etc. That’s probably why Twitter/X expanded the tweet size limit, to be able to collect such data.
If Twitter was as much a treasure trove of user data that Elon thinks it is, then why is Twitter's ad targeting so much worse than Facebook's and Instagram's?
Twitter has the largest database of tweets. If you want an AI that writes tweets, there’s nothing better. Why? Twitter could offer a service that bypasses the need for community manager… just feed it a press release, or product website, and it will provide a stream of well crafted ad tweets… or astroturf, even.
Doesn't really matter what the content is as long as the sequences of tokens make sense. That's the goal: predict the next token given the previous N tokens. Higher level structures like tweets just fall out of that but I wouldn't be too surprised if a model trained only on tweets could also generalize to some other structures.
It's not a change confined to Twitter; it's a glimpse into what could be the new normal for the internet.
You can bet Google/Gmail/YouTube, Amazon, Microsoft, TikTok, and every other Internet platform that works with user-generated content will soon do the same ... if they haven't done so already.
GMail's already been using your data for classification, data mining, etc. other AI purposes. I would be legitimately surprised if they're NOT using processed GMail data (potentially removing sensitive data, etc.) in training their LLMs or other AI projects.
Yup. To be clear, that's with the free version though.
If you pay, corporate versions of Google Workspace won't train on your data. That's very much by design, since companies don't want anything internal ever being exposed.
But with the free version, that's part of what you're "paying" for it to remain free.
Although judging from that, it sounds like Google may have changed policy so that even private data in the free versions is no longer used for AI training.
They stopped targeting ads in free Gmail based on your e-mail contents years ago because of the bad press. So maybe they've stopped training AI on free Gmail/Docs data out of similar precaution, now that LLM's are everywhere in the news.
They'd certainly use it to train their spam classifiers, but using that data in a generative model shared with other users would risk information leaks, so they wouldn't do that.
They've been doing it for years. Everything you put on their servers is a source for them to train on. At FB Messenger in the mid to late teens, the suggestion and auto-reply models were trained on the entire corpus of unencrypted messages sent between users.
I interpret this as more of a call to greater action, lest people thinking boycotting Twitter is going to address this problem, while happily continuing their use of instagram, google search, tiktok et al.
how the fuck are those legal? frankly more disturbing to me than the AI training one, which I find less objectionable than selling people's attention to advertisers
The US doesn't have loser pays and has some of the most expensive litigation in the world, which has created all kinds of problems. Someone can file a lawsuit against you knowing that they're unlikely to win, but in so doing they could cost you hundreds of thousands of dollars for lawyers, so why don't you just go ahead and settle for tens of thousands of dollars? It will cost you less to settle than to win in court.
This flaw was made to scale by class action lawsuits, which more than any other should be loser pays, because there is little question that thousands of people who have each been harmed to the tune of $100 could each front $10 for a meritorious lawsuit. But instead you get opportunistic lawyers signing up anyone they can find for questionable claims, so they can reach a settlement where the plaintiffs each get $7 -- or a $7 gift certificate -- and the lawyers get millions.
This was rightly regarded as a problem but the lawyers had enough political power to prevent a good solution, so what we got instead was to make it easier to force binding arbitration and opt out of class action suits.
IANAL. According to [1] they are legal per Supreme Court precedent, but it also says:
"Contract formation is increasingly scrutinised. Following Concepcion and its progeny, some courts have focused on issues of contract formation to determine whether the consumer in fact agreed to arbitration and the class action waiver. This inquiry is largely confined to online transactions, where a consumer is deemed to have consented to arbitration by using the business's website to purchase goods or services. These contracts fall within the rubric of "clickwrap," "browsewrap," or "webwrap" agreements and their enforceability is beyond the scope of this article. However, it is important to note that the courts will refuse to enforce class action waivers and arbitration agreements in such agreements when the arbitration provisions were insufficiently conspicuous to ensure the consumer objectively agreed to their terms."
The article also mentions that non-negotiable consumer contracts are viewed with more suspicion by some courts.
Class action lawsuits suck for victims. The only people it’s good for are lawyers, who become fabulously wealthy, while the victims get a check for two dollars in the mail.
That backfired on them when all the Twitter employees they fired sued them in individual lawsuits, so now they have to do the same discovery process and everything else over and over for each case. Lol. That only backfired because enough individuals that had the means to bring a full suit at the same time though. Unlikely to happen normally.
Elon has the superpower that makes it so that regardless of the question asked/answered, the audience is 40-40 "always believe" and "never believe", the rest being skeptics or 'don't cares'.
This often works to his advantages; the 'always believe' are loud.
The thing is that almost everyone believed him until it became known that many times he either:
1. didn't know what he was talking about
2. was over-optimistic about timelines to the point that there would be very little difference if it were a lie
3. objectively lied on purpose in order to further self interests
4. exaggerated or over-promised more than could be attributed to salesmanship
5. flip-flopped and refused to acknowledge the change in stance
6. stated things entirely to be vindictive, and let the truth of the statements be irrelevant
The problem isn't that people 'never believe', the problem is 'whatever is said should be suspect to the point that if it is actually true then that is entirely a side-effect of the statement and not the point of it'.
You seem to be unaware or Elon time or how Elon has made predictions in the past, generally not that "this will happen by date X" but that "this cannot possibly happen _before_ date X". Which are very different statements.
There is a contingent of people who want to retcon any statement ever to paint Elon as a fraud, when reality is much too subtle for trivial blanket labels like you want to apply.
My favorite time on Hacker News was reading all the quasi-fortune tellers and intellectuals say his $44B bid for Twitter was never going to go through, either through his withdrawal or being rejected by the board. They were so set in their hubris that the only debate was whether the SEC was going to sue him after the bid inevitably fell through.
Oh, that guy with the self-driving cars and the self-landing rockets and the creepy humanoid robots that can sorta walk? Yeah totally, he hates automated stuff.
Ah yes, the platform formerly known as twitter shall surely NEVER allow the weights produced from analysis of its conversational data to go into the production of spambots!
After all, there's definitely no market for a product that has been finely-trained on social media posts to the point where it can perfectly ape social media posts. There's no money for Twitter to make there, so the technology it produces from this analysis shall certainly never find its way into spam bots that create content indistinguishable from genuine humans on Twitter.
It really is baffling. Removing bots from Twitter is totally orthogonal to training a bot on Twitter data- arguably the former is a prerequisite for the latter. And we've known that Musk wants to train a competing chatbot for at least 6 months now. Has he ever said anything to indicate that he wants to eliminate bots entirely?
Maybe I'm overthinking it and this is just a rhetorical gotcha.
"Consumer action," whether organized or individual has 0% chance of affecting what happens next here. It's all down to ip law and it's adjacents. No real reason to expect intentional actions here.
Think of all the consumer action efforts, and (especially) talk on one hand. Think of the successes on the other... basically none but a few minor symbolic wins.
We are all going to be consumed in the AI machines.
"All capabilities are built from data analysis and the data they are analyzing is you. Whether by direct intent or not, all advancements are encroaching on consuming every knowable fact and inference about you. Your soul must be sacrificed to the machine to grant the powers it manifests."
Is there a python interface to Twitter that can use password auth and doesn't require an API key? Is there some existing piece of software I can run that doesn't require paying anything to mass delete my tweets(or xeets or whatever they're called now)?
Not that I'm particularly worried but this is a good reminder that I need to blank my twitter history.
Any ideas how we, the users of the internet, should act to prevent those steps?
or building a much better connected and not controlled by mega-corporates INTERNET social platform?
I can't upvote this enough. I don't know why people clutch their pearls so hard when they find out the free app they are using is trying to monetize their content. I'd actually be critical of X if they didn't do this.
There's plenty of places where you can pay a nominal free to have your data 100% controlled by you alone. Plus, you can still use X, Insta or whatever to drive traffic to your site. Its not that hard folks!
What about when someone else buys a company and changes the ToS, taking possession of work we've already contributed under a different ToS that you were okay with?
Clearly you should live under a rock if you don't want anyone to cross their fingers behind their backs or take advantage of the fact that you didn't say "no backsies".
That helps, but individual boycotts/market actions can't easily overcome the momentum of preferential attachment because brands are not fungible. Market evangelists/absolutists often rely on naive models of supply and demand, but forget that those models only work under conditions of perfect competition. This means ignoring the many many factors that contribute to inelastic demand like the cost of switching, the lack of ToS clarity, sunk costs, and prior network effects. There are plenty of people who hate Twitter but can't disengage from it because that's where most of their clicks are coming from.
The problem with TOS changes is they never provide a diff. They provide the entire TOS again, one that most people already skipped. I don’t think it’s fair to even provide a legal TOS that does not match your demographic.
These things need to be shorter and written at a 8th grade writing level, and I don’t mean that in any pejorative way. It needs to be clear to the average person (and I’m not even sure if I’m being generous there by suggesting 8th grade writing/reading level).
More importantly, I think official organisations like government, local government, safety, charities etc should not be encouraging people to use non-safe channels to interac. We should be lobbying them to stop
We need legislation that forces companies to prompt users with a ToS that’s no longer than 400 characters.
The prompt should read “So, you cool with that brah?”, and the word “Accept” needs to be replaced with “Yeah sure why not”, and “No” with “Nah, no thank you”.
Current TOS’s don’t provide enough clarity.
That way any semantic issues down the line can be boiled down to “brah, users said they were cool with it” (to the judge).
For me it was one part deleting my Twitter/Reddit accounts and one part rethinking the role of these sites in my life so I don't end up adopting the same habits elsewhere that will inevitably let me down, commercial or otherwise.
Stop commenting on HN? It'll take government action around copyright of training data + the use of decentralized and private instances of Mastodon to avoid it. Though it's still probably inevitably going to happen.
What I want to know is, what's the endgame? We've proven that AI learning eventually plateaus, so you keep feeding it data and...? Seems like an underpants gnome problem or companies trying to find another stock bump for all of this data lying around (that should be considered toxic instead)
Textbooks and reference books are much more useful for training a LLM than twitter. Both reddit and he are vastly overestimating the value of their content for training corpus data.
Depends on your use case, twitter is pretty good if you want a bot for astroturfing, artificially boosting content and engagement,slipping adverts into comments that look like they come from real people, etc
Any data you send to the cloud will be used to train AI unless you have a binding TOS or contract that says otherwise and that contract cannot be unilaterally amended this way.
I am not surprised at all. All major tech companies are at an arms race at the moment. Everyone wants to create the new thing and AI is the hip thing right now, with good reason.
X, isn't necessary a tech company in the same ballpark as Amazon, Google and Meta. But have significant backing to able to create something.
I wouldn't be surprised if other companies start changing their terms to allow them to phish data for AI training.
My 2 cents, no source, but training AI out of public data will eventually stop, due to the lack of transparency and potential legal issues associated with copyright. AI needs clean, diverse and also specialized data to offer an incentive for companies to build on it, beyond just conversational chatbots. Shameless plug, still early but I am trying to build this with faie.io
X and other social networks will do with anyway and you're just waiting for all the other social networks, like Facebook, Instagram, Threads, LinkedIn, Reddit, TikTok etc to follow suit and train their AIs on your data.
The solution? If you have a problem with it and you haven't done so already, just delete your accounts on ALL social networks.
It's time to regulate these social media companies as utilities. If a platform has more than N million users, and its main feature is communication between humans, then it should be subject to regulation.
With data going forward? or past data? Because I don't agree with using my past data... (kinda of rhetorical as I long deleted my accounts, though I'm sure the data is still there)
Technically, not deleting your data would be a GDPR violation of the right to be forgotten. They could be ignoring that but it would be very expensive if they were found out. Probably doesn't apply to models that have already been trained, but should apply effectively going forward.
Meh, the ToS is unenforceable and VPNs with residential IPs are not that expensive. Sure it's little harder to scrape than HN for example, but everyone trains on twitter data (along with everything else on the internet)
Oh, it's very enforceable. People have literally been imprisoned for years for using alternate tools, like wget, to access a website. The not being enforced most of the time so that everyone breaks / has broken the CFAA laws covering TOS is a feature of the system. Like other such laws it only gets enforced when you rock the boat and someone rich/powerful enough buys a district attorney to indict you.
I asked because I did google it and I found nothing.
Aaron Swartz was not "imprisoned for years".
Weev "exposed a flaw in AT&T security in June 2010, which allowed the e-mail addresses of iPad users to be revealed.[39] The flaw was part of a publicly-accessible URL, which allowed the group to collect the e-mails without having to break into AT&T's system.[40] Contrary to what it first claimed,[41] the group revealed the security flaw to Gawker Media before AT&T had been notified,[40] and also exposed the data of 114,000 iPad users, including those of celebrities, the government and the military"
That's a pretty silly security breach, but it's still a real security breach. Not comparable to scraping twitter
Having to go to court in itself and spend literally months or years with the possibility of being sentenced to most of the rest of your life in prison is itself a very severe punishment.
I think they're talking about the intention of the user in posting the data. People that post on Twitter and HN understand that the visibility is public, unlike Gmail.
This makes sense if you consider the new API price gouging. Much easier to attract those interested in training on twitter data if it is explicititly allowed.
That's all well and good but the content is, in my usage, trash. Their algorithm simply doesn't work to surface posts I like and the people I want to follow aren't on there.
No, you've been freely and happily giving your content to them for their own use for as long as you've been using the platform. Why would they start paying you for something you've provided for free since the beginning?
Off-topic but I thought Elon Musk wanted to make X into a WeChat-like everything app where twitter would make up the social media part. Is he instead renaming twitter to X?
I love how this is justified by "everyone is doing it so it's okay." That's the bandwagon fallacy and I don't buy it.
Twitter (I will not call it X, because that's just stupid), is free to attempt to change their Terms of Service, policies, etc, but we do not have to accept it or agree with it or be resigned to it. Also, it should not be retroactively applied to past content, and it should be an opt-in consent -- but that is pie-in-the-sky wishing at this point given the garbage heap Musk, and others, has made of Twitter.
If someone created an account back when AI was not a thing, they did NOT consent to their data being used for AI:
You cannot expect people to retroactively consent to a thing the existence of which they were not even aware of when they gave consent for some limited OTHER use.
Besides, it's just rude to do things for which there was no consent in general, no matter whether it is AI or anything else. No consent is no consent.
Further, you can't expect people who have a life in general to sit around all day just waiting to figure out where they have to delete accounts before they are misused.
Bazillions of websites nowadays want you to create an account, the normal behavior is that users just abandon accounts which they don't need anymore. Nobody has the time to delete all of them.
Your profile says you were "Senior Director of Monetization at Reddit". Considering all the outrage that company has caused with its users as well (redesign coughcough - just one example out of a truckload), and that the outrage-causing things largely seemed to be aimed at monetization, perhaps you should do some soul seeking to figure out whether your values are aligned with common societal morals.
Or in other words: How many more people will you make angry until you realize that maybe you're the baddie?
You can get mad all you want about people using “your data” that you posted on a public forum for whatever, doesn’t change the fact that you were dumb for thinking that your (not important, unique, or interesting) musings would be protected from people using them for whatever they want by posting it on the open internet, much less the website of the company you posted it on, get real.
The fact that you, Jamie Quint (look at his nickname, it says that), a previous Senior Director of Monetization at Reddit (see his profile), answer being accused of a poor moral compass by insulting the person who did (calling them "dumb", and trying to belittle everything they have to say as "not important, unique, or interesting") shows that you in fact have what you've been accused of - not only by me but by the community of reddit as a whole:
A poor, or even no, moral compass.
I'd lean as far out of the windows to speculate that what has often been said about people in positions of power applies here:
Those positions attract people who are completely unable to perceive empathy, and who act solely out of the desire for power and narcissism.
It's a shame, you cause so much harm for society - and you probably are unable to even perceive the harm you're causing because your brain is just not wired to be capable of empathy.
If you want to do the world a favor, go read up what a psychopath is, and by that I do NOT mean to insult you, but rather the actual medical term "psychopath".
Ask yourself whether it applies to you, and learn to protect society from yourself if it does.
You most likely consented to future changes to the ToS. It's kind of like their version of asking a Genie for infinite wishes as one of their three wishes.
I'd like someone to address this. Surely there's a limit. The statement "your account balance belongs to Twitter" is not against the law (like they can say the service is worth whatever amount of money and that you owe them this money), but you're not allowed to do that, because you're not allowed to retroactively change a contract's terms.
So, surely, you can't change ToS retroactively and expect that any of it applies?
Let's breakdown what tos is. It's the terms they are enforcing to provide you a service. They can refuse you service at any point for any reason. You can have your own terms of service that they must follow or you will refuse to do business with them.
By adding that they can train AI they are trying to get out of a future lawsuit that may happen if the courts require consent for training (current anyone can train on anything).
Your right to sue them for using the data they hold might be lost if you continue to use after the term change.
If they asked for your car and you refused they could stop service but they can't take your car.
They can't change payment terms from the past and sue for them. But if they change the tos to say it costs more now your next bill will go up. If they say they can use your data now that they hold and you have an active account they could take that as an acceptance that past/future can be used to train ai.
A better example might be a right given. For a year you could download photos for AI training. Today they forbid that for all future and past posted photos. Anything downloaded before the date can be legally used to train.
Go through the GDPR deletion process if you care that much, will very likely be deleted because the fines are massive (4% of annual worldwide turnover)
I am not an EU citizen or resident, so I don't believe that avenue is open to me. Also, Twitter has not of late demonstrated significant susceptibility to regulations.
I will note that we've narrowed the claim from "you're free to delete your content" to "If you live in some countries you're very likely to be able to delete your content", which I agree is probably true.
You would think so! I've requested deletion under GDPR and here's what you get in response:
Thank you for your inquiry. You can deactivate your account at any time. When deactivated, your Twitter account, including your display name, username, and public profile, will no longer be viewable on Twitter.com, Twitter for iOS, and Twitter for Android. For up to 30 days after deactivation, it is still possible to restore your Twitter account if it was accidentally or wrongfully deactivated.
Keep in mind that search engines and other third parties may still retain copies of your public information, like your profile information and public Tweets, even after you have deleted the information from our services or deactivated your account.
They don't have a form to request the information to be deleted under GDPR. I have looked everywhere under help.twitter.com and they just ask you to disable your account.
GDPR means absolutely nothing between an American customer and an American company. It's completely irrelevant.
And I'm not even sure it means anything even for users in the EU. If Twitter doesn't have any offices/subsidiaries/bank accounts in the EU, then even if the EU fined them, I'm not sure how that would ever be enforced?
Is it still there? I knew they did previously, but last year there were reports it was possibly closing as part of Musk's layoffs.
Looking online I can't find any recent information.
Basically, given the way Musk has been ignoring other regulations and/or not paying for things, I'm wondering if he even cares about GDPR. And if he doesn't care and shuts down any legal European presence, then does it matter?
(Of course if the Ireland office is still active and receiving lots of European advertiser revenue, then of course the GDPR has teeth.)
Musk only cares about "getting his way" at this point.
Years ago, "dark triad incarnate" was checked by lack of nearly as established a position* and a longer timeline over which to grow and consolidate wealth and power (which does require time and attention).
He's past 50 now, and started transitioning into his own "late Putin phase" (substitute your own 'favorite megalomaniac' at will) quite aggressively in the past 5 years (especially). Now the game is using that wealth and power for a kind of ultimate "spoiled child fantasy camp".
Regardless of how directly any of them channel the childishness of the archetype, the traits are always there - "I'm special", "your (parent-style) 'rules' don't apply to me"**, "I will get my way", etc. It's the whole point, and the motivation that people who don't think this way miss. The motivation that makes the behavior make at least some sense.
Musk is one of the real extreme examples in terms of how transparent the behavior is - whenever he does something that seems hard to explain, ask yourself how the situation might look "in a sandbox". Seriously. This may sound like typical rhetoric, but I'm serious: try it. Twitter is a perfect example - "if I can't have it my way, then I'll make sure no one can have it" ...
... and, just like in the analogy, there are layers of goals. I.e., it's also good if while we (may ultimately) destroy "the sandbox", we can use it to harm those we don't like who've been playing in it. Either directly (e.g., firing employees of Twitter), or in various indirect ways (reporting "troublesome users" to their authoritarian governments [when applicable], etc.).
* Specifically, still needing something from others here and there - most recently and likely the final example: funding for Twitter deal
** People who think this way can't help 'telegraphing' - it's one way they identify members of their own flock, in part. "Nanny state", "snowflake", etc.
I'm not sure the GDPR can be enforced outside of the EU, therefore one can't be absolutely certain the posts will not appear if accessed from elsewhere.
It is enforced internationally just like copyright law. It is a law for EU users no matter where their data resides. Meta was fined 1.3 billion dollars in spring for GDPR violation.
Realistically, a company is only obligated to pay the fine if they have a physical presence within the EU, and care to keep that presence.
For some companies, the calculus says pay the fines and cooperate with EU laws.
But for most companies, they can and do safely ignore GDPR and other EU laws. EU laws do not apply outside the EU... despite what many Europeans want to believe.
But for most companies, they can and do safely ignore GDPR and other EU laws. EU laws do not apply outside the EU... despite what many Europeans want to believe.
You’re right but when did you meet Europeans who said that a company selling in India has to respect GDPR?
You only need to apply EU regulations if you serve EU citizens, or you get either fined or blocked
Seeing bugs is one thing (I stopped using twitter like 5 years ago or so and it was always a bug fest so I can believe it got even worse) but we are talking about a single specific bug here with important privacy consequences. Some reliable proof would be nice.
The ethics argument is asinine. If you don’t like the rules, don’t use the product or start your own company. Just don’t be an insufferable whiner about it.
If you pull the “we need to have an ethics discussion” card about public data posted on a website you can quit anytime, you’re an insufferable whiner. Sorry not sorry.
>You're free to delete your content and your account if you're unhappy.
>Senior Director of Monetization at Reddit
You of all people should know that "free to" and "should" and consent by default is perfectly legal. And yet, it's also gross and slimy. So I guess I'm not at all surprised to find out you're a monetization person.
Comments like this act like the consumer has all the power to choose and create the world they want. We don't, companies own this world and major public forums and make decisions together like this which we are powerless to change.
Corporations have more power in our government than individuals. They have a better understanding and coordination. Acting like a public forum owned by a company is immune from criticism because it's a private company is sweeping so much under the rug.
I imagine you wouldn't be where you are in life if you didn't believe such things, though.
If only there was some legal regime that turned this conundrum into a scenario whereby people could publicly share their intellectual property while still retaining control of their rights in the material.
This seems so critical to a functioning society that one would have thought it would have been considered in the Constitution. Oh well!
If you think Twitter is violating the constitution or copyright law, then take them to court.
But by agreeing to the terms of use, Twitter retains certain rights over what you post on the platform.
If that is not acceptable, then don’t use Twitter. If you think your thoughts are too valuable for Twitter to use, then write them into a book or blog or some other venue where your intellectual property can be protected.
I do not think Twitter is violating the Constitution unless we are talking about some kind of state actor doctrine vis a vis misinformation censorship under previous ownership.
I do not think there are any violations of intellectual property law given that there is surely a waiver of ownership of posts in the TOS.
I, of course, do not have the kind of free time required to do something like engage with Twitter, and accordingly I have no account, cannot post, and have not agreed to the TOS.
I think you have misconstrued my post, but that’s ok.
the ethical implications of using someone's data without explicit, informed consent for each specific use case is obviously problematic.
the data landscape is ever-evolving, and what was acceptable or even conceivable years ago may not be the same today.
companies should not only be transparent but also dynamically update users on how their data is being used and offer an option to opt-out.
ignoring this not only impacts individual users but also has broader societal implications.
consent fatigue is real; expecting users to keep track and delete their accounts across numerous platforms is neither practical nor ethical.
also, cancelling your account or laboriously deleting all of your content doesn't necessarily guarantee that all your data will be deleted on the backend... did you think your comment through at all?
Also, could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.
If you wouldn't mind reviewing the guidelines and taking the intended spirit of the site more to heart, we'd be grateful.
> The mentality of actively opposing or criticizing anyone who defends a particular individual, organization, or viewpoint can be described as "tribalism"
Something tells me Elon could announce he's going to shit on all your faces and you'd be like "it's his company he can do it!" While opening your mouth and looking up with a smile.
Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. You may not owe $CelebrityBillionaire better but you owe this community better if you're participating in it.
It's been entertaining watching the pro-Elon crowd shrink and get quieter as he repeatedly screws up. I'm not saying every decision he's made has been bad (some, like revenue sharing and using Twitter Blue, are good imo). But he really has been screwing up bad, and in ways that are indefensible by anyone with common sense.
I also will not call it X, because I think it's confusing.
Quite a few years ago I remember working in Paypal's X API. Part of me wondered if I misremembered this, but no ... there are still references to it online. Maybe Musk named it. He wanted to name the entire company X, right?
I can't imagine how a retroactive agreement would possibly hold up if this issue were to go to court. It seems like it invalidates the whole ToS if potentially anything can be added at any time to apply across time. Wild. Imagine if a rent contract could do that. You now owe rent for two year ago because your landlord change your contract now to take retroactive effect.
Everyone is doing it, because it’s capitalism! Our public forums are privately managed and controlled by a few people. Venture CAPITALISTS invest in startups, get shares, prop up money-losing economics for years and then sell the shares to the public in an IPO. The corporations have quarterly earnings calls where they have to explain how they are extracting rents from their ecosystem, in order to make “number go up”, ie make shareholders happy.
Open source can liberate us from this, but we need someone to build really good and competitive alternatives to Twitter, Zoom et al.
Capitalism isn't "things I don't like", capitalism is when investors are paid simply for "owning" things rather than being forced to sell their labor in exchange for a wage like everyone else. Your company is selling shares to external investors and is not worker-owned; it is a capitalist firm that pays dividends to people who didn't necessarily do any work for the company. Words have meaning.
Words do indeed have meaning. Absolutely! And I am using the words advisedly, and in their original meaning. Venture capitalists are … well, capitalists! I don’t mischaracterize anything.
Capitalism is characterized by PRIVATE ownership of the “means of production”. That’s the term used in the 19th century, but today we could point to the technological infrastructure which enables each new user to engage with a network.
“Ownership” means exercising exclusive control over this, and excluding others from using (even a copy of) it.
Musk controls Twitter. Zuck controls Facebook. Durov controls Telegram. Moxie controls Signal. And so on. This is centralized control by people who won’t give you their back-end software. They’ll at best let you have your own custom client for a while, until they don’t (Reddit).
Cory Doctorow recently wrote about the “enshittification” that happens as the end result of all this private ownership. “I built it — I own it!” Well, if you believe that, you shouldn’t complain when a privately owned company does something, not even when they deplatform you. What you should complain about is the lack of open source alternatives.
Does Linus own Linux?
Does TimBL own the Web?
Does Rasmus Lerdorf own PHP?
Does Vitalik own Ethereum?
Just because one specific company in an ecosystem is privately owned does not mean the network infrastructure is centrally controlled by a few people.
In fact our company has experimebted with ways to reward contributors properly:
Wordpress, Drupal, Magento, Linux etc. can be hosted anywhere. It is a free market. By contrast, Twitter and Facebook (oh sorry, X and Meta) are digital feudalism!
We also are working on utility tokens that, unlike shares, entitle people only to services in that free market, and not to expect rents to be extracted forever. If Qbix or Automattic extracts too much rents from their open source ecosystem, or doesn’t do the best hosting in, say, Hawaii, then a competitor can arise and compete with them, locally or globally.
In fact, Qbix can be used to host social networks in areas with bad internet, including rural villages, cruise ships and planes. They can help young people of all sexes be educated in rural areas with bad internet. Can the same be said of Google or Facebook? NO! Their capitalist ideas always involve sending the signals back to their own server farms. Whether it’s Project Loon (google) or the solar-powered drones (facebook), what they don’t offer is local villages to simply load their own forked copy of their backend software, and owe them nothing!
We do. We give the source code away and help hosting companies install it. We are working on creating an entire decentralized ecosystem where we don’t have centralized control … so if host locally, you NEVER have to worry about us training our AI models on your data, or any of the other thousands of things to ebtray your trust. It’s YOUR choice who will run your infrastructure — and it could be your friend on a local computer and connecting your town over a mesh network:
For a long time, the top HN post about Elon Musk was "Elon Musk Deletes Own, SpaceX and Tesla Facebook Pages After #deletefacebook" [0], so he was definitely someone who prized being perceived as not doing what everyone else in Big Tech did.
Why would they? Most sites TOS already stipulate that by uploading data to the service, you grant a global irrevocable unlimited license to use all submitted data for any business purpose without your further consent. I'd be surprised if Twitter didn't have this for years.
If that's the case, then Twitter would have no need to change the TOS; their usage would already be permitted and changing the TOS would serve only to bring additional scrutiny
> Twitter (I will not call it X, because that's just stupid)
LOL, it's the latest craze to change company names. When I see their "new" logo somehow my mind immediately associates it (correctly) with the X11 logo. Facebook another one that decided to change its name for something that maybe turns out to be biggest money burn a company has ever done. Maybe tomorrow we will wake up with Pear instead of Apple, who knows. Now that I mentioned FB, what's the current status of the so called Metaverse? Are we there yet? Or are they still furiously pouring millions and millions and getting nothing out of it?
It's always weird when big, established names/brands attempt to rebrand.
Like "the artists formerly known as" Prince, Kanye, Snoop Dogg, etc. There's basically no getting away from the old branding because it has to be included with the new branding so one knows what we're even talking about.
As far as rebrands go, X just seems dumb. The more an article/news segment talk about X, it feels like an unfilled mad libs made it to air. Or it feels like they're talking about something general, like when X Company does Y thing.
Yeah this is probably the worst corporate rebranding I have ever seen. They replaced one of the most globally recognized brands with a generic and meaningless one. Plus the rollout was a mess, just like everything else post-musk Twitter does. Have they even gotten around to updating all of their own branding references yet?
> given the garbage heap Musk, and others, has made of Twitter.
It's a more vibrant, open, and honest community than ever, despite the organized and coordinated (I wonder by who) advertiser boycott. If anything, a lot of garbage has been removed from the heap.
X wants your biometric data, your job history, and your education history for 'safety, security, and identification purposes'.
Those items along with your posts and social network will be used for advertising, monetization, tracking, selling services to government agencies, and training their machine learning bots.
Not for your benefit, but for X's uses and benefits. You are the product and X wants to sell your info and attention so Elon can earn his money back.
I would love to know how the people blotivating about this think Twitter worked before.
Of course they’re doing ML training on user data! Every major tech company has been doing ML training on user data for over a decade at this point! Twitter used to have official tutorials on how to do ML on data you pulled from their APIs!
Seriously, how the f** do you all think their moderation algorithms worked before? Unicorn farts? Some guys in a room that could read tweets really fast?
The only thing that’s changed is that they feel the need to put it in their terms of service.
And no, this isn’t a defense of Twitter or Musk. If you actually give a shit about this and don’t just want to perform slacktivism on social media, get off ALL of these platforms. Migrate away from Gmail. Close your Facebook. Use paid services from trusted privacy-focused vendors.
Truly I do not understand why they are not even attempting to advantage of their competition actively driving people away. And they have history of Google+ to reflect on! But they're making the same mistake.
Bluesky and also the Fediverse are Open Platforms¹. That means your toot or post reaches anyone including those who want to train AI.
An open platform and open protocol makes it harder to prohibit AI bots from ingesting your published thoughts than when on a private, centralised service.
Mastodon, the fediverse, Bluesky, are really enabling AI learning more than prohibiting it.
¹ well, BSKY is really still just a single server.
Honestly Bluesky is more obsessed with Twitter and Musk than probably Twitter itself.
I spend a few days on there after getting the invite and didn't enjoy the experience. Every other post was making fun of Twitter/Musk and for some reason there's a lot of furry porn there.