I contacted support to opt out.. Here is the answer.
"Hi there,
Thank you for reaching out to Slack support. Your opt-out request has been completed.
For clarity, Slack has platform-level machine learning models for things like channel and emoji recommendations and search results. We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of customer data. Our published policies cover those here (https://slack.com/trust/data-management/privacy-principles), and as shared above your opt out request has been processed.
Slack AI is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data. Slack AI uses LLMs hosted directly within Slack’s AWS infrastructure, so that customer data remains in-house and is not shared with any LLM provider. This ensures that Customer Data stays in that organization’s control and exclusively for that organization’s use. You can read more about how we’ve built Slack AI to be secure and private here: https://slack.engineering/how-we-built-slack-ai-to-be-secure....
Just completely disingenuous that they still pretend to care about customer privacy after rolling this out with silent-opt-in and without adding an easy opt-out option in the Slack admin panel.
Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.
It's not like L1 support can make official statements on their own initiative. That was written by someone higher up and they're just copypasting it to panicked customers.
> We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.
Why would anyone not opt-out? (Besides not knowing they have to of course…)
Whats baffling to me is why companies think that when they slap AI on the press release, their customers will suddenly be perfectly fine with them scraping and monetizing all of their data on an industrial scale, without even asking for permission. In a paid service. Where the service is private communication.
I am not pro-exploiting users' ignorance for their data, but I would counter this with the observation that slapping AI on product suddenly makes people care about the fact that companies are monetizing on their usage data.
Monetizing on user activity data through opt-out collection is not new. Pretending that his phenomenon has anything to do with AI seems like a play for attention that exploits peoples AI fears.
I'll sandwich my comments with a reminder that I am not pro-exploiting users' ignorance for their data.
Sure - but isn't this a little like comparing manual wiretapping to dragnet? (Or comparing dragnet to ubiquitous scrape-and-store systems like those employed by five-eyes?)
Most people don't care, paid service or not. People are already used to companies stealing and selling their data up and down. Yes, this is absolutely crazy. But was anything substantial done against it before? No, hardly anyone was raising awareness against it. Now we keep reaping what we were sawing. The world keeps sinking deeper and deeper into digital fascism.
Companies do care: Why would you take additional risk of data leakage for free? In the best case scenario nothing happens but you also don't get anything out of it, in the worst case scenario extremely sensitive data from private chats get exposed and hits your company hard.
Companies are comprised of people. Some people in some enterprises care. I'd wager that in any company beyond a tiny upstart you'll have people all over the hierarchy that dont care. And some of them will be responsible for toggling that setting... Or not, because they just can't be arsed to with how little they care about the chat histories of the people they'll likely never even going to interact with being used to train some AI.
i mean, i am in complete agreement, but at least in theory the only reason for them to add AI to the product would be to make the product better, which would give you a better product per-dollar.
Because they don't seem to make it easy. It doesn't seem as a individual user I have any say in how my data is used, I have to contact the Workspace Owner. When I do I'll be asking them to look at alternative platforms instead.
"Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed."
I'm the one who picked Slack over a decade ago for chat, so hopefully my opinion still holds weight on the matter.
One of the primary reasons Slack was chosen was because they were a chat company, not an ad company, and we were paying for the service. Under these parameters, what was appropriate to say and exchange on Slack was both informally and formally solidified in various processes.
With this change, beyond just my personal concerns, there are legitimate concerns at a business level that need to be addressed. At this point, it's hard to imagine anything but self-hosted as being a viable path forward. The fact that chat as a technology has devolved into its current form is absolutely maddening.
> We offer Customers a choice around these practices.
I remembered the joke from The Hitchhiker's Guide to the Galaxy, maybe they will have a small hint in a very inconspicuous place, like inserting this into the user agreement on page 300 or so.
But the plans were on display…”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well, the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice, didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.
Seriously, for your sake; don't do this whole "I am the champion of Apple's righteousness" shtick. Apple doesn't care about privacy. That's the bottom line, you lack the authority to prove otherwise.
Because you might actually want to have the best possible global models ?
Think of "not opting out" as "helping them build a better product". You are already paying for that product, if there is anything you can do, for free and without any additional time investment on your side that makes their next release better, why not do it ?
You gain a better product for the same price, they get a better product to sell. It might look like they get more than you do in the trade, and that's probably true; but just because they gain more does not mean you lose. A "win less / win more" situation is still a win-win. (It's even a win-win-win if you take into account all the other users of the platform).
Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.
How could this make slack a better product? The platform was very convenient for sharing files and proprietary information with coworkers, but now I can't trust that slack won't slip in some "opt out if you don't want us to look at your data" "setting" in the future.
I don't see any cogent generative AI tie-in for slack, and I can't imagine any company that would value a speculative, undefined hypothetical benefit more than they value their internal communications remaining internal.
> Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.
The problem with this reasoning, at least from what I am understanding is that you don't really know when/where the training of you data crosses the line into information you don't want to share until it's too late. It's also a slippery slope.
> Think of "not opting out" as "helping them build a better product"
I feel like someone would only have this opinion if they've never ever dealt with any in the tech industry, or capitalist, in their entire life. So like 8-19 year olds? Except even they seem to understand that the profit absolutist goals undermine everything.
This idea has the same smell as "We're a family" company meetings.
I for one consider it my duty to bravely sacrifice my privacy to the alter of corporate profit so that the true beauty of LLM trained in emojis and cat gifs can bring humanity to the next epoch.
> Think of "not opting out" as "helping them build a better product"
Then they can simply pay me for that. I have zero interest in helping any company improve their products for free -- I need some reasonable consideration in return. For example, a percent of their revenues from products that use my data in their development. I'm totally willing to share the data with them for 2-3% of their revenues, that seems acceptable to me.
Yep, much like just about every credit card company shares your personal information BY DEFAULT with third parties unless you explicitly opt out (this includes Chase, Amex, Capital One, but likely all others).
For Chase Personal and Amex you can opt out in the settings. When you get new credit cards these institutions have the default setting to sharing your data. For Capital One you need to call them and have a chit chat that you want to exercise the restriction advertised in their privacy policy and they'll do it for you.
PG&E has a "Do not sell my info" form.
For other institutions, go check the settings and read the privacy policies.
I don't see the point of Rocket Money. They seem like they exist to sell your info.
You should keep track of your own subscriptions. My way of doing this is to have a separate zero-annual-fee credit card ONLY for subscriptions and I never use that card for anything else. That way I can cleanly see all my subscriptions on that credit card's bill, cleanly laid out, one per line, without other junk. I can also quickly spot sudden increases in monthly charges. I also never use that card in physical stores so that reduces the chance of a fraud incident where I need to cancel that card and then update all my subscriptions.
If you want to organize it even more, get a zero-annual-fee credit card that lets you set up virtual cards. You can then categorize your subscriptions (utilities, car, cloud/API, media, memberships, etc.) and that lets you keep track of how much you're spending on each category each month.
What I don't understand is why it's an opt-out not opt-in. From a legal standpoint, it seems if there is an option to not have something done to you, it should be the default. For example, people don't have to opt-out of giving me all their money when they pass by my house, even if I were to try to claim it's part of my terms of service.
I'm willing to bet that for smaller companies, they just won't care enough to consider this an issue and that's what Slack/Salesforce is hedging on.
I can't see a universe in which large corpos would allow such blatant corporate espionage for a product they pay for no less. But I can already imagine trying to talk my CTO (who is deep into the AI sycophancy) into opting us out is gonna be arduous at best.
I'd be surprised if any legal department in any company with one will not freak the f out when they read this. They will likely loose the biggest customers first, so even if it is 1% of customers, it will likely affect their bottom line enough to give it a second though. I don't see how they might profit from an in-house LLM more than from their enterprise-tier plans.
Their customer support will have a hell of a day today.
…a choice that’s carefully hidden deep in the ToS and requires a special person to send a special e-mail instead of just adding an option to the org admin interface.
> For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data
This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.
It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.
They need to reword this. Whoever wrote it is a liability.
Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.
> How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.
Hm, now you mention it, I don't think I've ever seen this specific example.
Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg
Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.
Sure lawyers wrote it but I'd bet a lot there's a product or business person standing behind the lawyer saying - "we want to do this but don't be so obvious about it because we don't want to scare users away". And so lawyers would love to be very upfront about what is happening because that's the best way to avoid liability. However, that conflicts with what the business wants, and because the lawyer will still refuse to write anything that's patently inaccurate, you end up with a weasel word salad that is ambiguous and unhelpful.
> If you want to exclude your Customer Data from helping train Slack global models, you can opt out.
So Customer Data is not used to train models "used broadly across all of our customers [in such a way that ...]", but... it is used to help train global models. Uh.
To me it says that they _do_ train global models with customer data, but they are trying to ensure no data leakage (which will be hard, but maybe not impossible, if they are training with it).
The caveats are for “local” models, where you would want the model to be able to answer questions about discussions in the workspace.
It makes me wonder how they handle “private” chats, can they leak across a workspace?
Presumably they are trying to train a generic language model which has very low recall for facts in the training data, then using RAG across the chats that the logged on user can see to provide local content.
My intuition is that it's impossible to guarantee there are no leaks in the LLM as it stands today. It would surely require some new computer science to ensure that no part of any output that could ever possibly be developed isn't sensitive data from any of the input.
It's one thing if the input is the published internet (even if covered by copyright), it's entirely another to be using private training data from corporate water coolers, where bots and other services routinely send updates and query sensitive internal services.
You and many others here are conflating AI and LLMs. They are not the same thing. LLMs are a type of AI model that produce content, and if trained on customer data would gladly replicate them. However, the TOS explictly say that Generative AI models (=LLMs) are taken off the shelf and not retrained on customer data.
Before LLMs exploded a year and a half ago lots of other AI models had been in place for several years in a lot of systems, handling categorization, search results ranking, etc. As they do not generate text or other content, they cannot leak data. The linked FAQ provides several examples of features based on such models, which are not LLMs: for example, they use customer data to determine a good emoji to suggest for reacting to a message based on the sentiment of the message. An emoji suggestion clearly has no potential of leaking customer data.
There is a way. Build a preference model from the sensitive dataset. Then use the preference model with RLAIF (like RLHF but with AI instead of humans) to fine-tune the LLM. This way only judgements about the LLM outputs will pass from the sensitive dataset. Copy the sense of what is good, not the data.
If you switch to Teams only for this reason I have some bad news for you - there’s no way Microsoft is not (or will not start in future) doing the same. And you’ll get a subpar experience with that (which is an understatement).
The Universal License Terms of Microsoft (applicable to Teams as well) clearly say they don't use customer data (Input) for training: https://www.microsoft.com/licensing/terms/product/ForallOnli...
Whether someone believes it or not, is another question, but at least they tell you what you want to hear.
I would guess Microsoft has a lot more government customers (and large customers in general) than Slack does. So I would think they have a lot more to loose if they went this route.
Unless your company makes a special arrangement with them, Microsoft will steal all of your data. It's at least in their TOS for Outlook, and for some reason doubt they wouldn't do the same for Teams.
Just don't use the "Team" feature of it to chat. Use chat groups and 1-to-1 of course. We use "Team" channels only for bots: CI results, alerts, things like that.
Meetings are also chat groups. We use the daily meeting as the dev-team chat itself so it's all there. Use Loops to track important tasks during the day.
I'm curious what's missing/broken in Teams that you would rather not have chat at all?
The idea that Slack makes companies work better needs some proof behind it, I’d say the amount of extra distraction is a net negative… but as with a lot of things in software and startups nobody researches anything and everyone writes long essays about how they feel things are.
Distraction is not enforced. Learning to control your attention and how to help yourself do it is crucial whatever you do in whatever time and in whatever technological context or otherwise. It is the most long term valuable resource you have.
I think we start to recognize this at larger scale.
Slack easily saves a ton of time solving complex problems that require interaction and expertise of a lot of people, often unpredictable number of them for each problem. They can answer with delay, in a good culture this is totally accepted and people still can independently move forward or switch tasks if necessary, same as with slower communication tools. You are not forced to answer with any particular lag, however slack makes it possible when needed to reduce it to zero.
Sometimes you are unsure if you need help or you can do smthing on your own. I certainly know that a lot of times eventually I had no chance whatsoever, because knowledge requires was too specialized, this is not always clear. Reducing barriers to communication in those cases is crucial and I don't see Slack being in the way here, only helpful.
The goal of organizing Slack is such that you pay right amount of attention to right parts of communication for you. You can do this if you really spend (hmm) attention trying to figure out what that is and how to tune your tools to achieve that.
That’s a lot of words with no proof isn’t it, it’s just your theory. Until I see a well designed study on such things I struggle to believe the conjecture you make either way. It could be quite possible that you benefit from Slack and I don’t.
Even receiving a message and not responding can be disruptive and on top I’d say being offline or ignoring messages is impossible in most companies.
This is your choice to trust only statements backed by scientific rigour or trying things out and applying to your way of life. This is just me talking to you, in that you are correct.
Regarding “receiving a message”: my devices are allowed only limited use of notifications. Of all the messaging/social apps only messages from my wife in our messaging app of choice pop us as notifications. Slack certainly is not allowed there
Good point, could be that it reduces friction too far in some instances. However, in general less communication doesn't seem better for the bottom line.
I'm not sure chat apps improve business communications. They are ephemeral, with differing expectations on different teams. Hardly what I'd label as "cohesive"
Async communications are critical to business success, to be sure -- I'm just not convinced that chat apps are the right tool.
From what I’ve seen (not much actually) Most channels can be replaced by a forum style discussion board. Chat can be great for 1:1 and small team interactions. And for tool interactions.
They are quite well differentiating generative AI models, that are not trained on customer data because they could leak customer data, and other types of AI models (e.g. recommendation systems) that do not work by reproducing content.
The examples in the linked page are quite informative of the use cases that do use customer data.
Nah. Whoever decided to create the reality their counsel is dancing around with this disclaimer is the actual problem, though it's mostly a problem for us, rather than them.
I'm imagining a corporate slack, with information discussed in channels or private chats that exists nowhere else on the internet.. gets rolled into a model.
Then, someone asks a very specific question.. conversationally.. about such a very specific scenario..
Seems plausible confidential data would get out, even if it wasn't attributed to the client.
Not that it’s possible to ask an llm how a specific or random company in an industry might design something…
Gandalf is a great tool for bringing awareness to AI hacking, but Lakera also trains on the prompts you provide when you play the game. See the bottom of that page.
"Disclaimer: we may use the fully anonymized input to Gandalf to improve Gandalf and for Lakera AI's products and services. "
Sometimes the obvious questions are met with a lot of silence.
I don't think I can be the only one who has had a conversation with GPT about something obscure they might know but there isn't much about online, and it either can't find anything... or finds it, and more.
I think it's as clear as it can be, they go into much more detail and provide examples in their bullet points, here are some highlights:
Our model learns from previous suggestions and whether or not a user joins the channel we recommend. We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data.
We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy. Simply put, our model can't reconstruct the search query or result. Instead, it learns from team-specific, contextual information like the number of times a message has been clicked in a search or an overlap in the number of words in the query and recommended message.
These suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.
To do this while protecting Customer Data, we might use an etrnal model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.
Whatever lawyer wrote that should be fired. This poorly written nonsense makes it look like Slack is trying to look shady and subversive. Even if well intended this is a PR blunder.
> They need to reword this. Whoever wrote it is a liability.
Wow you're so right. This multi-billion dollar company should be so thankful for your comment. I can't believe they did not consult their in-house lawyers before publishing this post! Can you believe those idiots? Luckily you are here to save the day with your superior knowledge and wisdom.
In this case they mean leak into the global model — so no. You can have sovereignty of your data if you use an open protocol like IRC or Matrix, or a self-hosted tool like Zulip, Mattermost, Rocket Chat, etc
In your experience. I have deployed single tenant SaaS for years, very much not Fortune 500. It's really not hard to give each customer their own server.
That's exactly my point. "File over app"[1] is just as relevant for businesses as it is for individuals — if you don't want your data to be used for training, then take sovereignty of it.
Ah good catch. Yes the "D" key can be used to switch light/dark mode, but I didn't account for the bookmark shortcut. That should be fixed now. Thanks!
"Will not" allows the existence of a bridge but it's not on your route and you say you're not going to go over it. "Cannot" is the absence of a bridge or the ability to cross it.
I'm confused about this statement:
"When developing AI/ML models or otherwise analyzing Customer Data, Slack can’t access the underlying content. We have various technical measures preventing this from occurring"
"Can't" is a strong word. I'm curious how an AI model could access data, but Slack, Inc itself couldn't. I suspect they mean "doesn't" instead of "can't", unless I'm missing something.
I also find the word "Slack" in that interesting. I assume they mean "employees of Slack", but the word "Slack" obviously means all the company's assets and agents, systems, computers, servers, AI models, etc.
I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic. Like, when I heard the word "can't" my brain goes to: there is nothing anyone in the company could do, within the bounds of the law, to do this. Employees at Slack could turn off the technical measures preventing this from occurring. Employees at Signal could push an app update which side-channels all messages through to a different server, unencrypted.
Better phrasing is "Employees of Slack will not access the underlying content".
If it's verifiably E2EE then I consider "we can't access this" to be a fairly powerful statement. Sure, the source could change, but if you have a reasonable distribution mechanism (e.g. all users get the same code, verifiably reproducible) then that's about as good as you can get.
Privacy policies that state "we won't do XYZ" have literally zero value to me to the extent that I don't even look at them. If I give you some data, it's already leaked in my mind, it's just a matter of time.
> I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic.
I don't really agree with this statement. Signal literally can't read user data right now. The statement is true, why can't they use it?
If they can't use it, nobody can. there are no services that can't publish an update reversing any security measure available. Also doing that would be illegal, because it would render the statement "we can't access our users content" false.
In Slack case, it is totally different. Data is accessible by Slack systems, the statement "we can't access our users content" is already false. Probably what they mean is something along the lines of: "The data can't be accessed by our systems, but we have measures in place that block the access to most of our employees"
> Provisioning
To minimize the risk of data exposure, Slack adheres to the principles of least privilege and role-based permissions when provisioning access—workers are only authorized to access data that they reasonably must handle in order to fulfill their current job responsibilities. All production access is reviewed at
least quarterly.
As an engineer who has worked on systems that handle sensitive data, it seems straightforwardly to me to be a statement about:
1. ACLs
2. The systems that provision those ACLs
3. The policies that determine the rules those systems follow.
In other words, the model training batch job might run as a system user that has access to data annotated as 'interactions' (at timestamp T1 user U1 joined channel C1, at timestamp T2 user U2 ran a query that got 137 results), but no access to data annotated as 'content', like (certainly) message text or (probably) the text of users' queries. An RPC from the training job attempting to retrieve such content would be denied, just the same as if somebody tried to access someone else's DMs without being logged in as them.
As a general rule in a big company, you the engineer or product manager don't get to decide what the ACLs will look like no matter how much you might feel like it. You request access for your batch job from some kind of system that provisions it. In turn the humans who decide how that system work obey the policies set out by the company.
It's not unlike a bank teller who handles your account number. You generally trust them not to transfer your money to their personal account on the sly while they're tapping away at the terminal--not necessarily because they're law abiding citizens who want to keep their job, but because the bank doesn't make it possible and/or would find out. (A mom and pop bank might not be able to make the same guarantee, but Bank of America does.) [*]
In the same vein, this is a statement that their system doesn't make it possible for some Slack PM to jack their team's OKRs by secretly training on customer data that other teams don't use, just because that particular PM felt like ignoring the policy.
[*] Not a perfect analogy, because a bank teller is like a Slack customer service agent who might, presumably after asking for your consent, be able to access messages on your behalf. But in practice I doubt there's a way for an employee to use their personal, probably very time-limited access to funnel that data to a model training job. And at a certain level of maturity a company (hopefully) also no longer makes it possible for a human employee to train a model in a random notebook using whatever personal data access they have been granted and then deploy that same model to prod. Startups might work that way, though.
"If you can read assembly, all programs are open source."
Sure, it's easier and less effort if the program is actually open source, but it's absolutely still possible to verify on bytecode, decompiled or disassembled programs, too.
Eugh. Has anyone compiled a list of companies that do this, so I can avoid them? If anyone knows of other companies training on customer data without an easy highly visible toggle opt out, please comment them below.
Synology updated this policy back in March (Happened to be a Friday afternoon).
Services Data Collection Disclosure
"Synology only uses the information we obtain from technical support requests to resolve your issue. After removing your personal information, we may use some of the technical details to generate bug reports if the problem was previously unknown to implement a solution for our products."
"Synology utilizes the information gathered through technical support requests exclusively for issue resolution purposes. Following the removal of personal data, certain technical details may be utilized for generating bug reports, especially for previously unidentified problems, aimed at implementing solutions for our product line. Additionally, Synology may transmit anonymized technical information to Microsoft Azure and leverage its OpenAI services to enhance the overall technical support experience. Synology will ensure that personally identifiable information, such as names, phone numbers, addresses, email addresses, IP addresses and product serial numbers, is excluded from this process."
I used to just delete privacy policy update emails and the like but now I make a habit of going in to diff them to see if these have been slipped in.
if your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resist
We can fight back by not posting anything useful or accurate to the internet until there are protections in place and each person gets to decide how their data is used and whether they are compensated for it.
How could this possibly comply with European "right to be forgotten" legislation? In fact, how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).
This "ai" scam going on now is the ultimate convoluted process to hide sooo much tomfuckery: theres no such thing as copyright anymore! this isn't stealing anything, its transforming it! you must opt out before we train our model on the entire internet! (and we still won't spits in our face) this isn't going to reduce any jobs at all! (every company on earth fires 15% of everyone immediately) you must return to office immediately or be fired! (so we get more car data teehee) this one weird trick will turn you into the ultimate productive programmer! (but we will be selling it to individuals not really making profitable products with it ourselves)
and finally the most aggregious and dangerous: censorship at the lowest level of information before it can ever get anywhere near peoples fingertips or eyeballs.
> how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).
I don't believe that is the current interpretation of GDPR, etc. - if the model is trained, it doesn't have to be deleted due to a RTBF request afaik. there is significant legal uncertainty here
Recent GDPR court decisions mean that this is probably still non-compliant due to the fact that it is opt-out rather than opt-in. Likely they are just filtering out all data produced in the EEA.
> Likely they are just filtering out all data produced in the EEA.
Likely they are just hoping to not get caught and/or consider it cost of doing business. GDPR has truly shown us (as if we didn't already know) that compliance must be enforced.
We really need to start using self-hosted solutions. Like matrix / element for team messaging.
It's ok not wanting to run your own hardware at your own premises. But the solution is to run a solution that is end-to-end encrypted so that the hosting service cannot get at the data. cryptpad.fr is another great piece of software.
You could checkout [Campfire](https://once.com/campfire). You get teh source code (Ruby on Rails) and you deploy wherever. We're running ours on a digital ocean droplet.
The incentive for first party tool providers to do this is going to be huge, whether its Slack, Google, Microsoft, or really any other SaaS tool. Ultimately, if business want to avoid getting commoditized by their vendors, they need be in control of their data, and their AI strategy. And that probably ultimately means turning off all of these small-utility-very-expensive-and-might-ruin-your-business features, and actually creating a centralized, access controlled, well governed knowledge base which you can plug any open source or black box LLM, from any provider.
all your files? no way that cozy of a blanket statement can be true. if you kept cycling in drives full of /dev/random you could fill up M$ servers with petabytes of junk? sounds like an appealing weekend project
a fun little internal only stock picker tool, that suggests to us some fun stocks to buy based on the anonymised real time inner monologue of hundreds of thousands of tech companies
Tokens (outside of a few trillion ) are worthless imo, I think OAI has pushed that limit, let the others chase them with billions into the ocean of useless conversational data and drown.
>Our mission is to build a product that makes work life simpler, more pleasant and more productive.
I know it would be impossible but I wish we go back to the days when we didn't have Slack (or tools alike). Our Slack is a cesspool of people complaining, talking behind other people's backs, echo chamber of negativity etc.
That probably speaks more to the overall culture of the company, but Slack certainly doesn't help.You can also say "tool is not the problem, people are" - sure, we can always explain things away, but Slack certainly plays a role here.
Your company sucks. I’ve used slack at four workplaces and it’s not been at all like that. A previous company had mailing lists and they were toxic as you describe. The tool was not the issue.
Yeah, written communication is harder than in-person communication.
It’s easy to come across poorly in writing, but that issue has no easy resolution unless you’re prepared to ban Slack, email, and any other text-based communication system between employees.
Slack can sometimes be a place for people who don’t feel heard in conventional spaces to vent — but that’s an organisational problem, not a Slack problem.
HN isn't really a bastion of media literacy or tech criticism. If you ever ask "does [some technology] affect [something qualitative] about [anything]", the response on hn is always going to be "technology isn't responsible, it's how the technology is used that is responsible!", asserting, over and over again, that technology is always neutral.
The idea that the mechanism of how people communicate affects what people communicate is a pretty foundational concept in media studies (a topic which is generally met with a hostile audience on HN). Slack almost certainly does play a role, but people who work in technology are incentivized to believe that technology does not affect people's behaviors, because that belief allows people who work in technology to be free of any and all qualitative or moral judgements on any grounds; the assertion that technology does not play a role is something that technology workers cling to because it absolves them of all guilt in all situations, and makes them, above all else, innocent in every situation. On the specific concept of a medium of communication affecting what is being communicated, McLuhan took these ideas to such an extreme that it's almost ludicrous, but he still had some pretty interesting observations worth thinking on, and his writing on this topic is some of the earlier work. This is generally the place where people first look, because much of the other work assumes you've understood McLuhan's work in advance. https://en.wikipedia.org/wiki/Understanding_Media
No, I don’t think Slack does play a role in this. It is quite literally a communication tool (and I’d argue one that encourages far _more_ open communication than others).
If Slack is a cesspool, that’s because your company culture is a cesspool.
> That probably speaks more to the overall culture of the company
Yep. Fun fact, my last workplace had a fairly nontoxic Slack... but there was a whole second Slack dedicated to bitching and shitposting where the bosses weren't invited. Humans gonna human.
I disagree slack plays a role. You only mentioned human aspects, nothing to do with technology. There was always going to be instant messaging as software once computers and networks were invented. You'd just say this happens over email and blame email.
To add some nuance to this conversation, what they are using this for is Channel recommendations, Search results, Autocomplete, and Emoji suggestion and the model(s) they train are specific to your workspace (not shared between workspaces). All of which seem like they could be handled fairly privately using some sort of vector (embeddings) search.
I am not defending Slack, and I can think of number of cases where training on slack messages could go very badly (ie, exposing private conversations, data leakage between workspaces, etc), but I think it helps to understand the context before reacting. Personally, I do think we need better controls over how our data is used and slack should be able to do better than "Email us to opt out".
> the model(s) they train are specific to your workspace (not shared between workspaces)
That's incorrect -- they're stating that they use your "messages, content, and files" to train "global models" that are used across workspaces.
They're also stating that they ensure no private information can leak from workspace to workspace in this way. It's up to you if you're comfortable with that.
From the wording, it sounds like they are conscious of the potential for data leakage and have taken steps to avoid it. It really depends on how they are applying AI/ML. It can be done in a private way if you are thoughtful about how you do it. For example:
Their channel recommendations:
"We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data"
Meaning they use a non-slack trained model to generate embeddings for search. Then they apply a recommender system (which is mostly ML not an LLM). This sounds like it can be kept private.
Search results:
"We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy"
Again, this is probably a combination of non-slack trained embeddings with machine learning algos based on engagement. This sounds like it can be kept private and team specific.
autocomplete:
"These suggestions are local and sourced from common public message phrases in the user’s workspace."
I would be concerned about private messages being leaked via autocomplete, but if it's based on public messages specific to your team, that should be ok?
Emoji suggestions:
"using the content and sentiment of the message, the historic usage of the emoji [in your team]"
Again, it sounds like they are using models for sentiment analysis (which they probably didn't train themselves and even if they did, don't really leak any training data) and some ML or other algos to pick common emojis specific to your team.
To me these are all standard applications of NLP / ML that have been around for a long time.
The way it's written means this just isn't the case. They _MAY_ use it for what you have mentioned above. They explicitly say "...here are a few examples of improvements..." and "How Slack may use Customer Data" (emph mine). They also... may not? And use it for completely different things that can expose who knows what via prompt hacking.
Agreed, and that is my concern as well that if people get too comfortable with it then companies will keep pushing the bounds of what is acceptable. We will need companies to be transparent about ALL the things they are using our data for.
We will only see more of this as time goes on and there’s so little impetus for companies to do anything that respects privacy when this data is power and freely accessible by virtue of owning the servers. If you haven’t read Technofuedalism, the to;dr is that what we once saw as tools for our entertainment or convenience now own our inputs and use us as free labour, and operate at scales where we have little to no power to protect our interests. For a quick summary: https://www.wired.com/story/yanis-varoufakis-technofeudalism...
> If you want to exclude your Customer Data from helping train Slack global models, you can opt out.
I don't understand how both these statements can be true. If they are using your data to train models used across workspaces then it WILL leak. If they aren't then why do they need an opt out?
Edit: reading through the examples of AI use at the bottom of the page (search results, emoji suggestions, autocomplete), my guess is this policy was put in place a decade ago and doesn't have anything to do with LLMs.
They're saying they won't train generative models that will literally regurgitate your text, my guess is classifiers are fair game in their interpretation
You are assuming they're saying that, because it's one charitable interpretation of what they're saying.
But they haven't actually said that. It also happens that people say things based on faulty or disputed beliefs of their own, or people willfully misrprepresent things, etc
Until they actually do say something as explicit as what you suggest, they haven't said anything of the sort.
> Data will not leak across workspaces. For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of Customer Data.
I feel like that is explicitly what this is saying.
The problem is, it's really really hard to guarantee that.
Yes if they only train say, classifiers, then the only thing that can leak is the classification outcome. But these things can be super subtle. Even a classifier could leak things if you can hack the context fed into it. They are really playing with fire here.
If is also hard to guarantee that, in a multi-tenant application, users will never see other users' data due to causes like mistakes AuthZ logic, caching gone awry, or other unpredictable situations that come up in distributed systems—yet even before the AI craze we were all happy to use these SaaS products anyway. Maybe this class of vulnerability is indeed harder to tame than most, but third-party software has never been without risks.
The OP privacy policy explicitly states that autocompletion algorithms are part of the scope. "Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions."
And this can leak: for instance, typing "a good business partner for foobars is" might not send that text upstream per se, but would be consulting a local model whose training data would have contained conversations that other Slack users are having about brands that provide foobars. How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token? And sure, one could build an adversarial model that attempts to minimize this kind of leakage, but is Slack incentivized to create such a thing vs. just building an optimal autocomplete as quickly as possible?
Even if it were just creating classifiers, similar leakages could occur there, albeit requiring more effort and time from attackers to extract actionable data.
I can't blame Slack for wanting to improve their product, but I'd also encourage any users with proprietary conversations to encourage their admins to opt out as soon as possible.
> How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token?
This is explained literally in the next sentence after the one you quoted: "We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm."
If all the global model sees is {similarity: 0.93, past_interactions: 6, recommendation_accepted: true} then there is no way to leak tokens, because not only are the tokens not part of the output, they're not even part of the input. But such a simple model could still be very useful for sorting the best autocomplete result to the top.
yeah i absolutely agree that even classifiers can leak, and the autocorrect thing sounds like i was wrong about generative (it sounds like an n-gram setup?)... although they also say they don't train LLMs (what is an n-gram? still an LM, not large... i guess?)
This reminds me of a company called C3.ai which claims in its advertising to eliminate hallucations using any LLM. OpenAI, Mistral, and others at the forefront of this field can't manage this, but a wrapper can?? Hmm...
Right? I take Slack and Salesforce at their word. They’re good companies and look out for the best interests of their customers. They have my complete trust.
> Emoji suggestion: Slack might suggest emoji reactions to messages using the content and sentiment of the message, the historic usage of the emoji and the frequency of use of the emoji in the team in various contexts. For instance, if [PARTY EMOJI] is a common reaction to celebratory messages in a particular channel, we will suggest that users react to new, similarly positive messages with [PARTY EMOJI].
Finally someone has figured out a sensible application for "AI". This is the future. Soon "AI" will have a similar connotation as "NFT".
"leadership" at my company tallies emoji reactions to their shitty slack messages and not reacting with emojies over a period of time is considered a slight against them.
I had to up my slack emoji game after joining my current employer
> To do this while protecting Customer Data, we might use an external model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.
This is so stupid and needlessly complicated. And all it does is remove personality from messages, suggesting everyone conforms to the same reactions.
Finally. I am all for this AI if it is going to learn and suggest my passive aggressive "here" emoji that I use when someone @here s on a public channel with hundreds of people for no good reason.
I would expect this from free service, but from paid service with non trivial cost... It seems insane... Maybe whole model of doing business is broken...
So if you want to opt out, there's no setting to switch, you need to send an email with a specific subject:
> Contact us to opt out. [...] To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” [...]
Hm. is this on iOS or Android, and what version? This is first i've heard of this; it should be rock solid. Am wondering if you're stuck on an ancient version or something.
i think i can count the number of crashes i've had with Element X iOS on fingers of one hand since dogfooding it since last summer (whereas classic Element iOS would crash every few days). Is this on Android or iOS? Was it a recent build? If so, which one? It really shouldn't be crashing.
What happened to "when you're not paying, you're not the customer, you're the product.". Here people are clearly paying, and yet, slowly that is being eroded as well.
> Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com
Sounds like an invitation for malicious compliance. Anyone can email them a huge text with workspace buried somewhere and they have to decipher it somehow.
Example [Answer is Org-12-Wp]:
"
FORMAL DIRECTIVE AND BINDING COVENANT
WHEREAS, the Parties to this Formal Directive and Binding Covenant, to wit: [Your Name] (hereinafter referred to as "Principal") and [AI Company Name] (hereinafter referred to as "Technological Partner"), wish to enter into a binding agreement regarding certain parameters for the training of an artificial intelligence system;
AND WHEREAS, the Principal maintains control and discretion over certain proprietary data repositories constituting segmented information habitats;
AND WHEREAS, the Principal desires to exempt one such segmented information habitat, namely the combined loci identified as "Org", the region denoted as "12", and the territory designated "Wp", from inclusion in the training data utilized by the Technological Partner for machine learning purposes;
NOW, THEREFORE, in consideration of the mutual covenants and promises contained herein, the receipt and sufficiency of which are hereby acknowledged, the Parties agree as follows:
DEFINITIONS
1.1 "Restricted Information Habitat" shall refer to the proprietary data repository identified by the Principal as the conjoined loci of "Org", the region "12", and the territory "Wp".
OBLIGATIONS OF TECHNOLOGICAL PARTNER
2.1 The Technological Partner shall implement all reasonably necessary technical and organizational measures to ensure that the Restricted Information Habitat, as defined herein, is excluded from any training data sets utilized for machine learning model development and/or refinement.
2.2 The Technological Partner shall maintain an auditable record of compliance with the provisions of this Formal Directive and Binding Covenant, said record being subject to inspection by the Principal upon reasonable notice.
REMEDIES
3.1 In the event of a material breach...
[Additional legalese]
IN WITNESS WHEREOF, the Parties have executed this Formal Directive and Binding Covenant."
Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.
That includes a period before inside the quotes which would suggest a period inside the subject line.
There's some shenanigans going on here, of course. The section I was looking at says...
To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com with your workspace/org URL and the subject line ‘Slack global model opt-out request’.
I don't think this is shenanigans, just inconsistent use of English rules. I'd be floored if a judge ruled the period was pertinent here; there's no linguistic value or meaning to it (i.e. the meaning of the message is the same with or without it).
I'd be very surprised to hear that Slack was allowed to propose a free-form communication method, impose rigid rules on it, and then deny anyone who fails to meet those rigid rules. If Slack was worried about having to read all of these emails, they should have made a PDF form or just put it on a toggle in the Workspace settings. This is a self-inflicted problem.
For reference, I got an email back saying I was opted out and a bunch of justifications about why it's okay what they did and zero mention of the legality of opting people in by default.
This is, once again, why I wanted us to go to self-hosted Mattermost instead of Slack. I recognize Slack is probably the better product (or mostly better), but you have to own your data.
> We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.
Sick and tired of these default opt in explicit opt out legalese.
This feels like a corporate greed play, on what should be a relatively simple chat application. Slack has quickly become just another enterprise solution in search of shareholder value at expensive of data privacy. Regulation of these companies should be more apparent to people, but sadly, is not.
Wow. I understand business models that are freemium but for a premium priced B2B product? This feels like an incredible rug pull. This changes things for me.
"We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data."
I think this deserves more attention. For many tasks like contextual recommendations, you can get most of the way by using an off-the-shelf model, but then you get a floating-point output and need to translate it into a binary "show this to the user, yes or no?" decision. That could be a simple thresholding model "score > θ", but that single parameter still needs to be trained somehow.
I wonder how many trainable parameters people objecting to Slack's training policy would be willing to accept.
Not to be glib, but this why we built Tonic Textual (www.tonic.ai/textual). It’s both very challenging and very important to protect data in training workflows. We designed Textual to make it easy to both redact sensitive data and replace it with contextually relevant synthetic data.
To add on to this: I think it should be mentioned that Slack says they'll prevent data leakage across workspaces in their model, but don't explain how they do this. They don't seem to go into any detail about their data safeguards and how they're excluding sensitive info from training. Textual is good for this purpose since it redacts PII thus preventing it from being leaked by the trained model.
How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "11 spices - mix with 2 cups of white flour ... 2/3 teaspoons of salt, 1/2 teaspoons of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 70 years
Fair question, but you have to consider the realistic alternatives. For most of our customers inaction isn't an option. The combination of NER models + synthesis LLMs actually handles these types of cases fairly well. I put your comment into our web app and this was the output:
How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "17 spices - mix with 2lbs of white flour ... half teaspoon of salt, 1 tablespoon of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 75 years.
Whatever the models used, or type of data within accounts this operates on, this clause would be red lined in most of the big customer accounts that have leverage during the sales/renewal process. Small to medium accounts will be supplying most of this data.
I wonder how many people that are really mad about these guys or SE using their professional output to train models thought commercial artists were just being whiny sore losers when Deviant Art, Adobe, OpenAI, Stability, et al did it to them.
squarely in the former camp. there's something deeply abhorrent about creating a place that encourages people to share and build and collaborate, then turning around and using their creative output to put more money in shareholder pockets.
i deleted my reddit and github accounts when they decided the millions of dollars per month they're receiving from their users wasn't enough. don't have the power to move our shop off slack but rest assured many will as a result of this announcement.
Yeah I haven't put a new codebease on GH in years. It's kind of a PITA hosting my own gitea server for personal projects but letting MS copy my work to help make my professional skillset less valuable is far less palatable.
Companies doing this would make me much less angry if they used an opt-in model only for future data. I didn't have a crystal ball and I don't have a time machine, so I simply can't stop these companies from using my work for their gain.
Compared to hosting other things? Nothing! It's great.
Hosting my own service rather than using a free SaaS solution that is entirely someone else's problem? There's a significant difference there. I've been running Linux servers either professionally or personally for almost 25 years, so it's not like it's a giant problem... but my work has been increasingly non-technical over the past 5 years or so, so even minor hiccups require re-acclimating myself to the requisite constructs and tools (wait, how do cron time patterns work? How do I test a variable in bash for this one-liner? How do iptables rules work again?)
It's not a deal breaker, but given the context, it's definitely not not a pain in the ass, either.
Thanks for elaborating! I'm a retired grunt and tech is just a hobby for me. I host my own Gitea with the same challenges, but to me looking up cron patterns etc. is the norm, not the exception, so I don't think much about it.
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
How does one technically opt-out after model training is completed? You can't exactly go into the model and "erase" parts of the corpus post-hoc.
Like when you send an email to feedback@slack.com with that perfect subject like (jeez, really?) what exactly does the customer support rep do on their end to opt you out?
Now is definitely the time to get/stay loud. If it dies down, the precedent has been set.
It’s very difficult to ensure no data leakage, even for something like an Emoji prediction model. If you can try a large number of inputs and see what the suggested emoji is, that’s going to give you information about the training set. I wouldn’t be surprised to see a trading company or two pop up trying to exploit this to get insider information.
Indeed. I flagged this internally the first thing this morning, and fully expect to field questions from our clients over the next couple of weeks as the news percolates through to upper echelons.
I can tolerate Slack and/or Salesforce at large building per-customer overlays on top of a generic LLM. Those at least can provide actual business value[ß], and give their AI teams something to experiment on. But feeding gazillion companies' internal (and workspace-joined!) chats to a global model? Hell no.
Unsurprisingly, we opted out a few hours ago.
ß: a smart, context-aware autocomplete for those who need to type a lot on their phones would not be a bad idea. The current generation of autocorrupt is obnoxious.
IMHO people should create "googlebomb" channels to mess with their training, maybe with the goal to get their autocomplete to offer offensive or nonsensical suggestions.
> we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data
They don't "build" them this way (whatever that means) but if training data is somehow leaked, they're off the hook because they didn't build it that way?
While Slack emphasizes that customers own their data, the default of Customer Data being used to train AI/ML models (even if aggregated and disassociated) may not align with all customers' expectations of data ownership and control.
> Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.
This is not ok. We didn't have to reach out by email to sign up, this should be a toggle in the UI. This is deliberately high friction.
It seems like we've entered an era where not only are you paying for software with money, you're also paying for software with your data, privacy implications be damned. I would love to see people picking f/oss instead.
1. Great UX folks almost never work for free. So the UX of nearly all OSS is awful.
2. Great software comes from a close connection to users. When your software is an OS kernel that works just fine for programmers, but how many OSS folks want to spend their free time on zoom talking to hundreds of businesses and understanding their needs, so they can give them free software?
The good news for FOSS is that the UX of most commercial software is also awful and generally getting worse. The bad news is that FOSS software is copying a lot of the same UX trends.
This is as systemically concerning as the data practices seen on Discord with integrations like statbot.net, though at least Slack is being transparent about it. Regardless, I find all of this highly problematic.
Disord's TOS used to say "we may sell all your convos, including your private ones". Then some time later, they suddenly they changed it to noooo, we would never sell aaanything, and didn't even update the "last changed" date. I deleted my Discord account and stopped using them immediately after I noticed the TOS, but them sneakily trying to cover it up later completely ruined any lingering trust I might have had in them.
And this is just one of many, many problems associated with the platform.
I can’t believe Slack added a bunch of AI features, without having admins opt into enabling them, and then put out a policy that requests that you send an email to have an opt out from your data being used for training. All of this should be opt-in and should respect the administrator’s prerogative. Very irresponsible for Salesforce (parent company) and I’ll be reconsidering if we continue using them, if this is the low trust way in which they will operate. We don’t have time to keep policing these things.
> What Firefox’s search data collection means for you
> We understand that any new data collection might spark some questions. Simply put, this new method only categorizes the websites that show up in your searches — not the specifics of what you’re personally looking up.
> Sensitive topics, like searching for particular health care services, are categorized only under broad terms like health or society. Your search activities are handled with the same level of confidentiality as all other data regardless of any local laws surrounding certain health services.
> Remember, you can always opt out of sending any technical or usage data to Firefox. Here’s a step-by-step guide on how to adjust your settings. We also don’t collect category data when you use Private Browsing mode on Firefox.
> As far as user experience goes, you won’t see any visible changes in your browsing. Our new approach to data will just enable us to better refine our product features and offerings in ways that matter to you.
Can we organize some sort of boycott somehow somewhere? Something in court possible?
This is not just some analytics data or email names, this is potential leakage of secrets and private conversations for thousands of companies and millions of individuals.
Slack has been using customer data for ML for years. Look at their search feature - it uses learning to rank, a machine learning approach that tracks content, clicks etc.
It sounds like the worry is this overfit generative AI will spew out some private input verbatim… which I can see happening honestly. Look at GitHub copilot, it’s almost a copy paste machine.
I honestly don't know what special kind of idiot OKd this project. In any finance org, or finance dept of public cos, these chats will include material non public information. If such information leaks across internal Chinese walls or externally, Slack opens itself up to customer litigation and SEC enforcement actions.
> To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information)
> We have technical controls in place to prevent access. When developing AI/ML models or otherwise analyzing Customer Data, Slack can’t access the underlying content
*> you want to exclude your Customer Data from helping train Slack global models, you can opt out.
Discord is not a better option than Slack. They are basically the same thing. Matrix is a better option from a privacy standpoint, just not from a UX one.
Not this. It's a pretty crazy policy. As far as I know no other major tech company (Google, Microsoft, Amazon, etc.) uses their customers' private data to train public AI because that would be suicidal.
I think you are ignoring the most fundamental fact of the Tech Industry. Regardless of what they may say, they break laws, regulations and societal norms in the name of disruption, and have a tendency to lobby to have their abhorrent behavior legalized after the fact.
If one is honest about it, one can plainly see the outcomes this kind of behavior has for tech is hardly ever negative. “Suicidal” though it may seem the just outcome, is hardly in alignment with reality.
Let's wait and see. I imagine Slack is going to backpedal on this pretty quickly.
> Regardless of what they may say, they break laws, regulations and societal norms in the name of disruption, and have a tendency to lobby to have their abhorrent behavior legalized after the fact.
This isn't illegal though. Or at least not clearly illegal. It's suicidal because customers are going to absolutely hate it, not because it's illegal.
My company is moving from Google to Microsoft over imaginary security concerns (despite the fact that Microsot has been hacked far more than Google!). Imagine if there are real deliberate issues like this.
Operating unlicensed taxis and hotels was clearly illegal at the time Uber and Airbnb came along. Was subsequently legal-washed via lobbying efforts. If one looks closely enough and beyond personal bias, one can see it in numerous companies. For example, all social has participated in censorship by proxy programs with the US Federal Government. It seemed for a time that X was beyond that but there are now occurrences that look as though they are on board too.
In the case of Slack, they informed everyone of their intent with the recent TOS update. They’ll cut their biggest customers a discount in the name of revenue sharing and this will all go away. Because nobody but you and I and about 200 other people even care about this.
Wasn’t there a big discussion on HN when Slack changed their privacy policy to say that they will steal all private conversations and use them for AI and some such?
Ha, the perfect embodiment of how capitalism works. Use public or private data in this case to build models to profit. Even in a paid service where your data is supposed to be private, you are not even in control of how your data is used.
Imagine thinking content you post online is NOT used to train AI data, in 2024. Seriously, just imagine for a second being that befuddled and out of touch.
Well I really hope this massively blows up in their face when all of Europe goes to work just about now, and then North America in 5-8 hours. Let's see if we have another Helldivers 2 event that makes them do a hard backpedal after losing thousands of large customers that will not under any circumstances take the chance.
I have a friend with a law firm who just called me yesterday for advice as he's thinking about switching to Slack from Teams. I gave him a glowing recommendation because it is literally night and day, but there is no way in hell he takes any chance any sensitive legal discussions leak out through prompt hacking. He might even be liable himself for knowingly using a tool that spells out "we read and reuse your conversations".
Out of those, Mattermost was the easiest to setup (just need PostgreSQL and a web server, in addition to the main container), however not being able to easily permanently delete instead of just archiving workspaces was awkward. Nextcloud Talk was very easy to get going if you already have Nextcloud but felt a bit barebones last I checked, whereas Rocket.Chat was overall the more pleasant option to use, although I wasn't the biggest fan of them using MongoDB for storage.
The user experience is pretty good with all of them, however in the groups that I've been a part of, ultimately nobody cared about self-hosting an instance, since most orgs just prefer Teams/Slack (or even Skype for just chatting/meetings) and most informal groups just default to Discord. Oh well.
The problem is not technical, but social with these platforms.
i.e. How do you convince 40+ people from 5 countries to add yet another memory resident chat application and fragment their knowledge to another app/mental space?
This gets way harder as the community becomes more dynamic and temporary (i.e. high circulation like students). I gave the good fight last year with someone, and they just didn't flex a nanometer citing ergonomics of Slack is way better than alternatives, and didn't care about data mining (was a possibility back then) or keeping older messages at ransom.
> i.e. How do you convince 40+ people from 5 countries to add yet another memory resident chat application and fragment their knowledge to another app/mental space?
If it's a company, you can just be like: "Hey, we use this platform for communication, you can log in with your Active Directory credentials."
It also has the added benefit of acting as a directory for every employee in the company, so getting in touch can be more convenient than e-mail (while you can also customize the notification preferences, so it doesn't get too spammy), as opposed to the situation which might develop, where some teams or org units are on Slack, others on Teams and getting in touch can be more messy.
If it's a free-form social group, then you can throw that idea away because of network effects, it'd be an uphill battle, same as how sometimes people complain about people using Discord for various communities, but at the same time the reality is that old school forums and such were also killed off - since most people already have a Discord account and there's less friction to just use that.
Either way, I'm happy that self-hosted software like that exists.
That's a big if, and the answer is "No" in my case. If it was, that comment wouldn't be there.
It's not a "social group" either, but a group of independent institutions working together. It's like a large gear-train. A lot of connections between small islands of people. So you have to work together, and have to find a way somehow. So, it's complicated.
> Either way, I'm happy that self-hosted software like that exists.
Me too. I happen to manage a Nextcloud instance, but nobody is interested in the "Talk" module.
> But you can opt out, right? So what’s the problem?
This thinking is the problem. "Oh, we just added your entire private/privileged/NDA/corporate information to our training set without your consent. What's the problem?"
Opt-out must be the default.
Edit: By "Opt-out must be the default." I mean: no one's data must be included until they explicitly give consent via an opt-in :)
Especially since once it has been trained, it is in the model, and I am not aware of any way anyone has discovered to later remove from the model single or selected training data points, except for re-training/re-learning the model. So basically the crime might already be done.
But I also know that so many businesses are too sluggish to make a switch and employees incapable of understanding the risk. So unfortunately not all of Europe will switch away. But I hope a significant number gives them the middle finger.
There's something called "machine unlearning" being worked on to address these issues.
This doesn't mean that I support Slack or any opt-in without consent training model. On the contrary. I don't have any OpenAI/Midjourney/etc. account, and don't plan to have one.
Worth noting: This is a legal requirement in Europe
The GDPR mandates that consent is given affirmatively, with this kind of "oh we put it in the EULA nobody reads" being explicitly called out as non-compliant.
The way to opt out is by contacting support,in an era where opt ins and outs should be handled by a toggle button.
Either they don't expect many people to wish to opt their slacks out, or they're aware of the asynchronous friction this introduces and they don't care.
I was at a VC conference last year and if I learned nothing else there, I learned how to spell "AI". Every single exhibitor just about had their signage proudly proclaiming their capabilities in this area, but one in particular struck me.
They were touting the API integrations they could offer to train their "Enterprise AI"/LLM, and among those integrations were things like M365, Slack, etc.
It struck me because of the garbage in, garbage out problem. I'd like to think that the amount of shitposting I do on Slack personally will poison that particular well of training data, but this seems to point to a larger problem to me.
LLM's don't have a concept of truth or reality, or awareness of any sort. If the training data they are fed is poorly quality checked/unsanitized by human intelligence, the outputs will be as useless/noisy as the original data set. It feels to me that in the frothy rush to capture market buzz and VC, this is being forgotten.
What makes you think I don't shitpost in the #engineering channel?
And heuristics don't even scratch the surface of the bigger problem where it's trained on people who aren't great at their jobs but type a lot of words on slack about circling back on KPIs.
Through regularization techniques, data augmentation, loss functions, and gradient optimization, ensuring the model focuses on meaningful patterns and reduces overfitting to noise.
It’s not obvious how any of those would do anything but better approximate the average of a noisy dataset. RLHF might help, but only if it’s not done by idiots.
Most shitposting is probably more straightforward to understand than business communication or press releases where realizing what wasn't said often carries more insight than the things that were said.
Of course training an AI model on simple, straightforward and honest data provides good results. That's the essence behind "textbooks is all you need" which lead to the phi LLMs. Those are great small LLMs. But if you want your model to understand the complexity of human communication you have to include it in your training data.
If you subscribe to the idea that to be the very best text completion engine possible you would need to have a perfect understanding of reality itself, how different humans perceive reality differently, and how they choose to communicate about this perception and their interaction with reality, themselves and other humans, then it's not unreasonable to expect that back-propagation would eventually find that optimal representation if given enough data, the right architecture and enough processing power. Or at least come somewhat close. In that paradigm there is no "bad data", only insufficient or badly balanced datasets. Just don't try doing that with a 3B parameter LLM.
The best LLMs were trained on data from the open internet, which is full of garbage. They still do a pretty good job (granted it has been fine tuned and RLHF'd, but you can do that with Slack data too)
Yes, consider an existing LLM being given “shitpost-y” messages and asking it if there is anything interesting in there. It could probably summarize it well and that could then be used for training another LLM.
This assumes everything in the training data set is accurate. Sometimes people are wrong, obtuse, sarcastic, etc. LLM's don't have any way of detecting or accounting for this, do they?
That output, then being used to train other LLM's, just creates an ouroboros of AI generated dogshit.
LLMs are state-of-the-art at detecting sarcasm. It won't help if the data is just wrong though.
Edit: https://arxiv.org/abs/2312.03706 Human performance on this benchmark (detecting sarcasm in Reddit comments) was 0.82, a BERT-based LLM scored 0.79.
The training data doesn't need to be strictly accurate. If it was, you'd just be programming a deterministic robot. The whole point is the feed it actual human language. Giving it shitposts and sarcasm is literally what makes it good. Think of it like 100 people guessing the number of marbles in a jar. Average their guesses and it will be very close. The training data is the guesses, the inference is the average.
And yet human civilization has survived the fact that many humans are wrong, lying, delusional, etc. There is no assumption that everything in our personal training set is accurate. In fact, things work better when we explicitly reject that idea.
LLMs do not rely on 100% factually accurate inputs. Sure, you’d rather have less BS than more, but this is all statistics. Just like most people realize that flat earthers are nutty, LLMs can ingest falsehoods without reducing output quality (again, subject to statistics)
Same for Reddit or Facebook groups. There's a lot of shitposting there, but absolutely a lot of valuable information if LLMs manage to separate the wheat from the chaff.
The sheer scale of data on the long tail. Sure, the head is already a trash pile and has been for decades now, but there is plenty of non-monetized information all over the internet that is barely linked to or otherwise discoverable.
This is the crux of it, and where I'm wondering if I'm missing something. Can it, today? My understanding is it cannot discern reality from fiction, thus "hallucinations" (a misnomer because it implies awareness, which these probability models lack).
The poorly named hallucinations are creation of ideas from provided prompts, which ideas are not grounded in reality. It isn't the mistaken adjudication of the reality of a provided prompt.
I think what you're missing is assuming that what an LLM "reads" thinks is a true statement. Shitposting is almost like meta slang. I feel like that's a necessary thing for it to train on to truly understand language. I feel like people underestimate the depth LLMs can pick up on.
Along the same lines, phi-3 is kind of a sign of what you can do if you focus only on high quality data. It seems like while yes, quantity is very important, quality almot matters just as much.
“Do not access the Services in order to build a similar or competitive product or service or copy any ideas, features, functions, or graphics of the Services;” https://slack.com/acceptable-use-policy
I wonder if OpenAI wishes they could train ChatGPT on their corporate chat history? How ironic ? Or they don’t care?
They know, as well, that opt-in simply wouldn’t give them the scale they’d need for meaningful training data. They’re being very intentionally self interested and unconcerned with their customers best interests.
The gold rush for data is wild. Private companies selling us out.
- Slack
- Discord
- Reddit
- Stackoverflow
Let’s just hope this data gold rush dies out faster than the web3 craze before OpenAI reaches critical mass and gets access to government server farms.
Alphabet boys have server farms of domestic and foreign surveillance and intelligence. Exabytes of data [1]
That thread was a mischaracterization and a misunderstanding. The toggle simply exposed UI entry points to AI integrations that users could then opt to use, with consent.
The data it has is incredibly valuable if they build their own product or sell it off. You essentially have org charts of entire companies and people asking for things and getting responses and working back and forth together long term.
In terms of building agents for doing real work this could be more valuable than things like Reddit.
Meh, tbh I think these guys live in a bit of a dream world about how much their data is worth. While investors and corporate partners will rush to these companies for their data, I'm not really convinced random internet conversations are going to push anything forward but let them sell shovels. Most of the miners always go broke.
Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.
If a company opts out, do you guarantee that all information from their instance that you have already used for training is somehow completely removed from the "global models" you have used it to train?
If not, it's not really an opt-out, is it? The data remains compromised.
"Hi there,
Thank you for reaching out to Slack support. Your opt-out request has been completed.
For clarity, Slack has platform-level machine learning models for things like channel and emoji recommendations and search results. We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of customer data. Our published policies cover those here (https://slack.com/trust/data-management/privacy-principles), and as shared above your opt out request has been processed.
Slack AI is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data. Slack AI uses LLMs hosted directly within Slack’s AWS infrastructure, so that customer data remains in-house and is not shared with any LLM provider. This ensures that Customer Data stays in that organization’s control and exclusively for that organization’s use. You can read more about how we’ve built Slack AI to be secure and private here: https://slack.engineering/how-we-built-slack-ai-to-be-secure....
Kind regards, Best regards,"