Hacker News new | past | comments | ask | show | jobs | submit login
Slack AI Training with Customer Data (slack.com)
784 points by mlhpdx 67 days ago | hide | past | favorite | 401 comments



I contacted support to opt out.. Here is the answer.

"Hi there,

Thank you for reaching out to Slack support. Your opt-out request has been completed.

For clarity, Slack has platform-level machine learning models for things like channel and emoji recommendations and search results. We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of customer data. Our published policies cover those here (https://slack.com/trust/data-management/privacy-principles), and as shared above your opt out request has been processed.

Slack AI is a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data. Slack AI uses LLMs hosted directly within Slack’s AWS infrastructure, so that customer data remains in-house and is not shared with any LLM provider. This ensures that Customer Data stays in that organization’s control and exclusively for that organization’s use. You can read more about how we’ve built Slack AI to be secure and private here: https://slack.engineering/how-we-built-slack-ai-to-be-secure....

Kind regards, Best regards,"


So in otherwords: we feed our AI your data. We share that with no one else. (Wait until our next TOS update)


Just completely disingenuous that they still pretend to care about customer privacy after rolling this out with silent-opt-in and without adding an easy opt-out option in the Slack admin panel.


Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.

I guess that was what you did?


Exactly, yes.


I got the same canned response. Followed up asking for more details and got crickets.


It's not like L1 support can make official statements on their own initiative. That was written by someone higher up and they're just copypasting it to panicked customers.


> We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.

Why would anyone not opt-out? (Besides not knowing they have to of course…)

Seems like only a losing situation.


Whats baffling to me is why companies think that when they slap AI on the press release, their customers will suddenly be perfectly fine with them scraping and monetizing all of their data on an industrial scale, without even asking for permission. In a paid service. Where the service is private communication.


I am not pro-exploiting users' ignorance for their data, but I would counter this with the observation that slapping AI on product suddenly makes people care about the fact that companies are monetizing on their usage data.

Monetizing on user activity data through opt-out collection is not new. Pretending that his phenomenon has anything to do with AI seems like a play for attention that exploits peoples AI fears.

I'll sandwich my comments with a reminder that I am not pro-exploiting users' ignorance for their data.


People care because AI will gladly regurgitate whatever it learns.


Sure - but isn't this a little like comparing manual wiretapping to dragnet? (Or comparing dragnet to ubiquitous scrape-and-store systems like those employed by five-eyes?)

Scale matters


Most people don't care, paid service or not. People are already used to companies stealing and selling their data up and down. Yes, this is absolutely crazy. But was anything substantial done against it before? No, hardly anyone was raising awareness against it. Now we keep reaping what we were sawing. The world keeps sinking deeper and deeper into digital fascism.


Companies do care: Why would you take additional risk of data leakage for free? In the best case scenario nothing happens but you also don't get anything out of it, in the worst case scenario extremely sensitive data from private chats get exposed and hits your company hard.


Companies are comprised of people. Some people in some enterprises care. I'd wager that in any company beyond a tiny upstart you'll have people all over the hierarchy that dont care. And some of them will be responsible for toggling that setting... Or not, because they just can't be arsed to with how little they care about the chat histories of the people they'll likely never even going to interact with being used to train some AI.


i mean, i am in complete agreement, but at least in theory the only reason for them to add AI to the product would be to make the product better, which would give you a better product per-dollar.


Because they don't seem to make it easy. It doesn't seem as a individual user I have any say in how my data is used, I have to contact the Workspace Owner. When I do I'll be asking them to look at alternative platforms instead.

"Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed."


You can always quit your job right? /s


I'm the one who picked Slack over a decade ago for chat, so hopefully my opinion still holds weight on the matter.

One of the primary reasons Slack was chosen was because they were a chat company, not an ad company, and we were paying for the service. Under these parameters, what was appropriate to say and exchange on Slack was both informally and formally solidified in various processes.

With this change, beyond just my personal concerns, there are legitimate concerns at a business level that need to be addressed. At this point, it's hard to imagine anything but self-hosted as being a viable path forward. The fact that chat as a technology has devolved into its current form is absolutely maddening.


> Why would anyone not opt-out?

This is basically like all privacy on the internet.

Everyone WOULD opt-out, if it was easy, and it becomes a whack-a-optput game.

note how you opt-out (generic contact us), and what happens when you do opt-out (they still train anyway)


Opt out should be the default by law


This is how GDPR works, explicit opt-in consent is needed from the customer.

https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-re...


so upgrades and customer approves everything? slippery slope to over regulation


The status quo is consumer abuse.


I’d take over regulation every day over unabashed user abuse in the name of free markets.


Hence the cookie banners


When we send our notice, we are going to be sending a notice that we want none of our data used for any ML training from Slack or anyone else.


> We offer Customers a choice around these practices.

I remembered the joke from The Hitchhiker's Guide to the Galaxy, maybe they will have a small hint in a very inconspicuous place, like inserting this into the user agreement on page 300 or so.


But the plans were on display…” “On display? I eventually had to go down to the cellar to find them.” “That’s the display department.” “With a flashlight.” “Ah, well, the lights had probably gone.” “So had the stairs.” “But look, you found the notice, didn’t you?” “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.


Never more true than with apple.

Activating an iphone for example has a screen devoted to how privacy is important!

It will show you literally thousands of pages of how they take privacy seriously!

(and you can't say NO anywhere in the dialog, they just show you)

They are normalizing "you cannot do anything", and then everyone does it.


Which third party does Apple sell your profile to or your data to?

What do they do with the content of your emails for advertising?

Which credit card networks at which retailers feed that same profile?

So it’s certainly “more true” with, well, every other vendor you can buy a phone from in a telco store or big box store.


> Which third party does Apple sell your profile to or your data to?

almost every single municipal government including but not limited to politically sanctioned nations: https://www.apple.com/legal/transparency/

Seriously, for your sake; don't do this whole "I am the champion of Apple's righteousness" shtick. Apple doesn't care about privacy. That's the bottom line, you lack the authority to prove otherwise.


why do they feel entitled to collect and use the data for any purpose if I don't want it?


> Why would anyone not opt-out?

Because you might actually want to have the best possible global models ? Think of "not opting out" as "helping them build a better product". You are already paying for that product, if there is anything you can do, for free and without any additional time investment on your side that makes their next release better, why not do it ?

You gain a better product for the same price, they get a better product to sell. It might look like they get more than you do in the trade, and that's probably true; but just because they gain more does not mean you lose. A "win less / win more" situation is still a win-win. (It's even a win-win-win if you take into account all the other users of the platform).

Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.


How could this make slack a better product? The platform was very convenient for sharing files and proprietary information with coworkers, but now I can't trust that slack won't slip in some "opt out if you don't want us to look at your data" "setting" in the future.

I don't see any cogent generative AI tie-in for slack, and I can't imagine any company that would value a speculative, undefined hypothetical benefit more than they value their internal communications remaining internal.


Do I have free access to and use of those models? If not, I don't care to help them.


> Of course, if you value the privacy of these data a lot, and if you believe that by allowing them to train on them it is actually going to risk exposing private info, the story changes. But then you have an option to say stop. It's up to you to measure how much you value "getting a better product" vs "estimated risk of exposing some information considered private". Some will err on one side, some on the other.

The problem with this reasoning, at least from what I am understanding is that you don't really know when/where the training of you data crosses the line into information you don't want to share until it's too late. It's also a slippery slope.


> Think of "not opting out" as "helping them build a better product"

I feel like someone would only have this opinion if they've never ever dealt with any in the tech industry, or capitalist, in their entire life. So like 8-19 year olds? Except even they seem to understand that the profit absolutist goals undermine everything.

This idea has the same smell as "We're a family" company meetings.


I for one consider it my duty to bravely sacrifice my privacy to the alter of corporate profit so that the true beauty of LLM trained in emojis and cat gifs can bring humanity to the next epoch.


> Think of "not opting out" as "helping them build a better product"

Then they can simply pay me for that. I have zero interest in helping any company improve their products for free -- I need some reasonable consideration in return. For example, a percent of their revenues from products that use my data in their development. I'm totally willing to share the data with them for 2-3% of their revenues, that seems acceptable to me.


"Best" and "better" is doing a lot of extremely heavy lifting here.

Are you sure you actually want what's hiding under those weasel words?


Because it's default opt-in, and most people won't see this announcement.


Yep, much like just about every credit card company shares your personal information BY DEFAULT with third parties unless you explicitly opt out (this includes Chase, Amex, Capital One, but likely all others).


how do you opt out of these? I do share my data with rocket money though since there's no good alternatives :(


For Chase Personal and Amex you can opt out in the settings. When you get new credit cards these institutions have the default setting to sharing your data. For Capital One you need to call them and have a chit chat that you want to exercise the restriction advertised in their privacy policy and they'll do it for you.

PG&E has a "Do not sell my info" form.

For other institutions, go check the settings and read the privacy policies.

I don't see the point of Rocket Money. They seem like they exist to sell your info.

You should keep track of your own subscriptions. My way of doing this is to have a separate zero-annual-fee credit card ONLY for subscriptions and I never use that card for anything else. That way I can cleanly see all my subscriptions on that credit card's bill, cleanly laid out, one per line, without other junk. I can also quickly spot sudden increases in monthly charges. I also never use that card in physical stores so that reduces the chance of a fraud incident where I need to cancel that card and then update all my subscriptions.

If you want to organize it even more, get a zero-annual-fee credit card that lets you set up virtual cards. You can then categorize your subscriptions (utilities, car, cloud/API, media, memberships, etc.) and that lets you keep track of how much you're spending on each category each month.


What I don't understand is why it's an opt-out not opt-in. From a legal standpoint, it seems if there is an option to not have something done to you, it should be the default. For example, people don't have to opt-out of giving me all their money when they pass by my house, even if I were to try to claim it's part of my terms of service.


I'm willing to bet that for smaller companies, they just won't care enough to consider this an issue and that's what Slack/Salesforce is hedging on.

I can't see a universe in which large corpos would allow such blatant corporate espionage for a product they pay for no less. But I can already imagine trying to talk my CTO (who is deep into the AI sycophancy) into opting us out is gonna be arduous at best.


I’d be surprised if more 1% opt out.


I'd be surprised if any legal department in any company with one will not freak the f out when they read this. They will likely loose the biggest customers first, so even if it is 1% of customers, it will likely affect their bottom line enough to give it a second though. I don't see how they might profit from an in-house LLM more than from their enterprise-tier plans.

Their customer support will have a hell of a day today.


…a choice that’s carefully hidden deep in the ToS and requires a special person to send a special e-mail instead of just adding an option to the org admin interface.


Opting out in this way may implicitly opt you in to workspace specific models.


> For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data

This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.

It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.

They need to reword this. Whoever wrote it is a liability.


> They need to reword this. Whoever wrote it is a liability

Sounds like it’s been written specifically to avoid liability.


I'm sure it was lawyers. It's always lawyers.


Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.


> How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.

Hm, now you mention it, I don't think I've ever seen this specific example.

Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg

Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.


Sure lawyers wrote it but I'd bet a lot there's a product or business person standing behind the lawyer saying - "we want to do this but don't be so obvious about it because we don't want to scare users away". And so lawyers would love to be very upfront about what is happening because that's the best way to avoid liability. However, that conflicts with what the business wants, and because the lawyer will still refuse to write anything that's patently inaccurate, you end up with a weasel word salad that is ambiguous and unhelpful.


Especially when a few paragraphs below they say:

> If you want to exclude your Customer Data from helping train Slack global models, you can opt out.

So Customer Data is not used to train models "used broadly across all of our customers [in such a way that ...]", but... it is used to help train global models. Uh.


To me it says that they _do_ train global models with customer data, but they are trying to ensure no data leakage (which will be hard, but maybe not impossible, if they are training with it).

The caveats are for “local” models, where you would want the model to be able to answer questions about discussions in the workspace.

It makes me wonder how they handle “private” chats, can they leak across a workspace?

Presumably they are trying to train a generic language model which has very low recall for facts in the training data, then using RAG across the chats that the logged on user can see to provide local content.


My intuition is that it's impossible to guarantee there are no leaks in the LLM as it stands today. It would surely require some new computer science to ensure that no part of any output that could ever possibly be developed isn't sensitive data from any of the input.

It's one thing if the input is the published internet (even if covered by copyright), it's entirely another to be using private training data from corporate water coolers, where bots and other services routinely send updates and query sensitive internal services.


You and many others here are conflating AI and LLMs. They are not the same thing. LLMs are a type of AI model that produce content, and if trained on customer data would gladly replicate them. However, the TOS explictly say that Generative AI models (=LLMs) are taken off the shelf and not retrained on customer data. Before LLMs exploded a year and a half ago lots of other AI models had been in place for several years in a lot of systems, handling categorization, search results ranking, etc. As they do not generate text or other content, they cannot leak data. The linked FAQ provides several examples of features based on such models, which are not LLMs: for example, they use customer data to determine a good emoji to suggest for reacting to a message based on the sentiment of the message. An emoji suggestion clearly has no potential of leaking customer data.


There is a way. Build a preference model from the sensitive dataset. Then use the preference model with RLAIF (like RLHF but with AI instead of humans) to fine-tune the LLM. This way only judgements about the LLM outputs will pass from the sensitive dataset. Copy the sense of what is good, not the data.


Why are these kinda things opt-out? And need to be discovered..

We're literally discussing switching to Teams at my company (1500 employees)


If you switch to Teams only for this reason I have some bad news for you - there’s no way Microsoft is not (or will not start in future) doing the same. And you’ll get a subpar experience with that (which is an understatement).


The Universal License Terms of Microsoft (applicable to Teams as well) clearly say they don't use customer data (Input) for training: https://www.microsoft.com/licensing/terms/product/ForallOnli... Whether someone believes it or not, is another question, but at least they tell you what you want to hear.


What if they exfiltrate customer data to a data broker and they buy it back?

It's not customer data anymore.


I think a self hosted matrix/irc/jitsi is the way to do it.


We've been using Mattermost and it works very well. Better than Slack.

The only downside is their mobile app is a bit unreliable, in that it sometimes doesn't load threads properly.


I would guess Microsoft has a lot more government customers (and large customers in general) than Slack does. So I would think they have a lot more to loose if they went this route.


Unless your company makes a special arrangement with them, Microsoft will steal all of your data. It's at least in their TOS for Outlook, and for some reason doubt they wouldn't do the same for Teams.


> Why are these kinda things opt-out? And need to be discovered..

Monies.

> We're literally discussing switching to Teams at my company (1500 employees)

Considering what Microsoft does with its "New and Improved(TM)" Outlook and love for OpenAI, I won't be so eager...


You’d be better off just not having chat than switching to Teams.


we use Teams and it's fine.

Just don't use the "Team" feature of it to chat. Use chat groups and 1-to-1 of course. We use "Team" channels only for bots: CI results, alerts, things like that.

Meetings are also chat groups. We use the daily meeting as the dev-team chat itself so it's all there. Use Loops to track important tasks during the day.

I'm curious what's missing/broken in Teams that you would rather not have chat at all?


But the business will suffer by most likely being less successful due to less cohesive communication.. it's "The Ick" either way.


The idea that Slack makes companies work better needs some proof behind it, I’d say the amount of extra distraction is a net negative… but as with a lot of things in software and startups nobody researches anything and everyone writes long essays about how they feel things are.


Distraction is not enforced. Learning to control your attention and how to help yourself do it is crucial whatever you do in whatever time and in whatever technological context or otherwise. It is the most long term valuable resource you have.

I think we start to recognize this at larger scale.

Slack easily saves a ton of time solving complex problems that require interaction and expertise of a lot of people, often unpredictable number of them for each problem. They can answer with delay, in a good culture this is totally accepted and people still can independently move forward or switch tasks if necessary, same as with slower communication tools. You are not forced to answer with any particular lag, however slack makes it possible when needed to reduce it to zero.

Sometimes you are unsure if you need help or you can do smthing on your own. I certainly know that a lot of times eventually I had no chance whatsoever, because knowledge requires was too specialized, this is not always clear. Reducing barriers to communication in those cases is crucial and I don't see Slack being in the way here, only helpful.

The goal of organizing Slack is such that you pay right amount of attention to right parts of communication for you. You can do this if you really spend (hmm) attention trying to figure out what that is and how to tune your tools to achieve that.


That’s a lot of words with no proof isn’t it, it’s just your theory. Until I see a well designed study on such things I struggle to believe the conjecture you make either way. It could be quite possible that you benefit from Slack and I don’t.

Even receiving a message and not responding can be disruptive and on top I’d say being offline or ignoring messages is impossible in most companies.


This is your choice to trust only statements backed by scientific rigour or trying things out and applying to your way of life. This is just me talking to you, in that you are correct.

Regarding “receiving a message”: my devices are allowed only limited use of notifications. Of all the messaging/social apps only messages from my wife in our messaging app of choice pop us as notifications. Slack certainly is not allowed there


Your idea also comes with no proof, just your personal experience.


Which is extremely clear from what I’m saying, it’s completely anecdotal.


Good point, could be that it reduces friction too far in some instances. However, in general less communication doesn't seem better for the bottom line.


I'm not sure chat apps improve business communications. They are ephemeral, with differing expectations on different teams. Hardly what I'd label as "cohesive"

Async communications are critical to business success, to be sure -- I'm just not convinced that chat apps are the right tool.


From what I’ve seen (not much actually) Most channels can be replaced by a forum style discussion board. Chat can be great for 1:1 and small team interactions. And for tool interactions.


Obviously because no one would ever opt in.


Possible alternatives that may not be OpenAI behind the scenes.

https://www.producthunt.com/categories/team-collaboration

There's also a Lot of "Best Slack Alternatives in 2024" and such.


Teams is in the same AI business. Better go with open source chats: https://itsfoss.com/open-source-slack-alternative/


Ugh Teams = Microsoft. They are the worst when it comes to data privacy. I'm not sure how that is even a choice.


I'd make sure to do an extended trial run first. Painful transition.


Teams have better voice/video. But chat is far worse, absolutely shit, though Slack seems to be working to get there.


They are quite well differentiating generative AI models, that are not trained on customer data because they could leak customer data, and other types of AI models (e.g. recommendation systems) that do not work by reproducing content. The examples in the linked page are quite informative of the use cases that do use customer data.


so if I don't want slack to train on _anything_ what do I do? I still suspect everything now


Hope it's not doublespeak, ambiguity leaves it grey, maybe to play.


Opt out is such bullshit.


Nah. Whoever decided to create the reality their counsel is dancing around with this disclaimer is the actual problem, though it's mostly a problem for us, rather than them.


It’s a problem for them if it looses customer trust / customers.


if they lose enough, they will "sorry we got caught"

if they don't, they will not do anything


If it impacted their business significantly, it would restore some of the faith I've lost in humanity recently. Frankly, I'm not holding my breath.


I'm imagining a corporate slack, with information discussed in channels or private chats that exists nowhere else on the internet.. gets rolled into a model.

Then, someone asks a very specific question.. conversationally.. about such a very specific scenario..

Seems plausible confidential data would get out, even if it wasn't attributed to the client.

Not that it’s possible to ask an llm how a specific or random company in an industry might design something…


exactly. a fun game to see why it is so hard to prevent this

https://gandalf.lakera.ai/


Gandalf is a great tool for bringing awareness to AI hacking, but Lakera also trains on the prompts you provide when you play the game. See the bottom of that page. "Disclaimer: we may use the fully anonymized input to Gandalf to improve Gandalf and for Lakera AI's products and services. "


Sometimes the obvious questions are met with a lot of silence.

I don't think I can be the only one who has had a conversation with GPT about something obscure they might know but there isn't much about online, and it either can't find anything... or finds it, and more.


I think it's as clear as it can be, they go into much more detail and provide examples in their bullet points, here are some highlights:

Our model learns from previous suggestions and whether or not a user joins the channel we recommend. We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data.

We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy. Simply put, our model can't reconstruct the search query or result. Instead, it learns from team-specific, contextual information like the number of times a message has been clicked in a search or an overlap in the number of words in the query and recommended message.

These suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.

To do this while protecting Customer Data, we might use an etrnal model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.


Seems like time to start some slack workspaces and fill them with garbage. Maybe from Uncyclopedia (https://en.uncyclopedia.co/wiki/Main_Page)


The Riders of the Lost Kek dataset is an excellent candidate https://arxiv.org/abs/2001.07487


- Create a Slack account for your 95-year-old grandpa

- Exclude that one account from using the models, he's never going to use Slack anyway

- Now you can learn, memorise, or reproduce all the Customer Data you like


The problem is this also covers very reasonable use cases.

Use sampling across messages for spam detection, predicting customer retention, etc - pretty standard.

Then there's cases where you could have models more like llms that can output data from the training set but you're running them for that customer.


If you trained on customer data your service contains custom data.


Whatever lawyer wrote that should be fired. This poorly written nonsense makes it look like Slack is trying to look shady and subversive. Even if well intended this is a PR blunder.


> They need to reword this. Whoever wrote it is a liability.

Wow you're so right. This multi-billion dollar company should be so thankful for your comment. I can't believe they did not consult their in-house lawyers before publishing this post! Can you believe those idiots? Luckily you are here to save the day with your superior knowledge and wisdom.


In summary, you must opt-out if you want to exclude your data from global models.

Incredibly confusing language since they also vaguely state that "data will not leak across workspaces".

Use tools that cannot leak data not "will not".


Aren't all tools, essentially, just one API call away from "leaking data"?


In this case they mean leak into the global model — so no. You can have sovereignty of your data if you use an open protocol like IRC or Matrix, or a self-hosted tool like Zulip, Mattermost, Rocket Chat, etc


Most, if not all SaaS software is multi-tenant, so we've been living in the "will not" world for decades now.


In your experience.


The SaaS business model breaks if you go single tenant except in Fortune 500 enterprise


In your experience. I have deployed single tenant SaaS for years, very much not Fortune 500. It's really not hard to give each customer their own server.


That's exactly my point. "File over app"[1] is just as relevant for businesses as it is for individuals — if you don't want your data to be used for training, then take sovereignty of it.

[1] https://stephango.com/file-over-app


"File over app" is a good way of putting it!

Something strange is happening on your blog, fwiw: Bookmarking it via command + D flips the color scheme to "night mode" – is that intentional?


Ah good catch. Yes the "D" key can be used to switch light/dark mode, but I didn't account for the bookmark shortcut. That should be fixed now. Thanks!


what is the difference between "will not" and "cannot" in legalese?


"Will not" allows the existence of a bridge but it's not on your route and you say you're not going to go over it. "Cannot" is the absence of a bridge or the ability to cross it.


Wow, well explained.


My off leash dog will not bite you, he is well behaved. My dog at home cannot bite you, he is too far away.


I'm confused about this statement: "When developing AI/ML models or otherwise analyzing Customer Data, Slack can’t access the underlying content. We have various technical measures preventing this from occurring"

"Can't" is a strong word. I'm curious how an AI model could access data, but Slack, Inc itself couldn't. I suspect they mean "doesn't" instead of "can't", unless I'm missing something.


I also find the word "Slack" in that interesting. I assume they mean "employees of Slack", but the word "Slack" obviously means all the company's assets and agents, systems, computers, servers, AI models, etc.

I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic. Like, when I heard the word "can't" my brain goes to: there is nothing anyone in the company could do, within the bounds of the law, to do this. Employees at Slack could turn off the technical measures preventing this from occurring. Employees at Signal could push an app update which side-channels all messages through to a different server, unencrypted.

Better phrasing is "Employees of Slack will not access the underlying content".


Interestingly I'd probably go the other way.

If it's verifiably E2EE then I consider "we can't access this" to be a fairly powerful statement. Sure, the source could change, but if you have a reasonable distribution mechanism (e.g. all users get the same code, verifiably reproducible) then that's about as good as you can get.

Privacy policies that state "we won't do XYZ" have literally zero value to me to the extent that I don't even look at them. If I give you some data, it's already leaked in my mind, it's just a matter of time.


> I would find even a statement from Signal like "we can't access our users content" to be tenuous and overly-optimistic.

I don't really agree with this statement. Signal literally can't read user data right now. The statement is true, why can't they use it?

If they can't use it, nobody can. there are no services that can't publish an update reversing any security measure available. Also doing that would be illegal, because it would render the statement "we can't access our users content" false.

In Slack case, it is totally different. Data is accessible by Slack systems, the statement "we can't access our users content" is already false. Probably what they mean is something along the lines of: "The data can't be accessed by our systems, but we have measures in place that block the access to most of our employees"


From their white paper linked in the same comment

> Provisioning To minimize the risk of data exposure, Slack adheres to the principles of least privilege and role-based permissions when provisioning access—workers are only authorized to access data that they reasonably must handle in order to fulfill their current job responsibilities. All production access is reviewed at least quarterly.

so... seems like they very clearly can.


As an engineer who has worked on systems that handle sensitive data, it seems straightforwardly to me to be a statement about:

1. ACLs

2. The systems that provision those ACLs

3. The policies that determine the rules those systems follow.

In other words, the model training batch job might run as a system user that has access to data annotated as 'interactions' (at timestamp T1 user U1 joined channel C1, at timestamp T2 user U2 ran a query that got 137 results), but no access to data annotated as 'content', like (certainly) message text or (probably) the text of users' queries. An RPC from the training job attempting to retrieve such content would be denied, just the same as if somebody tried to access someone else's DMs without being logged in as them.

As a general rule in a big company, you the engineer or product manager don't get to decide what the ACLs will look like no matter how much you might feel like it. You request access for your batch job from some kind of system that provisions it. In turn the humans who decide how that system work obey the policies set out by the company.

It's not unlike a bank teller who handles your account number. You generally trust them not to transfer your money to their personal account on the sly while they're tapping away at the terminal--not necessarily because they're law abiding citizens who want to keep their job, but because the bank doesn't make it possible and/or would find out. (A mom and pop bank might not be able to make the same guarantee, but Bank of America does.) [*]

In the same vein, this is a statement that their system doesn't make it possible for some Slack PM to jack their team's OKRs by secretly training on customer data that other teams don't use, just because that particular PM felt like ignoring the policy.

[*] Not a perfect analogy, because a bank teller is like a Slack customer service agent who might, presumably after asking for your consent, be able to access messages on your behalf. But in practice I doubt there's a way for an employee to use their personal, probably very time-limited access to funnel that data to a model training job. And at a certain level of maturity a company (hopefully) also no longer makes it possible for a human employee to train a model in a random notebook using whatever personal data access they have been granted and then deploy that same model to prod. Startups might work that way, though.


Every company that promises "end-to-end encryption" is just pinky-swearing to you also. Like Telegram or WhatsApp


Telegram client is open source, so you can see what exactly happens there when you enable E2EE.


Reproducible builds somehow ?



Yeah, at least if the client is open source you could verify.


"If you can read assembly, all programs are open source."

Sure, it's easier and less effort if the program is actually open source, but it's absolutely still possible to verify on bytecode, decompiled or disassembled programs, too.


Eugh. Has anyone compiled a list of companies that do this, so I can avoid them? If anyone knows of other companies training on customer data without an easy highly visible toggle opt out, please comment them below.


Synology updated this policy back in March (Happened to be a Friday afternoon).

Services Data Collection Disclosure

"Synology only uses the information we obtain from technical support requests to resolve your issue. After removing your personal information, we may use some of the technical details to generate bug reports if the problem was previously unknown to implement a solution for our products."

"Synology utilizes the information gathered through technical support requests exclusively for issue resolution purposes. Following the removal of personal data, certain technical details may be utilized for generating bug reports, especially for previously unidentified problems, aimed at implementing solutions for our product line. Additionally, Synology may transmit anonymized technical information to Microsoft Azure and leverage its OpenAI services to enhance the overall technical support experience. Synology will ensure that personally identifiable information, such as names, phone numbers, addresses, email addresses, IP addresses and product serial numbers, is excluded from this process."

I used to just delete privacy policy update emails and the like but now I make a habit of going in to diff them to see if these have been slipped in.


Like the other poster it would be great to have a name and shame site that lists companies training on customer data


if your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resist

https://twitter.com/kepano/status/1688610782509211648

https://twitter.com/kepano/status/1682829662370557952


And the penalty is unnoticeable to these companies.


It would be easier to compile a list of companies that don't do this.

The list:


Nonsense. There are plenty of companies that don't have shit policies like this. A vast majority, even. Stop normalizing it.


My company does not do this and have no plans to do such a thing.


Lol, my favorite corpo-speak.

I’m not eating a steak and have no plans to eat a steak. Ask again tomorrow.


We can fight back by not posting anything useful or accurate to the internet until there are protections in place and each person gets to decide how their data is used and whether they are compensated for it.


*


*


How could this possibly comply with European "right to be forgotten" legislation? In fact, how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).


This "ai" scam going on now is the ultimate convoluted process to hide sooo much tomfuckery: theres no such thing as copyright anymore! this isn't stealing anything, its transforming it! you must opt out before we train our model on the entire internet! (and we still won't spits in our face) this isn't going to reduce any jobs at all! (every company on earth fires 15% of everyone immediately) you must return to office immediately or be fired! (so we get more car data teehee) this one weird trick will turn you into the ultimate productive programmer! (but we will be selling it to individuals not really making profitable products with it ourselves)

and finally the most aggregious and dangerous: censorship at the lowest level of information before it can ever get anywhere near peoples fingertips or eyeballs.


Machine Unlearning is a thing, see e.g. here [0] for a introduction.

[0] https://ai.stanford.edu/~kzliu/blog/unlearning


> how could any of these AI models comply with that? If a user requests to be forgotten, is the entire model retrained (I don't think so).

I don't believe that is the current interpretation of GDPR, etc. - if the model is trained, it doesn't have to be deleted due to a RTBF request afaik. there is significant legal uncertainty here

Recent GDPR court decisions mean that this is probably still non-compliant due to the fact that it is opt-out rather than opt-in. Likely they are just filtering out all data produced in the EEA.


> Likely they are just filtering out all data produced in the EEA.

Likely they are just hoping to not get caught and/or consider it cost of doing business. GDPR has truly shown us (as if we didn't already know) that compliance must be enforced.


We really need to start using self-hosted solutions. Like matrix / element for team messaging.

It's ok not wanting to run your own hardware at your own premises. But the solution is to run a solution that is end-to-end encrypted so that the hosting service cannot get at the data. cryptpad.fr is another great piece of software.


You could checkout [Campfire](https://once.com/campfire). You get teh source code (Ruby on Rails) and you deploy wherever. We're running ours on a digital ocean droplet.


Zulip (https://zulip.com/) seems to be a great self-hosted python-based alternative to Slack/Teams.


The incentive for first party tool providers to do this is going to be huge, whether its Slack, Google, Microsoft, or really any other SaaS tool. Ultimately, if business want to avoid getting commoditized by their vendors, they need be in control of their data, and their AI strategy. And that probably ultimately means turning off all of these small-utility-very-expensive-and-might-ruin-your-business features, and actually creating a centralized, access controlled, well governed knowledge base which you can plug any open source or black box LLM, from any provider.


"commoditized by their vendors" is exactly the phrase I was looking for. It's why I wanted my co to self-host Mattermost instead of using Slack.


It’s definitely a moral hazard (/opportunity). As a reminder, by default on windows 11 Microsoft syncs your files to their server.


all your files? no way that cozy of a blanket statement can be true. if you kept cycling in drives full of /dev/random you could fill up M$ servers with petabytes of junk? sounds like an appealing weekend project


We’ll only use it for…

choosing an emoji, and…

a fun little internal only stock picker tool, that suggests to us some fun stocks to buy based on the anonymised real time inner monologue of hundreds of thousands of tech companies


How can anyone in their right mind think building AI for emoji selection is a remotely good use of time...


I’d use that, at work. It would be a welcome improvement to their product.


it's just a justification for collecting tokens


Tokens (outside of a few trillion ) are worthless imo, I think OAI has pushed that limit, let the others chase them with billions into the ocean of useless conversational data and drown.


> "These types of thoughtful personalizations and improvements are only possible if we study and understand how our users interact with Slack."

LOL


>Our mission is to build a product that makes work life simpler, more pleasant and more productive.

I know it would be impossible but I wish we go back to the days when we didn't have Slack (or tools alike). Our Slack is a cesspool of people complaining, talking behind other people's backs, echo chamber of negativity etc.

That probably speaks more to the overall culture of the company, but Slack certainly doesn't help.You can also say "tool is not the problem, people are" - sure, we can always explain things away, but Slack certainly plays a role here.


Your company sucks. I’ve used slack at four workplaces and it’s not been at all like that. A previous company had mailing lists and they were toxic as you describe. The tool was not the issue.


Yeah, written communication is harder than in-person communication.

It’s easy to come across poorly in writing, but that issue has no easy resolution unless you’re prepared to ban Slack, email, and any other text-based communication system between employees.

Slack can sometimes be a place for people who don’t feel heard in conventional spaces to vent — but that’s an organisational problem, not a Slack problem.


HN isn't really a bastion of media literacy or tech criticism. If you ever ask "does [some technology] affect [something qualitative] about [anything]", the response on hn is always going to be "technology isn't responsible, it's how the technology is used that is responsible!", asserting, over and over again, that technology is always neutral.

The idea that the mechanism of how people communicate affects what people communicate is a pretty foundational concept in media studies (a topic which is generally met with a hostile audience on HN). Slack almost certainly does play a role, but people who work in technology are incentivized to believe that technology does not affect people's behaviors, because that belief allows people who work in technology to be free of any and all qualitative or moral judgements on any grounds; the assertion that technology does not play a role is something that technology workers cling to because it absolves them of all guilt in all situations, and makes them, above all else, innocent in every situation. On the specific concept of a medium of communication affecting what is being communicated, McLuhan took these ideas to such an extreme that it's almost ludicrous, but he still had some pretty interesting observations worth thinking on, and his writing on this topic is some of the earlier work. This is generally the place where people first look, because much of the other work assumes you've understood McLuhan's work in advance. https://en.wikipedia.org/wiki/Understanding_Media


Switch to Teams instead.

Only half-kidding, but it's an application which is so repulsive it seems to discourage people from communicating at all.


No, I don’t think Slack does play a role in this. It is quite literally a communication tool (and I’d argue one that encourages far _more_ open communication than others).

If Slack is a cesspool, that’s because your company culture is a cesspool.


I think open communication in a toxic environment can obviously amplify toxicity or at least less open communication can act as a damper on toxicity.

Slack is surely not the generator of toxicity but it seems obvious it could act at increasing the bandwidth.

You can't have it both ways.


> That probably speaks more to the overall culture of the company

Yep. Fun fact, my last workplace had a fairly nontoxic Slack... but there was a whole second Slack dedicated to bitching and shitposting where the bosses weren't invited. Humans gonna human.


Was not limited to just the bosses who were not invited. If you weren’t in the cool club you also did not get an invite.

A very inclusive company on paper that was very exclusionary behind the scenes.


What happened when someone from the cool club got promoted and became a boss?


I disagree slack plays a role. You only mentioned human aspects, nothing to do with technology. There was always going to be instant messaging as software once computers and networks were invented. You'd just say this happens over email and blame email.


Keyboard warriors


To add some nuance to this conversation, what they are using this for is Channel recommendations, Search results, Autocomplete, and Emoji suggestion and the model(s) they train are specific to your workspace (not shared between workspaces). All of which seem like they could be handled fairly privately using some sort of vector (embeddings) search.

I am not defending Slack, and I can think of number of cases where training on slack messages could go very badly (ie, exposing private conversations, data leakage between workspaces, etc), but I think it helps to understand the context before reacting. Personally, I do think we need better controls over how our data is used and slack should be able to do better than "Email us to opt out".


> the model(s) they train are specific to your workspace (not shared between workspaces)

That's incorrect -- they're stating that they use your "messages, content, and files" to train "global models" that are used across workspaces.

They're also stating that they ensure no private information can leak from workspace to workspace in this way. It's up to you if you're comfortable with that.


From the wording, it sounds like they are conscious of the potential for data leakage and have taken steps to avoid it. It really depends on how they are applying AI/ML. It can be done in a private way if you are thoughtful about how you do it. For example:

Their channel recommendations: "We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data"

Meaning they use a non-slack trained model to generate embeddings for search. Then they apply a recommender system (which is mostly ML not an LLM). This sounds like it can be kept private.

Search results: "We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy" Again, this is probably a combination of non-slack trained embeddings with machine learning algos based on engagement. This sounds like it can be kept private and team specific.

autocomplete: "These suggestions are local and sourced from common public message phrases in the user’s workspace." I would be concerned about private messages being leaked via autocomplete, but if it's based on public messages specific to your team, that should be ok?

Emoji suggestions: "using the content and sentiment of the message, the historic usage of the emoji [in your team]" Again, it sounds like they are using models for sentiment analysis (which they probably didn't train themselves and even if they did, don't really leak any training data) and some ML or other algos to pick common emojis specific to your team.

To me these are all standard applications of NLP / ML that have been around for a long time.


The way it's written means this just isn't the case. They _MAY_ use it for what you have mentioned above. They explicitly say "...here are a few examples of improvements..." and "How Slack may use Customer Data" (emph mine). They also... may not? And use it for completely different things that can expose who knows what via prompt hacking.


Agreed, and that is my concern as well that if people get too comfortable with it then companies will keep pushing the bounds of what is acceptable. We will need companies to be transparent about ALL the things they are using our data for.


We will only see more of this as time goes on and there’s so little impetus for companies to do anything that respects privacy when this data is power and freely accessible by virtue of owning the servers. If you haven’t read Technofuedalism, the to;dr is that what we once saw as tools for our entertainment or convenience now own our inputs and use us as free labour, and operate at scales where we have little to no power to protect our interests. For a quick summary: https://www.wired.com/story/yanis-varoufakis-technofeudalism...


> Data will not leak across workspaces.

> If you want to exclude your Customer Data from helping train Slack global models, you can opt out.

I don't understand how both these statements can be true. If they are using your data to train models used across workspaces then it WILL leak. If they aren't then why do they need an opt out?

Edit: reading through the examples of AI use at the bottom of the page (search results, emoji suggestions, autocomplete), my guess is this policy was put in place a decade ago and doesn't have anything to do with LLMs.

Another edit: From https://slack.com/help/articles/28310650165907-Security-for-...

> Customer data is never used to train large language models (LLMs).

So yeah, sounds like a nothingburger.


They're saying they won't train generative models that will literally regurgitate your text, my guess is classifiers are fair game in their interpretation


You are assuming they're saying that, because it's one charitable interpretation of what they're saying.

But they haven't actually said that. It also happens that people say things based on faulty or disputed beliefs of their own, or people willfully misrprepresent things, etc

Until they actually do say something as explicit as what you suggest, they haven't said anything of the sort.


> Data will not leak across workspaces. For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of Customer Data.

I feel like that is explicitly what this is saying.


The problem is, it's really really hard to guarantee that.

Yes if they only train say, classifiers, then the only thing that can leak is the classification outcome. But these things can be super subtle. Even a classifier could leak things if you can hack the context fed into it. They are really playing with fire here.


If is also hard to guarantee that, in a multi-tenant application, users will never see other users' data due to causes like mistakes AuthZ logic, caching gone awry, or other unpredictable situations that come up in distributed systems—yet even before the AI craze we were all happy to use these SaaS products anyway. Maybe this class of vulnerability is indeed harder to tame than most, but third-party software has never been without risks.


yes, i certainly agree with you. i think oftentimes these policies are written by non-technical people

i'm not entirely convinced that classifiers and LLMs are disjoint to begin with


The OP privacy policy explicitly states that autocompletion algorithms are part of the scope. "Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions."

And this can leak: for instance, typing "a good business partner for foobars is" might not send that text upstream per se, but would be consulting a local model whose training data would have contained conversations that other Slack users are having about brands that provide foobars. How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token? And sure, one could build an adversarial model that attempts to minimize this kind of leakage, but is Slack incentivized to create such a thing vs. just building an optimal autocomplete as quickly as possible?

Even if it were just creating classifiers, similar leakages could occur there, albeit requiring more effort and time from attackers to extract actionable data.

I can't blame Slack for wanting to improve their product, but I'd also encourage any users with proprietary conversations to encourage their admins to opt out as soon as possible.


> How can Slack guarantee that the model won't incorporate proprietary insights on sourcing the best foobar producers into its choice of the next token?

This is explained literally in the next sentence after the one you quoted: "We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm."

If all the global model sees is {similarity: 0.93, past_interactions: 6, recommendation_accepted: true} then there is no way to leak tokens, because not only are the tokens not part of the output, they're not even part of the input. But such a simple model could still be very useful for sorting the best autocomplete result to the top.


yeah i absolutely agree that even classifiers can leak, and the autocorrect thing sounds like i was wrong about generative (it sounds like an n-gram setup?)... although they also say they don't train LLMs (what is an n-gram? still an LM, not large... i guess?)


There are no data leaks in Ba Sing Se.


This reminds me of a company called C3.ai which claims in its advertising to eliminate hallucations using any LLM. OpenAI, Mistral, and others at the forefront of this field can't manage this, but a wrapper can?? Hmm...


Ah yes, the stock everyone believed in and thought it would reach the moon during 2020.


Right? I take Slack and Salesforce at their word. They’re good companies and look out for the best interests of their customers. They have my complete trust.


> Emoji suggestion: Slack might suggest emoji reactions to messages using the content and sentiment of the message, the historic usage of the emoji and the frequency of use of the emoji in the team in various contexts. For instance, if [PARTY EMOJI] is a common reaction to celebratory messages in a particular channel, we will suggest that users react to new, similarly positive messages with [PARTY EMOJI].

Finally someone has figured out a sensible application for "AI". This is the future. Soon "AI" will have a similar connotation as "NFT".


"leadership" at my company tallies emoji reactions to their shitty slack messages and not reacting with emojies over a period of time is considered a slight against them.

I had to up my slack emoji game after joining my current employer


That sounds literally the same as "flare" from office space: https://www.youtube.com/watch?v=F7SNEdjftno


Yikes! Thats some pretty heavy insecurity signals from leadership. Please like us! Sad.


And it continues:

> To do this while protecting Customer Data, we might use an external model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.

This is so stupid and needlessly complicated. And all it does is remove personality from messages, suggesting everyone conforms to the same reactions.


Finally. I am all for this AI if it is going to learn and suggest my passive aggressive "here" emoji that I use when someone @here s on a public channel with hundreds of people for no good reason.


I would expect this from free service, but from paid service with non trivial cost... It seems insane... Maybe whole model of doing business is broken...


For those considering moving away from Slack: https://matrix.org/ecosystem/hosting/


So if you want to opt out, there's no setting to switch, you need to send an email with a specific subject:

> Contact us to opt out. [...] To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” [...]


Good we moved to matrix already. I just hope they start putting more emphasis on Element X, which message handling is broken on iOS for weeks now.


Element X is where all the effort is going, and should be working really well. How is msg handling broken?


I need to go back to the overview whenever I receive a new message, as the reply form is broken after each message received.


Hm. is this on iOS or Android, and what version? This is first i've heard of this; it should be rock solid. Am wondering if you're stuck on an ancient version or something.


Not the OP here, but I've tried really hard to use Element X and it crashes constantly.


i think i can count the number of crashes i've had with Element X iOS on fingers of one hand since dogfooding it since last summer (whereas classic Element iOS would crash every few days). Is this on Android or iOS? Was it a recent build? If so, which one? It really shouldn't be crashing.


What happened to "when you're not paying, you're not the customer, you're the product.". Here people are clearly paying, and yet, slowly that is being eroded as well.


Never underestimate how imaginative the "numbers must always go up!!1~"-brained folks can be.


> Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com

Sounds like an invitation for malicious compliance. Anyone can email them a huge text with workspace buried somewhere and they have to decipher it somehow.

Example [Answer is Org-12-Wp]:

"

FORMAL DIRECTIVE AND BINDING COVENANT

WHEREAS, the Parties to this Formal Directive and Binding Covenant, to wit: [Your Name] (hereinafter referred to as "Principal") and [AI Company Name] (hereinafter referred to as "Technological Partner"), wish to enter into a binding agreement regarding certain parameters for the training of an artificial intelligence system;

AND WHEREAS, the Principal maintains control and discretion over certain proprietary data repositories constituting segmented information habitats;

AND WHEREAS, the Principal desires to exempt one such segmented information habitat, namely the combined loci identified as "Org", the region denoted as "12", and the territory designated "Wp", from inclusion in the training data utilized by the Technological Partner for machine learning purposes;

NOW, THEREFORE, in consideration of the mutual covenants and promises contained herein, the receipt and sufficiency of which are hereby acknowledged, the Parties agree as follows:

DEFINITIONS

1.1 "Restricted Information Habitat" shall refer to the proprietary data repository identified by the Principal as the conjoined loci of "Org", the region "12", and the territory "Wp".

OBLIGATIONS OF TECHNOLOGICAL PARTNER

2.1 The Technological Partner shall implement all reasonably necessary technical and organizational measures to ensure that the Restricted Information Habitat, as defined herein, is excluded from any training data sets utilized for machine learning model development and/or refinement.

2.2 The Technological Partner shall maintain an auditable record of compliance with the provisions of this Formal Directive and Binding Covenant, said record being subject to inspection by the Principal upon reasonable notice.

REMEDIES

3.1 In the event of a material breach...

[Additional legalese]

IN WITNESS WHEREOF, the Parties have executed this Formal Directive and Binding Covenant."


In case this is helpful to anyone else, I opted out earlier today with an email to feedback@slack.com

Subject: Slack Global Model opt-out request.

Body:

<my workspace>.slack.com

Please opt the above Slack Workspace out of training of Slack Global Models.


Make sure you put a period at the end of the subject line. Their quoted text includes a period at the end.

Please also scold them for behaving unethically and perhaps breaking the law.


The period is outside the quotes though, are you suggesting we should have the quotes too?


The section from the OP I'm referring to is:

    Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.
That includes a period before inside the quotes which would suggest a period inside the subject line.


There's some shenanigans going on here, of course. The section I was looking at says...

To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com with your workspace/org URL and the subject line ‘Slack global model opt-out request’.


I don't think this is shenanigans, just inconsistent use of English rules. I'd be floored if a judge ruled the period was pertinent here; there's no linguistic value or meaning to it (i.e. the meaning of the message is the same with or without it).

I'd be very surprised to hear that Slack was allowed to propose a free-form communication method, impose rigid rules on it, and then deny anyone who fails to meet those rigid rules. If Slack was worried about having to read all of these emails, they should have made a PDF form or just put it on a toggle in the Workspace settings. This is a self-inflicted problem.


That's weak. Be better, Slack.

For reference, I got an email back saying I was opted out and a bunch of justifications about why it's okay what they did and zero mention of the legality of opting people in by default.


We just opted out. I told them our lawyers have been instructed to watch them like a hawk.


Updated!


This is, once again, why I wanted us to go to self-hosted Mattermost instead of Slack. I recognize Slack is probably the better product (or mostly better), but you have to own your data.


> We offer Customers a choice around these practices. If you want to exclude your Customer Data from helping train Slack global models, you can opt out. If you opt out, Customer Data on your workspace will only be used to improve the experience on your own workspace and you will still enjoy all of the benefits of our globally trained AI/ML models without contributing to the underlying models.

Sick and tired of these default opt in explicit opt out legalese.

The default should be opt out.

Just stop using my data.


This feels like a corporate greed play, on what should be a relatively simple chat application. Slack has quickly become just another enterprise solution in search of shareholder value at expensive of data privacy. Regulation of these companies should be more apparent to people, but sadly, is not.

I would recommend https://mattermost.com as an alternative.


So much to "if you are not paying you are the product". There is nothing that can stop companies from using your sweet sweet data once give it them.


Wow. I understand business models that are freemium but for a premium priced B2B product? This feels like an incredible rug pull. This changes things for me.


"We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data."

I think this deserves more attention. For many tasks like contextual recommendations, you can get most of the way by using an off-the-shelf model, but then you get a floating-point output and need to translate it into a binary "show this to the user, yes or no?" decision. That could be a simple thresholding model "score > θ", but that single parameter still needs to be trained somehow.

I wonder how many trainable parameters people objecting to Slack's training policy would be willing to accept.


It's high time to ditch Slack for slimmer, faster and privacy-respecting open source alternatives: https://itsfoss.com/open-source-slack-alternative/


Products that have search, autocomplete, etc… use rankers that are trained on System Metadata to build the core experience.

Microsoft Teams, Slack, etc… all do the same thing under the hood.

Nobody is pumping the text into an LLM training. The examples make this very clear as well.

Comment section here is divorced from reality.


Not to be glib, but this why we built Tonic Textual (www.tonic.ai/textual). It’s both very challenging and very important to protect data in training workflows. We designed Textual to make it easy to both redact sensitive data and replace it with contextually relevant synthetic data.


To add on to this: I think it should be mentioned that Slack says they'll prevent data leakage across workspaces in their model, but don't explain how they do this. They don't seem to go into any detail about their data safeguards and how they're excluding sensitive info from training. Textual is good for this purpose since it redacts PII thus preventing it from being leaked by the trained model.

Disclaimer: I work at Tonic


How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "11 spices - mix with 2 cups of white flour ... 2/3 teaspoons of salt, 1/2 teaspoons of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 70 years


Fair question, but you have to consider the realistic alternatives. For most of our customers inaction isn't an option. The combination of NER models + synthesis LLMs actually handles these types of cases fairly well. I put your comment into our web app and this was the output:

How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "17 spices - mix with 2lbs of white flour ... half teaspoon of salt, 1 tablespoon of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 75 years.


Whatever the models used, or type of data within accounts this operates on, this clause would be red lined in most of the big customer accounts that have leverage during the sales/renewal process. Small to medium accounts will be supplying most of this data.


Consent should be opt-in not opt-out. Yes means yes!


Can we start going a step further by demanding that consent must be opt-in and not opt-out? Requesting isn't good enough.


GDPR got that covered, now it just needs to become global, and enforced.


Is this new? As in, when was this policy developed?


From https://web.archive.org/web/20230101000000*/https://slack.co... it looks like they changed this sometime between 01-Apr-2023 and 18-Oct-2023


Then the date of 5th July 2023 looks likely as this is the date from which overall privacy policy is in effect: https://slack.com/intl/en-gb/trust/privacy/privacy-policy

Interesting choice of date btw


I wonder how many people that are really mad about these guys or SE using their professional output to train models thought commercial artists were just being whiny sore losers when Deviant Art, Adobe, OpenAI, Stability, et al did it to them.


squarely in the former camp. there's something deeply abhorrent about creating a place that encourages people to share and build and collaborate, then turning around and using their creative output to put more money in shareholder pockets.

i deleted my reddit and github accounts when they decided the millions of dollars per month they're receiving from their users wasn't enough. don't have the power to move our shop off slack but rest assured many will as a result of this announcement.


Yeah I haven't put a new codebease on GH in years. It's kind of a PITA hosting my own gitea server for personal projects but letting MS copy my work to help make my professional skillset less valuable is far less palatable.

Companies doing this would make me much less angry if they used an opt-in model only for future data. I didn't have a crystal ball and I don't have a time machine, so I simply can't stop these companies from using my work for their gain.


Why do you think it's a pain to host the Gitea?


Compared to hosting other things? Nothing! It's great.

Hosting my own service rather than using a free SaaS solution that is entirely someone else's problem? There's a significant difference there. I've been running Linux servers either professionally or personally for almost 25 years, so it's not like it's a giant problem... but my work has been increasingly non-technical over the past 5 years or so, so even minor hiccups require re-acclimating myself to the requisite constructs and tools (wait, how do cron time patterns work? How do I test a variable in bash for this one-liner? How do iptables rules work again?)

It's not a deal breaker, but given the context, it's definitely not not a pain in the ass, either.


Thanks for elaborating! I'm a retired grunt and tech is just a hobby for me. I host my own Gitea with the same challenges, but to me looking up cron patterns etc. is the norm, not the exception, so I don't think much about it.


“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: