> For any model that will be used broadly across all of our customers, we do not build or train these models in such a way that they could learn, memorise, or be able to reproduce some part of Customer Data
This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.
It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.
They need to reword this. Whoever wrote it is a liability.
Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.
> How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.
Hm, now you mention it, I don't think I've ever seen this specific example.
Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg
Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.
Sure lawyers wrote it but I'd bet a lot there's a product or business person standing behind the lawyer saying - "we want to do this but don't be so obvious about it because we don't want to scare users away". And so lawyers would love to be very upfront about what is happening because that's the best way to avoid liability. However, that conflicts with what the business wants, and because the lawyer will still refuse to write anything that's patently inaccurate, you end up with a weasel word salad that is ambiguous and unhelpful.
> If you want to exclude your Customer Data from helping train Slack global models, you can opt out.
So Customer Data is not used to train models "used broadly across all of our customers [in such a way that ...]", but... it is used to help train global models. Uh.
To me it says that they _do_ train global models with customer data, but they are trying to ensure no data leakage (which will be hard, but maybe not impossible, if they are training with it).
The caveats are for “local” models, where you would want the model to be able to answer questions about discussions in the workspace.
It makes me wonder how they handle “private” chats, can they leak across a workspace?
Presumably they are trying to train a generic language model which has very low recall for facts in the training data, then using RAG across the chats that the logged on user can see to provide local content.
My intuition is that it's impossible to guarantee there are no leaks in the LLM as it stands today. It would surely require some new computer science to ensure that no part of any output that could ever possibly be developed isn't sensitive data from any of the input.
It's one thing if the input is the published internet (even if covered by copyright), it's entirely another to be using private training data from corporate water coolers, where bots and other services routinely send updates and query sensitive internal services.
You and many others here are conflating AI and LLMs. They are not the same thing. LLMs are a type of AI model that produce content, and if trained on customer data would gladly replicate them. However, the TOS explictly say that Generative AI models (=LLMs) are taken off the shelf and not retrained on customer data.
Before LLMs exploded a year and a half ago lots of other AI models had been in place for several years in a lot of systems, handling categorization, search results ranking, etc. As they do not generate text or other content, they cannot leak data. The linked FAQ provides several examples of features based on such models, which are not LLMs: for example, they use customer data to determine a good emoji to suggest for reacting to a message based on the sentiment of the message. An emoji suggestion clearly has no potential of leaking customer data.
There is a way. Build a preference model from the sensitive dataset. Then use the preference model with RLAIF (like RLHF but with AI instead of humans) to fine-tune the LLM. This way only judgements about the LLM outputs will pass from the sensitive dataset. Copy the sense of what is good, not the data.
If you switch to Teams only for this reason I have some bad news for you - there’s no way Microsoft is not (or will not start in future) doing the same. And you’ll get a subpar experience with that (which is an understatement).
The Universal License Terms of Microsoft (applicable to Teams as well) clearly say they don't use customer data (Input) for training: https://www.microsoft.com/licensing/terms/product/ForallOnli...
Whether someone believes it or not, is another question, but at least they tell you what you want to hear.
I would guess Microsoft has a lot more government customers (and large customers in general) than Slack does. So I would think they have a lot more to loose if they went this route.
Unless your company makes a special arrangement with them, Microsoft will steal all of your data. It's at least in their TOS for Outlook, and for some reason doubt they wouldn't do the same for Teams.
Just don't use the "Team" feature of it to chat. Use chat groups and 1-to-1 of course. We use "Team" channels only for bots: CI results, alerts, things like that.
Meetings are also chat groups. We use the daily meeting as the dev-team chat itself so it's all there. Use Loops to track important tasks during the day.
I'm curious what's missing/broken in Teams that you would rather not have chat at all?
The idea that Slack makes companies work better needs some proof behind it, I’d say the amount of extra distraction is a net negative… but as with a lot of things in software and startups nobody researches anything and everyone writes long essays about how they feel things are.
Distraction is not enforced. Learning to control your attention and how to help yourself do it is crucial whatever you do in whatever time and in whatever technological context or otherwise. It is the most long term valuable resource you have.
I think we start to recognize this at larger scale.
Slack easily saves a ton of time solving complex problems that require interaction and expertise of a lot of people, often unpredictable number of them for each problem. They can answer with delay, in a good culture this is totally accepted and people still can independently move forward or switch tasks if necessary, same as with slower communication tools. You are not forced to answer with any particular lag, however slack makes it possible when needed to reduce it to zero.
Sometimes you are unsure if you need help or you can do smthing on your own. I certainly know that a lot of times eventually I had no chance whatsoever, because knowledge requires was too specialized, this is not always clear. Reducing barriers to communication in those cases is crucial and I don't see Slack being in the way here, only helpful.
The goal of organizing Slack is such that you pay right amount of attention to right parts of communication for you. You can do this if you really spend (hmm) attention trying to figure out what that is and how to tune your tools to achieve that.
That’s a lot of words with no proof isn’t it, it’s just your theory. Until I see a well designed study on such things I struggle to believe the conjecture you make either way. It could be quite possible that you benefit from Slack and I don’t.
Even receiving a message and not responding can be disruptive and on top I’d say being offline or ignoring messages is impossible in most companies.
This is your choice to trust only statements backed by scientific rigour or trying things out and applying to your way of life. This is just me talking to you, in that you are correct.
Regarding “receiving a message”: my devices are allowed only limited use of notifications. Of all the messaging/social apps only messages from my wife in our messaging app of choice pop us as notifications. Slack certainly is not allowed there
Good point, could be that it reduces friction too far in some instances. However, in general less communication doesn't seem better for the bottom line.
I'm not sure chat apps improve business communications. They are ephemeral, with differing expectations on different teams. Hardly what I'd label as "cohesive"
Async communications are critical to business success, to be sure -- I'm just not convinced that chat apps are the right tool.
From what I’ve seen (not much actually) Most channels can be replaced by a forum style discussion board. Chat can be great for 1:1 and small team interactions. And for tool interactions.
They are quite well differentiating generative AI models, that are not trained on customer data because they could leak customer data, and other types of AI models (e.g. recommendation systems) that do not work by reproducing content.
The examples in the linked page are quite informative of the use cases that do use customer data.
Nah. Whoever decided to create the reality their counsel is dancing around with this disclaimer is the actual problem, though it's mostly a problem for us, rather than them.
I'm imagining a corporate slack, with information discussed in channels or private chats that exists nowhere else on the internet.. gets rolled into a model.
Then, someone asks a very specific question.. conversationally.. about such a very specific scenario..
Seems plausible confidential data would get out, even if it wasn't attributed to the client.
Not that it’s possible to ask an llm how a specific or random company in an industry might design something…
Gandalf is a great tool for bringing awareness to AI hacking, but Lakera also trains on the prompts you provide when you play the game. See the bottom of that page.
"Disclaimer: we may use the fully anonymized input to Gandalf to improve Gandalf and for Lakera AI's products and services. "
Sometimes the obvious questions are met with a lot of silence.
I don't think I can be the only one who has had a conversation with GPT about something obscure they might know but there isn't much about online, and it either can't find anything... or finds it, and more.
I think it's as clear as it can be, they go into much more detail and provide examples in their bullet points, here are some highlights:
Our model learns from previous suggestions and whether or not a user joins the channel we recommend. We protect privacy while doing so by separating our model from Customer Data. We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data.
We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy. Simply put, our model can't reconstruct the search query or result. Instead, it learns from team-specific, contextual information like the number of times a message has been clicked in a search or an overlap in the number of words in the query and recommended message.
These suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.
To do this while protecting Customer Data, we might use an etrnal model (not trained on Slack messages) to classify the sentiment of the message. Our model would then suggest an emoji only considering the frequency with which a particular emoji has been associated with messages of that sentiment in that workspace.
Whatever lawyer wrote that should be fired. This poorly written nonsense makes it look like Slack is trying to look shady and subversive. Even if well intended this is a PR blunder.
> They need to reword this. Whoever wrote it is a liability.
Wow you're so right. This multi-billion dollar company should be so thankful for your comment. I can't believe they did not consult their in-house lawyers before publishing this post! Can you believe those idiots? Luckily you are here to save the day with your superior knowledge and wisdom.
This feels so full of subtle qualifiers and weasel words that it generates far more distrust than trust.
It only refers to models used "broadly across all" customers - so if it's (a) not used "broadly" or (b) only used for some subset of customers, the whole statement doesn't apply. Which actually sounds really bad because the logical implication is that data CAN leak outside those circumstances.
They need to reword this. Whoever wrote it is a liability.