I'm not suggesting the concerns aren't valid, but I guess I don't understand why this same principal isn't applied to other internet connected / cloud software? Do these companies worry that web browsers like Chrome could leak data or applications like Google Docs?
What is it about an AI chat bots that makes the risk of a data leak so much higher? Is something about OpenAI's ToS? Or it's relative infancy?
That’s because OpenAI can use any data that you send to chatgpt for training purposes. [0] They don’t do it with their APIs btw.
“(c) Use of Content to Improve Services. We do not use Content that you provide to or receive from our API (“API Content”) to develop or improve our Services. We may use Content from Services other than our API (“Non-API Content”) to help develop and improve our Services. You can read more here about how Non-API Content may be used to improve model performance.”
Honestly, from an executive's point of view, why should Samsung trust openAI? It exists in a community which is famous for skirting rules, the whole move fast and break things mentality isn't just at facebook. Their IP is incredibly valuable and they also have access to a bunch of customer IP. OpenAI needs to establish trust and it takes time. Microsoft spent years working with these companies to make them adopt azure and stuff.
"We do not use Content that you provide to or receive from our API ("API Content") to develop or improve our Services."
That sentence does not claim OpenAI does not use such Content for other purposes besides "developing and improving the Services". For example, using the Content in a manner that potentially harms Samsung's business.
What does "develop or improve our Services" even mean? There is no definition.
Third, how would anyone outside of OpenAI know how OpenAI uses the Content. For example, what if OpenAI was using Content from Services other than the API for purposes other than "to develop and improve our Services" (assuming anyone could prove what thet even means). How would anyone outside of OpenAI discover this was happening?
If we search these "ToS" for phrases like "You will" or "You will not", we see that users make promises to OpenAI. However if we search for phrases like "We will" or "We wil not", we see that there are no instances where OpenAI promises anything. IMHO, these "ToS" are better characterised as "ToU". Not to mention being found at "/policies/".
As a ChatGPT user, OpenAI does not owe you anything, unless perhaps you are Microsoft. For you, the "terms" can change at any time, for any reason, without any prior notice.
Let's imagine some far-fetched scenario where someone inside the company leaks information that suggests OpenAI is using Content from Services other than the API for purposes other than "improving or developing the Services". Then what?
OpenAI has not promised to refrain from using Content for certain purposes. There is no breach of these ToS if OpenAI uses the Content for whatever purposes it desires.
Maybe Samsung could claim something like (a) OpenAI misrepesented facts in their ToS, (b) that induced Samsung into using OpenAI, and (c) as a result Samsung suffered harm. Needless to say, claims like that are difficult to prove and any recovery is limited. Whatever creative legal claims Samsung could could up with, none of them would fix damage already done to Samsung from its employees having used OpenAI.
>These Terms of Use apply when you use the services of OpenAI, L.L.C. or our affiliates, including our application programming interface, software, tools, developer services, data, documentation, and websites (“Services”).
Though, they don't define "developing and improving".
Based on the way things have gone with them, it kind of leaves one with the impression that they are building the Dolores Umbridge of AI's and if that becomes AGI first, it will be complete hell to live on this planet, or anywhere the AI can find you really.
There is an option in ChatGPT that you can use to turn that off.
> Chat History & Training Save new chats to your history and allow them to be used to improve ChatGPT via model training. Unsaved chats will be deleted from our systems within 30 days.
The default is your chats are their property to train on and use as they wish.
If you dig into the Settings you can disable sharing chats with OpenAI, but you lose access to just about every feature including saving chats. You can only have one chat at a time, and if the window is closed or refreshed your chat is wiped. it's kind of like if opening a "Private Browsing" window prevented you from having a regular browsing window open and also had no tabs.
For some reason, they still retain your chat for "up to 30 days" despite not letting you save or access it after the page is refreshed.
The 30 days part is most likely a legal compliance bit to cover their asses if their backing data systems ever take a big dump.
They need to use an asynchronous system to be tracking when your chat becomes "finished" and then likely queue it up to a system looking to propagate deletes. They have to choose some kind of SLA on that and probably went with a common data privacy user data deletion window of 30 days.
But hasn’t Google literally admitted to using user data for training purposes? I remember this being a big deal with Gmail and personalized ads a while back. Personally, it sounds to me like another case of “We think new technology is scary, therefore it’s banned”.
I have seen companies have rules about (or against) cloud computing in general. I remember when decent web-based translation services first came out, and the guidance from BigCorp was to not use them for anything work-related.
From what I personally have seen, this sort of guidance remains. When companies do use things like Google Docs or Microsoft Office365, they likely have some specific contract in place with Google / Microsoft / etc., that the company's legal team has decided they are happy with.
I anticipate that the same will eventually be true of ChatGPT and such, that there will be some paid corporate offering with contract terms that make the company lawyers happy.
Most of my career has been with larger companies, often with high data sensitivity; I can easily imagine that some smaller and/or less data-sensitive companies might not care about any of this.
I work for a company who for a very long time was strongly opposed to employees using any cloud-based infrastructure, including OS or programming language package managers (eg. apt-get, Pip for Python, etc), opting to host their own instance if possible and disallowing usage if not possible. IT did finally cave and switch to Office365, which has been slowly opening the floodgates to other services being allowed.
The gets into some really interesting corporate governance issues.
The cloud is a terrible bet for many large companies. The benefits are minimal while the risks are huge, however what’s in the best interest for the company is only tangentially related to what happens.
It’s really difficult to ensure companies actually take low probability risks seriously. A 1% chance to lose 10 billion dollars is an easy bet for upper management to make when their personal risks and rewards don’t line up with the company’s risks and rewards.
As somebody who is completely puzzled why you could think that the cloud is a terrible bet for many companies - are these companies allowing their employees to use Google Search or other parts of the internet or do they have to look up things in paper books in the corporate library?
Let’s not pretend Google search is what people mean when they say “cloud.” It’s about running internal processes on external systems.
As to the risks, many companies live and die by their internal secrets. These range from a private keys, customer lists, trading strategies, and similar trade secrets to actual serious R&D efforts.
Sometimes the damage is obvious such as crypto exchanges suddenly finding themselves broke, but corporate espionage can also be kept quite. Losing major government bids because a competitor read some internal memo’s is a serious risk and you may never know.
It’s much harder to reconstruct actionable intelligence from a huge stream of people using Google search even if they’re using it to preform sensitive calculations.
While theoretically possible for something bad to happen, short phrases are much less likely to have actionable/sensitive info in them than the full document corpus of a company.
This is already in the works. Microsoft offers an Azure GPT4 service for business. They already offer a business version of CoPilot as well. I have not gone over the details but I imagine the usual business and support agreements will be in place for the business tiers.
> Do these companies worry that web browsers like Chrome could leak data or applications like Google Docs?
Yes they do. Where I work the whole google office suite is blocked from inside the network (you have to use MS Office). ChatGPT is blocked. Most web apps that you can copy text or data into are either blocked, or we have an agreement with the provider, or (for open source) we have an internal on-prem fork.
From TechCrunch on March 1, 2023: "Starting today, OpenAI says that it won’t use any data submitted through its API for “service improvements,” including AI model training, unless a customer or organization opts in."
So prior to that, they were willing to use your data for model training. Every service may have leaks/security issues, but few say they'll purposely use your data. OpenAI probably should've promised not to use your data from the beginning; it'll be a hard perception to change now.
Through their API is different than using their web based interface where you specifically have to opt out of allowing them to use your chat for training.
It is, firstly there are legal agreements when you’re an enterprise user of such solution then there are various tools like DLP solutions that integrate with cloud/SAAS services such as Google Docs or Officer 365 and lastly there are CASB solutions that allow you to control how corporate users use those solutions in the first place.
E.G. you’ll probably be able to use the corporate account to sign into the corporate Google Docs or O365 instance but if you try to sign into your own it would be blocked and likely also reported on so you might get a call from SecOps down the line.
OpenAI currently offers none of it and more importantly it openly uses the data that users submit to it as well as the responses for additional training and any other purpose they might come up with.
As for browsers these are also often also configured not to send data outside of the company and yes it’s possible. Windows 11 web search and other features would also likely be disabled on your corporate device.
This probably isn't at the top of the list of serious concerns, but one problem that's kind of unique to AI is that in general, AI-generated content isn't eligible for copyright protection. Companies might worry about losing copyright on certain things if someone finds out that a lazy employee didn't actually create the content themselves.
Companies generally tend to be wary of cloud services due to data leak concerns. At the very least, they like to be in control of the decision about which services are approved and which are not.
It's about leaking private/proprietary company information. It's not about features.
How will that private/proprietary information be used by OpenAI? Does it include NDA information from another company that they don't have the right to share? How secure is the information stored (think industrial espionage)? There is a lot that needs to be taken into account that even goes beyond this.
With Google Docs, MS Office, Atlassian etc you get a real software product with engineers paid ~$300k per year to fix bugs.
With ChatGPT, you get researchers paid over $1m per year [1] to use you as a training data source and ship stuff with basic bugs and then "feel sorry" when stuff breaks:
https://www.theregister.com/2023/03/23/openai_ceo_leak/
Another position for those like Samsung: preventing ChatGPT use encourages incubation of internal competing solutions.
It’s shadow IT. You are not supposed to leak confidential company data to any other company unless you have the appropriate vendor agreements in place. It’s like loading the source code to Dropbox when you are supposed to use bitbucket.
Don't know about chrome, because it's an application and not itself harmful, but yes they think Google docs will steal their data. Or any other cloud service. They are all banned for employees. I'm surprised that anyone is surprised by this.
I don't think it is at all clear that OpenAI wont use data that is put into it for unclear purposes, and I don't think they have a corporate account feature to guarantee prompt privacy.
We've seen demonstrations of models being tricked into sharing details about their setup or training data. If they are to be trained on what is shared with them then that data could be procured by an attacker.
I would have concerns about Google using my data but I wouldn't be concerned that the data I enter could easily appear in someone else's spreadsheet.
My company explicitly blocks the use of non-corporate controlled cloud products for obvious reasons. All it takes is one person to post an Excel document incorrectly to cause a major incident.
Companies banning the use of ChatGPT level tools going forward will find the rules either flouted, subverted or the employees going elsewhere.
Of course there is a duty on employees to be professional - the latter will be the ones taking up opportunities at non-legacy/dinosaur corporations that think they can command the waves.
The answer is to sort your processes, security and training out - new AI is here to stay, and managers cannot stop employees using game-changing tools without looking very foolish and incompetent.
Why? It’s a valid concern in my opinion. You’re feeding OpenAI your intellectual property and just hoping they don’t do anything with it. I have the same concerns with Microsoft’s TypeScript playground
It is a valid concern if you send an entire list of confidential data and ask it to transform that list. However if you ask ChatGPT some questions about coding in general it's no different than searching online.
This comes down to whether you trust your whole organization to act be educated about these issues and be good at judging what's ok.
At the size of Samsung that's just an impossible move, and it's easier to blanket ban a problematic service and have employee request exceptions justifying their use case.
BTW I've been in companies that blanket ban posting stuff online, and got posts security reviewed when asking help on vendor community forums. That's totally a thing.
Yes, 20 lines of transforming jsons from one form to another are exactly what OpenAI employees are looking for in all the data they're gathering. How will my company survive after they get their hands on this?
Good opsec/appsec requires doing things that seem unnecessary. And it depends on the context. Passing a private key or raw text customer password to any type of online tool is never a good idea.
It’s not clear whether ChatGPT and the likes would increase productivity at the organization level. And I am talking about the current GPT-4, not some hypothetical AGI. From what I have seen, a large swath of usages are basically just people DDOSing their teams with a lot of words. Things like someone in a marketing team prompting for a “detailed 10-week plan with actual numbers” that naturally have no basis on reality, but will take a lot of effort from their team to decipher the bullshit. Likewise there are also generated hundred lines of code with tests that are subtlety wrong.
Basically the challenge is fairly straightforward, if one side is machine-generated, and the other side is human-validated, the human loses 100% of the time. Either the machine has to be 100% accurate or very close to it, or the human needs tools to help him. As it stands, neither of those conditions is here yet
Today I was asked by a coworker, "why don't you use chatgpt." He had just asked me a question that chatgpt had failed to answer to satisfaction. I was able to find a complete and strongly justified answer within 30 seconds. But I suspect he'll continue using the tool. I'll continue thinking for myself.
All these arguments about how it can be wrong are literally identical to the arguments I heard from ignorant people between 2001-2008 on why search engines were unsafe because Google could return wrong answers or steal your data.
IMO the only people crapping on AI either don’t know how to prompt it correctly, aren’t very creative, or frankly weren’t doing that much work in the first place and feel threatened by it. I understand the need to protect intellectual property but 10 years from now there will be two types of companies: Those that embrace AI, and those that crashed and burned.
There has always been two types of companies: those that are run well and have process to evaluate tools as they fit the business values and needs, and those that are full of grifters, chasing whatever hype is trending at any given moment. Unfortunately, the latter type of companies is far from crashing and burning (or at least not quickly enough).
You are neither arguing in good faith, nor even engaging in my argument at all. Please try not to do that. Your two paragraphs amount to nothing more than "if you are criticizing GPT-like tool, you are an idiot", that's not even an argument.
Edit: my comment might seem out of place now, but the comment I was replying to originally had this paragraph:
> I write a lot of code with it and it’s extraordinarily, obviously clear that when prompted right, it increases productivity 2-5x. My company recently banned it and it has been excruciating. Like going back to coding before StackOverflow and Google existed.
Give it the context it needs. If you have internal APIs give them a .proto file or whatever you use to define an API. Phrase the question in a way that forces it to explain its thought process. Use chain of thought reasoning. Find documentation online, give it the documentation, then give it your code, then ask it to do something. Frequently spawn new sessions, if ChatGPT is repeatedly giving bad answers start a new session, modify the context, and ask in a different way. Basically you have to brute force it to get it to give you the answers you want on harder problems.
this is well-said, in the case where correctness is required. But low-trust communications via low-profit activity like tech-support, cheap ads, help lines for government, and even some professional services.. won't care; Employers gain and employees are redundant, next! In those businesses, it is an old story that the cheapest, cruelest and most-scofflaw company wins.. not even debatable that is true.. so here comes MSFT to sell it to you.
Companies banning the use of ChatGPT level tools going forward will find the rules either flouted, subverted or the employees going elsewhere.
Why? Companies typically have many rules that they expect employees to follow. Why would employees disregard these particular rules, or even quit because of them?
Companies can have reasonable cause to block things, or require processes for installing software, etc., but when those burdens become too much time or effort employees will find a way around it.
Almost a decade ago, the company I worked for didn't have good wiki software OR a good request system. My team had a linux server for the purpose of some primitive monitoring and automation of our systems. Apache was already installed...
Within a few weeks we operationalized a new dokuwiki installation, and not long after that we built an internal request system based on Bottle.py (since it didn't require any installation, only a single file).
Seeing that GPT-4 is so incredibly useful to the people I've heard talk about it, there will be employees trying to use it to increase their code quality, communications, planning, etc.
My current employer put out guidance specifically stating not to put any proprietary code into it, no matter how small, nor any confidential information of any kind (don't format your internal earnings email with it, for example).
That seems reasonable, and recognizes how hard it will be for employees to go zero-tolerance, especially if they don't have total network control over work endpoints.
Because it quickly became a tool that was so useful that I feel I am doing my job better with than without now that it’s available. Similiar to how I would disregard rules and/or quit if I was not allowed to use my operating system of choice at work or was denied the use of a specific tool integral to doing my job (well).
Ah, so you are not disregarding this particular rule, but you disregard all rules that you feel impede you. That answers my question, thank you -- this has nothing in particular to do with GPT tools.
No. I follow many rules, even if I like them or not. I picked one simple example of a rule I would consider a dealbreaker for my employment to illustrate how many people have already started using ChatGPT so heavily in their workflows that they would consider it detrimental if their employer took it away from them. But yes, it might not really have anything to do with GPT specifically besides it apparently being a very useful tool for a lot of employees, given that some would at least claim to quit their jobs over being prevented from using it.
Thank you again. I did not mean to claim that you disregard literally all rules, and I apologize for coming across that way.
I find your explanation sound and reasonable. There are many rules that, even if not necessarily liked, are not sufficient grounds to do anything about. But sometimes rules may impede your workflow so much that you find it preferable to either quietly work around the rule, or even to quit.
I appreciate your apology as it sort of came across that way.
I'll give you a real-life anecdotal example to expand a little on my point. My buddy is a front-end developer for a company which produce pretty basic "stuff" (sorry, I don't know anything about front-end) according to him. He says that he's gotten lazy and unmotivated to do anything about it. This leaves him unchallenged and he doesn't really like his job. Once GPT arrived, he's been able to (according to himself) reduce 70 % of the boring boiler-plate code type work he has been doing for years, by making GPT write it for him, and him just verifying it works. This has ultimately allowed him not only to focus on taking on more interesting projects where he can challenge himself, but also spending a lot of the time he previously spent writing "bullshit boiler-plate code" in learning new and more challenging front-end things.
I can easily imagine people in other jobs, in IT or perhaps in other fields already using GPT to reduce the boring parts of their jobs. I can genuinly not recall having heard anyone say a new IDE or any other tool since the arrival of the computer itself reduce their "boring work" load this signifcantly. So I think at this point it is reasonable to assume that access to GPT will become considered as commonplace as having access to a computer or email (given you work in a field where those are considered basic/primary tools of course), and that employers will have to adapt. If not, people will disregard rules / go "shadow IT" or even consider quitting.
Perhaps the fact that I work exclusively on the front-end is why I also derive tremendous value from GPT-4, and I have been perplexed by others saying they find no value in GPT-4 for coding. There is so much boilerplate BS that GPT-4 just nails down and lets me move on to bigger things.
Just yesterday, I needed to mock up a quick prototype for a new feature we're developing. I just paste in my existing React component (and it's all front-end code with no sensitive/proprietary information), tell GPT-4 what I want it to do, and it does it.
Is it perfect? No. Does it sometimes get things wrong? Yes. But it's still easier and faster to help guide GPT-4 and tweak its final output than to have done it all myself.
simple reason. I can refactor the codebase in a dozen different ways in a matter of seconds and choose the best one to work on. I can a summon a large volume of unit tests, descriptive logging statements etc. I can also just dump the logs and it will 9/10 times tell you right away what the issue is and how you can resolve it.
i.e basically you can do a lot of work in just a matter of hours. Once you taste the productivity increase by integrating AI into your workflow you will miss it if it is taken out.
not mention you can build all the handy little tools in a matter of seconds that will make your daily life way easier.
Employees have been prevented (by rule) from doing things that would make them more productive for a long time. For a trivial example, programmers who are proficient in Emacs not being allowed to install and use Emacs.
I still fail to see why employees will now choose to disregard this particular rule, and either disobey or quit.
So would I. Funnily enough I thought it might happen at a job in the past so I've thought through the "would you quit if work banned Emacs" question quite a bit.
They actually can. But not every business is competent. It is completely irresponsible to query public services with internal company intellectual property or any type of information that may breach your contract that you signed with a responsible and competent business. It is trivial to track you do on business equipment and if you think you're smart by subverting that by querying public methods through other means, that can and will be solve soon. Someone is going to make a lot of money deploying an AI system that can retroactively track these things and when that happens, for better or worse, hope you were not irresponsible and worked for a business with the technical acumen to trace you. It's not a matter of if it can be done, with today's tech. It's a matter of whether the business you worked for has the means and willpower to do it.
My advice is follow best practice and wait for an official company policy detailing the use of these new services. Otherwise you may find yourself in legal trouble years from now when the traces you left can easily be uncovered by technology that does not forget.
That's strange to me. I'm employed in order to receive a paycheck. If receiving my paycheck is contingent on me not using ChatGPT, then so be it, what do I care?
Employees want to work with good tools and do interesting work. I'm at a small startup and get to spend a lot of my time working with AI - figuring out how we can use it internally, working out where we can integrate it into our product and using it myself to prototype things and automate some of our business processes.
I am hugely fascinated and impressed by AI, and the fact that my work is paying me to spend time using this awesome tool in a real world context is suuuuuuuper good for my job satisfaction.
Depends. If suddenly my company said I was going to work on some deadend legacy crap for the next 5 years, I'm going to nope out ASAP.
If you get fired/quit and any other job you're looking at is going to have you interacting with new languages or AI workflows or something like that you have to assess what value you're losing by working for that company and the risks associated with it.
> I'm employed in order to receive a paycheck. If receiving my paycheck is contingent on me not using ChatGPT, then so be it, what do I care?
I can be employed to receive a paycheck by employers that give me freedom or employers that take away useful tools.
Why do I stick with the employer that gives me less freedom? Not to mention, getting a new job almost always drastically increases the size of said paycheck.
Pretty much the thought-devoid "It's new, so it must be good!" argument people have been pushing for centuries, whether it's music or technology or politics or fashion.
Companies banning the use of ChatGPT level tools going forward will find the rules either flouted, subverted or the employees going elsewhere.
If my employees are leaking company information through ChatGPT, I'm happy to have them go work for my competitors and leak their information, instead.
Pretty much the thought-devoid "It's new, so it must be good!" argument
If you think people are just hyping ChatGPT because its new without further reflection, you have stunningly missed the moment and have a rude awakening coming.
I was with you until you said employees going elsewhere. There will be a group who needs to use it because they can't function without some AI assisted help but people working at the company know how to do their jobs already why leave?
Do you think regular employees care all that much about ChatGPT? It's a cute toy. If my employer says not to use it for work data, that's just not a big deal for me.
Funnily enough our company will start to roll out an internal ChatGPT UI with an agreement with Azure on not to use the data for training etc, where we would be allowed to share internal data.
Probably best to nip "shadow IT" in the bud with an easy-to-use approved internal ChatGPT connection, rather than hoping that employees don't try make use of the tool surreptitiously.
Samsung may attempt to ban use of the tool for their employees because they're worried about loss of internal data, but when the tool hugely reduces the workload of those employees, they'll find a way to use it anyways.
Just rolled it out internally at our company as well. It’s nice to have access to GPT-4 without having to pay for it myself (those queries and responses get pricey if you use it a lot).
Super helpful for boilerplate code. Right now, we have a limit of 500 tokens though, so it somewhat limits asking more complex questions.
Interestingly, our internal team supporting it has added a ton of initial prompts depending on what vertical of our business you want to ask about. Kind of neat, as I haven’t tried to get the AI to pretend to be something else as of yet.
(Because I mostly just ask it to write various things to the tune of the “Fresh Prince of Bel Air”)
> Just rolled it out internally at our company as well. It’s nice to have access to GPT-4 without having to pay for it myself (those queries and responses get pricey if you use it a lot).
When you say use it a lot do you mean typing in queries yourself or programatically and/or recursively generating prompts like autogpt?
Because I thought one day I had used it a lot by typing in lots of messages. The next day my usage was less than a few dollars.
I actually don’t know. We just rolled out access to LLMs internally for the first time within the last week and both models were present when I was granted access.
The service has a Studio[1] where users can utilize the Chat capabilities. Our understanding (and approach) has been to utilize this to curate and train the GTP models and capabilities and then expose those to the wider org via other, already utilized UI (like MS Teams).
I am looking into open sourcing a PWA ChatGPT client for this. But I am not certain how to handle authentication without a backend or each user needing to add their OpenAI key
The market is definitely there for enterprise LLMs. Everyone is using GPT for work. I use it to provide stubs for memos and to brainstorm - but the real value comes from replace internal “tribal knowledge” with an AI who knows your org in and out.
It kind of boggles my mind that there are people who arent using LLMs yet.
Sure its not everyone, but the people who arent using them are signaling a major red flag IMO.
They are resistant to change, even if they don't understand the technology, what else are they resisting from their managers/leadership team? Further, I think of the people in my life who have refused to even try it, they all seem to have a screw or two loose, even if they are making 200k/yr successful.
All IMO of course, but in tech, I imagine something needs to be 'off' to never try it.
EDIT: Seems I'm getting criticism from people who are using it for inappropriate use cases. I don't use a screwdriver to hammer nails.
They're wildly inappropriate for most things. For similar anecdata, see blockchain fever where everybody shoehorned it in wherever they could, even when a traditional database made more sense.
Consistent conditional logic makes more sense than a risk-laden hallucinating LLM for a lot of workflows.
"Everyone" doesn't need to hammer nails because there's more than just one career and industry. The acceptable quality of the job output varies drastically too.
"It kind of boggles my mind" that people can't see beyond their own life.
I can't say I was impressed with ChatGPT help when I tried it. I figured quizing it on reading comprehension would be a great task, given that it is a language based model and a skill seemingly in short supply amongst my coworkers and self. After confirming that the specifications of a standard I am implementing were within its knowledge, I tried to have it explain the difference between two parts and it failed so miserably that its understanding of the content was below even my managers for whom this is only something they occasionally review. Any attempt to correct it only resulted in it providing an apology and new misunderstandings. Outside of work, I tried using it to find an old movie, probably from the '60s, about a man refusing to shave his long beard and featuring a scene with him being chased around his home half shaven, but it merely made up scenes about beard shaving for several other movies. Admittedly, I have not tried uploading any of my companies code to give it a less memory based task.
I think reading comprehension is a notable weakness - asking it detailed questions about a long text comes up with lots of hallucinations in my experience.
But it's definitely good at some other things. Writing boiler plate texts of various sorts and giving instructions on how to do certain things notably.
It seems to mostly synthesize common knowledge rather than learning anything. But that can be very useful, a lot of people's job involves doing things like that today.
It's great for generating sample code snippets or refactoring code, but I can't paste my company's intellectual property into it
If I could train a customized version of it on all my company's Slack messages, Jira tickets, e-mails, etc it'd be insanely useful . . . . but I don't think any big company would actually want that, since it wouldn't be able to keep secrets from anyone with access to it
> It kind of boggles my mind that there are people who arent using LLMs yet.
maybe it is easier to go through actual verified information than to double check everything an AI says.
I only use LLMs to restate information that I can half piece together so I can remember the missing bits (like a math proof or derivation), or to point me to recommendations of actual resources. And even those two things i am very wary off.
I think people have different jobs and different skill levels. For some it gives them a boost for others it slows them down. Translating what you want exactly into english is a different way of producing something for many. Some people are really smart and they don't need a calculator. No shame in using a calculator that mostly works.
We already have something developed like that in our company (~30 pax employee owned wealth management firm). It's.... interesting.
We currently use GPT-4 combined with an internal knowledge base we had earlier since the beginning, and we practically have to fire our chief of staff and the admin team. Just kidding, but it's made her team's work a ton easier that she can devote more time to the nitty-gritty hard stuff.
The interesting part is that I had a bit of a personality touch added as part of its context, so the AI's character is quite.... villainous.
Enterprise self-hosted ChatGPT is going to be huge.
I am about to release a self hosted GPT that works with OpenAI and Azure OpenAI. It has several enterprise features, mostly around authentication/authorisation. I'll let you know when it drops or you can be a beta tester if you like!
Noted. Just a few questions, what do you mean by "works with OpenAI"? I thought those were closed systems, so is the system basically pretrained and the weights saved? I'm pretty sure that if it were even possible, it would still be misuse per their terms and conditions?
Currently we use an initial semantic search for context injection, which is then passed to GPT for completions. If any LLAMA company were to make that second pretrained bit self-hostable for some license fee, I know a bunch of companies in finance of all sizes which would readily pounce on that tool. But I'm fairly certain that's not what Open AI wants to do.
Yes they do. Also, these are completely without any of the safeguards that the public instances have. This 'on the record' by a Microsoft regional CEO that was pushing this pretty hard.
As of now, these chatbots still make some subtle, egregious errors in code that could easily create more bugs than handwriting if you don't thoroughly audit the output.
It's wise to ban them until they improve or naive users get more instruction how to use them properly. And from experience with some Samsung products, they could do with tightening up their code QA standards a bit.
Code reviews aren't meant to prove correctness. When I review code, I assume the author thought about it, realized this is the best solution, and tested it, all not happening with the models.
That's what linters are for. Each of us has a limited capacity to take critique from other humans, before we become irritated by it and just ignore it. Therefore, I've learned to leave things that can be caught by automated tools to automated tools.
When I look at a PR, there are two things I try to validate:
1. Do I understand what the code is supposed to do, and why the author chose to do those things that way. If not, then either the code need rewriting (because it does things in a way that shouldn't be done,) or more commenting (because it ain't obvious), or both.
2. Catch anything that is obvious to me, but not the author, or to the linter. Maybe there's a different function that does a thing the author didn't know about? And while doing that, it's important to keep context in mind. You could write the kind of code that would be very smart, but can't be understood by most, and that's code is just as bad as a very dumb code (see (1)).
If I have to catch obvious errors, than the author didn't do their job.
If I have to catch spelling mistakes or formatting issues, than the linter didn't do their job.
My job is to catch anything that neither the author nor the tools can.
Which works perfectly fine until someone leaks the fact that NSA have full access to those documents, and occasionally will provide help to a few US companies that are seen as important to the nation.
It is hopelessly naive to question this, however some random documented examples in recent times: The NSA got the german BND to spy on european company Airbus. [1] Then there was the Snowden leaks that revealed for example that the NSA was spying on an Brazilian oil company. [2] Then there is the well known case of Shell infiltrating the Nigerian government who then asked the US to spy for them on rival company Gazprom, which was revealed by the diplomatic cable leaks. [3]
Siemens and some other German majors claimed the NSA stole thier trade secrets related to wind turbines and gave them up some American companies. I think they claimed it was done via wire tap.
I'm not sure if this claim was ever validated but it seems wild those fairly boring companies would come out with it over nothing, whereas it is related to of the US services stated mission.
US government access to thier companies customers files has been a compliance issue in Europe for a while because it is hard to claim GDPR compliance if your vendor might be required to leak user data outside the judicial system. This is why privacy shield treaty was needed.
This has been valid even decades before internet took over the world, I recall quite a few articles listed well known cases some years ago. Sure it wont push for some small startup but big corporations, sales of commercial planes, military equipment contracts and similar stuff for billions and more?
CIA/NSA would be failing at its core real mission if they didnt help US interests when they see an opportunity. And with us-based cloud they dont need to hack anything remote, just fill another form and go again in.
Makes me think, having strong privacy laws like ie in Swtzerland is a massive win for given country and its citizens in long term.
Because whatever you send to ChatGPT may be future training data for their AI model. Mixing up your data with other customer's data is not really part of Slack's business (yet?).
> When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models. You can switch off training in ChatGPT settings (under Data Controls) to turn off training for any conversations created while training is disabled or you can submit this form. Once you opt out, new conversations will not be used to train our models.
Not sure why you are attempting to conflate a LLM product offering vs. these other companies (except maybe GitHub copilot which only has public repos as training data). What you enter into ChatGPT becomes part of its training data unless you use the extremely recently available opt-out option. Once it becomes part of the training data it will end in many outputs to entities beyond your control. OpenAI doesn't have enough safeguards to prevent every single instance of such leaks and it is infeasible for them to implement safeguards for every single instance of such leaks.
Not following. Some eu data are not meant to be stored on aws zones outside the eu and if and where that is a concern aws is not used at all. Neither is azure or google cloud. Those “100 other saas” are meant to follow clear rules on how data is processed else they get stricken by gdrp.
Not all SaaS companies have security postures equivalent to Amazon/Google/Github/MS. Most SaaS have a fraction of a budget as compared tech giants to deal with security and privacy regulations.
Some people upload their entire private lives, personal and family photos to iCloud. Several celebrities were target of iCloud leaks and their nudes ended up into 'the fappening'.
There is a difference between me choosing where i upload my own data vs a bank worker uploading sensitive data all over the web. There are contractual agreements at least in the uk that customer data would not be sent or stored in the us for instance.
Any hacker group would love to boost about it for the street cred or try to monetize it if they ever did, or there would be at least some chatter on the darknet about it which would be reported on in the media anyway.
Nation states wouldn't, but it's safe to assume powerful nation states are already in every cloud provider, at least the US glowy boys anyway according to Snowden.
> Any hacker group would love to boost about it for the street cred or try to monetize it if they ever did, or there would be at least some chatter on the darknet about it which would be reported on in the media anyway.
And you are sure no hacker group has boasted, anywhere on the darknet, about hacking Microsoft? And even if that were true, you take that as strong evidence that Microsoft has not been hacked?
Wonder what the searches reveal about someone? What's worse: search history or DNS logs? I guess it's pretty moot if it's all going to the googs with their DNS and Website Analytics + Ad networks.
There was a "Show HN" post on here that was planning on using this type of AI to help doctors, basically letting them pipe their patients' medical data into an online AI chat.
The way basic privacy and secrecy is ignored in this space is staggering.
The whole space is built around it. The reason is simple. These machine learning systems need massive amounts of data to calibrate for accuracy. They are not intelligent to somehow learn on their own. So machine learning companies have figured they can do an aggressive marketing campaign to make people think these are intelligent and uncontrollable to mask their never ending hunger for our data, hoping everyone is ignorant enough not to catch on.
What remains to be seen is whether we come back to our senses soon enough.
A lot of “experts” working in ai are warning about dangers but what they omit telling is that it’s the humans who own ai that are dangerous.
That’s just plain wrong. Any AI expert worth their salt will tell that the humans who own, control and operate the AI is the primary danger, at least the way it works today.
There’s a ~3000 word prompt limit so while it’s bad whole user datasets are unlikely to have been uploaded… the issue will come when ChatGPT work out how to do live training from prompts and other user interactions, I suppose it will be filtered but nothing is 100%.
Doesn't necessarily apply to data being processed with plugins. The "Code Interpreter" plugin lets you upload files up to 100 MB for use by AI-generated Python scripts; see https://www.oneusefulthing.org/p/it-is-starting-to-get-stran... for examples (including useful output from vague prompts like "please characterize this dataset", at least if you get lucky).
Always have been. I've been working on healthcare project, they haven't even shared all the repos with us, for data leakage concern, but they were using free web tools to format jsons and things like that. Boggles the mind.
After skimming through the article it's not too clear to me what the misuse was. My best guess is entering "personal or company related information into the services"?
>My best guess is entering "personal or company related information into the services"?
I'm not really familiar exactly on how ChatGPT works but does it get trained on input data(search queries)? People are also "leaking" their personal information to Google when they search for something personal like health issues, financial issues, family issues etc.
What is the privacy policy of these chatbots afer-all?
My employer (big tech co using AI internally already) just blocked it too. They put out a very well thought out explanation why.
We're investigating standing up our own internal model for internal use to control any risk of our IP leaking out and to be able to vet the training data.
I see lots of casual mention of using it for assistance in writing code.. we view that as fairly dangerous in terms of the risk of accidentally including snippets of open source code.
There are many risks to this. Personally my own experimentation (for coding) was pretty ambivalent as to what it can/could actually speed up. Chat GPT is not that fast and it takes time to keep refining what it sends back to you. The window of what it actually helps with seems small right now.
I'm working on an open source enterprise self-hosted LLM integration, pretty sure many people are these days. The difference will be on the business side, rather than the LLM side, as they are quickly becoming commoditized, at least on the open source side, since they are free to use by anyone.
I'd also say that one needs a moat in order to succeed; you can't just provide the LLM, since anyone can do that, you need to provide something more that works even without any AI at all.
How about outside of any workflows, and just one off? I've used ChatGPT twice recently after spending a good amount of time Googling around and finding nothing (specifically https://chatgptonline.ai/chat/), generalising the statements/removing any/all real data.
1. given the following string [...], build me a regex that extracts the [abc] before [`], until the 2nd [xyz], used for extracting a bunch of info from an array
2. give me a list of common beneficiaries, then give me 5 more
Took an hour or so of Googling, then about 10 mins to find an online/open ChatGPT prompt, and about 2 mins to implement the answer in my code. But that's where I draw the line, I'll never use an editor that uses AI in my actual IDE, or expose my code openly to train models
I think any use of GPT or Copilot is a misuse for companies like Samsung. Let one idiot to ask questions about little chunks of code and another idiot uploads a full codebase.
>ChatGPT is a viral AI chatbot that is trained on huge amounts of data and is able to generate response to user queries. It is a form of so-called generative AI.
The lazy use of viral when talking about computer tech here annoys me slightly.
One of my colleague was pasting parts of /var/log/messages into chatGPT trying to debug some sshd related issue. Is this safe? He said he was masking hostnames before pasting, would that help here?
I hope you're joking here! If not, I would fire him on the spot. If for nothing else, he should not be messing with sshd debugging if he doesn't know what he is doing.
Need legislation to limit and control what data goes into LLMs. People need to be paid or at least always have a choice when it comes to being input for training data.
Yea, best to lean on legal, who will ask "who owns intellectual property written by openAI?" and look at the USPTO decision and the supreme court refusal to grant cert[1] and decide "not us."
Can you elaborate on why that would be the case? I ask because we've had some discussion about drafting an 'AI' policy with our CTO and though we haven't said anything officially, we are concerned about exfiltration of secrets, and the accuracy of information.
I do wonder if there will one day be communications formatters like what Prettier does for code, where no matter the style of writing going in, it will come out consistent. But until there are 'communications rewriters' to match a predefined company tone and style guide it seems like anyone's use of a tool like this would by ad-hoc.
What do you think the most sensible policy is to have right now, and why?
Not the OP, but the way I look at it - exfiltration of secrets can happen in any other possible way anyways, and you would need to lock down most of your SaaS software. I treat GPT the same I would treat any other Google search - don’t share your secrets, careful with copying stacktraces for weird error debugging and etc.
Then you fire them and sue them for leaking confidential information.
It doesn't matter if they have a checkbox saying 'we totally won't save your information we promise', no one should be shoveling confidential data into ChatGPT and if people keep doing so then they're a security risk.
What is it about an AI chat bots that makes the risk of a data leak so much higher? Is something about OpenAI's ToS? Or it's relative infancy?