Explicitly calling out that they are not going to train on enterprise's data and...

irrational · 2023-08-29T01:35:40

My company (Fortune 500 with 80,000 full time employees) has a policy that forbids the use of any AI or LLM tool. The big concern listed in the policy is that we may inadvertently use someone else’s IP from training data. So, our data going into the tool is one concern, but the other is our using something we are not authorized to use because the tool has it already in its data. How do you prove that that could never occur? The only way I can think of is to provide a comprehensive list of everything the tool was trained on.

tmpX7dMeXU · 2023-08-29T03:33:13

It’s a legal unknown. There’s nothing more to it. Your employer has opted for one side of the coin flip, and it’s the risk averse-one. Any reasonably-sized org is going to be raising the same questions, but instead opting to reap the benefits and take on the legal risk, which is something organisations do all the time anyway.

happytiger · 2023-08-29T04:50:16

There is a very real concern about being “left behind” on these issues building.

You’ve got to be early, but not so early you get legal or business disruptions or concequences.

It’s quite the balancing act for exec teams.

fxnn · 2023-08-29T08:20:54

For me that discussion is always hard to grasp. When a human would learn coding autodidacticly by reading source code, and later they would write new code — then they could only do so because they read licensed code. No one would ask for the license, right?

So why do we care from where LLMs learn?

sensanaty · 2023-08-29T08:45:03

> So why do we care from where LLMs learn?

Because humans aren't computers and the similarities between the two, other than the overuse of the word "learning" in the computer's case, are nonexistant?

gaganyaan · 2023-08-29T12:04:58

Are you really asserting that these models aren't learning? What definition of learning are you using?

sensanaty · 2023-08-29T12:50:12

Don't know if they are, and don't really care either and I don't care to anthropomorphize circuitry to the extent that AI proponents tend to, especially.

Humans and Computers are 2 wholly separate entities, and there's 0 reason for us to conflate the two. I don't care if another human looks at my code and straight up copies/pastes it, I care very much if an entity backed by a megacorp like Micro$oft does the same, en-masse, and sells it for profit, however.

fxnn · 2023-08-29T15:19:02

Okay, so the scale at which they sale their service is a good argument that this is different from a human learning.

However, on the other hand we also have the scale at which they learn, which kind of makes every individual source line of code they learn from pretty unimportant. Learning at this scale is statistical process, and in most cases individual source snippets diminish in the aggregation of millions of others.

Or to put it the other way round, the actual value lies in the effort of collecting the samples, training the models, creating the software required for the whole process, putting everything into a good product and selling it. Again, in my mind, the importance of every individual source repo is too small at this scale to care about their license.

Covenant0028 · 2023-08-30T05:44:05

The idea that individual source snippets at this scale diminish in aggregation, is undercut by the fact that OpenAI and MSFT are both selling enterprise-flavoured versions of GPT, and the one thing they promise is that enterprise data will not be used to further train GPT.

That is a fear for companies because the individual source snippets and the knowledge "learned" from them is seen as a competitive advantage of which the sources are an integral part - and I think this is a fair point from their side. However then the exact same argument should apply in favour of paying the artists, writers, coders etc whose work has been used to train these models.

So it sounds like they are trying to have their cake and eat it too.

fxnn · 2023-09-06T20:44:25

Hmm. You sure this is the same thing? I would say it’s more about confidentiality than about value.

Because what companies want to hide are usually secrets, that are available to (nearly) no one outside of the company. It’s about preventing accidental disclosure.

What AIs are trained on, on the other hand, is publicly available data.

To be clear: what could leak accidentally would have value of course. But here it’s about the single important fact that gets public although it shouldn’t, vs. the billions of pieces from which the trained AI emerges.

gaganyaan · 2023-08-29T16:23:29

It's really not different in scale. Imagine for a moment how much storage space it would take to store the sensory data that any two year old has experienced. That would absolutely dwarf the text-based world the largest of LLMs have experienced.

gaganyaan · 2023-08-29T16:20:15

If you don't care, why are you confidently asserting things you're not even interested in examining? It just drowns out useful comments.

bombolo · 2023-08-29T12:51:28

Do humans really read terabytes of C code to learn C?

Humans look at a few examples and extrapolate…

fxnn · 2023-09-06T20:47:25

But that also exists in the AI world. It’s called „fine tuning“: a LLM trained on a big general dataset can learn special knowledge with little effort.

I’d guess it’s exactly the same with humans: a human that received good general education can quickly learn specific things like C.

gaganyaan · 2023-08-29T16:26:11

Humans have experienced an amount of data that absolutely dwarfs the amount of data even the largest of LLMs have seen. And they've got billions of years of evolution to build on to boot

bombolo · 2023-08-29T21:13:18

You're straying away. Let's talk about learning C.

Also humans didn't evolve in billion of years.

gaganyaan · 2023-08-29T23:06:09

The process of evolution "from scratch", i.e. from single-celled organisms took billions of years.

This is all relevant because humans aren't born as random chemical soup. We come with pre-trained weights from billions of years of evolution, and fine-tune that with enormous amounts of sensory data for years. Only after that incredibly complex and time-consuming process does a person have the ability to learn from a few examples.

An LLM can generalize from a few examples on a new language that you invent yourself and isn't in the training set. Go ahead and try it.

bombolo · 2023-09-04T12:25:23

I can't even convince it to put the parameters in a function call in the correct order, despite repeatedly asking.

DrScientist · 2023-08-29T09:42:18

There is the element of the unknown with LLMs etc.

There is a legal difference between learning from something and truly making your own version and simply copying.

It's vague of course - take plagiarism in a university science essay - the student has no original data and very likely no original thought - but still there is a difference between simply copying a textbook and writing it in your own words.

Bottom line - how do we know the output of the LLM isn't a verbatim copy of something with the license stripped off?

peoplefromibiza · 2023-08-29T08:31:22

> So why do we care from where LLMs learn?

same difference there is between painting your own fake Caravaggio and buying a fake Caravaggio (or selling the one you made).

the second one is forgery, the first one is not.

blackhaz · 2023-08-29T08:45:21

The way I see it is that with AI you have really painted your own Caravaggio, but instead of an electrochemical circuit of a human brain you've employed a virtual network.

peoplefromibiza · 2023-08-29T09:00:46

> but instead of an electrochemical circuit of a human brain you've employed a virtual network.

technically it is still a tool you are using, differently from doing it on your own, with your hands, using your own brain cells, that you trained over the decades, instead of using a virtual electronic brain pre-trained in hours/days by someone else on who knows what.

fxnn · 2023-08-29T15:33:16

Okay if it’s about looking at one painting and fake that. However, if you train your model on billions of paintings and create arbitrary new ones from that, it’s just a statistical analysis on what paintings in general are made of.

The importance of the individual painting diminishes at this scale.

joncrocks · 2023-08-29T09:46:43

And if you look at lots of paintings, and create a new painting which is in a very similar style to an existing painting?

Is that a forgery? Have you infringed on the copyright on all the paintings you looked at?

jprete · 2023-08-29T11:38:30

Why do people bring this up? People are not LLMs and the issues are not the same.

tracker1 · 2023-08-29T16:36:16

I'd add to this, the damage an LLM could do is much less than a human could do in terms of individual production. A person can paint so many forgeries... A machine can create many, many more. The dilusion of value from a person learning is far different than machine learning. The value extracted and diluted is night and day in terms of scale.

Not to say what will/won't happen. In practice, what I've seen doesn't scare me much in terms of what LLMs produce vs. what a person has to clean up after it's produced.

gaganyaan · 2023-08-29T12:02:33

Why are the issues not the same? Are you privileging meat over silicon?

ddingus · 2023-08-29T15:09:45

Yes they are. Most people will.

They are not the same because an LLM is a construct. It is not a living entity with agency, motive, and all the things the law was intended for.

We will see new law as this tech develops.

For an analogy, many people call infringement theft and they are wrong to do so.

They will focus on the someone getting something without having followed the right process part while ignoring the equally important someone else being denied the use of, or loss of property part.

The former is an element in common between theft and infringement. And it is compelling!

But, the real meat in theft is all about people losing property! And that is not common at all.

This AI thing is similar. The common elements are super compelling.

But it just won't be about that in the end. It will be all about the details unique to AI code.

gaganyaan · 2023-08-29T16:55:00

Using the word "construct" isn't adding anything to the conversation. If we bioengineer a sentient human, would you feel OK torturing it because it's "just a construct"? If that's unethical to you, how about half meat and half silicon? How much silicon is too much silicon and makes torture OK?

> Most people will [privilege meat]

"A person is smart. People are dumb, panicky dangerous animals, and you know it". I agree that humans are likely to pass bad laws, because we are mostly just dumb, panicky dangerous animals in the end. That's different than asking an internet commentor why they're being so confident in their opinions though.

ddingus · 2023-08-29T19:04:29

If we bioengineer:

Full stop. We've not done that yet. When we do, we can revisit the law / discussion.

We can remedy "construct" this way:

Your engineered human would be a being. Being a being is one primary difference between us and these LLM things we are toying with right now.

And yes, beings are absolutely going to value themselves over non beings. It makes perfect sense to do so.

These LLM entities are not beings. That's fundamental. And it's why an extremely large number of other beings are going to find your comment laughable. I did!

You are attempting to simplify things too much to be meaningful.

gaganyaan · 2023-08-29T20:00:00

Define "being". If it's so fundamental, it should be pretty easy, no?

And I'd like if this were simple. Unfortunately there's too many people throwing around over-simplifications like "They are not the same because an LLM is a construct" or "These LLM entities are not beings". If you'll excuse the comparison, it's like arguing with theists that can't reason about their ideological foundations, but can provide specious soundbites in spades.

ddingus · 2023-08-29T20:49:39

It is easy!!

First and foremost:

A being is a living thing with a will to survive, need for food, and a corporeal existence, in other words, is born, lives for a time, then dies.

Secondly, beings are unique. Each one has a state that ends when they do and begins when they do. So far, we are unable to copy this state. Maybe we will one day, but that day, should there ever be one, is far away. We will live our lives never seeing this come to pass.

Finally, beings have agency. They do not require prompting.

gaganyaan · 2023-08-29T22:30:08

So these jellyfish aren't "beings" because they can live forever? Or do they magically become "beings" when they die?

https://en.m.wikipedia.org/wiki/Turritopsis_dohrnii

Also twice now you've said the equivalent of "it hasn't happened yet so no need to think about the implications". Respectfully, I think you need to ponder your arguments a bit more carefully. Cheers.

ddingus · 2023-08-30T01:05:50

Of course they are beings!

They've got a few fantastic attributes, lots of different beings do. You know the little water bear things are tough as nails! You can freeze them for for a century wake them up and they'll crawl around like nothing happened.

Naked mole rats don't get any form of cancer. All kinds of things the beans present in the world that doesn't affect the definition at all.

You didn't gain any ground with that.

And I will point out, it is you who has the burden in this whole conversation. I am clearly in the majority if you want things with what I've said. And I will absolutely privilege meets face over silicon any day, for the reasons I've given.

You, on the other hand, have a hell of a sales job ahead of you. Good luck maybe this little exchange helped a bit take care

peoplefromibiza · 2023-08-31T07:51:00

> Or do they magically become "beings" when they die?

quoting from your link

although in practice individuals can still die. In nature, most Turritopsis dohrnii are likely to succumb to predation or disease in the medusa stage without reverting to the polyp form

This sentence does not apply to an LLM.

Also, you can copy an LLM state and training data and you will have an equivalent LLM, you can't copy the state of a living being.

Mostly because a big chunk of the state is experience, like for example you take that jellyfish, cut one of its tentacles and it will be scarred for life (immortal or not). That can't be copied and most likely never will.

ddingus · 2023-08-30T01:07:42

Regarding the copying of a being state, I'm not really sure that's ever even going to be possible.

So for the sake of argument I'll just amend that and say we can't copy their state. Each being is unique and that's it. They aren't something we copy.

And yes that means all of us that thinks somehow they're going to get downloaded into a computer? I'll say it right here and now that's not going to fucking happen.

red_trumpet · 2023-08-29T10:34:47

Companies don't go around donating their source code to universities either, even if it was only for the purpose of learning.

blitzar · 2023-08-29T11:28:06

> So why do we care from where LLMs learn?

Because humans dont put the "Shutterstock" watermark logo on the images they produce.

defrost · 2023-08-29T11:33:06

As with all absolutes* exceptions exist:

Viagra Boys - In Spite Of Ourselves (with Amy Taylor)

    I absolutely love that the entirety of the video is unpurchased stock footage with the watermark still on it. This is cinematic gold.

https://www.youtube.com/watch?v=WLl1qpDL7YA

* well, most ...

hnben · 2023-08-29T12:30:41

cargo cult programming is real though

ajhai · 2023-08-29T03:31:26

> The only way I can think of is to provide a comprehensive list of everything the tool was trained on.

There are some startups working in the space that essentially plan to do something like this. https://www.konfer.ai/aritificial-intelligence-trust-managem... is one I know of that is trying to solve this. They enable these foundation model providers to maintain an inventory of training sources so they can easily deal with coming regulations etc.

wodenokoto · 2023-08-29T07:36:24

Isn’t that a benefit of using a provider?

Microsoft/OpenAI are selling a service. They’re both reputable companies. If it turns out that they are reselling stolen data, are you really liable for purchasing it?

If you buy something that fell of a truck, then you are liable for purchasing stolen goods. But if it turns out that all the bananas in wall mart were stolen from cosco you’re not as a customer liable for theft.

Similarly, I don’t know if Clarkson Intelligence have purchased proper license for all the data they are reselling. Maybe they are also scraping some proprietary source and now you are using someone else’s IP.

matkoniecz · 2023-08-29T09:52:47

> But if it turns out that all the bananas in wall mart were stolen from cosco you’re not as a customer liable for theft.

Actually, that would be fencing stolen goods and customers could have obligations.

Cases of bananas is a bit silly as returning bananas would be not possible and value is too small to bother.

But imagine reputable car dealer selling stolen cars, repossession is far more likely here.

arghwhat · 2023-08-29T08:01:26

Even if you find a way to successfully forward liability and damages to Microsoft and OpenAI - which I doubt you will be able to as the damages are determined by your use of the IP - you do not gain the right to use the affected IP and will have a cease and desist for whatever is built upon it.

How legitimate the IP concern is and whether it holds up in court is one thing, but finger pointing will probably not be sufficient.

sgt101 · 2023-08-29T08:38:51

Also I thing that MS / OpenAI cannot and will not indemnify you. I think that their CEO and CFO have a duty not to...

fragmede · 2023-08-29T05:58:47

> How do you prove that that could never occur?

Realistically you can prove that just as well as you can prove that employees aren't using ChatGPT via their cellphones.

There are also organizations that forbid the use of Stack overflow. As long as employees don't feel like you're holding back their career and skills by prohibiting them from using modern tools, and keep working there, hey. As long as you pay them enough to stay, people will put up with a lot, even if it hurts them.

kortilla · 2023-08-29T06:27:09

Using chatgpt to code is not a skill. It’s a crutch. Any employees that feel held back by not being able to access it aren’t great in the first place.

sinfulprogeny · 2023-08-29T06:44:16

Using $technological_aide \in {chatgpt,ide,stackoverflow,google,debuggers,compilers,optimizers,high-mem VMs}$ to code is not a skill. It’s a crutch. Any employees that feel held back by not being able to access it aren’t great in the first place.

noduerme · 2023-08-29T06:53:52

I don't think using ChatGPT is similar to searching for answers on S.O. Maybe if you were asking people on S.O. to write your code for you, or plugging in exact snippets. The point here is that letting ChatGPT write code directly into your repo is effectively plagiarism and may violate any number of licenses you don't even realize you're breaking, whereas just looking at how other people did something, understanding, and then writing your own code, does not.

DougBTX · 2023-08-29T09:29:52

Honestly I couldn’t tell you whether copying code out of Stack Overflow or out of ChatGPT is more legally suspect. For SO, you don’t know where the user got the code from either (wrote it themselves? Copied it out of their work repo? Copied from some random GPL source?)

noduerme · 2023-08-30T00:06:40

Well, this is why you don't copy code from S.O. You read it, understand why it works, then write your own.

hanselot · 2023-08-29T07:10:26

I've been experiencing carpal tunnel on and off for a couple of weeks now. I can tell you that reading through some code generated by "insert llm x" is substantially less painful than writing all of it by my own hand.

Especially if you start understanding how to refine your prompts to the point where you use a single thread for project management and use that thread to generate prompts for other threads.

Not all value to be gained from this is purely copypasta.

clbrmbr · 2023-08-29T11:54:17

Same here. End of last year I had to take more time off than I wanted to because of my wrists and hands. Copilot and GPT4 (combined with good compression braces) got me back in the game.

bombolo · 2023-08-29T13:03:25

Take sick leave

ddingus · 2023-08-29T15:13:48

Boy, those easy answers are right there! Can't miss 'em.

A guy could wonder why so many of us do not use those answers.

Could it be the details complicate things just enough to take the easy answer off the table?

Perhaps it is just me. What say you?

bombolo · 2023-08-29T21:36:35

Do you have a point to make? Maybe I should ask chatgpt to find it because I sure can't.

ddingus · 2023-08-30T01:12:43

Yes I do. The point is blurting out some one liner fix all doesn't help anyone really.

kortilla · 2023-08-31T02:15:46

Stackoverflow yes, the others aren’t the same category. Asking someone else to give you code to do X means you have struggled synthesizing algorithms yourself. It’s not a good habit because it means you struggle to be precise in how the program behaves.

peoplefromibiza · 2023-08-29T08:38:27

> It’s a crutch

problem is most of the code chatgpt spouts is wrong, in so many subtle ways, that sometimes you just have to run it to prove it.

so basically you have to be better than chatgpt at that particular task to spot its mistakes.

using it blindly it's similar to the Gell-Mann amnesia effect

https://theportal.wiki/wiki/The_Gell-Mann_Amnesia_Effect

said by someone who uses chatgpt extensively, it is good for the structure, to get an idea, but as a code generator it kinda sucks.

moratorii · 2023-09-01T11:47:19

Thank you.

I am not a programmer and only know some very rudimentary HTML and Java. After hearing everyone enthuse about how they use ChatGPT for everything, I thought that I could use it to generate a page that I thought sounded simple enough. Gist of it was that I needed 100 boxes of the same dimensions that text could be inputted into. I figured that it'd be faster with AI than with me setting up an excel sheet that others would have to access.

Instead, the AI kept spitting out slightly-off code, and no matter how much reiterations I did it did not improve. Had I known the programming language, I would have known what needed to be changed. I think that a lot of highly experienced people are using it as a short-hand to get started, and a lot of inexperienced people are going to use it to produce a lot of shoddy crap. Anyone relying on ChatGPT that doesn't already know what they're doing is setting themselves up for failure.

Hoasi · 2023-08-30T21:37:07

> said by someone who uses chatgpt extensively, it is good for the structure, to get an idea, but as a code generator it kinda sucks.

Interestingly, the same applies to text-to-image programs. Once you used these for a while, you realize their utility and value are little more than an inspiration or a starter. Even if you wanted to ignore the ethical implications, very little they produce is useable. LLM are amazing. However, their end-product application is overrated.

stormfather · 2023-08-29T07:31:11

Anyone who can't figure out how to become more efficient with the aid of LLMs is a dinosaur.

fhd2 · 2023-08-29T09:29:24

I dunno about that. I honestly tried to extract _any_ value in my day to day work from LLMs, but aside from being an OK alternative to Google/SO, I mostly did find it to be a crutch.

I never had issues with quickly writing a draft or typing in code. I do realise that for a lot of people, starting on a green field is hard, but for me it's easier.

My going hypotheses is that people are just different, and some get true value out of it while others don't. If it works for you, I'm not gonna call you names for it.

matkoniecz · 2023-08-29T10:24:45

I guess that it depends on how popular your thing. If you are doing something not done before or really unique, then hope for useful hints is lower.

If you do something done 10000 times before or is mix of two things done over and over then you are more likely to get advise.

Hoasi · 2023-08-30T21:42:39

Exactly. It's a time-saver for bureaucratic chores. And that's great. There is no need to strive for excellence when mediocrity is required.

Useless for anything that requires originality, elaborate humor, or finesse.

generic92034 · 2023-08-30T09:44:46

Yes, try it with a proprietary language, a closed source environment and lots of domain and application knowledge required to achieve anything. There ChatGPT is completely out of it.

tomjen3 · 2023-08-29T08:54:22

Here is a bunch of JSON. Output the c# classes that can deserialise it.

An intern in college could do that, but it isn’t worth our time to do.

For this function, write the unit tests. Now you do not have anything that you can blindly commit, but you are at the stage where you are reviewing code.

Could you do all of this by hand? Sure but you never would, you would use an IDE. Chatgpt is better than an IDE when you know how to use it.

cebert · 2023-08-29T10:03:02

I think it can be a productivity booster. At my company, I need to touch multiple projects in multiple languages. I can ask ChatGPT how to do something, such as enumerate an array, in a language I’m less familiar with so that I can more quickly contribute to a project. Using ChatGPT as a coach, I am also slowly learning new things.

throw__away7391 · 2023-08-29T12:35:25

I remember people saying literally the exact same thing almost word for word about Google almost a quarter century ago.

bombolo · 2023-08-29T13:04:36

You mean the search engine that NEVER EVER gives me the documentation I'm locking for, but always goes for a vaguely related blog with hundreds of ads?

morelisp · 2023-08-29T06:44:24

Horrible metaphor aside, at our company it's not the software developers most overusing ChatGPT.

fxnn · 2023-08-29T08:15:42

Sounds like the perspective of someone who never gave that tool a real chance. To me it’s primarily just another aid like IDE inspections, IDE autocompletions etc.

I use GitHub Autopilot mainly for smaller chunks of code, another kind of autocompletion, which basically safes me from typing, Googling or looking into the docs, and therefore keeps me in the flow.

spullara · 2023-08-29T07:42:59

Can you share your linkedin?

Cthulhu_ · 2023-08-29T07:29:36

I'd avoid the use of the word crutch, it sounds ableist as fuck; my girlfriend has a joint disease and relies on a crutch to walk. In other words, while I understand it's just a figure of speech: how dare you.

It's a tool that can help, just like an IDE, code generators, code formatters, etc. No need to talk down on it in that fashion, and there's no need to look down your nose at tools or the people that use it.

mmmmmbop · 2023-08-29T07:49:21

How is it offensive to use crutch in this context? The implication being that it's a tool that helps people do something they'd struggle with otherwise. I don't see why anyone might be offended by that.

stavros · 2023-08-29T07:47:03

Why is the word "crutch" ableist, when the meaning is exactly analogous? It's a tool to help you do something you can't easily do yourself.

concordDance · 2023-08-29T07:58:38

Your girlfriend is less able to walk and thus uses a crutch to compensate, but it doesn't let her walk as well as a normal person. Replace "walk" with "code" and the sentence works for ChatGPT if the grandparent is correct.

matkoniecz · 2023-08-29T10:26:32

> It's a tool that can help, just like an IDE, code generators, code formatters, etc

And like crutch for someone who cannot walk without it.

Or glasses (which I use) and allow me to regain almost as good vision as someone with no deformity of eyeballs.

spullara · 2023-08-29T07:49:42

Someone that doesn't use the tools available to them would be like folks hammering in nails with their forehead.

bombolo · 2023-08-29T13:02:36

You can't code without access to stackoverflow?

Official documentation is still available…

sirspacey · 2023-08-29T02:56:59

It’s an interesting question.

To effectively sue you, I believe the plaintiff would have to prove the LLM you were using was trained on that IP and it was not in the public domain. Neither seems very doable.

DSMan195276 · 2023-08-29T04:06:05

I don't actually think either of those things are all that hard, certainly it's a gray area until this actually happens but I think AI generation is not all that different from any other copyright situation. Even with regular copyright cases you don't need to prove "how" the copying occurred to show copyright infringement, rather you just have to show that it's the likely explanation (to the level of some standard). Point being, you potentially don't need to prove anything about the AI training as long as you can show that the AI's result is clearly identifiable as your work and is extremely unlikely to be generated any other way.

Ex. CoPilot can insert whole blocks of code with comments and variable names from copyrighted code, if those aspects are sufficiently unique then it's extremely unlikely to be produced any way other than coming from your code. If the code isn't a perfect copy then it's trickier, but that's also the case if I copy your code and remove all the comments, so it's still not all that different from the current status quo.

The bigger question is who gets sued, but I can't imagine any AI company actually making claims about the copyright status of the output of their AI, so it's probably on you for using it.

vGPU · 2023-08-29T03:04:37

It could open quite a wide window for patent trolls though, who generally go for settlements under the threat of a protracted court battle which is of minimal cost to them, as they are often single purpose law firms that do that and only that.

Being able to have your legal counsel tell them to go bug openAI could potentially save you from quite a few anklebiters all seeking to get their own piece.

Andrew018 · 2023-08-29T04:08:54

Your observation highlights the complexities of legal actions related to AI-generated content. Proving the exact source of a specific piece of content from a language model like the one I'm based on can indeed be challenging, especially when considering that training data is a mixture of publicly available information. Additionally, the evolving nature of AI technology and the lack of clear legal precedents in many jurisdictions further complicate the matter. However, legal interpretations may vary, and it's advisable for any legal proceedings to involve legal experts well-versed in both AI technology and intellectual property law. Also, check out AC football cases.

hnfong · 2023-08-29T04:11:58

Just curious, do they have bans on "traditional" online sources like Google search results, Wikipedia, and Stack Overflow?

From my view, copying information from Google search results isn't that much different from copying the response from ChatGPT.

Notably Stack Overflow's license is Creative Commons Attribution-ShareAlike, which I believe very people actually realize when copying snippets from there.

Semaphor · 2023-08-29T04:21:40

> Notably Stack Overflow's license is Creative Commons Attribution-ShareAlike, which I believe very people actually realize when copying snippets from there.

A lot of the snippets would not meet the standard for copyrightable code, though. At least that’s my understanding as non-lawyer.

peanball · 2023-08-29T04:23:10

With SO you also have no guarantee that the person has the license to put that snippet. Even that could have been copied from somewhere else. A customer was scanning for and banning SO, if that was the only determined source.

judge2020 · 2023-08-29T03:19:43

> but the other is our using something we are not authorized to use because the tool has it already in its data.

We won't know if this is legally sound until a company who isn't forbidding A.I. usage gets sued and they claim this as a defense. For all we know the court could determine that, as long as the content isn't directly regurgitated, it's seen as fair use of the input data.

onethought · 2023-08-29T04:09:25

It's not logical, because how can the company prove that could never happen from 80,000 employees writing things?

i.e. Without ChatGPT an employee could still copy and paste something from somewhere. ChatGPT actually doesn't change the equation at all.

spullara · 2023-08-29T07:40:36

They are stupid and don't understand risk vs reward.

mveertu · 2023-08-28T22:30:18

So, how do you plan to commercialize your product? I have noticed tons of chatbot cloud-based app providers built on top of ChatGPT API, Azure API (ask users to provide their API key). Enterprises will still be very wary of putting their data on these multi-tenant platforms. I feel that even if there is encryption that's not going to be enough. This screams for virtual private LLM stacks for enterprises (the only way to fully isolate).

ajhai · 2023-08-28T22:43:14

We have a cloud offering at https://trypromptly.com. We do offer enterprises the ability to host their own vector database to maintain control of their data. We also support interacting with open source LLMs from the platform. Enterprises can bring up https://github.com/go-skynet/LocalAI, run Llama or others and connect to them from their Promptly LLM apps.

We also provide support and some premium processors for enterprise on-prem deployments.

rsiqueira · 2023-08-28T23:06:09

But, in order to generate the vectors, I understand that it's necessary to use the OpenAI's Embeddings API, which would grant OpenAI access to all client data at the time of vector creation. Is this understanding correct? Or is there a solution for creating high-quality (semantic) embeddings, similar to OpenAI's, but in a private cloud/on premises environment?

ajhai · 2023-08-28T23:20:33

Enterprises with Azure contracts are using embeddings endpoint from Azure's OpenAI offering.

It is possible to use llama or bert models to generate embeddings using LocalAI (https://localai.io/features/embeddings/). This is something we are hoping to enable in LLMStack soon.

ivalm · 2023-08-28T23:14:29

Sentence-Bert is at least as good as OpenAI embeddings. But I think more importantly Azure OpenAI model api is already soc2 and hipaa compliant.

mveertu · 2023-08-28T22:50:20

Enterprises can bring up https://github.com/go-skynet/LocalAI, run Llama or others and connect to them from their Promptly LLM apps - So spin up GPU instances and host whatever model in their VPC and it connects to your SaaS stack? What are they paying you for in this scenario?

amelius · 2023-08-28T22:35:52

> is going to put a lot of the enterprises at ease and embrace ChatGPT in their business processes.

Except many companies deal with data of other companies, and these companies do not allow the sharing of data.

clbrmbr · 2023-08-29T12:03:24

Usually that’s not a problem it just means adding OpenAI as a data processor (at least under ISO 27017). There’s a difference between sharing data for commercial purposes (which is usually verboten), vs for data-processing purposes.

rr808 · 2023-08-28T23:19:23

At the corp I work for Chat GPT (even bing) is blocked at the firewall. Hopefully now we'll be able to use it.

oneneptune · 2023-08-29T14:51:10

I've been maintaining SOC2 certification for multiple years, and I'm here to say that it's largely performative and an ineffective indicator of security posture.

The SOC2 framework is complex and compliance can be expensive. This can lead organizations to focus on ticking the boxes rather than implementing meaningful security controls.

SOC2 is not a good universal metric for understanding an organization's security culture. It's frightening that this is the best we have for now.

eoproc · 2023-08-29T05:29:05

Will be doing a show HN for https://proc.gg, a generative AI platform I've built during my sabbatical.

I personally believe that in addition to OpenAI's offering, the ability to swap to an open source model e.g. Llama-2 is the way to go for enterprise offerings in order to get full control.

osigurdson · 2023-08-29T06:17:47

Azures ridiculous agreement likely put a lot of orgs off. They also shouldn't have tried to "improve" upon OpenAI's APIs. OpenAI's APIs are a little under thought (particularly fine tuning) but so what?

dools · 2023-08-28T22:57:22

> we quickly learned how sensitive enterprises are when it comes to sharing their data

"They're huge pussies when it comes to security" - Jan the Man[0]

[0] https://memes.getyarn.io/yarn-clip/b3fc68bb-5b53-456d-aec5-4...