I've done some work with corpus linguistics and quantitative linguistics, and large parts of these disciplines essentially are about facts derived from books in some manner. Modern approaches tend to involve machine learning, deep neural networks and other things fashionable on hackernews, but in general that's an old, traditional area that was working on facts derived from books for decades before the "ML era".
To work on facts derived from books, we're sourcing all kinds of books and other written language, such as newspapers. Some publishers and authors are cooperative and helpful for such research, some are uncooperative and prefer to intentionally make working on their sources difficult - but in any case, even in the case of disagreement and conflict there's no "IP war", the conflict in our case tends to be about practical convenience of access, not about IP, because they don't really have a leg to stand on in claiming a copyright violation. They hold the copyright on the original text, which gives them certain exclusive rights, there's a bunch of intermediary data that we can't make available to public without their permission, but these rights don't extend to facts derived from that text, and we legally don't need their permission to work on, analyze, transform, publish and use stuff based on facts in the text or facts about the text, we can do that openly even if they've explicitly made it clear that they don't want us to do that. That's nothing new, that's established law that probably predates modern computers.
Can an AI commit copyright infringment? BERT probably "knows" that Cthulhu is a giant thing evoking squids, tentacle and non-orthonormic dimensions. These are facts based on books, but you can produce copyright infrigement based on those facts. It is called "producing a derived work".
In the past years I never managed to get anyone with legal knowledge interested in what they saw as a totally impossible scenario: the idea that AI could one day produce original work y learning its craft, like human do, from copyrighted works. Their criterion was "if you fed copyrighted work into an algorithm to produce a new work, then that's a derived work".
Humans are somehow imbued with a magic property that allows them to watch read WH40K books, alien and predator movies, then produce the Starcraft universe, and have it count as original work.
We do have a philosophico-legal discussion to have there. And way overdue, if I may. The state of copyright is already late in acknowledging internet, DL-generated work will be even more of a conundrum for it.
I feel like this is speculation. Do you have any citations? It seems to me that in this fuzzy area an AI will be judged identically to a human. While you give an example where he StarCraft universe is created and considered original work. There are many cases where a human learns their craft from copyright work, and creates a derived work, fanfic is a huge genre of example. I suspect that in the legal arena the nature and content of the new work will be far more influential in the status of the copyright, than the details of how the new work was created.
So, while I think it's likely that something that is generated by an AI that looks like original work will be considered original work. I think a question that is less clear, and much more important is if the AI itself would be considered a derived work. In some ways, it can be argued that an AI is a transformation of the original representation, and that substantial portions of the original work are/can be maintained within the AI itself, just how a work can be transformed by a compressor, but still be considered to maintain the copyright. However afaik this question likely remains still untested.
IANAL and all.
Also, I think:
> BERT probably "knows" that Cthulhu is a giant thing evoking squids, tentacle and non-orthonormic dimensions.
Is highly debatable.
One is a natural person simply by the mere fact of having been born. And that's what makes all the difference.
This idea pretty much the basis of large swathes of jurisprudence across the world, really.
AI, as such, is legally speaking no different from a simple pencil when it comes to writing a book. It's a tool through which a natural person creates a creative work thus establishing a copyright on the part of the natural person.
See, what most people fail to see is that copyright isn't tied to the creative work; it's tied to its creator. Hence why copyright seizes to exist some arbitrary amount of time (20, 40, 70 years) after the creator - a natural person - has died.
So, when you say "a neural network acquires copyright by itself when it generates a new creative work", you are forced to consider whether a neural network is a "person". Which is a can of worms in itself. (consider animals as persons - case: monkey selfie)
I think the more interesting questions are whether the operator of the neural network is the legal author of its creations and whether such creations satisfy the creativity requirements for copyright.
I think the operator would be the author of the work, similar to how the operator of a camera or word processor is the author of works created by those tools. However I think in some cases the work may not meet the creativity requirement.
> “[T]he requisite level of creativity is extremely low.” Even a “slight amount” of creative expression will suffice. ... An author’s expression does not need to “be presented in an innovative or surprising way,” but it “cannot be so mechanical or routine as to require no creativity whatsoever.”
Exactly. Copyright is very murky in that respect. The basic notion of "creativity" is willfully vaguely defined to ensure that there's a universal maxim.
For instance, did you know that databases are copyrightable? Even when their constituent parts consist of uncopyrightable facts? Copyright law considers the database as a "collection" or a whole, and so the entire collection can be seen as a creative work. But copyright only applies to the whole, not the constituent parts.
Other example, suppose you digitize an ancient piece of pottery by making a digital photograph. Have you then created a new creative work of art? Some would argue you did. Why? Because you didn't make a 1:1 copy of the pottery by creating a new physical pot with similar materials. You created an image using a particular mechanism, introducing elements such as lighting, color, contrast,... that may give your image an original element.
The latter example is actually a legal problem for digitization programs of cultural collections. Institutions hire a photographer to digitize collection, but then discover that the images are pretty much unusable because the photographer is able to enforce their own copyright i.e. demand a licensing fee every time someone wants to use or download an image. Which implies that institutions are also forced to add legal provisions in any contracts pertaining to the transfer of rights.
Hence why copyright law is rife with exceptions and exemptions. For instance, did you know that any image made by the U.S. Government automatically ends up in the public domain?
The problem with copyright is that digital technology is innately the act of creating copies. Each time I send a request over the Internet, I basically create a copy of the 1's and 0's stored at the other side. The basic tenets of copyright don't concern themselves with conceptual models and higher abstractions. They go back to the fact that a string of 1's and 0's was created on a physical carrier and then copied over to another carrier.
But that's not how humans work, we don't really apply notions of copyright to the physical representation on a disk, we apply them to the ephemeral, assembled representation on our screens and displays. This tension is what creates a ton of tension in this space.
Then, is a work to have been produced by a tool which is found to be violating copyright also then judged to violate the copyright?
Would a reasonable analogy be a pencil that has the works of H.P. Lovecraft etched onto it's surface?
Is this speculation to say that Starcraft is protected by copyright as an original work?
Is it speculation to say that it borrows heavily from existing universes?
There is not that much to discuss. If you leave vested interests out, that is. Future generations will see copyright in same way we look at feudalism or slavery today. Assuming we avoid the future pointed by Idiocracy.
I think the issues come from the fact that copyright law really fails to represent the realities of creativity in humans. As you point out, the laws don't really address the fact that often the things we create are based on all of our experiences and consumption of creative works, yet a machine which can produce the same process may fall foul of the exclusive rights of reproduction and adaptation. Is it merely the fact that humans have consciousness which means that we are able to do this without violating copyright law?
At least in the US there is more flexibility around derivitive works, which give creators of derivitive works some avenue to enforce exclusive rights over their creations, or at least avoid claims from original rights holders. Here in the UK we really lack such a flexibility, with the only exceptions along the same lines being 'fair dealing' which is not really a fair comparison because it requires the derivative creator to jump through a bunch of hoops.
Having said that, I'm not sure derivative works are really a suitable legal definition for AI created works, but until we can have a conversation about the role of originality and creativity and the role of consciousness in those proesses, this imperfect definition will probably continue to be applied to those works.
Lawrence Lessig writes a lot about this sort of thing, if you are interested.
EDIT: Also the academic Omri Rachum-Twaig recently wrote a book called 'Copyright Law and Derivative Works: Regulating Creativity' which also covers a lot of issues that are interesting, such as the disconnect between the psychology of creativity and the structure of copyright law.
Pretty sure ai and ghost writers fall into the same legal situation.
A recipe can not be protected by copyright. This is one of the reasons that online recipe pages have turned in to long personal stories with (incidentally) a recipe at the bottom.
A recipe book, however, does have protection -- due to the creative work found in organizing the recipes, choosing which ones to include and to out near each other, and any creative work associated with introductions, photos, or other new expression.
That means that legally, you could buy a ton of recipe books, and then make your own by copying and pasting just the ones you like. You could use the recipes unchanged, but you can't reuse the photos or any descriptive text, or anything but the bare recipe.
Similar logic should apply to the publication or reuse of bare facts.
Of course, law is complicated and nuanced, and lawyers/judges/legislators don't always understand new technology well enough to apply existing principles properly to new worlds
And dish names might be protectable as trademarks, especially if they are not merely descriptions of the foodstuff.
The ingredients list and preparation steps, or any other text which is purely functional and without a creative component, is what is not protectable.
Interestingly, computer code is generally copyright protected, despite being literally steps a machine follows to perform a task. Courts have ruled that, because there are so many different ways to express any software of nontrivial size, the way the code is written (including comments, variable names, organization, etc ) represents sufficient creative expression to be protected.
I'm actually somewhat surprised that binaries still get the protection, especially since with modern compiler optimizations, it seems like any creative expression your code would be gone by the time the compiler was done with it.
But hey, as I said above, law is strange
A mechanical transformation of the recipe (say, converting it to all caps, or changing the font) will still be protected as a derivative work, just like the binaries for a computer program are protected as a derivative work.
A re-phrasing of the recipe in someone else's words which results in the same dish is not protected, just like a re-implementation of a piece of software is not protected.
If I removed comments and/or translated the code in a literal manner, I'm sure it would still qualify as plagiarism in school. Are you implying that it would not or should not be a copyright violation in that case? I have no idea legally, but I would assume the worst.
> If I removed comments and/or translated the code in a literal manner, I'm sure it would still qualify as plagiarism in school.
What's plagiarism and copyright violation have nothing to do with each other. Academics routinely copy large segments of text and rely on fair use exemptions to avoid breaching copyright. Meanwhile, copying a couple innocuous sentences can rise to be plagiarism when it would not be a substantive copyright violation.
That is copyright infringement. See SAS Institute, Inc. v. S&H Comp. Sys., 605 F. Supp. 816 (M.D. Tenn. 1985)
Could some pro-IP person help me reconcile following statements:
1. If there was no copyright, "nobody" would write books/create art, thus we absolutely need to have copyright
2. Recipes have no copyright, but we are flooded by old and new recipes all the time.
If you claim new recipes are somehow less work a daily comic strip, thus needing less protection, please consult a chef of a michelin starred restaurant about the need of the work to come up with new recipes.
One of the most annoying features of a cooking site, a ten page story about your childhood Michigan is not necessary for a peach pie recipe
At the scale of Google Books, we're not talking about using copyrighted recipes unchanged or copying and pasting. If you buy a ton of cookbooks, scan them, make them searchable by the world and derive new information about cooking from analyzing them, can you profit from the derived knowledge? Should that change if the publisher or author made an active decision not to make their content available electronically, or asked you to exclude them from your analysis?
You could then make your recipe book searchable, sure.
Yes, technically, you could patent a recipe, provided that it was sufficiently novel and sufficiently non-obvious to a practitioner of ordinary skill in the art.
An ordinary recipe that only takes well-known ingredients and combines them in well-understood ways, applying well-known techniques is going to have a difficult time passing either the novelty or non-obviousness tests.
I would anticipate that, if your recipe-based patent application were to prevail, that your recipe would need to include some preparation steps that are themselves new and unusual. For example, if you described a novel method of processing an ingredient, or a novel way of combining two ingredients that relied on some previously-unexplored aspect of their chemistry (such as making use of the small ash content of coconut milk products, for example)
Oh dear me, it was a utility patent.
They spent years trying to get it approved and ultimately abandoned the application (presumably realizing they were never going to get it)
And how many introductions to programming compare a program to a recipe? Yet programs are copyrightable. If you're using logic, you already have lost.
I found an amusing web page today - someone wrote a book on programming for Windows, and also has a website, and on a particular page, they have a helpful snippet of code, which is really just a wrapper/reference to a Windows system call. Literally one line, no additional logic. However, they have ~5 lines of copyright declaration above it, saying it is theirs and you can only use it if you buy their book.
I thought this was really funny, given that they are essentially claiming a portion of the API defined by Microsoft just because they wrote a (pretty much the only possible) line that accesses it. It seems to be a controversial area of copyright recently.
I'm sure if you dig around enough you'll find really simple recipes that claim to be covered by copyright.
Code can be written deliberately in a kind of unfinished, unrefined way, and let those who would use it go through a similar process.
But the GPL seems significantly different from the absence of copyright.
In my opinion, it is emerging very foolishly and nonsensically, but it is still emerging, and so much of it is not completely settled.
Oracle V Google, for example, raised and then incompletely answered some questions like these
That's what a guy who made a book of trivia thought when he sued Trivial Pursuit for using all of his facts. Turned out, he was wrong.
Worth's reliance on cases involving infringement of one directory by another, see, e.g., Leon v. Pacific Tel. & Tel. Co., 91 F.2d 484 (9th Cir. 1937) (telephone directories), or one list by another, see, e.g., Eckes v. Card Prices Update, 736 F.2d 859 (2d Cir. 1984), is not persuasive. In Leon, plaintiff's entire selection of names and numbers were copied and listed in numerical instead of alphabetical order. Leon, 91 F.2d at 484-85. In Eckes, the plaintiff published a list of 18,000 common baseball cards and selected 5,000 of those cards as "premium" cards; the defendant's listing selected substantially the same 5,000 cards as "premium" cards. Eckes, 736 F.2d at 860-61.
Current law restricts copying but doesn't protect ideas or facts.
Stupid laws are reaching tipping point to end.
EU database right can and does.
While I'm not trying to suggest that your ultimate conclusion might not turn out to be correct, especially in the context of US legislation, that's far from settled and I expect to see quite a few lawsuits whenever this hypothetical actually affects a major player with some market power.
It kinda makes sense, Guinness Book of Records, being an example.
There probably will be such an attempt, making it at least as far as hearings in Congress. (Hopefully no further - hopefully no attempted legislation.)
We should be rejoicing at the ability to have an assistant that digests the world's libraries not worrying that someone might make a profit off of it without permission.
I think that's worth worry about. As well, if Google in their drive to monetize content that they don't own, causes the various publishers and IP owners to go on the legal attack, any other option/startup will be quickly dissuaded from building a similar, or better, assistant.
They're not going to compete with Google in software development, but Google isn't the sole gatekeeper of the book scans.
What does the author mean by "CRS"? Coordinate Reference Systems?
Here's a clear example of ML improving search: voice search. You might not use it, but it's extremely popular in India and other developing markets. "G search has gotten worse as they’ve focused on recency in the index, gotten more tolerant of synonyms, and gotten less strict about quoted phrases." None of these are "machine learning" - these are product decisions. If you wanted to say "Google is not an ML company," you'd point to the outsized human influence on search rankings (see, e.g. https://static.googleusercontent.com/media/guidelines.raterh...).
Google Maps is extremely valuable as a proprietary dataset, and we're all making it better whenever we do a captcha, doing object recognition from streetview. So are YouTube, News, Translate, and so many others.
There are so many papers detailing practical metric improvements from ML: https://arxiv.org/abs/1810.09591, is one ("Replacing the manual scoring functionwith a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history, with many successful iterations to follow" and deeper neural nets offered significant improvements after that).
This is just tiresome. Wasn't piracy supposed to doom us all?
It's regrettable to see Google gaining more power, but the copyright cartel doesn't have a solid moral standing from which to complain.
Then you have you simplistic and ridiculous statements like
> When you free something that belongs to someone it’s called stealing.
As if anything that was ever invented, written, done or created by a human being was done so in complete isolation and not based on what others have done before.
Piracy is what drove YouTube's monetization scheme, and there is a race to the bottom among musicians (which is certainly not only caused by piracy). So it may not be doom, but it's at least a PITA.
Example: Asimov saw computers, and robots, etc. but surpringly (in retrospect of his brilliance) failed to see networks.
Soldier on, till death and beyond! When did we become such suckers for short term results at the expense of the future...
All we do is refine the materials and building and designs based on the same core principles (this optimization has yielded decent but not paradigm-changing results, unlike what a "new S-curve" would entail). The perceived increase in "lasting power" of our portable devices compared to 40 years ago was largely due to Moore's Law, optimization on the consumption side of the energy equation, not the source.
The lithium ion battery has incrementally improved energy density every year since its commercialization (see Figure 4):
Most breakthroughs or projected-breakthroughs that get Popular Science articles written about them are overstated or never materialize at all. But battery technology is improving over time. It's not improving at the pace of microelectronics, but hardly anything improves that quickly.
I'm really nitpicking the theoretical aspect here, I guess. Is Lithion-ion really different from a first principle perspective? Is any battery technology not based on electrolyte principles?
See, when I look at a vapor engine, and compare it with an electrical engine, I really do have two different first principles driving motion, two different conversions of energy. When I look at a regular / convection oven (heated resistors) and compare it with a microwave oven, again two fundamentally different ways of heating a solid. Magnetic induction compared with thermodynamic heat conduction. X-ray compared with MRI. All of these are breakthroughs, using different first principles to complete the task.
I fail to see how battery technology is not all based on one and the same fundamental principle. Quoting your second link:
> “Batteries are electrochemical devices that store electrical energy by directly converting it to a chemical form.”
I'm not sure we were talking about the same thing, because your I read comment and it seems to fit my view, actually substantiates it. Am I misunderstanding these concepts?
Edit: FWIW, I was heavy daily user of portable music devices in the 1990s, and while new battery tech gave you an extra hour or so every iteration, none of it was life changing, it was a slow increment, not orders of magnitudes — a sign that we're operating on the same principles, just with more efficiency. My point was that current battery life is fantastically aided by improvements on the consumption side, much more than on the source side. I'm not claiming there's none in the latter, not at all.
Edit 2: TL;DR: I believe there is no fundamentally new physics in "battery" (storing energy), it's been the same thing for centuries (and reportedly was invented but not used in Ancient times). Unlike many other technologies like engines, ovens, body imaging, etc. Please don't hesitate to teach me more.
That's why cordless saws and leaf blowers are practical now but weren't practical 40 years ago. Better batteries made them work. They didn't benefit from Moore's Law.
> If I’d published a non-fiction book in the last 100 years I’d put $10 right now into a class action to prevent this product from hitting the market.
Authors are a 1 100th of a percent of the population. If we do create new ways for people whose entire life and minds are derivative of millennia of civilization to own facts observed in the world around them the primary funding for and beneficiary of such a change would be an even smaller class of people who collect much more of the sweat of the authors brow than the author even will. The proper response to this is voting any bums who vote for this out of office. If this doesn't work the next step is the guillotine.
> Google will force us to create a new format for information by removing the profitability from the existing one.
The fact that actual scarcity is giving way to plenty in no way suggests that we ought to fight to impose artificial scarcity for the dubious privilege of ensuring that leaches can keep profiting in order to keep a minority of the money filtering down to the people who do the actual work. Perhaps we ought to discover a way for everyone to profitably enjoy the greater bounty instead of glorifying working for a living.
> It’s not like they didn’t tell us they were doing this. Their mission statement was to ‘free the world’s information’. Small wonder they don’t understand privacy. In this case we’re talking about information that’s protected by IP rights. When you free something that belongs to someone it’s called stealing.
Our inherent emotional reaction to real scarcity based on the rivalrous nature of physical goods is a poor foundation to build a case for inventing new rights designed to divvy up the world for the benefit of the rich. I'm sick unto death of hearing proponents of new and inventive varieties of imaginary property describing circumvention of their imaginary rights "stealing". There are no words in keeping with the dignity of this site that I could use to aptly describe my feelings for the authors words. People like him are emphatically the enemies of the people.
It's just really not that good (yet)...
There's lots wrong with this article
> Google has only one valuable proprietary dataset
They own none of the map/street view data?
Google already answers questions using books.
And if you too want the world's book repository you can just download it, just illegally. There are big megapacks of it. Way better data than Google books, The only thing Google books might beat you on is original documents for history.
But books are almost dead, maybe a decade?
ML with human assisted help will be able to pop out quality books quite easily. It'll still take a human, just it'll do in a month what took years.
Paper book format maybe is.
"Long form content" is what "book" means now and it is not going away.
But if I'm starting a map company, and I scan in and trace the roads in my competitors' maps? I'd say that's less clear cut - and may well be copyright infringement, even though I'm extracting facts from their publication and creating a new publication containing the same facts.
If I use an entire copyrighted book to train an AI, is it more like the first example, or more like the second?
I.e contesting book fact aggregation would already had to have sued an won against Wikipedia and Britannica before it.