Wolfram Alpha debuted in 2009. Just imagine what they could have been if they'd opened up "skill" creation to the public.
Given that devices equivalent to Alexa or Google Home could pretty much be launched on Kickstarter, is the ecosystem large enough to allow an open-source or more public version of the AI assistant? Is it needed?
Wolfram Alpha actually has a pretty significant collection of “widgets” (http://www.wolframalpha.com/widgets/). These are commonly embedded on math and science related pages. The more clever widgets take full advantage of Wolfram Alpha’s NLP abilities.
As to why they didn’t have Alexa-style voice skills, I’m not sure speech recognition is a core competency of the company.
The work of a scholar I follow has covers this topic a little. Chris Blattman recently co-wrote an op-ed [0] about his data covering sweatshops in Ethiopia. They found that it wasn't always so clear that these dangerous industrial jobs lifted people out of poverty.
In a really neat trial, they went to factory owners and randomized the acceptance for job applicants, tracking the outcomes of people who found employment elsewhere. Chris originally made his name studying the lives of former child soldiers in Uganda. Recommend his mostly-professional twitter feed [1]
This seems pretty streamlined and well-documented, although I wonder if they could have delayed the Facebook login. Convincing users to do your data cleaning for you is quite the trick, like the Smithsonian.
I wonder, what more underhanded versions of data collection are possible? Maybe providing the opportunity to clean inline as users search the archive?
> It reinvents how people interact with artificial intelligence by allowing you to teach it instead of guessing what it'll understand.
Putting out an app like this seems like a great idea, if only to have a platform to start collecting and cleaning data (for your MTurkers, relatives, and bored employees). Speaking of which the privacy policy link goes nowhere...
On a related note, does anyone have details on the penalties for not having a privacy policy on a Google Play app? They seem to be a little more insistent on having one than Apple.
I have been going down the rabbit hole of copyright, fair use, and the Google Books Settlement recently. This article is a great summary including a lot of the peripheral issues, but the "2003 law review article" linked in TFA is nigh unreadable to me, compared to the actual legal opinions and briefs[0].
They are a couple of fascinating documents. The Authors Guild seems gobsmacked by the final ruling, and so am I. Perhaps the SCOTUS was correct to turn down hearing the case, if only to let the issue settle a little more, but it really feels like it's likely to be overturned in the near future.
There are some interesting tidbits in the opinions:
1) In the definitive ruling, the judge decides that the harm done to the market for the books is negligible, or overcome by the transformative "purpose" of the the usage ("purpose" is significant because most examples of fair use include some type of new creative "expression"). This is surprising to me.
2) Google Books is ruled fair use in part because the book descriptions (and snippets?) are metadata describing the books, information that should not be controlled by the authors.
The final ruling in Authors Guild v. Google was really just a footnote to the whole saga, though. The article barely mentions it.
The article focuses on the failure of the class action settlement, due to the "perfect being the enemy of the good" (librarians and individual authors objected to the settlement because they hoped Congress would pass a law to free orphan works, but what actually happened is that no progress has been made).
The battle lines around orphan works are interesting because they don't really follow the same contours as do a lot of the other disagreements about copyright law. From what I've seen, the main opponents of freeing orphan works are individual content creators and the organizations that purport to represent them like ASMP.
The fear I gather is that large content users won't make much of an effort to contact rights holders and will use orphan works legislation to just take it for free.
And this is one reason why i believe that copyright should require a minimal-fee registration every ten years. If you keep your registration current, there is no effort required to contact you. If you can't be bothered to do that, your copyright clearly isn't worth much to you and expires. Either way, the status of the work is unambiguous.
In the case of something like a photograph, that means a minimal-fee registration on each photograph every 10 years. This is also exactly the sort of effort that opponents of orphan works legislation feel that large content corporations will take advantage of when all the little guys forget to renew.
I'm actually mostly for orphan works legislation but I understand the perspective of the opponents.
Wouldn't it be easy to have a provision for bulk registration?
Like, "renew the photographs with SHA's .....", and then providing a simple tool to list all the SHA's of all files with a given extension in a directory?
> 2) Google Books is ruled fair use in part because the book descriptions (and snippets?) are metadata describing the books, information that should not be controlled by the authors.
It would be very interesting if, instead of showing verbatim snippets from books, there was an appropriate, high quality machine generated summary. This would be a genuine transformation of the source material.
[As a layperson] the most convincing argument the Authors Guild makes is given I think in their SCOTUS petition: that Google's "fair use" is sidestepping a legitimate business opportunity for the rights holders. Books are not just the paper they're printed on, and authors already as a matter of course hold the rights to plays, movies, etc adapted from the text. Particularly when there is no new expression, it seems to me you are just getting away with not licensing the data. This argument is one of the least-covered in the briefs, however. So.
If I were to make a bar bet, based on my limited knowledge, I would say that any bolder attempt to use mass digitized books for a "transformative purpose" like a chatbot or AI would not pass scrutiny (which kinda sucks, because that would be awesome). That's what I mean by overturned -- perhaps the current GB usage is fine because of point 2) above.
Of course, like many Court issues, the best solution (as yohui alludes above) seems to be to have Congress fix things with real law, such as to create compulsory licensing scheme like in music.
A legitimate business opportunity affects 1 part of the 4-factor fair use test. There are plenty of other cases where things were found to be fair use despite a market existing. I've met a lot of people who have the same gut feeling that you do, but the legal history is much more complicated than that.
"If Google could find a way to take that corpus, sliced and diced by genre, topic, time period, all the ways you can divide it, and make that available to machine-learning researchers and hobbyists at universities and out in the wild, I’ll bet there’s some really interesting work that could come out of that. Nobody knows what,” Sloan says. He assumes Google is already doing this internally. Jaskiewicz and others at Google would not say."
For books that are scanned, but with no extra licensing, would Google be allowed to do anything with the data? Create a very delocalized n-gram set? Use it as the "test" set (not even cross-validation, where it might influence hyperparams) for a ML algorithm?
Edit: would love to know where google's authorization derives from, with the ngram set. Somewhere in the Judge's orders? A negotiated fee with the Authors Guild?
Ok, here is one of the important opinions in the Google Books settlement, by Judge Chin in 2013 [0]. He basically says (paraphrasing), "I'm going to assume Google has violated copyright by creating digital copies and serving them. But it's fair use, because the new products are transformative".
For example, re:ngrams
"""
Similarly, Google Books is also transformative in the sense that it has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way they have not been used before. Google Books has created something new in the use of book text-the frequency of words and trends in their usage provide substantive information. [...]
On the other hand, fair use has been found even where a defendant benefitted commercially from the unlicensed use of copyrighted works
Data-mining, indexing, quotations, meta-data, have all been extracted before. It seems more like the degree to which Google are/want to do it, rather than the idea to do it?
If I get the same treatment as Google before the law then doesn't this mean I can copy any whole corpus of work, use it, recopy it, share it, make derivative works, etc., all as long as at the end I write something new - a music track inspired by their work, say? That appears to be what the judge is saying when applied to other works??
Given that devices equivalent to Alexa or Google Home could pretty much be launched on Kickstarter, is the ecosystem large enough to allow an open-source or more public version of the AI assistant? Is it needed?