
Microsoft acquires deep learning startup Maluuba - damvigilante
https://blogs.microsoft.com/blog/2017/01/13/microsoft-acquires-deep-learning-startup-maluuba-ai-pioneer-yoshua-bengio-advisory-role/
======
thewhitetulip
MS has always shown that they can do research, halolens is one such product
and the recent ML enablement spree is another example. It is a good thing that
they are going beyond their comfort zone of milking Windows. What is
interesting is to see if their efforts make any difference in the status quo.
I say this after using AzureML, which I liked, there is no such thing on the
internet which allows you to write a ML model without knowing a programming
language! It is a webapp which asks you to put data inside it, click a few
buttons and it generates python or R code for you. Just brilliant.

~~~
dforrestwilson1
Thanks for the comment I had not seen this before.

[https://azure.microsoft.com/en-us/services/machine-
learning/](https://azure.microsoft.com/en-us/services/machine-learning/)

^Basic version is free to use!

~~~
thewhitetulip
Yes, basic version is free to use and it truly is an amazing application. MS
feels like a company with a direction :-)

Had they had this direction ten years ago, probably android would not have
existed!

------
roymurdock
More interesting info on Maluuba's 2 actual, recently-released datasets:
[http://datasets.maluuba.com/](http://datasets.maluuba.com/)

Their "News QA dataset" contains 120k Q&As collected from CNN articles:

 _Documents are CNN news articles. Questions are written by human users in
natural language. Answers may be multiword passages of the source text.
Questions may be unanswerable.

NewsQA is collected using a 3-stage, siloed process. Questioners see only an
article's headline and highlights. Answerers see the question and the full
article, then select an answer passage. Validators see the article, the
question, and a set of answers that they rank. NewsQA is more natural and more
challenging than previous datasets._

Their "Frames" dataset contains 1369 dialogues for vacation scheduling:

 _With this dataset, we also present a new task: frame tracking. Our main
observation is that decision-making is tightly linked to memory. In effect, to
choose a trip, users and wizards talked about different possibilities,
compared them and went back-and-forth between cities, dates, or vacation
packages.

Current systems are memory-less. They implement slot-filling for search as a
sequential process where the user is asked for constraints one after the other
until a database query can be formulated. Only one set of constraints is kept
in memory. For instance, in the illustration below, on the left, when the user
mentions Montreal, it overwrites Toronto as destination city. However,
behaviours observed in Frames imply that slot values should not be
overwritten. One use-case is comparisons: it is common that users ask to
compare different items and in this case, different sets of constraints are
involved (for instance, different destinations). Frame tracking consists of
keeping in memory all the different sets of constraints mentioned by the user.
It is a generalization of the state tracking task to a setting where not only
the current frame is memorized.

Adding this kind of conversational memory is key to building agents which do
not simply serve as a natural language interface for searching a database but
instead accompany users in their exploration and help them find the best
item._

\---

Can anyone with experience in ML/AI comment on how novel/complex these
projects are, and how expensive it would be to build out these datasets? Would
be interesting to see what it takes to publish a few datasets trained on 20
day conversations between real people, and get acquired by
Microsoft/Apple/Google.

~~~
webmaven
_> how expensive it would be to build out these datasets?_

Assuming you have the expertise necessary to design and run the process in-
house, the major expense is going to be compensating the humans in the loop,
which can add up quickly.

This is why organizations that already have access to large datasets have such
a huge advantage.

I think that one of the reasons we're seeing such a rush to deploy chatbots is
that even a minimally-useful bot will quickly start accumulating _extremely_
useful (and very clean) training data.

There is a lot of noise being made about "democratizing AI", but as long as
the best results require a lot of training and huge amounts of training data
it will remain the bottleneck.

Look for progress on 1-shot and 0-shot learning to get a better feel for how
much progress is made on real democratization.

~~~
roymurdock
Thanks for the pointers - I'm looking forward to seeing how AI/ML is brought
to market, as we hear a lot about research but not as much on the product side
just yet. Sounds like MSFT will be pushing in this direction as well with the
Maluuba team:

 _Last fall, we formed the Artificial Intelligence and Research organization,
bringing engineering and research closer together to accelerate the pipeline
from cutting-edge research to product development. Maluuba, too, has closely
aligned its research and engineering teams, and we’re looking forward to
learning from their experiences as well._

------
clickok
Here's a presentation from one of their researchers, Harm van Seijen[0], to
give you an idea of what sort of work they do.

Applying reinforcement learning to dialogue systems seems incredibly
difficult, but if Maluuba (or others) can get a handle on the problem it would
not be unreasonable to expect another revolution in the vein of applying
convolutional nets to vision.

0:
[https://www.youtube.com/watch?v=s-8WkKhHYqA](https://www.youtube.com/watch?v=s-8WkKhHYqA)

------
Rainymood
Interesting ... honest question: what kind of business models do these ML
firms have?

------
redtrackker
not sure if this was a successful exit. LinkedIn shows the employee count
being cut in half over the last little while. Also 2 of the other cofounders
left?

~~~
jpantony
Out of curiosity what made you think 2 cofounders left?

------
EternalData
Go Montreal! woot.

~~~
rocky1138
I thought this company was located in Waterloo.

~~~
mindcruzer
Yeah last time I checked it was at Columbia & Phillip, right next to UW.

~~~
EternalData
One of the co-founders is pretty big in the Montreal startup scene.

