For my part, one of the biggest realization I had after many years of applying machine learning was that I got too caught up in the machine learning algorithms themselves. I was often way too eager to guess and check across different algorithms and parameters in search of higher accuracy. Fortunately, there are new automated tools today that can do that automatically.
However, the key piece of advice I'd give someone new to machine learning is not to get caught up in the different machine learning techniques (SVM vs random forrest vs neural network, etc). Instead (1) spend more time on translating your problem into terms a machine can understand (i.e how are you defining and generating your labels) and (2) how do you perform feature engineering so the the right variables are available for machine learning to use. Focusing on these two things helped me build more accurate models that were more likely to be deployed in the real world.
Feature engineering in particular has become a bit of a passion of mine since that realization. I currently work on an open source project called Featuretools (https://github.com/featuretools/featuretools/) that aims help people apply feature engineering to transactional or relational datasets. We just put out a tutorial on building models to predict what product a customer will buy next, which is a good hands on example to learn from https://github.com/featuretools/predict_next_purchase for beginners.
One example I have in mind, was a contest where participants were given a series of satellite pictures and asked to write a classifier to detect icebergs and cargo ships (the two are quite similar). As someone else pointed out, trying to use classical computer vision and machine learning on these images will always have some error rate during identification. However, if we were able to extract speed and trajectory of all objects in the picture and mixing them with AIS data, finding which ones are ships, which ones are giant pieces of ice, and which one are non-moving structures to be avoided, becomes easy.
So, you have to choose between a black box that will give you potential results with a given error-rate, and a predictable algorithm that anyone can audit. Seems like a no-brainer situation to me. For what other reason would you choose the first solution, except hype-related decisions ?
You claim, (probably correctly) that dataset B, which includes velocity and trajectory, is more correct for the problem at hand, and given dataset B, I would suggest that either algorithm A or B would probably do just fine.
You also claim that algorithm A has "some error rate during identification." But so will algorithm B, and so will either algorithm on dataset A and B!
The question you should ask is, how much do I care about "black box" vs. "white box", and is there are trade-off? If the black-box solution (algorithm A, the "ML" solution) gives you 10% higher accuracy, and that accuracy is going to save lives, you bet I'd choose it. Or maybe I decide that interpretability is really important due to external audit reasons, so I need the white-box solution. But maybe I'd choose both, the interpretable one, and use the uninterpretable one as a flag for "a human should look at this." Or maybe I'd combine the results of both algorithms to get even higher accuracy.
There are just so many ways to configure a solution to the problem you propose, and you are only distinguishing between two of them. In the end the appropriate choice depends on context.
But they're wrong! I read "Deep learning drives machine learning which drives artificial intelligence." This is very wrong. I stopped reading.
ML is one family of approaches for knowledge acquisition in AI, but far from the only one (eg. logic based inference is another big one).
DL is a family of approaches in supervised ML. As the author points out, it's a subset of a subset.
But saying that this sub-subset "drives" AI is like saying endocrinology "drives" medicine: not the right mental model at all.
I wish I could say I was passionate about feature engineering. I enjoy where deep learning is heading right now - where that kind of finicky, more-art-than-science approach becomes unnecessary, and the model does a better job detecting features than humans.
Additionally, my company (link in profile) builds a commercial product to help people define and iterate on prediction problems in a structured way based off of the ideas in that paper.
There are also some methodologies out there that can help you label data sets more efficiently. I don't often see them used, but they exist. Look up "active learning" and "semi-supervised learning".
I've been playing around with a similar idea of text. Do you already do that?
can you please elaborate?
A lot is known. E.g., there's the now classic Draper and Smith, Applied Regression Analysis. Software IBM Scientific Subroutine Package (SSP), SPSS (Statistical Package for the Social Sciences), SAS (Statistical Analysis System), etc. does the arithmetic for texts such as Draper and Smith. For some decades some of the best users of such applied math were the empirical macro economic model builders. E.g., once at a hearing in Congress I heard a guy, IIRC, Adams talking about that.
Lesson: If are going to do curve fitting for model building, then a lot is known. Maybe what is new is working with millions of independent variables and trillions of bytes of data. But it stands to reason that there will also be problems with 1, 2, 1 dozen, 2 dozen variables and some thousands or millions of bytes of data, and people have been doing a lot of work like that for over half a century. Sometimes they did good work. If want to do model building on that more modest and common scale, my guess is that should look mostly at the old very well done work. Here is just a really short sampling of some of that old work:
Stephen E. Fienberg,
The Analysis of Cross-Classified Data,
Yvonne M. M. Bishop,
Stephen E. Fienberg,
Paul W. Holland,
Discrete Multivariate Analysis:
Theory and Practice,
Shelby J. Haberman,
Analysis of Qualitative Data,
Shelby J. Haberman,
Analysis of Qualitative Data,
Analysis of Variance,
John Wiley and Sons,
C. Radhakrishna Rao,
Linear Statistical Inference and
John Wiley and Sons,
N. R. Draper and
Applied Regression Analysis,
John Wiley and Sons,
Jerome H. Friedman,
Richard A. Olshen,
Charles J. Stone,
Classification and Regression Trees,
Wadsworth & Brooks/Cole,
Pacific Grove, California,
There is a lesson about curve fitting: There was the ancient Greek Ptolemy who took data on the motions of the planets and fitted circles and circles inside circles, etc. and supposedly, except for some use of Kelly's Variable Constant and Finkel's Fudge Factor, got good fits. The problem, his circles had next to nothing to do with planetary motion; instead, that's based on ellipses and that was from more observations, Kepler, and Newton. Lesson: Empirical curve fitting is not the only approach.
Actually the more mathematical
statistics texts, e.g, the ones with theorems and proofs, say, "We KNOW that our system is linear and has just these variables and we KNOW about the statistical properties of our data, e.g., Gaussian errors, independent and identically distributed, and ALL we want to do is just get some good estimates of the coefficients with confidence intervals and t-tests and confidence intervals on predicted values. Then, can go through all that statistics and see how to do that. But notice the assumptions at the beginning: We KNOW the system is linear, etc. and are ONLY trying to estimate the coefficients that we KNOW exist. That's long been a bit distant from practice and is apparently still farther from current ML practice.
Okay, ML for image processing. Okay. I am unsure about how much image processing there is to do where there is enough good data for the ML techniques to do well.
Generally there is much, much more to what can be done with applied math, applied probability, and statistics than curve fitting. My view is that the real opportunities are in this much larger area and not in the recent comparatively small area of ML.
E.g., my startup has some original work in applied probability. Some of that work does some things some people in statistics said could not be done. No, it's doable: But it's not in the books. What is in the books is asking too much from my data. So, the books are trying for too much, and with my data that's impossible. But I'm asking for less than is in the books, and that is possible and from my data. I can't go into details in public, but my lesson is this:
There a lot in applied math and applications that is really powerful and not currently popular, canned, etc.
Shrinkage methods like lasso/elasticnet are less susceptible to these problems.
Are you able to go into more detail about your startup (problems it is solving)?
My view is that currently there is a lot
of content on the Internet and the total
is growing quickly. So, there is a need
-- people finding what content they will
like for each of their interests.
My view is that current means
for this need do well on (rough ballpark
guesstimate) about 1/3rd of the content,
searches people want to do, and results
they want to find. My work is for the
"safe for work" parts of the other 2/3rds.
The user interface is really simple; the
user experience should be fun, engaging,
and rewarding. The user interface, data
used, etc. are all very different from
anything else I know about.
The crucial, enabling core of the work,
the "how to do that", the "secret sauce",
is some applied math I derived. It's fair
to say that I used some advanced pure math
To the users, my solution is just a Web
site. I wrote the code in Microsoft's
Visual Basic .NET 4.0 using ASP.NET for
the Web pages and ADO.NET for the use of
The monetization is just from ads, at
first with relatively good user
demographics and later with my own ad
The Web pages are elementary HTML and CSS.
ASP.NET wrote a little for me, maybe for
some cursor positioning or some such.
The Web pages should look fine on anything
from a smart phone to a high end work
station. The pages should be usable in a
window as narrow as 300 pixels. For
smaller screens, the pages have both
horizontal and vertical scroll bars. The
layout is simple, just from HTML tables
and with no DIV elements. The fonts are
comparatively large. The contrast is
high. There are no icons, pull-downs,
pop-ups, roll-overs, overlays, etc. Only
simple HTML links and controls are used.
Users don't log in. There is no use of
cookies. Users are essentially anonymous
and have some of the best privacy. For
browser is optional; the site works fine
maybe sometimes users will have to use
their pointing device to position the
There is some code for off-line "batch"
processing of some of the data. The code
for the on-line work is about 24,000
programming language statements in about
100,000 lines of typing. I typed in all
the code with just my favorite text editor
There is a little C code, and otherwise
all the code is in Microsoft's Visual
Basic .NET. This is not the old Visual
Basic 6 or some such (which I never used)
and, instead, is the newer Visual Basic
part of .NET. This newer version appears
to be just a particular flavor of
syntactic sugar and otherwise as good a
way as any to use the .NET classes and the
common language runtime (CLR), that is,
essentially equivalent to C#.
The code appears to run as intended. The
code should have more testing, but so far
I know of no bugs. I intend alpha testing
soon and then a lot of beta testing
announced on Hacker News, AVC.COM, and
For the server farm architecture, there
is a Web server, a Web session state
server, SQL Server, and two servers for
the core applied math and search.
I wrote the session state server using
just TCP/IP socket communications sending
and receiving byte arrays containing
serialized object instances. The core
work of the Web session state server is
from two instances of a standard Microsoft
.NET collection class, hopefully based on
AVL or red-black balanced binary trees or
something equally good.
The Web servers do not have user
affinity: That is, when a user does an
HTTP POST back to the server farm, any of
many parallel Web servers can receive and
process the POST. So, the Web servers are
easily scalable. IIRC, Cisco has a box
that will do load leveling of such
parallel Web servers. Of course, with the
Windows software stack, the Web servers
use Microsoft's Internet Information
Server (IIS). Then IIS starts and runs my
Visual Basic .NET code.
Of course the reason for this lack of user
affinity and easy scalability is the
session state server I wrote. For easy
scalability, it would be easy to run
hundreds of such servers in parallel.
I have a few code changes in mind. One of
them is to replace the Windows facilities
for system logs with my own log server.
For that, I'll just start with my code
for the session state server and
essentially just replace the use of the
collection class instances with a simple
file write statement.
I wrote no prototype code. I wrote no
code intended as only for a "minimum
viable product". So far I see no need to
refactor the code.
The code is awash in internal comments.
For more comments, some long and deep,
external to the code, often there are tree
names in the code to the external
comments, and then one keystroke with my
favorite editor displays the external
comments. I have about 6000 files of
Windows documentation, mostly from MSDN,
and most of the tree names in the comments
are to the HTML files of that
I have a little macro that inserts
time-date stamp comments in the code,
Modified at 23:19:07 on Thursday, December 14th, 2017.
and I have some simple editor macros that
let those comment lines serve as keys in
cross references. That helps.
The code I have is intended for production
up to maybe 20 users a second.
For another factor of 10 or 20, there will
have to be some tweaks in some parts of
the code for more scaling, but some of
that scaling functionality is in the code
For some of the data, a solid state drive
(SSD), written maybe once a week and
otherwise essentially read-only, many
thousands of times a day, would do wonders
for users served per second. Several of
the recent 14 TB SSDs could be the core
hardware for a significant business.
Current work is sad -- system management
mud wrestling with apparently an unstable
motherboard. At some effort, I finally
got what appears to be a good backup of
all the files to an external hard disk
with a USB interface. So, that work is
Now I'm about to plug together another
computer for the rest of the development,
gathering data, etc.
I'm thinking of a last generation
approach, AMD FX series processor, DDR3
ECC main memory, SATA hard disks, USB
ports for DVD, etc., Windows 7
Professional 64 bit, Windows Server, IIS,
and SQL Server.
While you are right that some feature engineering is needed, there's no reason DL can't be a part of your workflow.
For more of the basics, my book on deep learning might help as well (minimal math vs the standard text book):
I (and, I believe, the earlier poster, too) never implied you can't use deep learning on such examples. What we (I think, both) were referring to was the claim that it would absolve you from feature engineering. (Which I understand you also refute.)
> For more of the basics, my book on deep learning might help as well
Congratulations on your book, I know how much hard work that is!
Disclaimer: I make money with deep learning, too... ;-)
I guess what I wanted to do was add a bit of nuance. It can help reduce the amount of feature engineering needed. Of course you still need a baseline representation though. More feature engineering also doesn't hurt. I always think of deep learning in the time series context as a neat SVM kernel with some compression built in. With the right tuning it can give you a better representation which you can use with clustering and whatever else you'd like.
Do you have an opinion on the fast.ai and deeplearning.ai courses?
I finally have some time to work through these and since the deeplearning.ai series starts on December 18th, I'm wondering which one to dive into since I can't tell from the outside how they compare.
The practical value of ML/AI is what’s in between and is something that isn’t often discussed between all the hype. ML/AI can be used to build models which work well with nontabular data (e.g. text and images), and can solve such regression/classification problems more cleanly. (and with tools like Keras, they’re as easy to train and deploy as a normal model)
For text great results have been achieved using automatons, but they only work for structured strings and break if you add only a little bit of noise.
I feel like ML should be considered whenever you feel like programming something requires you to deal with many different cases, you have a lot of example data available, and having some false positives / true negatives is not a big problem.
I'm as exhausted of the ML hype as anyone else, but I believe this deck tempers expectations.
I highly doubt that BD is doing any ML work right now ... Can the author link to specific research that they are doing using ML?
EDIT: Oh, and expert systems/rules. Lots of em.
EDIT2: Well, an engineering, obviously... :-) Heck, just check Wikipedia on the topic...
is != always will be.
But forget that, let's check Wikipedia, as you suggest.
What are the first few words of the article on Reinforcement Learning, hmm. The very first few words at the very beginning of the article:
"Reinforcement learning (RL) is an area of machine learning..."
Read that and tell me what the last two words are. "Machine learning."
Not that I am judging or anything but, the author's personal website http://www.jasonmayes.com/ whose link is displayed multiple times is a giant ad to get hired elsewhere and show at least some desire for other career opportunities. Not sure if that reflects greatly on the company.
I mean yeah, we computer folk are supposed to be all self deprecating and all. But if there is one place we should stop mumbling and talking ourselves down for a second, that is it.
At some point if you want people to know what you do, you're going to have to tell them.
Of course you should be talking about yourself on your resume but a couple of this that are different here:
- Wtf is up with music
- 51%/49% thing.
- Publicly asking to be hired that reflects poorly on his current job at Google.
- Excessively loud self marketing
why not have a simple site with your accomplishments? Why all the excess bullshit?
And many of the things he did do sound like he could be a good contributor - I guess you can't know about his personality without an interview.
It's entirely reasonable to talk about yourself and your achievements on your resume, and Mr Mayes' site is rather a good example of doing so.
If I see this kind of resume land or my desk, it gets thrown out.
Question to those versed in ML: I want to work on an AI that plays a video game (aspirations of playing something like Rocket League, but I know I need to start smaller with something like an old NES game). I understand these are usually done with Recurrent Neural Networks, but I'm a little lost as to how to get data in to the RNN -- will I need to make another AI or CNN to read the screen and interpret (including the score?) My 30k ft view is that if I can define a 'score', give it a 'reset' button, and define 'inputs (decision targets)', then I just need to give it the screen and let it do its thing. But getting the 'score' is the part I can't figure out short of adding another layer to the classifier.
It's nice deck, but I'd hoped the blue slides went more technical without dropping out to various videos. If wanted videos, I'd go to youtube directly. Not everyone wants to learn through watching people talk. I learn best when I read, it's unfortunate that youngsters these days think that the written word is now a poor cousin to flashy video.
In the same way that new clothes are no longer for me, and new music is no longer for me, and all good TV shows and films are full of people half my age, I also now feel that I'm being aged off the internet.
I was here first, you young whippersnappers! It's MY lawn.
Google Slides is really slow. That's why this needs two hours.
Most of the real content is in linked videos.
If you really want to understand you would be much better off starting here:
Though, the options to export as a PDF didn't work for me (either via download or as an export to Google Drive). I'm assuming the presentation is too big.
stopping death means stopping life, every generation is more forgetful of the past than the last. the light ages will be far more destructive, what could possibly motivate you to stop a perpetual pleasure machine? how do you prevent the inevitable conflict between those who insist pain and suffering is an essential part of the human experiment and those who just force them to feel good and change their mind? what will happen to these toddlers in 10-15 years? they will have grown up interfacing with some electronic device for every single day of their young lives, a different type of consciousness shaped by destruction of self-confidence in their own knowledge and memories and a complete trust in the needle-finders of big hay.
this shift will be as important as pre-writing to post-writing, except the transformation won't take centuries and millennia to propagate itself across the planet. a post-memory world, with every human enslaved by their base sensations. the first US president who is an internet addict.
"[Writing] will create forgetfulness in the learners’ souls, because they will not use their memories; they will trust to the external written characters and not remember of themselves. The specific which you have discovered is an aid not to memory, but to reminiscence, and you give your disciples not truth, but only the semblance of truth; they will be hearers of many things and will have learned nothing; they will appear to be omniscient and will generally know nothing; they will be tiresome company, having the show of wisdom without the reality."
the future is a fate worse than death.
Unfortunately it doesn't work I guess.
Oh, and he says he is watching you... Maybe he really means this? Maybe that's why he disabled downloads?
These aren't synonymous, a perceptron is a type of artificial neuron.
Also confusingly, 'multi layer perceptrons' might not contain perceptrons at all.
Does everything to distract us.
Shame the backgrounds gave me a headache anyway
“That would be (very) bad.”