Talking to other groups that had gone through the exact same process our results were pretty typical. These guys were all very intelligent and the code and systems they had implemented were pretty impressive. I'm guessing the system they built would have cost a few million dollars if built from scratch. We did use this "AI/ML" in our marketing so maybe it was payed for by increased sales through use of buzz words. But my experience was that in most limited use cases the technology was ineffective.
They said, "That's about what we concluded except that we didn't get around to actually doing that pilot project yet."
I got the job. :-)
Blockchain is mostly a marketing tool, not something you would want to use in production for anything.
Think about that. I wanted to spend money. I wasn't too fussy what it was. Amazon has a decade of my purchasing and browsing history.
And they still failed.
That ONE TIME I buy a unicorn dress for my 2 year old daughter? That's a lifetime of unicorn related merchandise adverts and recommendations for you!
The ultimate goal of the advertising is return on investment, not making you feel interested in the adverts. If, to exaggerate the possibility, 100% of "people who look at tech" are 0% influenced by adverts, but 10% of "people who bought a unicorn thing" will go on to buy another if they're constantly reminded that whoever they bought it for likes unicorns, all of a sudden it would make sense despite being counterintuitive to viewers.
A more commonly discussed example of a similar thing is that it's easy to think "I just bought a (dishwasher, keyboard, etc), I obviously already have one so why am I seeing adverts for them?" Sure, it might be that the company responsible has an incomplete profile and doesn't know you bought one already. But it's also possible that the % of people who just bought the item and then decide they don't like it, return it and buy a different type is high enough to be worth advertising to them.
So they gave up trying to match CPU with GPU and went back to connecting beer to diapers.
For all Facebook knows about me they've always been exceptionally bad at advertising to me, which is remarkable considering what they've got. Google is only very marginally better. Actually, now that I think about it, Amazon's 'customer's also bought' is also pretty bad at the recommendation itself since it not uncommonly recommends incompatible things! ...but it does often succeed at getting me to think more about what else I might need and sometimes leads me to buying other things. At least it's not always recommending the same thing, but rather related things, which is probably a much better way to advertise.
Because it's not Facebook really, it's the advertisers who choose targeting criteria. You as an advertiser have a myriad of options. For example if you've built a competitor to X, you can target users who've visited X recently, aged N-M, residing in countries A, B and C, and so on. There are options with broader interests too. Poorly targeted ad means poorly selected criteria by the advertiser (or sometimes just advertiser experimenting) and consequently money wasted. Facebook doesn't care though.
Then there is retargeting/remarketing (target bounced traffic) already mentioned here is probably the stupidest looking invention that actually works.
I mean, if I could stop seeing washing machines or whatever, I'd probably click it.
You can also block ads (Ublock/Umatrix)
* They don't have suppression set up
* They're using a conversion tracking platform that is slow
* They're testing the returns conversion hypothesis: you have expressed concrete intent, you have bought the product. If it has 5% return rate, you probably still want it, and there's a 5% chance they need to be in the mix.
They show me similar things to the thing I just put in my cart. Sometimes better choices that I made.
They also show related things that other people have bought, that many times I end up purchasing.
You can accomplish that with relational algebra on a precomputed data warehouse job and only for products with strong correlation. The intelligence of the customers is enough agency to instil a semblance of intelligence in the data.
I'm doubtful we'll see an ai that makes a serious jump without directly interacting with the world we live in. Under that measure cars might be the closest since you learn to interact with the bounds of where it can and can't go is similar to a toddler learning to crawl.
The team I was on before that one was a bunch of scrappy engineers from Poland, India and the USA with no graduate degrees, but 20 years coding and distributed systems experience each. The difference in problem solving ability, the speed at which they moved, broke down problems, tried out different methods, was staggering.
I think ML is suffering from a prestige problem, and many companies are suffering for it. The wrong people are being hired and promoted, with business leaders calling the shots on who runs machine learning projects without fully understanding who can actually deliver.
If you did a good job at the last company you'll probably do a good job here.
If you did a good job yesterday, you'll probably do a good job today.
For the most part they are usually correct.
Even the coding interviews are just a signal against overall performance given a short amount of time. It's only a sample of data, but if you did your interview right you should be able to protect some against bad people getting very lucky. Just like driving a car; bad drivers tend to stay pretty bad, and good drivers tend to stay safe. Even though there's a lot of ways to define what is a good driver, there's clearer ways to define what is a bad driver, and if someone was a bad driver yesterday they are still probably a bad driver.
He would go around to people and say "I heard Joe sucks". If the people strongly defended Joe, he was probably pretty good. If nobody stuck up for him, Joe might indeed suck.
I'm only half-joking here. To me, YT algorithm seems to be a mix of "show random videos you've already watched" + "show random videos from channels and users you watched" + "show the most popular videos in last few hours/days/weeks". It's pretty much worthless, but what are we expecting? Like all things ad economy, the primary metric here isn't whether you like the recommended content, or whether that content challenges you to grow - it's maximizing the amount of videos you watch, because videos are the vehicles for delivering ads to you.
Years ago, there is an app stumble upon. I always find it engaging. Hard to get bored.
Hundreds of times I have told youtube im not interested in a recommended video that i have already watched, seems to be completely ignored.
I often watch the same video that I've already watched, many times music, many times comedy, many times something I want to link to another person but end up watching some/all again as I find it, often if I remember it being interesting (e.g. a VSauce video or a Dan Gilbert TED talk), sometimes if it was a guide or howto that I want to follow - e.g. a cooking instruction.
Gotta say though, I don't miss my slow blower even just a wee little bit.
Some years ago I heard an anecdote from a developer who had worked on a video game about American football. The gist of it was that they had tested various sophisticated systems for an AI opponent to choose a possible offensive/defensive play, but the one that the players often considered the most "intelligent" was the one that simply made random decisions.
In certain domains, I think, it's quite difficult to beat the perceived performance of an AI system that merely makes committed random decisions (i.e. carried out over time) within a set of reasonable choices. If we don’t understand what an agent is doing, we often assume that there is some devious and subtle purpose behind its actions.
If AI responded/acted based on a predefined set of patterns that could be recognizable, the player would automatically feel it (pattern matching) and makes the NPC far less interesting.
Beyond reinforcing our tendency to project, as you say, a personal history on random behavior, it also highlights what a few other people have commented: that in many non-cooperative situations a committed random strategy is extremely effective, and perhaps more effective than a biased, seemingly "rational" strategy. (For another example, I believe Henrich's "The Secret Of Our Success" discusses the possible adaptive benefits of divination as a generator for random strategies among early societies.)
A lot of the best ml right now is effectively about making better conditional probability distributions. You always get random output, but skewed according to the circumstances, and sharp according to confidence in the result.
And they say English doesn't have reduplication!
Improving performance of these ranking models was notoriously difficult. 50% of the experiments we'd run would show no statistically significant change, or would even decrease performance. Another 40% or so would improve one funnel KPI, but decrease another, leading to no net improvement in $$. Only 10% or so of experiments would actually show a marginal improvement to cohort LTV.
I'm not sure how much of this is actually "there's very little marginal value to be gained here" versus lack of rigor and a cohesive approach to modeling. The data scientists were very good at what they do, but ownership of models frequently changed hands, and documentation and reporting about what experiments had previously been tried was almost non-existent.
All that to say, productizing ML/AI is very time- and resource-intensive, and it's not always clear why something did/didn't work. It also requires a lot of supporting infrastructure and a data platform that most startups would balk at the cost of.
This encourages a simple first version and incremental complexity, rather than starting very complex 6 months in, and never having an easy baseline to compare to. A simple baseline can spawn off several creative methods of improvement to research.
The other case is that the models should be run against simple cases that are easy to understand and easy to confirm. This way there's always a human QA component available to make sure results are sensible.
That is great for building incredible open source software and a lot of other things that I would not be able to do given a 1000 years. However (again IMHBO) a specific ML and any other specific application of stastistics or mathematics becomes really tricky once your use case is explicitly defined.
You then need intimate and deep knowledge of the tools that you are using (e.g.: Should I even use NN? Should I even use genetic algorithms? Should I even use x?) but ML for most people is shorthand for NN and its variants or maybe shorthand for something else specific rather than in principle.
A well aimed shot at PCA  can often solve your problem. Or at least, tell you what the problem looks like. This is just an example, but IMHBO people waste their time learning ML and not learning mathematics and statistics.
IMHBO I still think that self-driving cars can be solved by defining a list of 1000 or so rules, by hand, by humans, and by consensus. The computer vision part is the ML part.
Yes they did - in fact they had input on defining them and helped in tracking them.
> Did they ever have access to adequate information to truly solve the problem your team needed solved?
They believed so. Their team was also responsible for our company data warehousing so they knew even better than me what data was available. Basically any piece of data that could be available they had access to.
> And was the same result observed for other uses of their recommendation systems?
I did not have first-hand access to the results of their use in other recommendation contexts. As I mentioned in my original post I only had second-hand accounts from other teams that went the same route. They reported similar results to me.
It seems like everyone who joins my company to shake things up follows the same path of wanting personalized content to acquire new customers.
But in reality we just don't have enough data points on people before they become customers to segment people that way. Even if we could, being able to accurately
Every time I see people go through the motions of attempting to implement this until they eventually give up.
This idea looks like an obvious win and big companies have done them before with success, but is extremely hard to impossible to pull off for our small company.
What they didn't seem to get however was that a randomised baseline model would beat a randomised baseline model on a naive comparison 50% of the time, so their understanding of randomness/statistical significance/performance metrics was way off. So while they believed they were also testing their models before presenting to management, none of them were implementing their comparison/measurements properly, and huge parts of their work were just p- hacking and pulling random high performing results out of the tails of the many models they built and compared.
So while it's good your team makes comparison to baselines (it's alarming how many don't even do that), my experience also suggests a huge number who think they're comparing to reasonable baselines and using metrics to measure their performance aren't actually doing so properly.
1/ the team implemeted a naive baseline
2/ they implemeted a more sophisticated model that depended on some parameter p
3/ for 100 different values of p, they examined its performance, and picked the model with the best performance
Now they're not quite subject to the multiple comparisons problem there, since the models with different values of p aren't independent from one another. But they're not not suffering from it either. It mostly depends on the model. But it's a very easy mistake to make. I'd say many many academic papers make the same mistake.
Long answer: I have saying in statistics: "nature abhors two numbers: 0 and 100". In the real world, there is no 100%, you have a number of models and a (finite) number of trials/comparisons to whatever metric and then you have to then make a decision.
My point was that their "non randomised" models may in fact have the equivalent performance of a random model, and that if this was in fact the case, you would expect them to beat a randomised comparison roughly half the time. If you have repeated trials of multiple models, the odds of one consistently beating others (even if it's properties were essentially equivalent to a random model) in a small finite number of trials is much higher than most people realise. Essentially, they're flipping a large number of coins to determine their performance, and choosing the coins that consistently come up heads.
Another observation I'd make is that in the real world, random or averages are almost the most facetious thing to be comparing performance against. We aren't generally in a state ignorance or randomness, but you see this kind of metric all the time, even from "respected" sources. 2 if/then/else statements will generally outperform randomness universally in a huge number of fields/subject matter areas.
What's not interesting is that one can build a robot that beats/meets the average human at tennis (the average human probably is probably incapable of serving out a single game), but that one can build one that performs better than a relatively cheap implementation of our current state of knowledge of the game.
Moving from 2 if/then/else statements to an n parameter complicated model that requires training data and that no one understands and requires huge amounts of power and time to train is not only not progression, it's actually a regression on the current state of affairs. In almost all fields, random or average is the last thing you want to compare against.
"Up Next" problem can easily fall into any of the three buckets.
IMO YT AI is the opposite of intelligent , it still recommends things I disliked. for some reason this basic rule of not showing something that I explicitly disliked was to hard for it to learn, I am wondering if it is truly an AI behind it or just statistics
The bad algorithm will force the unhappy user to use manually created playlists leaving less people to engage with the algorithm and probably have the algorithm getting worse in time as more users will avoid it
And even more interesting trying to google "how to make youtube not show X" it is a complete fail, it will just show you youtube video results.
At the risk of projecting, this has the hallmark of bad experimental design. The best experiments are designed to determine which of many theories better account for what we observe.
(When I write "you" or "your" below, I don't mean YOU specifically, but anyone designing the kind of experiment you describe.)
One model of gravity says the postition/time curve of a ball dropped from a height should look like X. Another model of gravty says it should look like Y.
You drop many balls, plot their position/time, and see which of the two models' curves match what you observe. The goal isn't to get the curve; the goal is to decide which model is a better picture of our universe. If the plotted curve looks kinda-sorta like X but NOTHING like Y, you've at least learned that Y is not a good model.
What models/theories of customer behavior were your experiments designed to distinguish between? My guess is "none" because someone thinking about the problem scientifically would start with a single experiment whose results are maximally dispositive and go from there. They wouldn't spend a bunch of time up-front designing 12 distinct experiments.
So it wasn't really an experiment in the scientific sense, but rather a kind of random optimization exercise: do 12 somewhat-less-than-random things and see which, if any, improve the metrics we care about.
Random observations aren't bad, but you'd do them when you're trying to build a model, not when you're trying to determine to what extent a model corresponds with reality.
For example, are there any dimensions along which the 12 variants ARE distinguishable from one another? That might point the way to learning something interesting and actionable about your customers.
Did the team treat the random algorithm as the control? Well, if you believe some of your customers are engaged by novelty then maybe random is maximally novel (or at least equivalently novel), and so it's not really a control.
What about negative experiments, i.e., recommendations your current model would predict have a NEGATIVE impact on your KPIs? If those experiments DON't produce a negative impact then you've learned that some combination of the following is the case:
1. The current customer model is inaccurate
2. The model is accurate but the KPIs don't measure what you believe they do (test validity)
3. The KPIs measure what you believe they do but the instrumentation is broken
What if you always recommend a video that consists of nothing but 90 minutes of static?
What if you always recommend the video a user just watched?
What if you recommend the Nth prior video a user watched, creating a recommendation cycle?
Imagine if THOSE experiments didn't impact the KPIs, either. In that universe, you'd expect the outcome you observed with your 12 ML experiments.
In fact, after observering 12 distinct ML models give indistingiushable results, I'd be seriously wondering if my analytics infrastructure was broken and/or whether KPIs measured what we thought they did.
> What models/theories of customer behavior were your experiments designed to distinguish between? My guess is "none" because someone thinking about the problem scientifically would start with a single experiment whose results are maximally dispositive and go from there.
This is how science is (at least, ought to be) done. This way, the goal is to always be improving your understanding of objective reality.
> They wouldn't spend a bunch of time up-front designing 12 distinct experiments. [...] So it wasn't really an experiment in the scientific sense, but rather a kind of random optimization exercise: do 12 somewhat-less-than-random things and see which, if any, improve the metrics we care about.
The problem is that a lot of AI salesmen tend to hype the "model-free" nature of "predictive" AI towards optimizing outcomes/goals, and people who don't know better get carried away with the bandwagon. Overly business-oriented people are susceptible to the ostrich mentality of not wanting to understand problems with bad tools -- they are too focused on the possibility of optimizing money-making. I find the movie "The big short" to be a fantastic illustration of this psychology.
It's probably going to lead to a very bad hangover, but for the moment the party's still going on and nobody likes the punch bowl being yanked away.
More on such tradeoffs in a recent case study from DeepMind on Google Play Store app recommendations. Even they acknowledge the same techniques that surface 30% cost efficiencies in data center cooling, may not be completely applicable to "taste"
The issue is that the Netflix dataset has a baked-in assumption that a recommender system should show media that a user is likely to have ranked highly. It may be more important to show the user media they wouldn't have found (and thus ranked) at all. Or perhaps a user will be more engaged with something controversial rather than generically acceptable. Who knows?
I think I'd rather have a random collection of titles than a recommended list for me.
up next recommendations wont work without advanced image recognition and topics gathering - basically titles/tags for most videos are garbage and clickbait, and most of youtube work by watching buzzed videos: some well known (by a big amout of watchers) "influencers" push video of some topic (thing/brand) then it get traction from other content creators - they produce videos about it and watchers tend to stick to buzzed topics. it's like news about news.
if your team used ml to recommend up next on your own video hosting your result simply means your videos are equally not on topic Or non-interesting for your service auditory; or they are garbage.
I can see “random” performing well in a set of <1000 videos, all on similar subject spaces (eg “memes”, or “python”), but recommending relevant stuff gets much harder as the amount of content grows...
IMO even in interface designing you should be arguing from first principles rather than relying on telemetry and other empirical data.
In longform if anyone is interested https://medium.com/@marksaroufim/can-deep-learning-solve-my-...
EDIT: I would consider autoencoders, word2vec, Reinforcement Learning examples of turning a different problem into a supervised learning problem
EDIT 2: Social functions like happiness, emotion and fairness are difficult to state - you can't have a supervised learning problem without a loss function
Your examples (deep learning applied to perception) are what he argues AI is generally good for.
Humans are smarter than computers. How can a human teach a computer how to do something when the human itself can't teach another human that something?
We haven't solved that problem. The snake is eating its tail.
You can't teach a human how to do something when the methodology to do that is the student trying something and the teacher saying "Yes" or "No".
Well.... why? Why is it yes or why is it no? What is the difference between what the human or the computer, or in general, the student, did and what is good or correct? And then you still have to define "good" and many times that means waiting, in the case of the PDF linked to above, perhaps many years to determine if the employee the AI picked, turned out to be a good employee or not.
And how do you determine that? How do you know if an employee is good or not? We haven't even figured that out yet.
How can we create an AI to pick good employees if human beings don't know how to do that?
Supervised learning isn't going to solve any problem, if that problem isn't solved or perhaps even solvable at all.
In other words, over the years, my heuristic has turned into, "Has a human being solved this problem?" If not, then AI software that claims to is BS.
The closest analogy for humans would be to define a metric and ask a human to figure out how to maximize that metric. That's something we're often pretty good at doing, often in ways that the person defining the metric didn't actually want us to use.
I disagree, I think it's exactly the same. As an example, a human teaching a human how to use an orbital sander to smooth out the rough grain of a piece of wood.
The teacher sees the student bearing down really hard with the sander and hears the RPM's of the sander declining as measured by the frequency of the sound.
The teacher would help the student improve by saying, "Decrease pressure such that you maximize the RPM's of the sander. Let the velocity of the sander do the work, not the pressure from your hand."
That's a good application of supervised learning. Hiring the right candidate for your company is not.
* excepting some classes of expert systems
The human learning situation you describe works quite differently, though: The student sees either the device alone or the teacher using the device to demonstrate its functionality. This is the moment most of the actual learning happens: The student creates internal concepts of the device and its interactions with the surroundings. As a result the student can immediately use the decive more or less correctly. What's left is just some finetuning of parameters like movement vectors, movement speed, applied pressure etc.
If the student would work like ML, it would: hold the device in random ways, like on the cord, the disc, the actual grip. After a bunch of right/wrong responses she would settle on using the grip mostly. Then (or in parallel) the student would try out random surfaces to use the device on: the own hand (wrong), the face of the teacher (wrong), the wall (wrong), the wood (right), the table (wrong) etc. After a bunch of retries she would settle on using the device on the wood mostly.
It's easy to overlook the actual cognitive accomplishments
of us humans in menial tasks like this one because most of it happens unconsciously. It's not the "I" that is creating the cognitive concepts.
Strangely, I recently had to complete a cognitive test that was essentially that process. I was given a series of pages, each of which had a number of shapes and a multiple choice answer. I was told whether I chose the correct answer, then the page was flipped to the next problem. The heuristic for the correct answer was changed at intervals during the test, without any warning from the tester. I'm told I did OK.
I wonder, how would an AI perform on the same test.
What is the mathematical minimum number of questions on such a test, subsequent to the heuristic change, that could guarantee that new heuristic has been learned?
I'm curious about the test. Did it have a name? What were they testing you for?
This situation is called Multi-armed Bandit. In this setup you have a number of actions at your disposal and need to maximise rewards by selecting the most efficient actions. But the results are stochastic and the player doesn't know which action is best. They need to 'spend' some time trying out various actions but then focus on those that work better. In a variant of this problem, the rewards associated to actions are also changing in time. It's a very well studied problem, a form of simple reinforcement learning.
I would agree that it's a very inefficient way of teaching something. It gave me an unexpected insight into machine learning though.
I'm sure the test was designed so that picking the same answer each time or picking one at random would result in a fail.
AGI, in the singularity sense, will be solving problems before we even identify them as problems. Experts in a field can do this for the layman already and I think it's possible. Some don't. I do.
It'll be super interesting when it flips! When the student becomes the master and we, as a species, start learning from the computer. You can kind of get a sense of this from the Deep Mind founder's presentation on their AI learning how to play the old Atari game Breakout. He says when their engineers watched the computer play the game, it had developed techniques the engineers who wrote the program hadn't even thought of.
Even still, the engineers could teach another human how to play Breakout, so yes, I do believe they did in fact create a software to play Breakout better than they could.
The best chess AIs can beat any human chess player. They use techniques that were never taught to them by a human.
Another example: a machine-learning-driven computer-vision system predicting the sex of a person based on an image of their iris. No human can do this. 
 Learning to predict gender from iris images (PDF) https://www3.nd.edu/~nchawla/papers/BTAS07.pdf
 Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, second edition. MIT press, 2018.
- OpenAI: Dota 2 (PPO), GPT-2...
- NVidia: StyleGAN, BigGAN, ProGAN...
From Google in 2018:
"One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch."
The unsupervised aspect is the engine driving all modern NLP advancements. Your comment suggests that it is incidental, which is far from the case. Yes, it is often ultimately then used for a downstream supervised task, but it wouldn't work at all without unsupervised training.
Indeed, one of the biggest applications of deep NLP in recent times, machine translation, is (somewhat arguably) entirely unsupervised.
I think that in the future, more and more clever unsupervised approaches will be the path forward in huge AI advances. We've essentially run out of labeled data for a large variety of tasks.
You can define the terms how you want - but in terms of how they're understood in both industry and academia, you are incorrect.
I think I see what you're saying, but that might be a different definition of "supervised". It seems impossible for one half of the same algorithm to be supervised and the other to be unsupervised. But I like your definition (if it was renamed to something else) because you're right that the discriminator is the only thing that pays attention to the training data, whereas the generator does not.
"Formulate the problem as X" - what is your input for how a problem is formulated? That you personally like how it was formulated?
"Probably," - OK, so you assign probability scores? Or do you mean, "likelihood based upon my guess?"
Finally, how do you measure performance? Your own assessment of how good you were at it?
In an extremely narrow sense of pattern recognition of some "image features", i.e. 5% of what a radiologist actually does, he's probably right. But context is the other 95%, and AI is nowhere close to being able to approach expert accuracy in that. It's a goal as far away from reality as AGI.
"AI" tools will probably improve the productivity of radiologists, and there are statistical learning tools that already kind of do that (usually not actually widely used in medical practice, you can say yet, I can say who knows but nice prototype). But actual diagnosis, like the part where an MD makes a judgement call and the part which malpractice insurance is for? Not in any of our lifetimes.
A radiologist friend complains that it's been 10+ years since they've been using speech recognition instead of a human transcriptionist, and all the systems out there are still really bad. Recognizing medical lingo is something you can probably achieve with more training data, but the software that sometimes drops "not" from a scan report is a cost-cutting measure, not a productivity tool. It makes the radiologist worse off because he's got to waste his time proofreading the hell out of it, but the hospital saves money.
I will correct this in future versions of the talk and paper.
One thing I'd love is a look at 'noise' in these systems, specifically injecting noise into them. Addons like Noiszy  and trackmenot  claim to help, but I'd imagine that doing so with your GPS location is a bit tougher. I'd love to know more on such tactics, as it seems that opt-ing out of tracking isn't super feasible anymore (despite the effectiveness of the tracking).
Again, great work, please keep it up!
Medical imaging diagnosis was one of them.
Speech recognition/transcription was another. I don't know if it's my accent or my speech patterns(though foreigners regularly compliment my wife and myself on our pronunciation), but the tech hasn't gotten noticeably better for me since the days of Dragon natural speaking, and that was, what...10 years ago?
Sure, I can "hey Google/siri/alexa" a handful of predefined commands, but I still have to talk in a sort of stoccato "I am talking to a computer" voice, it still only gets it right 90% of the time, and God help you if you try anything new/natural not in the form of "writing Boolean logic programs with my voice".
It is one of my most frustrating everyday software experiences.
Not only is it not getting better, its actually getting worse, because before I at least had the correct sentence. Now my correct sentence is mangled as it tries to force corrections/substitutions, and I have to continually go back and manually-correct the auto-correct.
It seems to work for me on short pre-formed sentences and toy examples (if you communicate using pre-formed phrases and use well-worn cliches in your writing, it seems to pick up and predict for them). I wonder whether the "increased accuracy" of modern solutions aren't just functionally having access to a larger library of lookup rules of stored common/popular phrases and direct translations (a huge part of practical 'AI' advancement has been on the scaling-infrastructure/collection of new-scales-of-data rather than the AI techniques themselves IMO) effectively mined from its training data, but the moment I try to write or dictate anything new, original, or lengthy and it turns absolutely pear-shaped.
I interviewed at a startup that seemed fishy. They offer a fully AI powered customer service chat as an off the shelf black box to banks. I highly suspect that they were a pseudo AI setup. LinkedIn shows that they are light on developers but very heavy on “trainers”, probably the people who actually handle the customers, mostly young graduates in unrelated fields, who may believe that their interactions will be the necessary data to build a real AI.
I doubt that AI will ever be built, it's just a glorified Mechanical Turk help-desk. I guess the banks will keep it going as long as they see near human level outputs.
Pt 1: https://thespinoff.co.nz/the-best-of/06-03-2018/the-mystery-...
Pt 2: https://thespinoff.co.nz/the-best-of/09-03-2018/the-mystery-...
In the Mechanical Turk analogy there is no such capability amplification happening.
A couple of weeks ago such a startup based in London contacted me on LinkedIn - the product really hyped AI, but it all seemed very dubious. My guess was it was really a mix of a simple chatbot with a Mechanical Turk-style second line.
I'm afraid you have misspelled "raise a humongous round form SoftBank". It's an easy typo to make, don't feel bad.
I envision the sticker "human inside" strapped on our algorithms.
The term AI is used as if humanity now has figured out general AI or artificial general intelligence (AGI). It's quite obvious organizations and people use the term AI to fool the less tech inclined into thinking it's AGI - a real thinking machine.
I suppose their only real sin was business suits. Everything seems more credible if you say it while wearing a hoodie.
These things better be smart, because they are not low-footprint.
Modern humans have a very heavy carbon footprint, especially in the US. Think of all the things you do and consume and all the carbon involved all thorough the chain. It's a big number. Computers are extremely efficient compared to that.
computer < human + computer
Of course not every computer has the same specs and footprint, but they should be in roughly the same ballpark.
Given the number of people Facebook employees to censor content and the mistakes they make I would label most of Facebooks AI claims as snake oil.
I believe Google Maps has a lot of humans who tidy up the automated mapping algorithms (such as adjusting roads).
Annotation is time consuming and therefore extremely expensive if you have a $100k engineer doing it.
Changes to other types of places can still be done manually by GMaps users themselves, and other users can evaluate that, and I guess if it's a "controversial" (low rep user did the change or people voted against it) a Google employee evaluates it. And if you're beyond certain level as a GMaps user you can get most changes published immediately.
So what if some corporate hack calls linear regression “AI”? The results speak for themselves. The ML genie is too profitable to go back in the bottle.
"We trained a neural network to oversee the machine output"
Independent companies using AI is far less a concern for me. If they are snake oil, people will learn how to overcome them. Government (especially parts related to enforcement) is what I find scary.
I've therefore started stockpiling popcorn since this law was announced for the inevitable clusterfuck that was going to happen when this law would have to apply to a decision taken using machine learning.
(Which is pretty much impossible to explain the way the law requires to, because even those that made the neural network would be quite at loss to understand themselves how exactly the neural network came up with that decision, even less being able to explain it to your average person !)
They optimise a simple set of decision rules which has reasonable accuracy in their application, quite cool really
If someone has killed 12 people, being prejudice about their chance of killing another and using that to determine the length of a sentence seems reasonable.
Even with something like a health inspection. Measuring how they store and cook raw chicken is about predicting the health risks to the public eating it, not about measuring the actual number of outbreaks of salmonella. And even if they were to measure the previous outbreaks of salmonella and use it to prediction the future outbreaks, that is still two different things.