Hacker News new | past | comments | ask | show | jobs | submit login
Adventures in Improving AI Economics (a16z.com)
249 points by oliver101 on Aug 13, 2020 | hide | past | favorite | 75 comments



In my experience, there are just a lot of "bad" AI/ML engineers who don't fundamentally understand what data can do, what ML algorithms can handle, and how to piece it together to produce something of value to the end user. A couple of these people on a team can torpedo a project. Worse are those who sabotage projects or are general pain points of hindering progress. These may be jaded people who don't believe that ML has any value yet have titles like Data Scientist or ML engineer, and can bring team morale down. The economics are similar to a grad-school research project, yet is infiltrated by all sorts of people with 3 month certificates believing they are the star of the show.

The most important element of AI project success is the right people and the right team. Projects are long-term and failure can be often. It's not easy to succeed but cultivating the right people and their mindset is in my opinion a needle mover for AI projects, more-so than what data is available, what algorithms are tried, and what shiny framework people want to use.


There are people all across the system who fail have a poor understanding of how machine learning works, especially in the enterprise AI market. The most common hurdle I have seen is having to explain to salespeople and customers that models are not perfect - they often fail on known unknowns and unknown unknowns. No matter how simple to understand you try to make it using charts and lift curves, every model failure has a potential for customer escalation, costing hundreds of engineering and support hours and depressing margins.


Not trying to nit pick (but here we go), but allowing an escalation to cost hundreds of hours seems like a systemic corporate issue. It's absolutely a customer service hurdle, you can't just tell a customer sending in a ticket to shove it, and throwing something back them that they signed acknowledging it is never helpful. But even involving engineers in these cut-and-dry scenarios (whether the customer agrees or not) seems like a poor decision.


This is equally true if you replace “ML engineer” with simply “engineer”.


Yes, I've come to the same conclusion. When interviewing ML engineers, I prefer to know they are exceptional programmers with passable knowledge of ML than the other way around. If they haven't learned to be good software engineers it's improbable they will in the future, but ML can be learned. In fact a ML team needs a large number of regular software engineers, there's a lot of non-ML code to work on, such as labelling interfaces, data pipelines and CI/CD for models.


Real ML and not just plug and play models takes a serious amount of knowledge


On the other hand, unless ML is your core competency as a business, plug and play models can get you really far.


When you consider the chain from inputs to action you can go far with a small toolbox (e.g. logistic regresssion.)

Plug-and-Play models in visual recognition basically work at this point in time; in NLP they don't.


I wouldn’t have ML engineers doing ML. They should be working on scaffolding, maintenance, production side, etc. The worst ML scientists (producing ML models) I’ve experienced were software developers who transitioned.


You just need one ML scientist for 4-5 software (or ML) engineers. If you wand to optimise time to delivery of products, you have much more to gain by improving the software engineering part, because regular SWE it's 90% of the product.

One of the main differences between ML in academia and industry is related to sourcing the training data. In academia they just use available pre-tagged datasets such as ImageNet, in industry you have to collect, clean up, organise, train, iterate with new data.


Yep, you pretty much hit the nail on the head. A few people in this thread are equating ML scientist's responsibilities to those of an ML engineer, however, they are very different.

Production development of a model is a very hard problem and it's interesting because I see few companies trying to tackle it. One of them is tecton.ai (heard about them on SED) and I'll be interested to see how they evolve their feature set because it still seems incomplete.


Wow! This is the elephant in the room that doesn't get talked about in that article at all. You're also the only person to mention it, despite the fact that it is the only essential step to a commercial machine learning project.


Curious why you think the worst ML scientists were the former engineers (as someone who does a bit of both).


> ML can be learned

Curious: do you have a recommended pathway for that?


Correct me if I’m wrong but isn’t this fundamentally true for any team working on any project?


No, there’s more ambiguity in machine learning projects. When you develop a website, aside from the design, it works or it doesn’t. Whether some kind of ml product can work at all is often team dependent


> When you develop a website, aside from the design, it works or it doesn’t.

This really isn’t true. There are websites and web apps that “work” but are really suboptimal from a performance and UX perspective. It’s possible to do this right, but it’s much easier to do a poor job. You end up with something that kind of works and may even be profitable but which is a boat anchor around your company compared to a better approach.


Yeah not sure what they were getting at. There are also a lot of people who fundamentally don't know good software engineering practices and cause a ton of tech debt


a16z has a podcast where they explored gross margins a month back. The panel called out AI as an example of a software business that has a high likelihood of not having standard SaaS margins (Most of the panel thought this could be a limitation).

The podcast is nice because I think it holistically explores gross margins in a way that you start to understand how it might impact AI as a viable primary business model and valuations related to companies who that is the case for. Quite complementary to the article.

Might be interesting to people who are interested in this article: https://open.spotify.com/episode/79lJCrHB3nBn1qXCxKA5s7?si=R...


This is a great feature of the AI space for startups - in the short term it reduces competition, in the long term it's not really a problem. If your business is break-even currently, it will be profitable in 2-3 years due to declining cost of compute. In 10 years the compute costs will fall by an order of magnitude and more efficient models will become available, making the economics closer to traditional SaaS.

This does disadvantage smaller bootstrapped businesses though.


Not true if there’s enough competition that you need more resources for a bigger model in 2-3 years. Anecdotally, it seems like SOTA model training costs are rising much faster than computer cost is falling.


maybe, but model quality doesn't scale linearly with model size. The performance/dollar metric is more important, and that definitely will decline over time.

Personally, I have a boots trapped business that uses a transformer model. I'm not worried about supermassive models like gpt3 because it would be way too expensive to deploy for my use case, with marginal additional quality.


While I haven't listened to this particular one, I just wanted to say that their podcasts are usually quite interesting if you're interested in new technology/science.


Good analysis and great of them to share their thinking. Does feel like this could have been a tweet that said the necessary condition for successful ML solution is applying it to a problem that has asymmetric upside.

Great for telling people they should get tested for diseases, terrible for diagnosis. In the alerting first case, consequences of being wrong are no better than base rate as they wouldn't have been tested otherwise, and the upside saves a life. In the latter diagnosis case, the consequences of being wrong are catastrophic, and it is substituting for the best available judgment. Similarly, it's great for fraud detection, terrible for making credit decisions, because the false negative rate is essentially externalized. It's good for finding opportunities, bad for providing services. So funnels and conversion pipelines it's great for.

So perhaps there's an ironic Turing test for ML solutions that is related to the relationship between the size of a group of people and the effect of mean reversion of their collective intelligence on their behaviour makes them indifferent to the perceived intelligence of the model, whereas a given individual will find the results of the model unsatisfying. From an indifference perspective, AI can fool some of the people all the time, and all the people some of the time, but no confusion matrix satisfies all the people all the time. Economically, ML will be useful for creating simple and cheap services that people who can't afford better will use, and substitute up from them when they can afford better, known as "inferior goods." There may be a hard limit on ML providing "normal goods," to individuals at scale for this reason. Lots of money to be made, but lots to be wasted tweaking your ROC curve to in the hope of creating a normal good.

I yell from the rooftops every chance I get that "the confusion matrix is the product." That is, your FP/FN/TP/TN rate is your product, and you are optimizing your system for the weights your customer assigns to those variables.

There is another ML/DL use case I'm hacking on that is about enabling privacy, but even this reduces to the asymmetry of the upside/downside of the confusion matrix. Obviously the article is more nuanced than this, but I think this heuristic is a key tool for reading articles like it.


I appreciate the thoughtful commentary. I couldn't disagree more with you more of course.

There are 2 instances where AI breaks the mold you've cast.

Executing rote tasks that no humans need do, and relatedly, while there does seem to be a tough hurdle when it comes to "better than human" execution there is also an inverted survivors bias. Once a technology is production ready it is no longer AI. Cars aren't robots, antilock brakes aren't AI, Once a system outperforms a human it's technology, not intelligence.


Our disagreement might be subtle. An old saw of mine is that the Turing test thought experiment is covered by prior art in economics, where the idea of an indifference curve describes the points between amounts of things where people are indifferent to substituting between them.

I agree these things you state aren't intelligent, but nor are computers, or can they be - people just become indifferent to whether we are dealing with a human or a computer.

My assertion is that we are highly sensitive to substitutes when the downside risk is large, but largely indifferent to them and even like them when they resemble a lottery with good upside at low cost or risk.

Self driving cars are a good example, where someone asked me whether, if I had kids, would I send one to school in traffic in an autonomous vehicle. I told them it would depend on how many kids I had.

But this pretty much describes the dynamic.


Self driving cars are a good example, where someone asked me whether, if I had kids, would I send one to school in traffic in an autonomous vehicle. I told them it would depend on how many kids I had.

Pretty sure that answer is much less convincing than you think.

Infact, I thought you were right until that, then realised that was an answer no parent would ever give which made me realise there's a lot missing in your hypothesis.


> This is the crux of the AI business dilemma. If the economics are a function of the problem – not the technology per se – how can we improve them?

The article focusses on the costs of resources to build a model (annotated data + compute) but the economics are also affected by the ongoing cost of making a prediction error. False positives and false negatives usually have a different cost and each user might have their own preferences:

e.g. "show me all the content that's a bit relevant" vs "show me just the content that's really relevant".

If you can write out the loss function in $$$ terms not just accuracy, then you're closer to either abandoning the problem or finding a profitable AI model.


Great way of putting it. The trouble there is that it takes an exceptional kind of senior ML person to basically wear a product manager hat all the time and press to translate project success criteria into revenue impact or cost reduction terms.

Having these “glue people” that connect ML engineering to product management is probably the most important thing to running an ML organization.


this is more related to the previous a16z article on the topic but I found "Data as a Service" by Auren Hoffman a great read for thinking about businesses that sell access to machine learning models

https://www.safegraph.com/blog/data-as-a-service-bible-every...


"Andreessen Horowitz (known as "a16z") is a venture capital firm in Silicon Valley, California"

In case anyone was as confused as I was about what a16z means - it's just the company not a new abbreviated term related to AI.


Also for anyone too young to remember the dot com boom, the founding partners (Marc Andreessen and Ben Horowitz) are some of the legendary techies from that cycle (of Netscape and LoudCloud/Opsware fame, way ahead of their time)


Yeah, I find this kind of abbreviation annoying. But there's a few words that are commonly abbreviated like this:

i18n -> internationalization

l10n -> localization

g11n -> globalization

l12y -> localizability

a11y -> accessibility

It bothers me because my brain does not jump from the abbreviation to the underlying word. I really need to stop and think about each one. And I get the numbers wrong when writing them.


There was a period of a few months when I was first learning about web apps where I saw "i18n" multiple times. The first time I came across it, I tried to sound it out:

i18n -> I-one-eight-n -> iwonation

I was already a couple of rabbit holes deep at the time and didn't have the mental capacity to look it up and wrap my head around yet another new concept.

"Oh boy." I thought to myself, "One more word I've never heard of, probably representing some complicated CS concept."

I was so annoyed when I finally found out what was going on.


Wait, can you clarify what is actually going on? Is there any rhyme or reason or is are these shortenings just random?

I'm having trouble parsing the grandparent comment...


It's the number of letters between the start and end letters. Yeah, it's annoying.


Also, k8s->kubernetes


What AI companies are they invested in?

Labelbox is the only one I know.

https://a16z.com/portfolio/


Tecton is their most recent high profile ML/AI company: https://a16z.com/2020/04/28/investing-in-tecton/


Hey—this is Brian from Labelbox. Thanks for the mention.

Martin and Matt are incredibly sharp on the trends in AI and have inspired our work at Labelbox immensely. I particularly like the discussion on getting the operations right for building and iterating with ML. Like software development, iterating quickly is key to building a performant model on real world data. The average iteration time for ML is 2-4 weeks in the industry right now. Comparing this to software development averages is stark. Getting the development and operations of ML right can greatly speed up iteration and improve the likelihood of getting to market with performant ML systems.


There are 16 characters between the A of Andreessen and the Z in Horowitz for those that don't get it.


Gr8 article.

I'd add that caveat that software dev processes can be well controlled or not well controlled. AIML is not so much a new kind of project but it is a project likely to be poorly controlled.

Another thing they don't mention is that AIML projects break the agile assumption that you can manage with only punchclock, not calendar time.

Imagine you have a 2 week sprint and it takes 1 week to train a model. You have to get the training started in the first week, and any tasks that need to be done to start training have to start before that.

This of course means applying PERT chart thinking even if you don't make PERT charts. It often isn't that hard but in an agile shop that mistakes the map for the territory they will start the 1 week job consistently on the last day of the sprint.

The 'containerization' process they describe is close to the methods used by East coast defense contractors (in a band between research triangle park and the applied physics dept at John Hopkins in baltimore) to get high accuracy. Also they were what IBM Watson did as opposed to what people thought they did.

It's amazing those methods have remained so obscure, but the mind that is impressed with BERT is going to be impervious to asymtopes. That article should be telling people to run not walk away from those kind of models -- it is how you always be a bridesmaid but never a bride.


Good analogy about discovery of Pharma molecules.

It’s really fun to think about the fact that Tesla has more than enough data to unlock autonomous vehicles, but all that is missing is the correct AI architecture to get it working...

Who will figure out how to code that? Will it be a breakthrough, or can sub-optimal architectures eventually reach equilibrium with 10x or 100x the amount of time/data processing.


> Tesla has more than enough data to unlock autonomous vehicles

Many people in the automotive industry, myself included, disagree with this statement pretty strongly. Driving data quantity is not equivalent to quality and they are severely lacking in advanced sensor data.


I'm likely biased because I spend some of my time doing perception research, but I find the "advanced sensors are necessary" argument so odd. We have clear evidence from humans that you don't need them. I expect we'll be doing this sort of thing [0] in toy dynamic scenes from monocular vision in ~1year, and in real-time on city scenes in ~2. Perception-wise, what more do you need?

Planning and control seem much harder, but that's not a sensing problem.

[0] https://nerf-w.github.io/


So is the claim by Elon Musk that current iterations of Tesla vehicles have all of the sensors and compute power needed to be fully autonomous (Level 4+ I guess?) in the future, via software updates only, a specious one?


It's hard to be absolutist on the response to that: anything is possible, and humans can drive without LIDAR.

But at the moment it seems a strange position to take: we know LIDAR data is useful in many circumstances, and we know it can solve a number of the hard parts of computer vision.


Musk also said he was taking Tesla private at $420 a share, funding secured.

He says a lot of things.


I don’t know, the existence proof is that it takes a 16 year old a few days of driving before they get it well enough...


I'm sorry, but is that true, that Tesla has enough data to unlock autonomous vehicles? My experience is that until you get an ML model to do X, you never know if you have enough data to train it to do X. Or is that just your opinion, that they don't need more data?


Perhaps. It seems it’s still an open question whether AI is just about memorizing your data, or can it actually make reliable decisions during previously unseen scenarios.

Have we already observed, or collected, all that is possible in the “driving” world?


> It seems it’s still an open question whether AI is just about memorizing your data,

No - it's not at all, and this is a well understood problem in building machine learning systems. There are some cases where this occurs but usually this is just overfitting.

Good AI systems generalize well on unseen data.


To the downvoters, I give you AlphaZero.

Not only is every game of Go it plays and wins brand new (so no memorisation), the same system learnt to play Chess without knowing the rules, and plays in a "style .. unlike any traditional chess engine"

https://deepmind.com/blog/article/alphazero-shedding-new-lig...


I've yet to see an "AI" that is not just memorizing data.


Just memorising data is simple, use a file on disk. The hard part is recognising data when it is slightly different than what the model 'memorised', deciding which of the millions of things it learned best fits the answer we desire.


Then you haven't really looked.

Most credible machine learning systems work well on unseen data, which by definition isn't memorizing.


> Most credible machine learning systems work well on unseen data, which by definition isn't memorizing.

Sorry, but no. ML models don't generalise well outside the training data, but they can interpolate inside. This question becomes very interesting in the case of GPT-3 which has had a huge corpus of text to train on, so it's probably seen 'everything'. It's still memorising for GPT-3 but also learning to manipulate data, like software algorithms.


ML models don't generalise well outside the training data, but they can interpolate inside.

I'm unsure if you just misstated this or don't know, but this is wrong.

ML models don't generalise well on data outside the distribution of their training data. But that's an entirely different thing, and doesn't mean at all they are memorising data.

Imagine something training on the US unemployment rate until 2020 being hit with the COVID rate. It wouldn't know what to do, but that doesn't mean it wouldn't work fine on a rate of 5.342% even if it had never seen that rate before.

This is a simplified example, but applies to everything.

GPT-3 generation of text does pull from memorised training data. There's a lot of stuff going on there, and amongst other things there has never really been a system that does textual generation well. It's also hugely overparameterised, so lots of potential for overfitting. I don't think it's a good example of a "good" AI system - it's very interesting, full of potential, but there are lots of issues.


Can you link an example you find to generalize particularly well?


Sure. Apple's Face Unlock.

It generalises to almost every face, and the ones it doesn't its failure mode is safe.

Or something like word embeddings. Works incredibly well, and most "failure" modes are around things like bias, where the behavior reflects the real world.

Or something like AlphaZero. Not only is every new game of Go it plays brand new, it learnt to play Chess without knowing the rules. That just isn't memorization.

https://deepmind.com/blog/article/alphazero-shedding-new-lig...


Word embeddings suck. Take a look at the graphs under " 2. Linear substructures" in

https://nlp.stanford.edu/projects/glove/

and note that they all involve looking at a small number of points. It is easy to reproduce plots like that but if you try to increase the number of points the result breaks down completely.

It is a curve fitting problem: for a small enough set of points compared to the number of dimensions, you can find a matrix that projects a set of random points to an exactly specified set of points in the plane. If you relax the problem to something like "put colors on the left side, put smells on the right side" you will get better than random performance from that kind of model, but not that much better than random.

Word embeddings are a strategy that approaches an asymptote. Systems that are destined to low performance will perform better if you use a word embedding, but they throw away information up front that makes high performance impossible.


This is true, but I don't think you are using word embedding like most people use them.

The linear relationship between things like king/queen etc is a cute demo but not really useful or used in practice.

The real usefulness of word embeddings is that similar concepts are close to each other so they make a great representation for other models (vs something like TF-IDF). These days they have been mostly surpassed in terms of state of the art by full language models, but the point is that simple techniques like average embedding of words in sentences generalised really well to unseen data.

And if you add in subword embeddings they generalise to unseen words, too.

We could talk about how context lets language models do this even better, but I'm still back trying to persuade the OP that this isn't just memorisation and good ML models work well on unseen data!


It's not so straightforward to go from a word representation to a query, sentence, or document representation.

If you come from the tfidf direction you can first tune up BM25 or something based on the ks-divergence, then you can use a random matrix, LDA, or the deep-network autoencoder that I worked on that crushed conventional tfidf vectors to 50-d vectors.

(Like many things people want to apply word vectors to, you go from 50% accuracy here to 70%, but we know it because we tested it on TREC gov2)

Today I'm interested in systems that have an input-to-action orientation and there you have to be able to put together a story like: "these 10 messages are parsed correctly and not by accident" and that requires that certain 'king/queen' inferences be done correctly or alternately the system has paths to recover from missing an inference.

Often there is no path to go from "popular models in the new A.I." to "something that can serve customers off the leash" and that's the problem.

Now I do like subword embeddings, but that just points out the problem that there is no such thing as a "word".

Let me justify that.

You can split up English into words like "some text".split() but it is not easy to do it from audio. Speech is punctuated by silences, often in the middle of words whenever you make a "[st]op" sound enough that separating words is equivalent to the whole speech understanding problem.

We can turn words into subwords and mash them together with subwords to make words. (e.g. "Fourthmeal", "Juneteenth", "Nihilego")

Also there are many cases you can replace a phrase with a word or a word with a phrase. Putting 'word' at the center of a model means the system is going to be in trouble w/ linguistic phenomena that happen 30% of the time.


To expand on input/action thing I guess you have to deal with the issue of representation of opposites being similar in many representations.

That leads to parsing, which is... Ok a lot of the time and completely wrong sometimes, and it's difficult to know which is which.

I think that's one of the big problems in NLP still.


https://arxiv.org/pdf/1805.12177.pdf

I only read the abstract.


Not sure what you think this shows, but there's a lot of reasons why the results they show don't really matter much - or at least might actually reflect an accuracy approximatly what a human would also achieve.

Their headline claim is a "1 pixel change reduces accuracy by 30%". The test process for that number is this:

We choose a random square within the original image and resize the square to be 224x224. The size and location of the square are chosen randomly according to the distribution described in (Szegedy et al., 2015). We then shift that square by one pixel diagonally to create a second image that differs from the first one by translation by a single pixel.

So... they are taking a random square, downsampling to 224, moving and then predicting on that subset of the original image, and measuring the performance against the accuracy of the original prediction.

What this seems to show is that "CNNs aren't as accurate at making predictions on subsets of an image as on the whole image". This is of course to be expected, and is exactly how a human would perform.

There are a bunch of other criticisms too.

Read the ICLR (for which it was rejected) reviews: https://openreview.net/forum?id=HJxYwiC5tm


But then again, how does something like the following work?

https://twitter.com/GoogleAI/status/1293970520753369088

Any idea how it could be fooled?


What do you mean by "fooled"?

I'm very familiar with the BlazeFace and FaceMesh models (which are related to this in Google's MediaPipe framework).

They have weaknesses - they aren't designed for running upside down for example, so if they get data that is that way oriented they will tend to fail.

They aren't designed for "life" detection, so you can show printed pictures of a face and it will detected it.

But they give you confidence scores etc, so if you give it something like a caricature of a face it will return reasonable confidence numbers indicating it isn't as sure of its predictions.



Most self-driving companies use simulations to see how the model performs in unseen scenarios.


I'm sure that's tremendously helpful but isn't exactly an antidote, is it? You can't really simulate an unforeseen situation. If you can simulate it accurately - then you've foreseen it, and are thus able to account for all relevant variables. If you are randomly generating scenarios, something like fuzz testing or property testing, then I am sure you will discover bugs before they hit production, but you can't be sure you're simulating it accurately.

For example, maybe when your car is T-boned while climbing a steep hill, the suspension behaves in a way you didn't expect and which isn't replicated by your simulation. Or maybe you're sent a batch of decals with bad adhesive, and in the hot sun the begin to slip down the windshield until they end up obscuring or otherwise interfering with a sensor.

The only simulation that can account for every variable, regardless of whether you've anticipated it, is reality.


I'm happy a VC is providing these insights. If the economics of AI don't make sense by helping increase profits or cutting costs... it's going to be a long road to reaching the "promised land."

I'm biased but in a lot of industries synthetic data has the potential to balance the costs from the perspective of data acquisition and preparation as well as model testing.

This article doesn't focus too much on the edge side of things but one pattern I'm seeing is that edge deployment can be notoriously resource intensive and time consuming.


One way to deal with AI/ML shortcomings I've seen is to require end-user intervention for edge cases; such as a support chatbot that transfers the customer to a human rep if it can't understand the issue. Human intervention isn't mentioned in the article but maybe they'd put that under "narrow the problem," or they may not consider that a solution since human involvement eats into margins.

I believe all software companies will be AI companies in <5 years. By then, not having AI/ML would be like not having a database today. There will be no choice but to deal with the long tail, and the competitive advantage will go to the company that does it better. That makes this advice all the more timely and important, and it also means opportunities for startups to innovate in this space. Eg, better model optimization, low-cost operations without regressing to colocated GPUs, etc.

"The critical design element is that each model addresses a global slice of data... There is no substitute, it turns out, for deep domain expertise." Totally true for marketing as well. Much more effective to define audience segments and tailor the messaging and marketing for each.


https://yts.mx/movies/robot-frank-2012 show subtle issues related to AI in real life;


Indubitably




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: