Hacker News new | past | comments | ask | show | jobs | submit login
Andreessen-Horowitz craps on “AI” startups from a great height (scottlocklin.wordpress.com)
695 points by dostoevsky 42 days ago | hide | past | web | favorite | 244 comments

"Huge compute bills" usually come from training, or to be more precise, hyperparameter search that's required before you find a model that works well. You could also fail to find such a model, but that's another discussion.

So yeah, you could spend one or two FTE salaries' (or one deep learning PhD's) worth of cash on finding such models for your startup if you insist on helping Jeff Bezos to wipe his tears with crisp hundred dollar bills. That's if you know what you're doing of course. Literally unlimited amounts could be spent if you don't. Or you could do the same for a fraction of the cost by stuffing a rack in your office with consumer grade 2080ti's. Just don't call it a "datacenter" or NVIDIA will have a stroke. Is that too much money? Not in most typical cases, I'd think. If the competitive advantage of what you're doing with DL does not offset the cost of 2 meatspace FTEs, you're doing it wrong.

That, once again, assumes that you know what you're doing, and aren't doing deep learning for the sake of deep learning.

Also, if your startup is venture funded, AWS will give you $100K in credit, hoping that you waste it by misconfiguring your instances and not paying attention to their extremely opaque billing (which is what most of their startup customers proceed to doing pretty much straight away). If you do not make these mistakes, that $100K will last for some time, after which you could build out the aforementioned rack full of 2080ti's on prem.

I find it fun how the cost of the cloud is forcing people to consider what absolutely must run in the cloud (presumably for stability and compliance reasons) and what can be brought back on-prem.

We don't train ML models, but we are in a similar boat regarding cloud compute costs. Building our solutions for our clients is a compute-heavy task which is getting expensive in the cloud. We are considering options such as building commodity threadripper rigs, throwing them in various developers' (home) offices, installing a VPN client on each and then attaching as build agents to our AWS-hosted jenkins instance. In this configuration we could drop down to a t3a.micro for Jenkins and still see much faster builds. The reduction in iteration time over a month would easily pay for the new hardware. An obvious next step up from this is to do proper colocation, but I am of a mindset that if I have to start racking servers I am bringing 100% of our infrastructure out of the cloud.

If I worked from home and my employer asked me to install a server in my home, I would tell them to go fuck themselves.

It's noisy, it takes up space, and presumably I'm on call to fix it if it breaks.

You should pay them an extra 24x(PSU wattage)x(peak $/Wh in area) per day for the electricity too.

I'm alarmed that someone in your company felt this idea was appropriate enough to propose.

We would certainly compensate employees. I didn't feel it appropriate to disclose every last detail regarding the arrangement in a thread which is only tangentially-related to the OP.

This was my idea. I am a developer in my company. We are a flat structure. We have a lot of respect for each other. I am on a standup with the CEO every day. We all believe in our product and would happily participate in whatever activity brings it to market more quickly. We do not hire or retain the kind of talent that would flatly refuse to participate in experimental projects like this. At least not without some sort of initial conversation about why it's not a good fit for a particular individual.

I certainly see how someone might share your perspective. I used to work for a souless megacorp and I could have easily found myself telling my former employer to "go fuck themselves" if a proposal similar to this was imposed upon me.

Try the value data center providers like he.net.

A 42u rack and 1 Gbps connection is $400 per month.

Put cheap supermicro EPYC servers in rather than threadrippers (or build your own). High capacity RDIMMs are cheaper than UDIMMs.

This will give you a much more maintainable solution than workstations at employee's home via overlay VPN.

There is a maintenance cost for infrastructure that people tend to forget these days.

I think it was funny to presume that the employer wouldn't pay for it. At one of my old roles I had a second laptop I used solely for VMs just because I ran out of space on primary workstation. A simple fix orders of magnitude cheaper for a developer compared to more ESXi or cloud hardware.

I guess the ‘go fuck yourselves’ is heavily dependent on how able the company is to do this sort of stuff without involving you.

Do you have an ownership stake in the company?

Tangentially, I once joined a startup where a week after I started the CEO demanded that I run their crawler software on my own personal hardware, on my personal internet connection, 24x7. I (politely) told him to go fuck himself.

I later learned that the CTO had to spend a couple hours talking him out of firing me. I probably should've just quit on the spot anyway.

You can stop at “go f* themselves”, plenty of jobs to choose from.

This is not a new phenomenon. As early as in 2009 I worked for a company (ads, but not Google) which outgrew the typical "cloud" cost structure at the time, and moved everything to a more traditional datacenter, and saved substantial money even considering 3 more SREs they had to hire to absorb the increased support needs. AWS charges what the market will bear, and as such it was never designed to make sense for everyone. One needs to re-evaluate on the back of the napkin from time to time.

I once had a borrowed Sun blade server in my home office. The fan in it sounded like an industrial vacuum cleaner. It got moved to a different room and was powered on as little as possible.

Your plan makes sense but be mindful of the acoustics or your devs may grow to hate you.

Excellent point. If we are building these rigs by hand (which is a likely option considering the initial usage context), the cooling solution would probably be a Noctua NH-U14S or similar. I already have one of these in my office attached to a 2950X and it is dead silent. You can definitely hear it when every core is pegged, but it's hardly noticeable over any other arbitrary workstation. The sound is nowhere near as intrusive as something like a blower on a GPU (or god forbid a sun microsystems blade).

What you’re saying sounds like a PC custom build enthusiast. I respect that, I like it too.

Please be mindful of the fact that consumer products are not designed for the workstation/server type of load. It’s related to why the hardware is cheaper compared to the server hardware. Also, the consumer ISP connection is most likely not as reliable as that of a data center. I’m working remotely from home and I experienced this many times, bad performance in peak times or a half of a day of downtime can happen, without any warning. And account for maintenance, everyone on the team must be able to figure out a problem or deal with getting someone else in to fix it.

I know I sound like a buzzkill. I am writing this with good intentions.

Even if this works right now, it’s not a reliable long-term solution. Maybe instead of dumping a couple of grand on consumer PCs to handle server’s work, look into building a proper server. Or you could find a datacenter provider to rent their hardware, something that is not as shiny and full of features like AWS.

BTW, the only reason why consumer vacuum cleaners are so loud is because consumers associate loudness with suction power.

"Backpack" style commercial vacuum cleaners have more suction, and are barely audible in comparison.

I stand corrected. It was much louder than a consumer vacuum but my analogy skills are weak.

Your analogy skills were strong, because analogy is rooted in myth, not fact. Achilles did not actually have an Achilles tendon because Achilles did not exist

There does not have to be an increadibly loud functional industrial vacuum cleaner, for figuratively everyone to get your analogy, because the Herculean reality of vacuum cleaners is that you cannot clean an augean stable of lego on the floor, without a lot of noise. If you get my analogy.

To continue the pedantry, I don't think we know for sure that Achilles didn't exist. Troy certainly did, and we have the Iliad to thank for knowing to look for it.

No, we have Schliemann to thank, for digging through it in his lust for anachronistically Indy-Jonesing the archeology. Troy only exists because Schliemann stole the gold. Otherwise its just some hill in Turkey.

I know the story...Schliemann went looking for Troy because the Illiad told him it existed.

Didn't work on Ararat. False in unum, false in omnibus. Anyway, he went to the wrong hill first. But overall yes, myth informs reality sometimes. Shame he didn't find the bones of the giant wooden platypus the Greek ice cream salesmen hid in.

I don't know about backpack style vacuums (the few I've seen seem plenty loud), but the idea that manufacturers deliberately create loud vacuums to make them seem more powerful is ridiculous. There is an entire market segment of vacuums that are marketed as 'quiet', so there is plenty of demand for quiet vacuums. There is simply a trade off between suction / loudness / price, and various models make different trade offs (corresponding to various points in the market).

That some people want quiet vacuums is not proof that some people don't want loud vacuums. Cars and motorcycles are good examples of markets where you see both deliberately quiet and deliberately loud examples.

This doesn't seem any more ridiculous than companies setting a higher price and then always having the item on sale. I think JCPenny famously tried simply lowering their prices and sales went down. I guarantee if you ask the average person "would you rather companies cut the BS and just lower their prices rather than have items on sale in perpetuity" they would of course say lower the prices. But as the anecdote would demonstrate, that isn't how things actually play out.

The point is, when it comes to consumer behavior, I don't think anyone has a clue what to expect. It would not surprise one bit if vacuum companies make louder vacuums because the consumer thinks it works better.

Is BMW piping in fake "engine noise" through the speakers also "ridiculous" in your opinion?

Once upon a time, a company I worked for convinced Sun Microsystems that it would be cool to provide us with hardware at a discount. The discount hardware was six UltraSPARC E4000 servers, each with 8 cpus and 8GB of memory, and a creator3d card in the io mezzanine upa slot.

This company was using them as desktop workstations, in an open office.

One was used as a build host. Often, the shopvac wail of E4000 fans would be cut short by some poor dev going berserk and unplugging the thing when nobody was looking...

> I find it fun how the cost of the cloud is forcing people to consider what absolutely must run in the cloud

Honestly why ever go to the cloud? It seems like a Larry Ellison boondoggle with the absurdly high costs and lock-in. (Ever look at moving your data?)

Running your own metal is cheaper if you actually fund it.

In my experience, teams will rack up thousands in monthly expenses just being parked in a shell on very large On-Demand or Reserved EC2 instances. Basically using them as development boxes without realizing how much they cost.

I've saved a ton of money just giving them dedicated workstations to develop on and then having everyone use a shared EC2 instance to push jobs to a fleet of spot instances for large scale training.

No, also inference is quite expensive. You'll have 100% usage on a $10,000 GPU for 3s per customer image for a decently sized optical flow network. That's 3 hours of compute time for 1 minute of 60fps video.

Now let's say your customer wants to analyze 2 hours = 120 minutes of video and doesn't want to wait more than those 3 hours, then suddenly you need 120 servers with one $10k GPU each to service this one customer within 3 hours of waiting.

Good luck reaching that $1,200,000 customer lifetime value to get a positive ROI on your hardware investment.

When I talk about AI, I usually call it "beating the problem to death with cheap computing power". And looking at the average cleverness of AI algorithm training formulas, that seems to be exactly what everyone else is doing, too.

And since I'm being snarky anyway, there's two subdivisions to AI:

supervised learning => remember this

unsupervised learning => approximate this

Both approaches don't put much emphasis on intelligence ;) And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.

Some kinds of inference are expensive, yes, not going to dispute that. But 99.95% of it is actually surprisingly inexpensive. Hell, a lot of useful workloads can be deployed on a cell phone nowadays, and that fraction will increase over time, further reducing inference costs or eliminating them outright (or rather moving them to the consumer).

For the vast majority of people the main expense is creating the combination of a dataset and model that works for their practical problem, with the dataset being the harder (and sometimes more expensive) problem of the two.

The dataset is also their "moat", even though most of them don't realize it, and don't put enough care into that part of the pipeline.

The algorithms that run on cell phones tend to be specially optimized and quality-reduced neural networks. For example, https://arxiv.org/abs/1704.04861

Apple does a pretty good job at that, with no compromise in quality.

I believe that just due to memory constraints, running any high-quality neural network on phones is currently impossible.

State of the art optical flow tracking needs about 10 GB of GPU memory to execute on full HD frames. I wouldn't know of any mainstream phone with that much RAM.

That, BTW, is also the reason why autonomous drones usually downsample the images before AI tracking, which has the nasty side effect of making thin branches, fences, telephone wires, etc. invisible.

And since I'm being snarky anyway, there's two subdivisions to AI:

supervised learning => remember this

unsupervised learning => approximate this

This doesn't make any sense at all.

Both are "remembering" something under some constraint, which forces generalisation.

Supervised learning just "knows" what it is "remembering". Unsupervised learning is just trying to group data into patterns.

Both approaches don't put much emphasis on intelligence

Seems like most "intelligence" relies a lot on pattern recognition.

And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.

The evidence is that you are wrong on this for a number of pretty important problems. I don't know much about optical flow, but in the image and text spaces you can't approach the accuracy of neural network approaches with hand crafted features.

I am not sure what you are doing, but can you just compute the similarity between two frames, and analyze only the novel frames?

I.e. I think that in one minute video, 95% of your images do not have new information in them

"supervised learning => remember this

unsupervised learning => approximate this"

Lol this can't be more wrong lmao. Both areas "remember" and "approximate" things trough training. The difference is that unsupervised learning does not have labeled data, thus it has to search for some pattern. Honestly not even computer science graduates would say something like this.

- Or AMD could change their policy of 'never miss an opportunity to miss an opportunity' and offer high-performance OpenCL GPGPU offerings. Then nVidia could have all the stroke they wanted.

- Or Tensorflow/Pytorch could've crapped on OpenCL a little less by releasing a fully functional OpenCL version everytime they released a fully functional Cuda version, instead of worshipping Cuda year in and year out.

- Or Google could start selling their TPUv2, if not TPUv3, while they're on the verge of releasing TPUv4.

- Or one of the other big-tech's Facebook/Microsoft/Intel could make and start selling a TPU-equivalent device.

- Or I could finish school and get funded to do all/most of the above ;)

edit: On a more serious note, a cloud/on-prem hybrid is absolutely the right way to go. You should have a 4x 2080 ti rig available 24x7 for every ML engineer. It costs about $6k-8k a piece [0]. Prototype the hell out of your models on on-prem hardware. Then when your setup is in working condition and starts producing good results on small problems, you're ready to do a big computation for final model training. Then you send it to the cloud, for final production run. (Guess what, on a majority of your projects, you might realize, the final production run could be carried out on on-prem itself; you just have to keep it running 24 hours-a-day for a few days or up to a couple weeks.)

[0]: https://l7.curtisnorthcutt.com/the-best-4-gpu-deep-learning-...

As someone who has actually worked on this stuff soup to nuts, it's not as easy as people imagine, because you can't just support some subset of available ops and call it a day. If you want to make OpenCL pie from scratch, you must first make the universe, and support every single stupid thing (among thousands) and even mimic some of the bugs so that models work "the same".

This is hard and time consuming, and this field is hard enough as it is. What makes it even harder is that only NVIDIA has decent, mature tooling. There is some work on ROCM though, so AMD is not _totally_ dead in the water. I'd say they're about 90% dead in the water.

> support every single stupid thing (among thousands) and even mimic some of the bugs so that models work "the same".

Do you need to do the stupid things performantly, though? Because that sounds like a case for skipping microcode shims, and going straight to instructions that trap into a software implementation. Or just running the whole compute-kernel in a driver-side software emulator that then compiles real sub-kernels for each stretch of non-stupid GPGPU instructions, uploads those, and then calls into them from the software emulation at the right moments. Like a profile-guided JIT, but one that can't actually JIT everything, only some things.

I know Tensorflow decided to be cuda-exclusive for the silly reason that the matrix library they were using (eigen) only supported cuda.

I have never recovered from that.

Are you in the Bay area? Would love to chat. Thinking of an idea where your expertise could be very handy. $my-hn-username at gmail.

That article is mostly right, but there's one part that got skimped on that will mess you up big time with about an 20% chance if you run for long enough.

I've been playing with custom-built 2080 Ti workstation for a while: https://www.youtube.com/watch?v=OF3JYEIsjH8

Several issues: 1. electricity bill is still an issue, I've been paying anywhere between $500 to $1000 per month for this workstation (always have something to train). 2. something with a decent memory size (Titan RTX and RTX 8000) cost way too much; 3. once you reached a point of 4-2080Ti-is-not-fast-enough, power management and connectivity setup would be a nightmare.

Would love to know other people's opinions on the on-prem setup, especially whether a consumer-grade 10Ghe is enough for connectivity-wise.

10gbe will depend on the workload. In general, I'd assume it's fine because it takes a parallel raid setup to saturate. Upgrading to 100gbe is pretty unreasonable cost wise unless you buy network gear from a back alley van dealer.

Although once you reach 4 2080ti, you ought to consider switching to a titanium grade psu and rewiring if you're in a 100-120v country. If you're feeling cheap, just steal the phases from two different circuits. Last I looked, most psu operate around 5% lower efficiency on 115 vs 230.

I've encountered some latency issues with allreduce on transformer models due to vocabulary sizes when communication have to cross PCIe lanes. Increasing batch size helps a lot, but low-latency & high-throughput is universally more helpful to lift these minor concerns (I don't really want to care about my batch size to improve allreduce performance). Hence worried not only throughput, but also latency on consumer-grade 10ghe equipments.

If you're feeling cheap, just steal the phases from two different circuits.

I chukled, but more seriously, if you can't rewire your house to get a normal 240V circuit, you should not be fucking around with hacks like above.

>> $500 to $1000 per month

How much is your electricity? I currently run 12 GPUs in my garage pretty much non-stop. 4 GPUs per machine, 3 machines. Each machine is about 1.2KW on average (I can tell because each machine is connected through its own rack UPS), or 13.2 cents per hour, or $95/mo. Which, IMO, is not bad at all. That's less than $300 per month for 12 GPUs.

Sorry, it is 2-month billing cycle. We have around 30 cents per kwh I think.

As someone hoping to build a world-wide footprint, say 25 to 50 DCs, of servers to deploy to with unmetered bandwidth, what are some alternatives to the usual suspects?

I have come across fly.io, Vultr, Scaleway, Stackpath, Hetzner, and OVH but either they are expensive (in that they charge for bandwidth and uptime) or do not have a wide enough foot-print.

I guess colos are the way to go, but how does one work with colos, allocate servers, deploy to them, ensure security and uptime and so on from a single place, 'cause dealing with them individually might slow down the process? Is there a tooling that deals with multi-colos like the ones for multi-cloud like min.io, k8s, Triton etc;

(Hi, I'm from fly.io)

It depends what you need in your datacenters! If you just want servers, and don't care about doing something like anycast, you can find a bunch of local dedicated server providers in a bunch of cities and go to town. But you can't get them all from one provider, really, not with any kind of reasonable budget.

You _could_ buy colo from a place like Equinix in a bunch of cities, and then either use their transit or buy from other transit providers.

But also, unmetered bandwidth isn't a very sustainable service, so I'm curious what you're after? You're usually either going to have to pay for usage, or pay large monthly fixed prices to get reasonable transit connections in each datacenter.

In our case, we're constrained by Anycast. To expand past the 17 usual cities you end up needing to do your own network engineering which we'd rather not do yet.

(thanks mrkurt)

It is anycast that I'm going after. Requirement for unmetered bandwidth (or cheaper than AWS et al) is because of the kind of workloads (TURN relays, proxy, tunnels etc) we'd deal with gets expensive, otherwise. For another related workload, per-request pricing gets expensive, again, due to the nature of the workload (to the tune 100k requests per user per month).

So far, for the former (TURN relays etc), I've found using AWS Global Accelerator and/or GCP's GLB to be the easiest way to do anycast but the bandwidth is slightly on the expensive side. Fly.io matches the pricing in terms of network bandwidth (as promised on the website), so that's a positive but GCP/AWS have a wider footprint. Cloudflare's Magic Transit is another potential solution, but requires an enterprise plan and one needs to bring-your-own-anycast-IP and origin-servers.

For the latter (latency-sensitive workload with ~100k+ reqs / month), Cloudflare Workers (200+ locations minus China) are a great fit though would get expensive once we hit a certain scale. Plus, they're limited to L7 HTTP reqs, only. Whilst, I believe, fly.io can do L4.

Ah interesting! If you're cool talking more about what you're doing with me will you send me an email (kurt@fly.io)?

> As someone hoping to build a world-wide footprint

Does adding an extra 100ms to the response time cost you that much business wise?

As for colos, it depends on scale. If you have 30k servers world wide, it pays to have someone manage the contracts for you. If not it pays to go for the painful arseholes like vodaphone, or whoever bought Cable & wireless's stuff.

as for security, it gets very difficult. You need to make sure that each machine is actually running _what_ you told it, and know if someone has inserted a hypervisor shim between you and your bare metal.

none of that is off the shelf.

Which is why people pay the big boys, so that they can prove chain of custody and have very big locks on the cages.

K8s gives you scheduling and a datastore. For a large globally distributed system its going to scale like treacle.

For balance, all big cloud providers - aws, gcp, azure, oracle [0] have pretty similar startup plans. Y$$MV

(I'm in full agreement with everything you've written + it's well-phrased and funny. gj!)

[0] that's not a typo - there is such thing as "Oracle cloud"

There’s also the issue that data scientists often want to go running to hyperparameter optimization and neural architecture search. In most cases improving your data pipelines and ensuring the data are clean and efficient will pay off much more quickly.

But manually improving the data pipeline requires an understanding of the problem, whereas doing a hyperparameter optimized architecture search just needs $$$ hardware and no clue on the side of the operator.

Or, to put that another way: if you knew what algorithm the AI would be using to discriminate the signal from the noise in your data, why would you need the AI? Just write that algorithm.

That's not the same thing at all.

You can't build hand-build a feature detector as accurate as (say) a ResNet50 by hand. Before 2014 people tried to do this with feature detectors like SIFT and HOG. These were patented and made the inventors significant money. If it was still possible to do it then someone would be and making profit from it.

Hyperparameter search is just optimising the training parameters (things like batch size, or optimiser parameters). This might give another 1% lift on accuracy, but isn't generally a significant factor.

> You can't build hand-build a feature detector as accurate as (say) a ResNet50 by hand.

Yes, you can. If, that is, you can actually understand what the produced model is doing. And, of course, no human can do that, because no human understands the algorithm being employed by the produced model, because it's a really freaking complex algorithm whose optimal formulation really is just a graph of matrix transformations, rather than an imperative procedure that can be described by words like "variable."

This is an important idea to absorb, for the specific case where the AI converges on an optimal algorithm that's actually very simple—because the data has a regular, simple shape—rather than on one that's too complicated for our mortal minds. If you already knew that simple algorithm, then the work you did training an AI just to end up back at that same simple algorithm is wasted effort. An AI can't do better at e.g. being an AND gate, than an actual AND gate can. An AI can't do what wc(1) does better than wc(1) can.

If the data is regular—that is, if a model of its structure can be held fully in a human brain—then jumping immediately to Machine Learning, before trying to just solve the problem with an algorithm, is silly. The only time you should start with ML, is when it's clear that your problem can't be cleanly mapped into the domain of human procedural thinking.

The AI programmers of the 1960s were not wrong to start with Expert Systems (i.e. attempting to write general algorithms) for deduction, and only begrudgingly turn to fuzzy logic later on. Many deduction tasks are algorithmic. If you don't require the context of "common sense", but only need operate on data types you understand, you can get very far indeed with purely-algorithmic deduction, as e.g. modern RDBMS query planners do. There would be no gain from using ML in RDBMS query planning. It's regular data; the AI's trained model would just be a recapitulation of the query-planning algorithm we already have.

>> But manually improving the data pipeline requires an understanding of the problem

> Or, to put that another way: if you knew what algorithm the AI would be using to discriminate the signal from the noise in your data, why would you need the AI? Just write that algorithm.

My point is that this isn't the same thing at all.

Say your problem is plant detection from mobile phone photos. I can understand everything about plants, and I can manually build a highly optimised data processing pipeline.

But I can't build a feature extractor that outperforms ResNet50. That's the key algorithm.

> If the data is regular—that is, if a model of its structure can be held fully in a human brain—then jumping immediately to Machine Learning, before trying to just solve the problem with an algorithm, is silly.

True, but no one has made that argument. This is specifically about using hyperparameter optimisation vs improving your data pipeline.

> True, but no one has made that argument.

Er, yes, I did, in my original post. The form you quoted was me attempting to be more precise in rephrasing it.

My point—my original point, this whole time—was that applying an advanced “feature extraction” algorithm to a data source whose features are explicitly encoded in a lossless, linearly-recoverable way in the data—what we usually call structured data—is silly.

For example, there’s no point in using ResNet50 to extract the “features” of a formal grammar, like JSON. It’d just be badly simulating a JSON parser.

In fact, there’s pretty much no data structure software engineers use, where ResNet50 would give you more information out than you’d get from just using the ADT interface of the data structure. What features are in a queue? Items and an ordering. What’s in a tree? Items, an ordering, and parent-child relationships. Etc.

The only place where it might make sense to use ML when dealing with structured data, is with statistical data structures like Bloom filters. ResNet50 might be able to recover some of the original data out of a bloom filter, in est using it as a compressed-sensing tool, or (in the algorithmic CS domain) as a decompressor for a lossy, underdetermined compression codec.


My second point was that, often, it turns out that your data is structured data, even when you didn’t ask for structured data.

Some natural-world datasets are structured!

Example: the standard model of quantum chromodynamics describes a clean digraph of possible spin configurations. You don’t need feature detection when looking at LHC data. The dataset is pre-bucketed, the items pre-tagged, by nature itself.

But more often, what happens is that your data turns out to not be “raw” / primary-source data, but rather a secondary source that was already structured, enriched, and feature-extracted by someone else before you got there.

Scraping social network data? It’s already a graph, and it often already has annotation fields in the JSON graph endpoints describing the relationships between the members. If you don’t just stop and look at the dataset, you might think your feature-extractor is doing something very clever, when actually it’s just finding the explicit pre-chewed “relationship” field and spitting it back out at you.


You might not see the relation to a kD-sample-matrix feature-extractor like ResNet50, so here’s some more tightly-analogous examples:

• What if the images in your training dataset turn out to be in Fireworks PNG format, where the raster data contains an embedding of the original vector image it was rendered from? Specializing your feature-extractor to this data is just going to make it learn to find those vectors (and extract features from those), rather than depending on the features in the raster data; and then it’ll fail on images without embedded vector descriptions. And if that’s all you want, why not just use a PNG parser to pull out the vectors?

• What if your audio files turn out to all have been MIDI files rendered out from a certain synthesizer using its default set of instrument patches? Will feature-extraction on this rendered data beat just writing a program to exact-match and decode the instruments back to a MIDI description? Certainly there might be MIDI-level features you want to extract, but will ResNet50 be better at extracting those MIDI-level features for having seen the rendering, as opposed to having been fed the decoded MIDI-level data directly?

> Er, yes, I did, in my original post. The form you quoted was me attempting to be more precise in rephrasing it.

Ok. No one other than yourself has made that argument.

> Scraping social network data? It’s already a graph, and it often already has annotation fields in the JSON graph endpoints describing the relationships between the members. If you don’t just stop and look at the dataset, you might think your feature-extractor is doing something very clever, when actually it’s just finding the explicit pre-chewed “relationship” field and spitting it back out at you.

I've worked in this exact field, and I've never ever heard of anyone who doesn't do this. I guess someone might.. so if your point is that there are dumb people around then ok.

But that isn't what a feature extractor does! In the graph context a simple feature extractor is something hand coded like the degree of a node, or a more complex learned one is something that maybe does embedding.

If your point is that people should understand their data, then yes that is in data science 101 for good reasons.

"There would be no gain from using ML in RDBMS query planning. It's regular data; the AI's trained model would just be a recapitulation of the query-planning algorithm we already have."

Most of what you wrote seems fine, until I got to this. A query optimizer seems like something that tends to be very opaque, very complex, and in my experience blows up without a good explanation frequently in typical situations. It's also based on a lack of complete data about the problem domain, to the point where an optimal algorithm seems hopeless. I'm not saying an AI approach automatically can be better, but at least you (I) can envision it being better, perhaps less brittle and taking into account dimensions and possibilities a human doesn't. And the non-AI solutions aren't trustworthy in terms of bounded quality so you have not got a lot to lose.

Most (?) RDBMS query planners rely on updating statistics on the data in the table to decide what type of joins to perform.

I can imagine cases (distributed databases, different speed storage) where it would make sense to test the queries and learn which optimisations make sense. It'd be self tuning and able to adapt to changing hardware.

Oracle certainly has statistics and incredibly sophisticated optimization in theory, it's just that in practice, I think it sucks.

I spent way too many years writing ad hoc Oracle SQL that had to complete within a few hours and trying to guess if the optimizer had decided to finish in 15 minutes or a week.

And I would read Tom Kyte where he says obviously your database is set up wrong if the optimizer isn't working for you. And how you should do everything in one big beautiful query that uses all the latest features of Oracle.

Exactly :)

In most cases, unsupervised learning is nothing more than having the AI try to approximate the solution of your highly non-linear loss function. So if there's any way of solving that loss function directly, it will perform like a well-trained AI.

> In most cases

In what cases is this not true?

Of course, you can also automatically search the best data pipeline for your data...

> Just don't call it a "datacenter" or NVIDIA will have a stroke.

Context please :) ?

NVIDIA forces you to buy significantly more expensive cards that perform marginally better if you are using them for datacenter use. They try to enforce not letting businesses use consumer grade gaming cards. I assume this is so cloud providers don't buy up all the supply of graphics cards and make it hard for gamers to get decent cards, like what happened during the bitcoin craze.

No it's just pure price discrimination. They don't care about gamers they just know businesses will pay more if forced to while gamers can't.

I wouldn't say they don't care about gamers, considering that gaming makes up about half of their revenue: https://www.anandtech.com/show/15513/nvidia-releases-q4-fy20...

Sure sorry I realize now the point I was trying to make doesn't match my wording. They do care about selling to gamers but availability to gamers is not in anyway why they are forcing more expensive models of essentially the same cards on the HPC and server market. It's all because they can and the business market is able to bear the cost. Now if they found a true competitor in that market, I think this pricing model would fall apart fast.


Just a guess but maybe it's some licensing issue? https://www.nvidia.com/en-us/drivers/geforce-license/

No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.

> except that blockchain processing in a datacenter is permitted

Well, Nvidia, y'see, my new blockchain does AI training as its Proof-of-Work step...

well they are the one writing the rules, so i'd side with OP

Datacenter GPUs are mostly identical to the much cheaper consumer versions. The only thing preventing you from running a datacenter with consumer hardware is the licensing agreement you accept.

"The only thing preventing you from running a datacenter with consumer hardware is the licensing agreement you accept."

The consumer cards don't use ECC and memory errors are a common issue (GDDR6 running at the edge of its capabilities). In a gaming situation that means a polygon might be wrong, a visual glitch occurs, a texture isn't rendered right -- things that just don't matter. For scientific purposes that same glitch could be catastrophic.

The "datacenter" cards offer significantly higher performance for some case (tensor cores, double precision), are designed for data center use, are much more scalable, etc. They also come with over double the memory (which is one of the primary limitations forcing scale outs).

Going with the consumer cards is one of those things that might be Pyrrhic. If funds are super low and you want to just get started, sure, but any implication that the only difference is a license is simply incorrect.

> For scientific purposes that same glitch could be catastrophic.

But for machine learning, some say that stochasticity improves convergence times!

Could you cite an example where ECC requirement on GPU was real and demonstrated to be needed? In practice, I don't know anyone who'd willfully take 10-15% perf hit on GPUs, because of a cosmic ray.

The thermal design for "datacenter" card can be better for sure. And on-board memory size and design. That's about it. For how many x over geforce price tag is that?

"In practice, I don't know anyone who'd willfully take 10-15% perf hit on GPUs, because of a cosmic ray."

Virtually every server in data centers runs on ECC: the notion of not using it is simply absurd. And given that the Tesla V100 gets 900GB/s of memory bandwidth with ECC, versus 616GB/s of memory bandwidth on the 2080Ti without ECC, it's a strawman to begin with.

nvidia further states that there is zero performance penalty for ECC.

As to whether the requirement is "real", Google did an analysis where they found their ECC memory corrected a bit error every 14 to 40 hours per gigabit.

"That's about it."

Also ECC memory. Also dramatically higher double precision performance. Dramatically higher tensor performance. Aside from all of that...that's it.

Learned a new word today. Pyrrhic.

And the cooling, amount of ram and the doubles performance.

the chip might be the same, but the rest of it isn't

Granted, its not worth the $3k price bump, but thats a different issue.

Nah that's not really it. The reason NVIDIA doesn't allow this is precisely because the additional RAM - functionally the only difference - is not cost efficient. People would like (and did) use a bunch of consumer 1080s, which is why NVIDIA disallowed precisely that. You had to buy the equivalent pro grade card, which costs easily two or three times that and offers a couple more GB of RAM.

Yep if you getting huge bills you should be doing on prem HPC eg where a 15k budget means 15kw per container and your into exotic network designs where 10g wont cut it any more.

eg from 2011 6400 Hadoop nodes like http://bradhedlund.com/2011/11/05/hadoop-network-design-chal...

God only knows what fun you could get up to with modern tech - I miss bleeding edge rnd

I was training ML models on AWS / Google Colab. After racking up a few hundred dollars on AWS I bought a Titan RTX (I also play video games so it does that very well also.

> Also, if your startup is venture funded, AWS will give you $100K in credit

AFAIK that is limited to <$20k and it expires.

We got $100k but boy oh boy once you’re on it it’s hard to get off. Now we’re close to 50-50 gcloud/aws

Hah! "First hit is free, man!" Makes sense, though. Anybody offering a "free" $100k of anything can afford a lot of MBA time to make sure that the CLTV is well over costs.

Inference is also becoming a bigger contributor to compute bills, especially as models get bigger. With big models like GPT-2, its not unheard of for teams to scale up to hundreds of GPU instances to handle a surprisingly small number of concurrent users. Things can get expensive pretty quick.

Slow clap

> most people haven’t figured out that ML oriented processes almost never scale like a simpler application would. You will be confronted with the same problem as using SAP; there is a ton of work done up front; all of it custom. I’ll go out on a limb and assert that most of the up front data pipelining and organizational changes which allow for [ML to be used operationally by an org] are probably more valuable than the actual machine learning piece.

Strong agreement from me: I've never worked on deploying ML models, but have worked on deploying operations-research type automated decision systems that have somewhat similar data requirements. Most of the work is client org specific in terms of setting up the human & machine processes to define a data pipeline to provide input and consume output of the clever little black box. A lot of this is super idiosyncratic & non repeatable between different client deployments.

That's because, ML and operations-research problems can be simplified to set of optimization problems and the underlying math and statistics are all very similar if not identical in some cases.

And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.

In short, find interesting problems, then the solutions -- not the other way around.

"The data" means more than pure computer science people want to admit. In any "advanced" application, that means annotators. Radiologists drawing circles around cancer, attorneys labeling contract clauses as unacceptable, drivers labeling stop signs, etc.

ML is a mining problem. Digitizers are the miners. Annotators are the refiners.

Basically, the system is massively ad-hoc and driven by this large scale annotation, training and testing.

The big question here is, what happens when the world changes next year? You rebuild the application. I know there are companies that advertise doing continuous updating of deep learning models but it seems like calculating total costs and total benefits is going to be hard here.

Sometimes the mine makes money, sometimes it doesn't make sense to run the mine.

To extend the mining metaphor, and relate back to the original articles:

People and organizations are chasing what they believe, or are told to believe, is pay dirt.

Many unfamiliar investors have rushed in, possibly fearing missing out, and fund many of the prospectors, yet many of the prospectors and investors aren't really aware of the costs of running a mine, nor the practices required to run them efficiently.

It turns out that there's more aspects to the value creation process than dig/refine/polish (data/train/predict), especially when usefulness in application matters and there are finite resources available for digging.

Companies selling shovels are some of the primary beneficiaries of this, by selling shovels (i.e. renting compute) funded by the malinvestment.

Additional beneficiaries are the refiners (training experts) that are able to charge steep labor premiums, however organizations are starting to figure out that their refiners are expensive to keep idle and often operate the mines poorly in terms of throughput/cost-effectiveness/repeatability/application (see the various threads on "Data Engineers")

This is correct, however, the distinction between labeling and training is artificial, and probably arises from the fact that ML came from academia, where it was not part of the business process.

I.e. a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.

After a while, the machine would train on this recorded data, and start replacing the humans.

Rinse and repeat.

> a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.

Ah, this is a typical thing I hear people in the Valley say: just push it all ... somewhere. No.

If we digitized all microscopy slides, it would require YouTube-scale storage several times over. People think genomics is big. People think reconnaissance imaging is big. They're big, but there's only so much of them.

IF it were digitized, there would be far more pathology whole slide imaging being generated every day. I did some estimates at one point and had to throw a couple orders of magnitude into the genomics data to even make it competitive at enterprise scale.

And keep in mind, we're talking clinical medicine. We want the data now. We're looking at the slides while the glue is still wet. You don't have the bandwidth, no one has the bandwidth, to do some of this stuff they way you propose and maintain the current "business process" of clinical medicine.

Building models and iterating, the old fashioned way, is the only way it makes sense.

Funny, we all thought computers were fast. Turns out its nowhere what we need.

They're fast, sure. But not very efficient in certain problem domains, specificially where humans are efficient (for reasons that are IMHO historical, not innate).

This is spot on. Hence the open sourcing of ML code while keeping an iron grip on data.

> And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.

The models are likely also a differentiating factor in a sense that there are models that perform much better than others, to a point of completely new functionality. But also all of these models are basically open source currently... So they can't by definition be differentiating between different companies, because all of the companies generally have access to all of the algorithms. At leat to all of the types of algorithms.

I just spent $50K on coloc hardware. I'm taking a $10K/mo Azure spend down to a $1K/mo hosting cost.

But the real kicker is that I get x5 the cores, x20 RAM, x10 storage, and a couple of GPUs. I'm running last-generation Infiniband (56gb/sec) and modern U.2 SSDs (say 500MB/sec per device).

I figure it is going to take me about $10K in labor to move and then $1K/mo to maintain and pay for services that are bundled in the cloud. And because I have all this dedicated hardware, I don't have to mess around with docker/k8s/etc.

It's not really a big data problem but it shows the ROI on owning your own hardware. If you need 100 servers for one day per month, the cloud is amazing. But I do a bunch of resampling, simple models, and interactive BI type stuff, so co-loc wins easily.

Yes, it's quite obvious when you actually have compute needs. At my current employer, we spent about 100k to build a small single purpose hpc. One year later, I calculated the azure costs (help bargain for more servers) would have been around 1.5m. This is almost 24/7 use though, and add another ~150k in electricity.

For my own company we built out at two regionally distinct colo facilities. That worked really well and operations was efficient and costs were moderate, clearly tied to CAPEX increments which were predictable.

Recent projects have been on AWS. For a project that is roughly on the scale of our colo in terms of instances, though with aggregate lower performance, we are buying one of our colos every year. It’s insane. Network costs are particularly egregious in AWS.

But there is absolutely no way we’d be permitted to build colo facilities for many reasons and there are many reasons why even if we could get permission to do so we would choose not to due the resulting death by a thousand cuts orchestrated by the team who happens to have inserted themselves as the owner for DC/colo like things.

Yes, cloud costs are the cost of having poor internal management, such that inefficiency and incompetence reigns unchallenged. The enormous cost differential is unfortunately borne by the unit doing the work, rather than the one preventing it from being done efficiently.

That is a very accurate summary.

the point of Cloud is that it solves the problem of variable demand.

I used to run on-prem back in the 2000's, and we were constantly dealing with demand fluctuation crises. Spinning up new physical servers to deal with new demand, or being massively over-specced when demand dropped, was a real pain.

I'm starting a new thing this week, and using the Cloud for it because I have no idea what our demand will be. I can start small, scale up with our customer growth, and never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.

At some point in the future, when our needs are clear and relatively stable, it might make sense to migrate to on-prem and save those costs.

I half-agree. The Cloud specifically solves the problem of _highly_ variable demand.

If your peak demand is 100x your baseline and only happens for ~1h each day, cloud is almost certainly a good choice. If it happens for ~12h a day or it's only 5x your baseline, the cost of the cloud is such that you're likely to save with dedicated hardware, even though much of your hardware sits around doing nothing part of the time.

> never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.

There is a middle-ground that's very much worth considering: renting dedicated servers. It's not quite as cost-effective as colocation and owning your hardware when you have at least a cabinet worth of stuff but it does offload the management of the hardware and provisioning to somebody else. They can also usually be provisioned in a matter of minutes.

In some cases (e.g. Packet.net) these machines can even be treated essentially like cloud instances, with hourly pricing.

There's also yet another middle ground: using dedicated to handle the known and predictable baseline traffic and using the cloud to handle the unexpected bursts.

Thanks for the pointer to packet.net. This looks like it will scratch a current itch.

I did similar at my current and last job. Rather than spend $24k/month, I spent $50k, bought a shitton of hardware, built a virtualization cluster at Corp, and upgraded our connections. Accounting thought i was a wizard.

Especially as they can amortize that cost in the annual accounts - there might even be RnD tax credits they can use

Yea we’re seeing this all over the place at Lambda (https://lambdalabs.com). Most people running consistent GPU training or inference jobs are building on-prem clusters or even groups of workstations.

It just doesn’t make financial sense to use the big the cloud service providers for those with consistent workloads. I always hear stories where folks have saved hundreds of thousands in infrastructure costs with owning + co-lo.

I agree to this, and I think lambdalabs is quite precisely positioned for on-prem training.

As an aside, thank you for your one-line installer script for tf/keras. Earlier, my team used to spend days figuring out the CUDA/tf/keras/CUDNN etc dependency charts, and you've brought that down to ~0.

+1 for the one line install

We never went cloud, except for ancillary things like build machines, nagios etc that run on tiny VMs. Whenever I looked at the economics I could buy a server of the class we needed for roughly 2x the monthly rent for the equivalent from Amazon.

This whole topic recapitulates all the arguments for business units acquiring and operating their own servers versus continuing to suffer the internal bill-backs from the corporate data center.

Some of the same caveats apply with respect to software updates, configuration control, security, availability, business continuity, disaster recovery, and what happens if the local admin is hit by a bus.

Exactly. These examples are mostly apples to oranges comparisons. I have worked over 20 years in OPS and it is really hard to do cheaper than AWS ____in the long run___. If you are unlucky and bought a batch of SSDs that are faulty exactly 1 month after warranty expires or you have downtime because of other low-level reasons that AWS shields you from, your co-loc cost can quickly go up. I don't even want to go into networking hoops, that is a whole different problem to deal with global network vendors. If you can be sure you never run into these, or your business is resiliant to these sort of problems, or you have a dedicated highly skilled team (like dropbox) than co-loc might be a good idea. Otherwise it is pretty damn hard.

I'm sure your right for your case. But I'd add one caveat for those less experienced: if you own the hardware, you need to be prepared to go to the colo when something breaks. The various clouds are a much nicer experience when hardware fails. At the very least people should have enough spare capacity that a hardware failure means going sometime in the next couple of weeks, rather than getting up at 3 am and fixing things under pressure.

Or take the middle road and just get rent the hardware (aka, dedicated hosting). You pay more than colo but still way less than cloud, get the same level of hardware support as a cloud provider but the same performance as colo.

Yep, for example Hetzner offers bare metal servers as well as cloud instances at a very reasonable price. (Not affiliated in any way. Just a happy customer.)

Prior to the cloud, I ran at a coloc facility for 15 years. I break stuff much more often than having it actually fail. So... make yourself robust against human error first and you'll probably cover the hardware side as a side-effect. I am more likely to hose a machine during an OS upgrade and not have time to recover than I am to have an SSD fail.

But spare capacity is a good idea, especially if you have real-time traffic.

Operations teams deal with both. You design your system with enough spare capacity that you can live somewhat degraded for a time - you must if only due to the lead time. Software failures are far far far more common than hardware failures so once you combine these, the occasional midnight trip to the colo is both rare and oddly satisfying for hero types.

I would have assumed the colo provider would offer Remote Hands, so you’d only need to send replacement hardware.

That’s how the DC I used to work in operated.

If you have enough spare capacity and the problem is pretty mundane, sure, that can work. But if not, then it's off to the colo while the rest of the company freaks out.

Network redundancy, electricity redundancy, bandwidth included? Otherwise, it is a bit of apples to oranges. What about firewalls? I mean you could ignore all that and say you only need raw computing power. On the k8s note, nobody is forcing you to use k8s on the top of Azure.

Now do the calculation for ongoing operations for 5 years, taking into consideration normal hardware failure and maintenance cost. You need to swap out old hardware to get a new CPU, etc. I have tried to use co-loc vs cloud for ~100 nodes and cloud won, by 30%.

What colo company did you use?

In Austin, DataFoundry is by far the best. It was overkill for me and went with something off the beaten path but they have an amazing facility.

I wound up at a facility run by a fiber vendor because they'd sell me a fixed 250mbps pipe for the same price that a data center would sell me 20mbps pipe that bursts to 1gbps. It only works for me because of the nature of my business -- most people would be better off somewhere else.

Choosing a co-loc facility is complicated. My recommendation is to tour and get quotes from 3-5 vendors in your area before choosing anyone. Ideally, take someone who has done it before.

How did you estimate your hardware needs?

I plan to do this in the near future once my GCP credits are used up (18 months of credits left).

My plan is to temporarily shift to dedicated hardware through a service like Hetzner to evaluate what kind of hardware I need. I can simply redirect a fraction of the traffic and extrapolate. Since this is elastic there will be no upfront costs, but I can play around with different sizes. Once I'm happy with my estimate, buy real hardware and move the rest over.

At least that's the plan. I don't think you can do much more than an educated guess and I think this will be as close as I can get.

Not AI related btw.

Gee... if only there was a service where you could spin up machines on demand. (joke)

I kinda worked backwards from the cost. I ran the business for a year on Azure but each 'sample' of the resample took about 2 mins so it precluded any near real-time analysis. I ported the kernel to a GPU locally using python/numba and it ran in about 10 seconds and that was enough to seal-the-deal.

From there, I spec-ed out a GPU server and then machines that matched each role in my environment. I decided I was willing to spend $50K and just started loading up the machines.


(I don't think people like the gratuitous imaging generated by that last sentance. Much too real.)

Agreed. Totally unnecessary

The number of places where machine learning can be used effectively from both a cost perspective and a return perspective are small. They are usually tremendously large datasets at gigantic companies, and they probably have to build in house expertise because it's hard to package this up into a product and resell it for various industries, datasets, etc.

Certainly something like autonomous driving needs machine learning to function, but again, these are going to be owned by large corporations, and even when a startup is successful, it's really about the layered technology on-top of machine learning that makes it interesting.

It's kind of like what Kelsey Hightower said about Kubernetes. It's interesting and great, but what will really matter is what service you put on top of it, so much so that whether you use Kubernetes becomes irrelevant.

So I think companies that are focusing on a specific problem, providing that value added service, building it through machine learning, can be successful. While just broadly deploying machine learning as a platform in and of itself can be very challenging.

And I think the autonomous driving space is a great example of that. They are building a value added service in a particular vertical, with tremendous investment, progress, and potentially life changing tech down the road. But as a consumer it's really the autonomous driving that is interesting, not whether they are using AI/machine learning to get there.

“The number of places where machine learning can be used effectively from both a cost perspective and a return perspective are small.”

Thankfully transfer learning and super convergence invalidates this claim.

Using pre-trained models + specific training techniques significantly reduces the amount of data you need, your training time and the cost to create near state of the art models.

Both Kaggle and google colab offer free GPU.

>Thankfully transfer learning and super convergence invalidates this claim.

IME it is nowhere near as universally successful as this suggests.

> Both Kaggle and google colab offer free GPU.

I think this sentence invalidates your argument against:

“The number of places where machine learning can be used effectively from both a cost perspective and a return perspective are small.”

In a hobbyist world, free GPU time is an amazing thing, and you can do a lot of fun and rewarding projects using transfer learning and other techniques that avoid heavy engineering and data processing. In a business world, where your product must consistently and accurately perform well, problems that may be solved by ML need to be heavily scrutinized and researched, because for most problems there are cheaper, faster, more robust solutions. Free GPU time doesn't weigh in at this scale.

Sure if ML/DL is your core business than yeah it doesn't make sense.

If ML/DL is an add-on to help augment your business (separate from the core value) then yes transfer learning and free GPU's will get you good returns.

How would you explain the rise (and success) of machine learning in science? A lab that uses some learning-based method will likely be limited to just one or two people (responsible for data acquisition, feature engineering, evaluation, etc.) and extremely finite data.

It's not clear there has been any deep impact actually, but there has been a lot of discussion (and grant proposals)

I've seen a lot of cross pollination of ML and AI techniques into various disciplines. A large percentage just didn't work at all, most of the rest were more "kind of interesting, but". Nothing earthshaking happened although pop sci press likes to talk about it a lot.

If you have more digital data than you used to, using modern free frameworks and toolkits to do basic (i.e. older, boring, but understood) ML stuff to understand it seems to have a reasonable return. Mostly I think this is because it becomes accessible to someone without much background in the area, and you can do reasonable things without having to put 6 months of reading and implementing together before starting.

How do you define success? Adoption? Because right now, writing "we will use machine learning to solve X" in a grant proposal is an easy way to increase chances of getting funding.

I'm not sure there is a rise. 'Science' is a huge domain. Machine learning if I had to guess maybe plays a role in < 1% of them, and that may be overstating it.

Also it's doubtful to even categorize machine-learning as science. The goal of science is to generate insight and knowledge, ML solves particular engineering problems or searches problem spaces, it doesn't build fundamental scientific models.

Can you elaborate on what you mean by "A lab that uses some learning-based method will likely be limited to just one or two people (responsible for data acquisition, feature engineering, evaluation, etc.)" ? I know a bunch of labs that apply machine learning to specific tasks, and the parts you list each can easily take up multiple people for years for a single task - not counting data acquisition, because data is definitely not "extremely finite", you need lots of quality data, and improving data is something that always gets improvements and can easily eat up more manpower than you can have budget, no matter what that budget is.

...because previously, the academics would use an army of undergrads to do the same data labeling that ML accomplishes.

(The dis-economy of scale hurts less if you're already starting from a point with the manual labor.)

Its now a lot cheaper in the 1980's than when I worked at the worlds leading hydrodynamics orgs.

I briefly looked at using neural nets to analyse data from an experiment - analysing the efficacy of toilet bowl designs.

The entry level hardware was £250k in 1981 - it was much cheaper to take photo's and have a research assistant count squares.

Now you could use fairly cheap commodity hardware to do it.

It would have been an amazing cutting edge project if we could have got some government funding we did have an in-house knowledge engineer.

It's interesting that the industry constantly has to relearn the idea that tech needs follow business needs, not the other way around. As you said, so many teams rushing to containerize, but if the services you run are piles of junk, do your users care about whether kubernetes can scale based on memory instead of cpu? Similarly, many effective "recommendation engines" are just inverted indexes and not fancy ML models, and are a hell of a lot cheaper.

Having briefly worked for an AI company, I agree with the conclusion that AI companies are more like services businesses than software companies. I would add only one other thing: to me going forward there likely won't be "AI companies" - AI exists to power applications. And in my experience, unless the output is truly differentiated, customers aren't willing to spend more for something "powered by AI" - they just expect that software has evolved to provide the kind of insights that AI sometimes deliver.

For an example of a genuine software company vaguely in this ecosystem, consider companies that build the tools that some AI/ML/optimisation systems use as building blocks. Eg optimisation algorithms.

If you need to solve gnarly industrial scale mixed integer combinatorial optimisation problems in the guts of your ML / optimisation engine, the commercial MIP solvers (gurobi , CPLEX ) or non-MIP based alternative combinatorial optimisation systems (localsolver ) can often give more optimal results in exponentially less running time than free open source alternatives.

1% more optimal solutions might translate into 1% more net profit for the entire org if you've gone whole hog and are trying to systematically profit optimise the entire business, so depending on the scale of the org it might be an easy business case to invest a few million dollars to set this system in place.

Annual server licenses for this commerical MIP solver software was 0(100k) / yr per server & the companies that build these products bake a lot of clever tricks from academia into these products that you can exploit by paying the license fee. ( my knowledge of pricing is out of date by about 7 years ) .

I'm all for linear optimization and other optimization techniques. It's refreshing to see other people talk about Gurobi, CPLEX, etc... Having done research in the field of scheduling and now getting contacted by companies, it's demoralizing to see that everybody usually speaks about machine learning while many problems can be solved in a more precise way with other techniques.

Aren’t software businesses increasingly like service businesses though?

They deliver now often with backend cloud storage, update near continuously, integrate frequently with outside services, sometimes open source major components iteratively, typically have an evolving API and developer ecosystem to educate, and are sold as subscriptions. It’s not as “human in the loop” as some of the AI described in this article but it’s clearly moving toward services in terms of margins.

Nothing is like the old shrink wrapped software business, basically.

Not from what I see - what I see is software companies using services as a way to shorten time-to-value for the customer. They do this either themselves or via professional services firms.

To me, the services you describe are software-as-a-service - they scale well without adding more humans to the mix. Services businesses, in contrast, generally need more humans to do more work.

I do think you are right that we are entering an age where the margin pressures will continue to increase. As the Amazon quote goes "your margin is my opportunity." In that world, strength accrues to the largest players - which is why AWS is so strong.

I like to joke that AWS should refund money to the startup that buy booths at re:Invent only to find out AWS is rolling out a competing service (with the acknowledgement that AWS entering a space doesn't necessarily mean the end of the competing company.)

So, way back in the last millenium, I did my Master's thesis (way smaller deal than a Ph.D. thesis) on neural networks. Since then, I have looked in on it every few years. I think they're cool, I like using them, and writing multi-level backpropagation neural networks used to be one of the first things I'd do in a new language, just to get a feel for how it worked (until pytorch came along and I decided for the first time that using their library was easier than writing my own).

So, it's not like I dislike ML. But, saying an investment is an "AI" startup, ought to be like saying it's a python startup, or saying it's a postgres startup. That ought not to be something you tell people as a defining characteristic of what you do, not because it's a secret but rather because it's not that important to your odds of success. If you used a different language and database, you would probably have about the same odds of success, because it depends more on how well you understand the problem space, and how well you architect your software.

Linear models or other more traditional statistical models can often perform just as well as DL or any other neural network, for the same reason that when you look at a kaggle leaderboard, the difference between the leaders is usually not that big after a while. The limiting factor is in the data, and how well you have transformed/categorized that data, and all the different methods of ML that get thrown at it all end up with similar looking levels of accuracy.

There used to be a saying: "If you don't know how to do it, you don't know how to do it with a computer." AI boosters sometimes sound as if they are suggesting that this is no longer true. They're incorrect. ML is, absolutely, a technique that a good programmer should know about, and may sometimes wish to use, kind of like knowing how a state machine works. It makes no great deal of difference to how likely a business is to succeed.

Saying that you're going to "use AI" is more akin to saying "we're going to have a web application" back in 1998.

Back then a lot of startups didn't have websites, because they were making other products (hardware, boxed software, etc). If they had a website it was just a marketing page.

So saying that you were going to make a "web application" did in fact differentiate you, in that it showed your approach was very different from the boxed software folks, but it didn't tell you much beyond that.

"Web application" came later. In the nineties it was called a "cgi web page" by your webmaster.

In the nineties, there was a huge difference between 1995 and 1998. It wasn't all that apparent to some of us until later, but things moved really fast in that timeframe. The years leading up to 2000 were almost like the imagining of approaching an event horizon or asymptote.

What you’re describing is so hard to convey to people. In 1994 the we were building raised-floor data centers with halon for suppressors and marveling at our 2GB behemoth UNIX boxes. And writing our own web application framework using CGI. In ‘99 we were renting a suite at a colo and putting our own hardware there, running ColdFusion web apps. In ‘04 we were renting half a rack at the same colo and trying not to write three tier Java servlet based apps with 1,000 line web.xml files. And then AWS happened.

CGI to ColdFusion to Java servlets. Sounds enterprise-y.

It was all very start-uppy. What were you using to build your commercial web applications in 1996 if not CGI? Mod_perl did not even exist until 1995, and FastCGI didn't exist IIRC until after Netscape released their enterprise server.

Huh. I agree with CGI, but CF certainly had alternatives.

The boring low-risk unsexy thing that works is often underrated. I didn't choose CF — at the time I argued that it was a tool for scrubs, but the VPE said, "Hey, I know it, and I know it can do the job." We launched that CF-based web site on time and sold the company for $350MM fewer than six months later. Only then did we incrementally port it to Java.

I can recall chugging along with a Pentium 133mhz and 56k dialup between 95-98.

Its fantastic to think that we didn't see 1ghz cpus and 1mb cable/dsl Internet until 2000.

The resourcefulness from that pre-2k era was amazing! It was leaps and bounds!

I know, but I'm writing to a modern audience. :)

Or from time to time, a webmistress.

> If you don't know how to do it, you don't know how to do it with a computer.

This is so true. We spent decades educating non-technical people that understanding a problem well is a prerequisite to programming it. Take something easy to understand like driving a car, doing it in a computer is now harder.

AI is undoing all that. People reach a vague problem they can't describe and assume computers will magically fix it.

Well the term Postgres or Python startup may not make sense, but a Pytorch or TensorFlow startup may not either. A database startup though, tells me the company is likely going to be in the database field, and most likely is going to try and sell me something I don't need. An AI startup, similarly, is going to either be utilizing existing techniques on industry problems to sell me something I don't need, or making some novel improvement to the training or inference to sell me something else I don't need.


Thank you for the perspective. Now when we talk machine learning are we talking:

L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press 2005.

G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics. CRC Press, 2001. Drton, Mathias, Sturmfels, Bernd, Sullivant, Seth. Lectures on Algebraic Statistics, Springer 2009.

Or more like:

Watanabe, Sumio. Algebraic Geometry and Statistical Learning Theory, Cambridge University Press 2009.

My understanding (I do not do AI or machine learning) that AI is distinct from these more mathematical analytic perspectives.

Finally, might we argue that generally AI/ML is more easily suited to data that's already high quality eg. CERN data, trade data, drug trial data as opposed to unconstrained data eg. Find the buses in these 1MM jpegs?

Pure CS based AI approaches are primarily for Image, Text, and maybe graphs and control. The domains are called computer vision, natural language processing, graph learning and reinforcement learning

Structured Data like tables, time series etc the techniques are still from statistics. Regression for example is the workhorse for numerical prediction problems

I think a lot of people are missing the point about leaps AI has made because they aren't aware of NLP or CV or reinforcement learning.

So "AI" mentioned above is stunningly good for buses in 1MM image and reasonably good drug trial, cern data.

The business models required for making AI business successful haven't been invented yet.

Good AI model will be Deep stack : example would be something like precision agriculture where you'd use AI for designing rice then use iot and earth observation to locate right acreages and monitor growth and adjust nutrient at crop level and get dramatically great output with least wastage and highest nutritional content.

Most AI companies are still started by ex CS folks who in general arent aware of deep technical opportunities in other disciplines. I think this will change soon very fast due to ubiquity of deep learning training material, libraries and research papers.

> There used to be a saying: "If you don't know how to do it, you don't know how to do it with a computer."

This is a tautology in the narrow sense, but in the broader sense I think there surely exist things that humans don't "know" how to do without a computer, but know how to do with a computer. And the space of solveable problems is expanding, though AI is only a narrow slice of that.

I don't know. I think what we all do, we know how to do it without a computer. Computer just automate stuff for us. It's a very practical saying because it forces you to ask the right question about the problem you're trying to solve. (We all know how to do AirBNB by hand, or Uber by hand, but the mobile app is hyper efficient w/ GPS & 4G, that's all).

I agree with the author's opinion about

> I’ll go out on a limb and assert that most of the up front data pipelining and organizational changes which allow for it are probably more valuable than the actual machine learning piece.

Especially at non-tech companies with outdated internal technology. I've consulted at one of these and the biggest wins from the project (I left before the whole thing finished unfortunately) were overall improvements to the internal data pipeline, such as standardization and consolidation of similar or identical data from different business units.

I do data science at a non-tech company with outdated internal technology and I've seen this over and over again. Honestly though, it's worth every penny because often the only way to get the resources to truly solve data pipeline issues is to get an executive to buy some crap from a vendor and force everyone to make it work.

I was a consultant at one of the giant outsourcers and nod my head vigorously at this comment. The least sexy projects were MDM (master data management) but they were absolutely essential to the success of any other fancy analytics/BI/ML project.

Interestingly I too worked on MDM systems about ten years ago, when I was at IBM Research. Ironically, one of my first ideas for applying machine learning was in de-duplication of data in an MDM server. However the technology was a bit too primitive back in 2010 and the project was a hard sell so it was abandoned.

No need to look at AZ for this. If you're building "AI" I wish you a speedy road to being acquired by a company that can put it to use. You've become a high priced recruiting firm.

If you're solving a real problem and use ML in service of solving that problem, then you've got a great moat....happy trusting customers.

It's not complicated

Sssh! Valuations are a function of projected market size and opacity of the problem. Clarity like this collapses the uncertainty and destroys value. If you pour enough capital into rooms full of PhD's something's gotta hit.

My way of saying, you're very, very right.

I wrote an article I published a week ago about how AI is the biggest misnomer in tech history https://medium.com/@seibelj/the-artificial-intelligence-scam...

I wrote it to be tongue-in-cheek in a ranting style, but essentially "AI" businesses and the technology underpinning it are not the silver bullet the media and marketing hype has made it out to be. The linked article about a16z shows how AI is the same story everywhere - enormous capital to get the data and engineers to automate, but even the "good" AI still gets it wrong much of the time, necessitating endless edge-cases, human intervention, and eventually it's a giant ball of poorly-understand and impossible to maintain pipelines that don't even provide a better result than a few humans with a spreadsheet.

Coming from a fellow masshole: that's a great rant.

There was this meme in the 70s about "self driving cars" following magnetic strips in the road in restricted highways. I remember at the time, being, like 8 and thinking "sure seems like an overly complicated train."

Thanks man! Lifelong masshole here.

Your post was much better than mine, but I appreciate the comment.

>That’s right; that’s why a lone wolf like me, or a small team can do as good or better a job than some firm with 100x the head count and 100m in VC backing.

goes on to say

>I agree, but the hockey stick required for VC backing, and the army of Ph.D.s required to make it work doesn’t really mix well with those limited domains, which have a limited market.

Choose one?

Also assumes running your own data center to be easy. Some people don't want to be up 24x7 monitoring their data center or to buy hardware to accommodate the rare 10 minute peaks in usage.

>rare 10 minute peaks

But is that really the use case here? I haven't worked in ML. But I'm not seeing where you are going to need to handle a 10 minute spike that requires a whole datacenter.

A month's worth of a quad gpu instance on AWS could pay for a server with similar capacity in a few months of usage.

And hardware is pretty resilient these days. Especially if you co-locate it in a datacenter that handles all the internet and power up time for you. And when something does go wrong, they offer "magic hands" service to go swap out hardware for you. Colocation is surprisingly cheap. As is leasing 'managed' equipment.

Training ML models usually doesn’t have the same uptime requirements as production systems. If your training goes down for a bit, it probably won’t make much difference to the underlying business, in most cases.

That’s why the author found it glaringly obvious that it should be brought in-house. It’s often both the most costly and most “in-housable” compute work involved in these companies.

I don't think these are necessarily contradictory. With pytorch-transformers, you can use a full-blown BERT model like the best in the world. And yet, to make this novel and defensible, you would need to build on top of it and innovate significantly, which would require significant capital to achieve.

I ran a small data cluster for years, the horsepower behind my startup. Other than the Chinese DDoS attacks, running the cluster was absolutely elementary. The idea that running a server or a band of servers is difficult is a bold faced lie. People have got to stop repeating the cloud propaganda.

> Some people don't want to be up 24x7 monitoring their data center or to buy hardware to accommodate the rare 10 minute peaks in usage.

Do you need that for training workloads, and what percentage of a startups workload is training?

I found it fun to read this after reading this other post that made the rounds today about AI automating most programming work and making program optimization irrelevant: https://bartoszmilewski.com/2020/02/24/math-is-your-insuranc...

A thread about the original article, from a few days ago: https://news.ycombinator.com/item?id=22352750

I predict a great future for startups that sell pickaxes, err, tools for AI.

AI is like the new gold rush. And just like back then, it's not the gold diggers that will get rich.

"Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling."


(from 2017)

Is it the new gold rush though. I work in a large organisation that has a lot of data and inefficient processes, and we haven’t bought anything.

It hasn’t been for a lack of trying. We’ve had everyone from IBM and Microsoft to small local AI startup try to sell us their magic, but no one has come up with anything meaningful to do with our data that our analysis department isn’t already doing without ML/AI. I guess we could replace some of our analysis department with ML/AI, but working with data is only part of what they do, explaining the data and helping our leadership make sound decisions is their primary function, and it’s kind of hard for ML/AI to do that (trust me).

What we have learned though, is that even though we have a truck load of data, we can’t actually use it unless we have someone on deck who actually understands it. IBM had a run at it, and they couldn’t get their algorithms to understand anything, not even when we tried to help them. I mean, they did come up with some basic models that their machine spotted/learned by itself by trawling through our data, but nothing we didn’t already have. Because even though we have a lot of data, the quality of it is absolute shite. Which is anecdotal, but it’s terrible because it was generated by thousand of human employees over 40 years, and even though I’m guessing, I doubt we’re unique in that aspect.

We’ll continue to do various proof of concepts and listen to what suppliers have to say, but I fully expect most of it to go the way Blockchain did which is where we never actually find a use for it.

With a gold rush, you kind of need the nuggets of gold to sell, and I’m just not seeing that with ML/AI. At least no yet.

AI != gold. The market for selling tools to people who are essentially chasing buzz words is much smaller than that of selling tools to people extracting scarce metals from the ground.

Ultimately the value of selling tools is dependent on the riches being mined actually existing. The value of AI/big data to the average business has yet to be determined

>"Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling."

A lot of those companies are styled as "AI" companies themselves, aiming to automate the process of labeling.

The main winner here really is Amazon. They get a chunk by serving up infrastructure and in labeling through mechanical turk.

An many times all these AI computations go into solving mundane problems like "What's the likelihood of this Ad to perform well".

AI is so shiny that makes people want to jump as fast as they can into that boat but a reasonable objective analysis shows that a huge and not insignificant amount of software problems can still be solved without relying on the "AI black box".

You all know a GTX 1070 with 8GB on a gaming laptop with 32GB is still doing wonders and covering 90%+ business cases when coupled with smart & batch techniques the likes of you learn from fast.ai or under direct pytorch implementation, right??

> Training a single AI model can cost hundreds of thousands of dollars (or more) in compute resources

Why don't they buy their own hardware for this part? The training process doesn't need to be auto-scalable or failure-resistant or distributed across the world. The value proposition of cloud hosting doesn't seem to make sense here. Surely at this price the answer isn't just "it's more convenient"?

because you are trading speed for cash.

Say you have $8M in funding, and you need to train a model to do x

You can either:

a) gain access to a system that scale ondemand and allows instant, actionable results.

b) hire a infrastructure person, someone to write a K8s deployment system. Another person to come in a throw that all away. Another person to negotiate and buy the hardware, and another to install it.

Option b is can be the cheapest in the long term, but it carries the most risk of failing before you've even trained a single model. It also costs time, and if speed to market is your thing, then you're shit out of luck.

Why in the world do you need a Kubernetes deployment system to run a single, manual, one-time (or a handful of times), high-compute job?

Because when all you have is a hammer, everything looks like a nail.

We have become so DevOps and cloud dependent that everyone has forgotten how to just run big systems cheaply and efficiently.

Because that high-compute job needs to be distributed on many, many machines, and if you're using cheap preemptible instances you have to handle machines dropping off and joining in while you're running that single job.

It's definitely not something that you can launch manually - perhaps Kubernetes is not the best solution, but you definitely need some automation.

because its not a one time operation.

also, how else do you sensibly deploy and manage a multi-stage programme on >500 nodes?

I mean we use AWS batch, which is far superior for this sort of thing. SLURM might work for real steel, as would tractor from pixar.

If you're in a position where you need to train a large network: first, I feel bad for you. second, you'll need additional machines to train in a reasonable amount of time.

ML distributed training is all about increasing training velocity and searching for good hyperparameters

> Better user interfaces are sorely underappreciated.

This is why I’m much more excited by AR and VR than AI. Human brains are fucking amazing at certain kinds of data processing and inference and pretty mediocre at others. We should be focusing more on creating interfaces and data visualizations that unlock that superpower for wider applications.

I'm not terribly convinced of point 4.

> Machine learning will be most productive inside large organizations that have data and process inefficiencies.

I strongly believe ML is at worst dangerous and at best pointless here. Data and Process inefficiencies => garbage in, garbage out. ML is NOT a silver bullet in large organisations that have these issues*, I've seen managers try to adopt ML to solve issues, but the results are almost always suspect and/or marginally better than simple if else rules but require a multiple people or teams to get all the data and models right.

“ Embrace services. There are huge opportunities to meet the market where it stands. That may mean offering a full-stack translation service rather than translation software or running a taxi service rather than selling self-driving cars. Building hybrid businesses is harder than pure software, but this approach can provide deep insight into customer needs and yield fast-growing, market-defining companies. Services can also be a great tool to kickstart a company’s go-to-market engine – see this post for more on this – especially when selling complex and/or brand new technology. The key is pursue one strategy in a committed way, rather than supporting both software and services customers.”

Exactly wrong and contradicts most of the thesis of the article - that AI often fails to achieve acceptable models because of the individuality, finickiness, edge cases, and human involvement needed to process customer data sets.

The key to profitability is for AI to be a component in a proprietary software package, where the VENDOR studies, determines, and limits the data sets and PRESCRIBES this to the customer, choosing applications many customers agree upon. Edge cases and cat-guacamole situations are detected and ejected, and the AI forms a smaller, but critical efficiency enhancing component of a larger system.

The thesis of the article is that this is going to be called consultancy.

Single-focus disruptors bad. Generic consultancy good - with ML secret sauce, possibly helped by hired specialist human insight.

Companies that can make this work will kill it. Companies that can't will be killed.

It's going to be IBM, Oracle, SAP, etc all over again. Within 10 years there will be a dominant monopolistic player in the ML space. It will be selling corporate ML-as-a-service, doing all of that hard data wrangling and model building etc and setting it up for clients as a packaged service using its own economies of scale and "top sales talent" (it says here).

That's where the big big big big money will be. Not in individual specialist "We ML'd your pizza order/pet food/music choices/bicycle route to work" startups.

Amazon, Google, MS, and maybe the twitching remnants of IBM will be fighting it out in this space. But it's possible they'll get their lunch money stolen by a hungry startup, perhaps in collaboration with someone like McKinsey, or an investment bank, or a quant house with ambitions.

5-10 years after that customisable industrial-grade ML will start trickling down to the personal level. But it will probably have been superseded by primitive AGI by then, which makes prediction difficult - especially about that future.

The big consulting firms have been building in-house ML libraries for common business problems for 3+ years. They don't need to acquire the data startups because as the article points out, these models are commoditized pretty quickly (especially when you have access to the transactional data of many large multinational companies). There is no secret sauce to ML that makes you any more likely to succeed with it than Accenture -- and they have a much deeper pipeline than you do. ML is a mature capability at all of the enterprise-tier consultancies, and they bundle it with their $100M system deployments. The mid-market consultancies are working on it. There is very little money to squeeze out of this market.

We're also a long way off from AGI. Nobody really even has a roadmap to what an AGI would look like. Heck, DNN/ML techniques have been widely-known since the early 90s; they just became practical with access to cloud-scale hardware, so the current situation has been 25+ years in the making.

Now a days DL models are becoming commodities very fast. By the time you train NN to solve a particular problem, a new efficient model is out somewhere and is available public. So you need to go through the process entirely or else you risk losing business. Unless your NN is so unique like you are handcrafting your own in which case you take lot of time to arrive at a best model and you need more PhDs.

Props to the ML community for being so open.

Open does not mean patent-free.

That is a great write up and very accurate description of both the costs and human intervention based on my experience with “AI” tools.

> In the old days of on-premise software, delivering a product meant stamping out and shipping physical media – the cost of running the software, whether on servers or desktops, was borne by the buyer. Today, with the dominance of SaaS, that cost has been pushed back to the vendor. Most software companies pay big AWS or Azure bills every month – the more demanding the software, the higher the bill.

This irrational sheep mentality amuses me. Yes, tehre are some very specific cases where AWS & ca. is clearly a better choice, but for the most cases I saw the TCO with hosting it on premises or renting servers is much lower, sometimes by an order of magnitude (in some cases even more). But people insist on doing it because others do it. We'll soon have an entire generation of engineers completely hooked on AWS & co. and not even realizing other solutions are possible, not to mention lower TCO.

AI on the algo side is only half the story -- it has to sit in a domain specific framework to be most effective

I see a lot of 'bolt-on' tech emerging -- it looks mostly snake oil -- there is no obvious way to be competitive against teams that baked it in to the bare metal design

Also most commercial use-cases I've seen need effective ML more than anything else

There are many problems which are simply impossible to do with traditional optimisation or human analysis, that ML can do really well at. But I get the sense that this is not the type of problem that these "AI" startups referred to are addressing. Instead its like 'here is a problem I can charge for, with some ML magic it will be easy'. This is classic snake oil.

Being able to sift/classify/analyse data with ML really can be a 'moat', an extreme competitive advantage. But using "AI" doesn't automatically get you there.

Separately, AWS is an expensive luxury, which is worth it if for some reason you can't manage your own computers.

I really annoys me when analysts like this guy mangle together things which are obvious and then comes up with an unsupported conclusion, like "second AI winter is coming man".

The A16Z piece makes all these points quite clearly. This editorial is trying to put a finer point on a sharp knife.

Agree mostly but he only talk about some AI start-ups that have a 1 to 1 model or at best a 1 to few. There is some AI startups like ours which have a 1 to many model. We use Computer Vision to collect data from video streams and sell data and transformed data through our API. The output of our models is the same for everyone.

Cost wise though it's clearly being not knowledgeable about how it works or at least think all AI startups have huge training set. For many companies owning your hardware for training is a very easy step to rationalise cost.

It feels like an article written about all AI companies but actually (very) true only for some AI companies.

I wonder how much of the formidable amount of computing resources required for deep learning can be attributed to wasteful and inefficient programming practices. A lot of the ML libraries that I see are written in Python with very little attention paid to aspects such as memory usage, cache coherency, concurrency, etc.

If we focused on writing more efficient software instead of demanding bigger and faster machines with more and more GPUs, would the cost of ML become more practical? More importantly, as the author pointed out, would smaller companies have a better chance at making advancements in the field?

Here's what cloud gives you that is very costly to implement internally, cost accountability. Analysts running the same queries over and over would peg internal hardware all the time. When we went to the cloud, we made a budget for each division, problem solved. Same with DS. Give them a blank check, they'll spend it, manage to a budget, they'll do it.

"(my personal bete-noir; the term “AI” when they mean “machine learning”)"

This is so right. Using a term "artificial intelligence" for machine learning is like using "artificial horses" to describe cars. It is even worse, since we cannot even define what "natural intelligence" actually is. Stop talking about "artificial intelligence".

Or "artificial swans" that "appear even more lifelike".


>The bodywork represents a swan gliding through water. The rear is decorated with a lotus flower design finished in gold leaf, an ancient symbol for divine wisdom. Apart from the normal lights, there are electric bulbs in the swan’s eyes that glow eerily in the dark. The car has an exhaust-driven, eight-tone Gabriel horn that can be operated by means of a keyboard at the back of the car. A ship’s telegraph was used to issue commands to the driver. Brushes were fitted to sweep off the elephant dung collected by the tyres. The swan’s beak is linked to the engine’s cooling system and opens wide to allow the driver to spray steam to clear a passage in the streets. Whitewash could be dumped onto the road through a valve at the back of the car to make the swan appear even more lifelike.

>The car caused panic and chaos in the streets on its first outing and the police had to intervene.

I interviewed at some AI companies a year or two back. They all had teams of people dedicated to support each client: to clean their data, train their models, integrate the domain-specific requirements, customize UIs, etc. They sold themselves as the next AI-powered mega-unicorns, but they were more like boutique consultancies with no obvious path to scale up.

"Boutique Consultancy" is quite recapitulative for most AI companies for now. But this may be the only way to empower their clients. One of these startups will find the path to scale up eventually.

Related to the topic of marginal benefits of AI models versus their costs:

Green AI (Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni - 2019)


I sometimes contribute to methodology projects in neuroscience ("AI" for scientists). The most tiring part of it is explaining essentially these things over and over. Very interesting to see the sentiment vindicated in Startupistan.

I view AI as the application of ML and ML as the implement (tool). Therefor, tooling efficiency is a competitive advantage of good ML projects.

Nice article. The flip side of the coin is that all these “problems” are potential moats for a well tuned ML company to use to defend market share.

On the other hand some of the startup is doing absolutely fraud on the name of AI.I went to a self checkout store (AIFI.io). I did not touch anything but they charge me $35.10. According to the receipt I took 17 packs of snacks :) These guys are doing fraud on the name of AI. They have no technology no software just put up some camera and open a store so that they can defraud the investor. Anyone can try if intersted https://www.aifi.io/loop-case-study

> “AI coming for your jobs” meme; AI actually stands for “Alien (or) Immigrant” in this context.

Finally a correct use of "AI".

Well, duh. Unless you invent AGI you're always going to be fitting new models for new clients. The best case scenario is getting bought by a client and becoming their full-time ML tailor.

For a pure ML company to IPO they'd have to both solve intelligence and manufacture their own hardware. FOMO screwed a lot of investors who would've been better off buying Google stock.

Generally the use of the phrase from a great height implies the height is one of morality, intellect, or valor (each of these decreasing in usage), I'm not exactly sure what the great height Andreessen-Horowitz craps from is composed of - maybe money?

I think they may just be crapping on them from a reasonable vantage point.

The height is not really about morals. Its more about the blast radius of the shit.

Or like “nuked from orbit”

I guess I won't mention Kubeflow here.....

This is a terrific article. Two thumbs up.

All of this might be true currently, but that's because this current first generation "AI" (technically should just be called ML) is mostly bullshit. To clarify, I don't mean anyone is lying or selling snake oil - what I mean by bullshit is that the vast majority of these services are cooked up by software developers without any background in mathematics, selling adtechy services in domains like product recommendation and sentiment analysis. They are single discipline applications accessable to devs without science backgrounds and do not rely on substantial expertise from other fields. That makes them narrow in technical scope and easy to rip off (hence no moat, lots of competition, and human reliance and lack of actual software).

The next generation of Machine Learning is just emerging, and looks nothing like this. Funds are being raised, patents are being filed, and everything is in early stage development, so you probably haven't heard much yet - but these ML startups are going after real problems in industry: cross disciplinary applications leveraging the power of heuristic learning to make cross disciplinary designs and decisions currently still limited to the human domain.

I'm talking about the kind of heuristics which currently exist only as human intuition expressed most compactly as concept graphs and, especially, mathematical relationships - e.g. component design with stress and materials constraints, geologic model building, treatment recommendation from a corpus of patient data, etc. ML solutions for problems like these cannot be developed without an intimate understanding of the problem domain. This is a generalist's game. I predict that the most successful ML engineers of the next decade will be those with hard STEM backgrounds, MS and PhD level, who have transitioned to ML. [Un]Fortunately for us, the current buzzwordy types of ML services give the rest of us a bad name, but looking at these upcoming applications the answers to the article tl;dr look different:

>Deep learning costs a lot in compute, for marginal payoffs

The payoffs here are far greater. Designs are in the pipeline which augment industry roles - accelerate design by replacing finite methods with vastly quicker ML for unprecedented iteration. Produce meaningful suggestions during the development of 3D designs. Fetch related technical documents in real time by scanning the progressive design as the engineer works, parsing and probabilistically suggesting alternative paths to research progression. Think Bonzi Buddy on steroids...this is a place for recurring software licenses, not SaaS.

>Machine learning startups generally have no moat or meaningful special sauce

For solving specific, technical problems, neural network design requires a certain degree of intuition with respect to the flow of information through the network, which both optimizes and limits the kind of patterns that a given net can learn. Thus designing NN for hard-industry applications is predicated upon an intimate understanding of domain knowledge, and these highly specialized neural nets become patentable secret sauces. That's half of the most - the other comes from competition for the software developers with first-hand experience in these fields, or a general enough math heavy background to capture the relationships that are being distilled into nets.

>Machine learning startups are mostly services businesses, not software businesses

Again only true because most current applications are NLP adtechy bullshit. Imagine coding in an IDE powered by an AI (multiple interacting neural nets) which guides the structure of your code at a high level and flags bugs as you write. This, at a more practical level, is the type of software that will eventually change every technical discipline, and you can sell licenses!

>Machine learning will be most productive inside large organizations that have data and process inefficiencies

This next generation goes far past simply optimizing production lines or counting missed pennies or extracting a couple extra percent of value from analytics data. This style of applied ML operates at a deeper level of design which will change everything.

>The next generation of Machine Learning is just emerging, and looks nothing like this. Funds are being raised, patents are being filed, and everything is in early stage development, so you probably haven't heard much yet ...

Citations needed. Large claims: presumably you can name one example of this, and hopefully it's not a company you work at.

I've seen projects on literally all the things you mention: materials science, medical stuff, geology/prospecting -none of them worked well enough to build a stand alone business around them. I do know the oil companies are using DL ideas with some small successes, but this only makes sense for them, as they've been working on inverse problems for decades. None of them buy canned software/services: it's all done in house. Probably always will be, same as their other imaging efforts.

>Citations needed. Large claims: presumably you can name one example of this, and hopefully it's not a company you work at.

Unfortunately this is all emerging just now and yes, I do work at such a company, but I'm old enough to not be naively excited by some hot fad. There's something profound just starting to happen but everyone is keeping the tech rather secret because it isn't developed/differentiated enough yet to keep a competitor from running off with an idea, yet. Disclosure is probably 1-3 years out of estimate.

>I do know the oil companies are using DL...as their other imaging efforts.

You're correct, and I happen to have experience in this domain - except there are a handful of up and commers courting funds from global majors like Shell and BP, and seismic inversion is near the end of the list of novel applications. Peteoleum is ground zero for a potential revolution right now, if we can come up with something before the U.S. administration clamps down on fossil fuels.

But we're talking complex algorithms which consist of multiple interacting neural networks. We are rapidly moving toward rudimentary reasoning systems which represent conceptual information encoded in vectors. I'm jaded enough that I wouldn't say we're developing AGI, but if the progressing ideas I'm familiar with and Workin on personally pan out, they will be massive baby steps towards something like AGI.

The space is evolving at least as rapidly as the academic side, which I think is an unprecedented pace of development for a novel field of study. I can't help but feel like these are the first steps towards some kind of singularity. There's no question that we are on to something civilization changing with neural networks, what remains to be seen is whether compute scaling will keep up with the needs of this next generation ML. Even if research stopped today, the modern ML zoo has exploded with architectures with fruitful applications across domains. The future is here!

I mean, the fact that you just wall of texted me with "trust me" doesn't inspire a lot of confidence. You could at least point me at an impressive paper or something!

I read all the NIPS stuff every other year; don't see anything game changing in there!

Fossil fuels are done. Even if they get a reprieve for another four years, they'll be done after that. As someone who currently works in the petroleum industry, I wouldn't recommend anyone start a company in it now. It's far too volatile, and the political winds are shifting strongly against it.

I don't believe anyone who says that we're getting close to AGI. There are interesting techniques that might get us closer, but they're so far behind in terms of compute power and scaling that it's hard to see how they could be practical in the near future. Anything that's based on back prop is not going to get there. The singularity is a loooong way away.

Is the misspelling of "Andreessen-Horowitz" and use of "A19H" instead of "a16z" intentional?

I suck at spelling. If I was one of the cool kids I'd claim to be dyslexic.

hi OP. We built an open-source library called, BentoML(https://github.com/bentoml/bentoml) to make model inferencing/serving a lot easier for Data scientists in various serving scenarios.

Love to hear your thoughts on our library

I was really hoping that you where about to offer an ML framework to improve spelling.

You mean the fact that they left out an "s" in Andreessen?

We've squeezed another s above.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact