So yeah, you could spend one or two FTE salaries' (or one deep learning PhD's) worth of cash on finding such models for your startup if you insist on helping Jeff Bezos to wipe his tears with crisp hundred dollar bills. That's if you know what you're doing of course. Literally unlimited amounts could be spent if you don't. Or you could do the same for a fraction of the cost by stuffing a rack in your office with consumer grade 2080ti's. Just don't call it a "datacenter" or NVIDIA will have a stroke. Is that too much money? Not in most typical cases, I'd think. If the competitive advantage of what you're doing with DL does not offset the cost of 2 meatspace FTEs, you're doing it wrong.
That, once again, assumes that you know what you're doing, and aren't doing deep learning for the sake of deep learning.
Also, if your startup is venture funded, AWS will give you $100K in credit, hoping that you waste it by misconfiguring your instances and not paying attention to their extremely opaque billing (which is what most of their startup customers proceed to doing pretty much straight away). If you do not make these mistakes, that $100K will last for some time, after which you could build out the aforementioned rack full of 2080ti's on prem.
We don't train ML models, but we are in a similar boat regarding cloud compute costs. Building our solutions for our clients is a compute-heavy task which is getting expensive in the cloud. We are considering options such as building commodity threadripper rigs, throwing them in various developers' (home) offices, installing a VPN client on each and then attaching as build agents to our AWS-hosted jenkins instance. In this configuration we could drop down to a t3a.micro for Jenkins and still see much faster builds. The reduction in iteration time over a month would easily pay for the new hardware. An obvious next step up from this is to do proper colocation, but I am of a mindset that if I have to start racking servers I am bringing 100% of our infrastructure out of the cloud.
It's noisy, it takes up space, and presumably I'm on call to fix it if it breaks.
You should pay them an extra 24x(PSU wattage)x(peak $/Wh in area) per day for the electricity too.
I'm alarmed that someone in your company felt this idea was appropriate enough to propose.
This was my idea. I am a developer in my company. We are a flat structure. We have a lot of respect for each other. I am on a standup with the CEO every day. We all believe in our product and would happily participate in whatever activity brings it to market more quickly. We do not hire or retain the kind of talent that would flatly refuse to participate in experimental projects like this. At least not without some sort of initial conversation about why it's not a good fit for a particular individual.
I certainly see how someone might share your perspective. I used to work for a souless megacorp and I could have easily found myself telling my former employer to "go fuck themselves" if a proposal similar to this was imposed upon me.
A 42u rack and 1 Gbps connection is $400 per month.
Put cheap supermicro EPYC servers in rather than threadrippers (or build your own). High capacity RDIMMs are cheaper than UDIMMs.
This will give you a much more maintainable solution than workstations at employee's home via overlay VPN.
There is a maintenance cost for infrastructure that people tend to forget these days.
I later learned that the CTO had to spend a couple hours talking him out of firing me. I probably should've just quit on the spot anyway.
Your plan makes sense but be mindful of the acoustics or your devs may grow to hate you.
Please be mindful of the fact that consumer products are not designed for the workstation/server type of load. It’s related to why the hardware is cheaper compared to the server hardware. Also, the consumer ISP connection is most likely not as reliable as that of a data center. I’m working remotely from home and I experienced this many times, bad performance in peak times or a half of a day of downtime can happen, without any warning. And account for maintenance, everyone on the team must be able to figure out a problem or deal with getting someone else in to fix it.
I know I sound like a buzzkill. I am writing this with good intentions.
Even if this works right now, it’s not a reliable long-term solution. Maybe instead of dumping a couple of grand on consumer PCs to handle server’s work, look into building a proper server. Or you could find a datacenter provider to rent their hardware, something that is not as shiny and full of features like AWS.
"Backpack" style commercial vacuum cleaners have more suction, and are barely audible in comparison.
There does not have to be an increadibly loud functional industrial vacuum cleaner, for figuratively everyone to get your analogy, because the Herculean reality of vacuum cleaners is that you cannot clean an augean stable of lego on the floor, without a lot of noise. If you get my analogy.
The point is, when it comes to consumer behavior, I don't think anyone has a clue what to expect. It would not surprise one bit if vacuum companies make louder vacuums because the consumer thinks it works better.
This company was using them as desktop workstations, in an open office.
One was used as a build host. Often, the shopvac wail of E4000 fans would be cut short by some poor dev going berserk and unplugging the thing when nobody was looking...
Honestly why ever go to the cloud? It seems like a Larry Ellison boondoggle with the absurdly high costs and lock-in. (Ever look at moving your data?)
Running your own metal is cheaper if you actually fund it.
I've saved a ton of money just giving them dedicated workstations to develop on and then having everyone use a shared EC2 instance to push jobs to a fleet of spot instances for large scale training.
Now let's say your customer wants to analyze 2 hours = 120 minutes of video and doesn't want to wait more than those 3 hours, then suddenly you need 120 servers with one $10k GPU each to service this one customer within 3 hours of waiting.
Good luck reaching that $1,200,000 customer lifetime value to get a positive ROI on your hardware investment.
When I talk about AI, I usually call it "beating the problem to death with cheap computing power". And looking at the average cleverness of AI algorithm training formulas, that seems to be exactly what everyone else is doing, too.
And since I'm being snarky anyway, there's two subdivisions to AI:
supervised learning => remember this
unsupervised learning => approximate this
Both approaches don't put much emphasis on intelligence ;) And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.
For the vast majority of people the main expense is creating the combination of a dataset and model that works for their practical problem, with the dataset being the harder (and sometimes more expensive) problem of the two.
The dataset is also their "moat", even though most of them don't realize it, and don't put enough care into that part of the pipeline.
State of the art optical flow tracking needs about 10 GB of GPU memory to execute on full HD frames. I wouldn't know of any mainstream phone with that much RAM.
That, BTW, is also the reason why autonomous drones usually downsample the images before AI tracking, which has the nasty side effect of making thin branches, fences, telephone wires, etc. invisible.
This doesn't make any sense at all.
Both are "remembering" something under some constraint, which forces generalisation.
Supervised learning just "knows" what it is "remembering". Unsupervised learning is just trying to group data into patterns.
Both approaches don't put much emphasis on intelligence
Seems like most "intelligence" relies a lot on pattern recognition.
And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.
The evidence is that you are wrong on this for a number of pretty important problems. I don't know much about optical flow, but in the image and text spaces you can't approach the accuracy of neural network approaches with hand crafted features.
I.e. I think that in one minute video, 95% of your images do not have new information in them
unsupervised learning => approximate this"
Lol this can't be more wrong lmao. Both areas "remember" and "approximate" things trough training. The difference is that unsupervised learning does not have labeled data, thus it has to search for some pattern. Honestly not even computer science graduates would say something like this.
- Or Tensorflow/Pytorch could've crapped on OpenCL a little less by releasing a fully functional OpenCL version everytime they released a fully functional Cuda version, instead of worshipping Cuda year in and year out.
- Or Google could start selling their TPUv2, if not TPUv3, while they're on the verge of releasing TPUv4.
- Or one of the other big-tech's Facebook/Microsoft/Intel could make and start selling a TPU-equivalent device.
- Or I could finish school and get funded to do all/most of the above ;)
edit: On a more serious note, a cloud/on-prem hybrid is absolutely the right way to go. You should have a 4x 2080 ti rig available 24x7 for every ML engineer. It costs about $6k-8k a piece . Prototype the hell out of your models on on-prem hardware. Then when your setup is in working condition and starts producing good results on small problems, you're ready to do a big computation for final model training. Then you send it to the cloud, for final production run. (Guess what, on a majority of your projects, you might realize, the final production run could be carried out on on-prem itself; you just have to keep it running 24 hours-a-day for a few days or up to a couple weeks.)
This is hard and time consuming, and this field is hard enough as it is. What makes it even harder is that only NVIDIA has decent, mature tooling. There is some work on ROCM though, so AMD is not _totally_ dead in the water. I'd say they're about 90% dead in the water.
Do you need to do the stupid things performantly, though? Because that sounds like a case for skipping microcode shims, and going straight to instructions that trap into a software implementation. Or just running the whole compute-kernel in a driver-side software emulator that then compiles real sub-kernels for each stretch of non-stupid GPGPU instructions, uploads those, and then calls into them from the software emulation at the right moments. Like a profile-guided JIT, but one that can't actually JIT everything, only some things.
I have never recovered from that.
Several issues: 1. electricity bill is still an issue, I've been paying anywhere between $500 to $1000 per month for this workstation (always have something to train). 2. something with a decent memory size (Titan RTX and RTX 8000) cost way too much; 3. once you reached a point of 4-2080Ti-is-not-fast-enough, power management and connectivity setup would be a nightmare.
Would love to know other people's opinions on the on-prem setup, especially whether a consumer-grade 10Ghe is enough for connectivity-wise.
Although once you reach 4 2080ti, you ought to consider switching to a titanium grade psu and rewiring if you're in a 100-120v country. If you're feeling cheap, just steal the phases from two different circuits. Last I looked, most psu operate around 5% lower efficiency on 115 vs 230.
I chukled, but more seriously, if you can't rewire your house to get a normal 240V circuit, you should not be fucking around with hacks like above.
How much is your electricity? I currently run 12 GPUs in my garage pretty much non-stop. 4 GPUs per machine, 3 machines. Each machine is about 1.2KW on average (I can tell because each machine is connected through its own rack UPS), or 13.2 cents per hour, or $95/mo. Which, IMO, is not bad at all. That's less than $300 per month for 12 GPUs.
I have come across fly.io, Vultr, Scaleway, Stackpath, Hetzner, and OVH but either they are expensive (in that they charge for bandwidth and uptime) or do not have a wide enough foot-print.
I guess colos are the way to go, but how does one work with colos, allocate servers, deploy to them, ensure security and uptime and so on from a single place, 'cause dealing with them individually might slow down the process? Is there a tooling that deals with multi-colos like the ones for multi-cloud like min.io, k8s, Triton etc;
It depends what you need in your datacenters! If you just want servers, and don't care about doing something like anycast, you can find a bunch of local dedicated server providers in a bunch of cities and go to town. But you can't get them all from one provider, really, not with any kind of reasonable budget.
You _could_ buy colo from a place like Equinix in a bunch of cities, and then either use their transit or buy from other transit providers.
But also, unmetered bandwidth isn't a very sustainable service, so I'm curious what you're after? You're usually either going to have to pay for usage, or pay large monthly fixed prices to get reasonable transit connections in each datacenter.
In our case, we're constrained by Anycast. To expand past the 17 usual cities you end up needing to do your own network engineering which we'd rather not do yet.
It is anycast that I'm going after. Requirement for unmetered bandwidth (or cheaper than AWS et al) is because of the kind of workloads (TURN relays, proxy, tunnels etc) we'd deal with gets expensive, otherwise. For another related workload, per-request pricing gets expensive, again, due to the nature of the workload (to the tune 100k requests per user per month).
So far, for the former (TURN relays etc), I've found using AWS Global Accelerator and/or GCP's GLB to be the easiest way to do anycast but the bandwidth is slightly on the expensive side. Fly.io matches the pricing in terms of network bandwidth (as promised on the website), so that's a positive but GCP/AWS have a wider footprint. Cloudflare's Magic Transit is another potential solution, but requires an enterprise plan and one needs to bring-your-own-anycast-IP and origin-servers.
For the latter (latency-sensitive workload with ~100k+ reqs / month), Cloudflare Workers (200+ locations minus China) are a great fit though would get expensive once we hit a certain scale. Plus, they're limited to L7 HTTP reqs, only. Whilst, I believe, fly.io can do L4.
Does adding an extra 100ms to the response time cost you that much business wise?
As for colos, it depends on scale. If you have 30k servers world wide, it pays to have someone manage the contracts for you. If not it pays to go for the painful arseholes like vodaphone, or whoever bought Cable & wireless's stuff.
as for security, it gets very difficult. You need to make sure that each machine is actually running _what_ you told it, and know if someone has inserted a hypervisor shim between you and your bare metal.
none of that is off the shelf.
Which is why people pay the big boys, so that they can prove chain of custody and have very big locks on the cages.
K8s gives you scheduling and a datastore. For a large globally distributed system its going to scale like treacle.
(I'm in full agreement with everything you've written + it's well-phrased and funny. gj!)
 that's not a typo - there is such thing as "Oracle cloud"
You can't build hand-build a feature detector as accurate as (say) a ResNet50 by hand. Before 2014 people tried to do this with feature detectors like SIFT and HOG. These were patented and made the inventors significant money. If it was still possible to do it then someone would be and making profit from it.
Hyperparameter search is just optimising the training parameters (things like batch size, or optimiser parameters). This might give another 1% lift on accuracy, but isn't generally a significant factor.
Yes, you can. If, that is, you can actually understand what the produced model is doing. And, of course, no human can do that, because no human understands the algorithm being employed by the produced model, because it's a really freaking complex algorithm whose optimal formulation really is just a graph of matrix transformations, rather than an imperative procedure that can be described by words like "variable."
This is an important idea to absorb, for the specific case where the AI converges on an optimal algorithm that's actually very simple—because the data has a regular, simple shape—rather than on one that's too complicated for our mortal minds. If you already knew that simple algorithm, then the work you did training an AI just to end up back at that same simple algorithm is wasted effort. An AI can't do better at e.g. being an AND gate, than an actual AND gate can. An AI can't do what wc(1) does better than wc(1) can.
If the data is regular—that is, if a model of its structure can be held fully in a human brain—then jumping immediately to Machine Learning, before trying to just solve the problem with an algorithm, is silly. The only time you should start with ML, is when it's clear that your problem can't be cleanly mapped into the domain of human procedural thinking.
The AI programmers of the 1960s were not wrong to start with Expert Systems (i.e. attempting to write general algorithms) for deduction, and only begrudgingly turn to fuzzy logic later on. Many deduction tasks are algorithmic. If you don't require the context of "common sense", but only need operate on data types you understand, you can get very far indeed with purely-algorithmic deduction, as e.g. modern RDBMS query planners do. There would be no gain from using ML in RDBMS query planning. It's regular data; the AI's trained model would just be a recapitulation of the query-planning algorithm we already have.
> Or, to put that another way: if you knew what algorithm the AI would be using to discriminate the signal from the noise in your data, why would you need the AI? Just write that algorithm.
My point is that this isn't the same thing at all.
Say your problem is plant detection from mobile phone photos. I can understand everything about plants, and I can manually build a highly optimised data processing pipeline.
But I can't build a feature extractor that outperforms ResNet50. That's the key algorithm.
> If the data is regular—that is, if a model of its structure can be held fully in a human brain—then jumping immediately to Machine Learning, before trying to just solve the problem with an algorithm, is silly.
True, but no one has made that argument. This is specifically about using hyperparameter optimisation vs improving your data pipeline.
Er, yes, I did, in my original post. The form you quoted was me attempting to be more precise in rephrasing it.
My point—my original point, this whole time—was that applying an advanced “feature extraction” algorithm to a data source whose features are explicitly encoded in a lossless, linearly-recoverable way in the data—what we usually call structured data—is silly.
For example, there’s no point in using ResNet50 to extract the “features” of a formal grammar, like JSON. It’d just be badly simulating a JSON parser.
In fact, there’s pretty much no data structure software engineers use, where ResNet50 would give you more information out than you’d get from just using the ADT interface of the data structure. What features are in a queue? Items and an ordering. What’s in a tree? Items, an ordering, and parent-child relationships. Etc.
The only place where it might make sense to use ML when dealing with structured data, is with statistical data structures like Bloom filters. ResNet50 might be able to recover some of the original data out of a bloom filter, in est using it as a compressed-sensing tool, or (in the algorithmic CS domain) as a decompressor for a lossy, underdetermined compression codec.
My second point was that, often, it turns out that your data is structured data, even when you didn’t ask for structured data.
Some natural-world datasets are structured!
Example: the standard model of quantum chromodynamics describes a clean digraph of possible spin configurations. You don’t need feature detection when looking at LHC data. The dataset is pre-bucketed, the items pre-tagged, by nature itself.
But more often, what happens is that your data turns out to not be “raw” / primary-source data, but rather a secondary source that was already structured, enriched, and feature-extracted by someone else before you got there.
Scraping social network data? It’s already a graph, and it often already has annotation fields in the JSON graph endpoints describing the relationships between the members. If you don’t just stop and look at the dataset, you might think your feature-extractor is doing something very clever, when actually it’s just finding the explicit pre-chewed “relationship” field and spitting it back out at you.
You might not see the relation to a kD-sample-matrix feature-extractor like ResNet50, so here’s some more tightly-analogous examples:
• What if the images in your training dataset turn out to be in Fireworks PNG format, where the raster data contains an embedding of the original vector image it was rendered from? Specializing your feature-extractor to this data is just going to make it learn to find those vectors (and extract features from those), rather than depending on the features in the raster data; and then it’ll fail on images without embedded vector descriptions. And if that’s all you want, why not just use a PNG parser to pull out the vectors?
• What if your audio files turn out to all have been MIDI files rendered out from a certain synthesizer using its default set of instrument patches? Will feature-extraction on this rendered data beat just writing a program to exact-match and decode the instruments back to a MIDI description? Certainly there might be MIDI-level features you want to extract, but will ResNet50 be better at extracting those MIDI-level features for having seen the rendering, as opposed to having been fed the decoded MIDI-level data directly?
Ok. No one other than yourself has made that argument.
> Scraping social network data? It’s already a graph, and it often already has annotation fields in the JSON graph endpoints describing the relationships between the members. If you don’t just stop and look at the dataset, you might think your feature-extractor is doing something very clever, when actually it’s just finding the explicit pre-chewed “relationship” field and spitting it back out at you.
I've worked in this exact field, and I've never ever heard of anyone who doesn't do this. I guess someone might.. so if your point is that there are dumb people around then ok.
But that isn't what a feature extractor does! In the graph context a simple feature extractor is something hand coded like the degree of a node, or a more complex learned one is something that maybe does embedding.
If your point is that people should understand their data, then yes that is in data science 101 for good reasons.
Most of what you wrote seems fine, until I got to this. A query optimizer seems like something that tends to be very opaque, very complex, and in my experience blows up without a good explanation frequently in typical situations. It's also based on a lack of complete data about the problem domain, to the point where an optimal algorithm seems hopeless. I'm not saying an AI approach automatically can be better, but at least you (I) can envision it being better, perhaps less brittle and taking into account dimensions and possibilities a human doesn't. And the non-AI solutions aren't trustworthy in terms of bounded quality so you have not got a lot to lose.
I can imagine cases (distributed databases, different speed storage) where it would make sense to test the queries and learn which optimisations make sense. It'd be self tuning and able to adapt to changing hardware.
I spent way too many years writing ad hoc Oracle SQL that had to complete within a few hours and trying to guess if the optimizer had decided to finish in 15 minutes or a week.
And I would read Tom Kyte where he says obviously your database is set up wrong if the optimizer isn't working for you. And how you should do everything in one big beautiful query that uses all the latest features of Oracle.
In most cases, unsupervised learning is nothing more than having the AI try to approximate the solution of your highly non-linear loss function. So if there's any way of solving that loss function directly, it will perform like a well-trained AI.
In what cases is this not true?
Context please :) ?
No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.
Well, Nvidia, y'see, my new blockchain does AI training as its Proof-of-Work step...
The consumer cards don't use ECC and memory errors are a common issue (GDDR6 running at the edge of its capabilities). In a gaming situation that means a polygon might be wrong, a visual glitch occurs, a texture isn't rendered right -- things that just don't matter. For scientific purposes that same glitch could be catastrophic.
The "datacenter" cards offer significantly higher performance for some case (tensor cores, double precision), are designed for data center use, are much more scalable, etc. They also come with over double the memory (which is one of the primary limitations forcing scale outs).
Going with the consumer cards is one of those things that might be Pyrrhic. If funds are super low and you want to just get started, sure, but any implication that the only difference is a license is simply incorrect.
But for machine learning, some say that stochasticity improves convergence times!
The thermal design for "datacenter" card can be better for sure. And on-board memory size and design. That's about it. For how many x over geforce price tag is that?
Virtually every server in data centers runs on ECC: the notion of not using it is simply absurd. And given that the Tesla V100 gets 900GB/s of memory bandwidth with ECC, versus 616GB/s of memory bandwidth on the 2080Ti without ECC, it's a strawman to begin with.
nvidia further states that there is zero performance penalty for ECC.
As to whether the requirement is "real", Google did an analysis where they found their ECC memory corrected a bit error every 14 to 40 hours per gigabit.
"That's about it."
Also ECC memory. Also dramatically higher double precision performance. Dramatically higher tensor performance. Aside from all of that...that's it.
the chip might be the same, but the rest of it isn't
Granted, its not worth the $3k price bump, but thats a different issue.
eg from 2011 6400 Hadoop nodes like http://bradhedlund.com/2011/11/05/hadoop-network-design-chal...
God only knows what fun you could get up to with modern tech - I miss bleeding edge rnd
AFAIK that is limited to <$20k and it expires.
Strong agreement from me: I've never worked on deploying ML models, but have worked on deploying operations-research type automated decision systems that have somewhat similar data requirements. Most of the work is client org specific in terms of setting up the human & machine processes to define a data pipeline to provide input and consume output of the clever little black box. A lot of this is super idiosyncratic & non repeatable between different client deployments.
And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.
In short, find interesting problems, then the solutions -- not the other way around.
ML is a mining problem. Digitizers are the miners. Annotators are the refiners.
The big question here is, what happens when the world changes next year? You rebuild the application. I know there are companies that advertise doing continuous updating of deep learning models but it seems like calculating total costs and total benefits is going to be hard here.
People and organizations are chasing what they believe, or are told to believe, is pay dirt.
Many unfamiliar investors have rushed in, possibly fearing missing out, and fund many of the prospectors, yet many of the prospectors and investors aren't really aware of the costs of running a mine, nor the practices required to run them efficiently.
It turns out that there's more aspects to the value creation process than dig/refine/polish (data/train/predict), especially when usefulness in application matters and there are finite resources available for digging.
Companies selling shovels are some of the primary beneficiaries of this, by selling shovels (i.e. renting compute) funded by the malinvestment.
Additional beneficiaries are the refiners (training experts) that are able to charge steep labor premiums, however organizations are starting to figure out that their refiners are expensive to keep idle and often operate the mines poorly in terms of throughput/cost-effectiveness/repeatability/application (see the various threads on "Data Engineers")
I.e. a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.
After a while, the machine would train on this recorded data, and start replacing the humans.
Rinse and repeat.
Ah, this is a typical thing I hear people in the Valley say: just push it all ... somewhere. No.
If we digitized all microscopy slides, it would require YouTube-scale storage several times over. People think genomics is big. People think reconnaissance imaging is big. They're big, but there's only so much of them.
IF it were digitized, there would be far more pathology whole slide imaging being generated every day. I did some estimates at one point and had to throw a couple orders of magnitude into the genomics data to even make it competitive at enterprise scale.
And keep in mind, we're talking clinical medicine. We want the data now. We're looking at the slides while the glue is still wet. You don't have the bandwidth, no one has the bandwidth, to do some of this stuff they way you propose and maintain the current "business process" of clinical medicine.
Building models and iterating, the old fashioned way, is the only way it makes sense.
The models are likely also a differentiating factor in a sense that there are models that perform much better than others, to a point of completely new functionality. But also all of these models are basically open source currently... So they can't by definition be differentiating between different companies, because all of the companies generally have access to all of the algorithms. At leat to all of the types of algorithms.
But the real kicker is that I get x5 the cores, x20 RAM, x10 storage, and a couple of GPUs. I'm running last-generation Infiniband (56gb/sec) and modern U.2 SSDs (say 500MB/sec per device).
I figure it is going to take me about $10K in labor to move and then $1K/mo to maintain and pay for services that are bundled in the cloud. And because I have all this dedicated hardware, I don't have to mess around with docker/k8s/etc.
It's not really a big data problem but it shows the ROI on owning your own hardware. If you need 100 servers for one day per month, the cloud is amazing. But I do a bunch of resampling, simple models, and interactive BI type stuff, so co-loc wins easily.
Recent projects have been on AWS. For a project that is roughly on the scale of our colo in terms of instances, though with aggregate lower performance, we are buying one of our colos every year. It’s insane. Network costs are particularly egregious in AWS.
But there is absolutely no way we’d be permitted to build colo facilities for many reasons and there are many reasons why even if we could get permission to do so we would choose not to due the resulting death by a thousand cuts orchestrated by the team who happens to have inserted themselves as the owner for DC/colo like things.
I used to run on-prem back in the 2000's, and we were constantly dealing with demand fluctuation crises. Spinning up new physical servers to deal with new demand, or being massively over-specced when demand dropped, was a real pain.
I'm starting a new thing this week, and using the Cloud for it because I have no idea what our demand will be. I can start small, scale up with our customer growth, and never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.
At some point in the future, when our needs are clear and relatively stable, it might make sense to migrate to on-prem and save those costs.
If your peak demand is 100x your baseline and only happens for ~1h each day, cloud is almost certainly a good choice. If it happens for ~12h a day or it's only 5x your baseline, the cost of the cloud is such that you're likely to save with dedicated hardware, even though much of your hardware sits around doing nothing part of the time.
> never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.
There is a middle-ground that's very much worth considering: renting dedicated servers. It's not quite as cost-effective as colocation and owning your hardware when you have at least a cabinet worth of stuff but it does offload the management of the hardware and provisioning to somebody else. They can also usually be provisioned in a matter of minutes.
In some cases (e.g. Packet.net) these machines can even be treated essentially like cloud instances, with hourly pricing.
There's also yet another middle ground: using dedicated to handle the known and predictable baseline traffic and using the cloud to handle the unexpected bursts.
It just doesn’t make financial sense to use the big the cloud service providers for those with consistent workloads. I always hear stories where folks have saved hundreds of thousands in infrastructure costs with owning + co-lo.
As an aside, thank you for your one-line installer script for tf/keras. Earlier, my team used to spend days figuring out the CUDA/tf/keras/CUDNN etc dependency charts, and you've brought that down to ~0.
Some of the same caveats apply with respect to software updates, configuration control, security, availability, business continuity, disaster recovery, and what happens if the local admin is hit by a bus.
But spare capacity is a good idea, especially if you have real-time traffic.
That’s how the DC I used to work in operated.
Now do the calculation for ongoing operations for 5 years, taking into consideration normal hardware failure and maintenance cost. You need to swap out old hardware to get a new CPU, etc. I have tried to use co-loc vs cloud for ~100 nodes and cloud won, by 30%.
I wound up at a facility run by a fiber vendor because they'd sell me a fixed 250mbps pipe for the same price that a data center would sell me 20mbps pipe that bursts to 1gbps. It only works for me because of the nature of my business -- most people would be better off somewhere else.
Choosing a co-loc facility is complicated. My recommendation is to tour and get quotes from 3-5 vendors in your area before choosing anyone. Ideally, take someone who has done it before.
My plan is to temporarily shift to dedicated hardware through a service like Hetzner to evaluate what kind of hardware I need. I can simply redirect a fraction of the traffic and extrapolate. Since this is elastic there will be no upfront costs, but I can play around with different sizes. Once I'm happy with my estimate, buy real hardware and move the rest over.
At least that's the plan. I don't think you can do much more than an educated guess and I think this will be as close as I can get.
Not AI related btw.
I kinda worked backwards from the cost. I ran the business for a year on Azure but each 'sample' of the resample took about 2 mins so it precluded any near real-time analysis. I ported the kernel to a GPU locally using python/numba and it ran in about 10 seconds and that was enough to seal-the-deal.
From there, I spec-ed out a GPU server and then machines that matched each role in my environment. I decided I was willing to spend $50K and just started loading up the machines.
Certainly something like autonomous driving needs machine learning to function, but again, these are going to be owned by large corporations, and even when a startup is successful, it's really about the layered technology on-top of machine learning that makes it interesting.
It's kind of like what Kelsey Hightower said about Kubernetes. It's interesting and great, but what will really matter is what service you put on top of it, so much so that whether you use Kubernetes becomes irrelevant.
So I think companies that are focusing on a specific problem, providing that value added service, building it through machine learning, can be successful. While just broadly deploying machine learning as a platform in and of itself can be very challenging.
And I think the autonomous driving space is a great example of that. They are building a value added service in a particular vertical, with tremendous investment, progress, and potentially life changing tech down the road. But as a consumer it's really the autonomous driving that is interesting, not whether they are using AI/machine learning to get there.
Thankfully transfer learning and super convergence invalidates this claim.
Using pre-trained models + specific training techniques significantly reduces the amount of data you need, your training time and the cost to create near state of the art models.
Both Kaggle and google colab offer free GPU.
IME it is nowhere near as universally successful as this suggests.
I think this sentence invalidates your argument against:
“The number of places where machine learning can be used effectively from both a cost perspective and a return perspective are small.”
In a hobbyist world, free GPU time is an amazing thing, and you can do a lot of fun and rewarding projects using transfer learning and other techniques that avoid heavy engineering and data processing. In a business world, where your product must consistently and accurately perform well, problems that may be solved by ML need to be heavily scrutinized and researched, because for most problems there are cheaper, faster, more robust solutions. Free GPU time doesn't weigh in at this scale.
If ML/DL is an add-on to help augment your business (separate from the core value) then yes transfer learning and free GPU's will get you good returns.
I've seen a lot of cross pollination of ML and AI techniques into various disciplines. A large percentage just didn't work at all, most of the rest were more "kind of interesting, but". Nothing earthshaking happened although pop sci press likes to talk about it a lot.
If you have more digital data than you used to, using modern free frameworks and toolkits to do basic (i.e. older, boring, but understood) ML stuff to understand it seems to have a reasonable return. Mostly I think this is because it becomes accessible to someone without much background in the area, and you can do reasonable things without having to put 6 months of reading and implementing together before starting.
Also it's doubtful to even categorize machine-learning as science. The goal of science is to generate insight and knowledge, ML solves particular engineering problems or searches problem spaces, it doesn't build fundamental scientific models.
(The dis-economy of scale hurts less if you're already starting from a point with the manual labor.)
I briefly looked at using neural nets to analyse data from an experiment - analysing the efficacy of toilet bowl designs.
The entry level hardware was £250k in 1981 - it was much cheaper to take photo's and have a research assistant count squares.
Now you could use fairly cheap commodity hardware to do it.
It would have been an amazing cutting edge project if we could have got some government funding we did have an in-house knowledge engineer.
If you need to solve gnarly industrial scale mixed integer combinatorial optimisation problems in the guts of your ML / optimisation engine, the commercial MIP solvers (gurobi , CPLEX ) or non-MIP based alternative combinatorial optimisation systems (localsolver ) can often give more optimal results in exponentially less running time than free open source alternatives.
1% more optimal solutions might translate into 1% more net profit for the entire org if you've gone whole hog and are trying to systematically profit optimise the entire business, so depending on the scale of the org it might be an easy business case to invest a few million dollars to set this system in place.
Annual server licenses for this commerical MIP solver software was 0(100k) / yr per server & the companies that build these products bake a lot of clever tricks from academia into these products that you can exploit by paying the license fee. ( my knowledge of pricing is out of date by about 7 years ) .
They deliver now often with backend cloud storage, update near continuously, integrate frequently with outside services, sometimes open source major components iteratively, typically have an evolving API and developer ecosystem to educate, and are sold as subscriptions. It’s not as “human in the loop” as some of the AI described in this article but it’s clearly moving toward services in terms of margins.
Nothing is like the old shrink wrapped software business, basically.
To me, the services you describe are software-as-a-service - they scale well without adding more humans to the mix. Services businesses, in contrast, generally need more humans to do more work.
I do think you are right that we are entering an age where the margin pressures will continue to increase. As the Amazon quote goes "your margin is my opportunity." In that world, strength accrues to the largest players - which is why AWS is so strong.
I like to joke that AWS should refund money to the startup that buy booths at re:Invent only to find out AWS is rolling out a competing service (with the acknowledgement that AWS entering a space doesn't necessarily mean the end of the competing company.)
So, it's not like I dislike ML. But, saying an investment is an "AI" startup, ought to be like saying it's a python startup, or saying it's a postgres startup. That ought not to be something you tell people as a defining characteristic of what you do, not because it's a secret but rather because it's not that important to your odds of success. If you used a different language and database, you would probably have about the same odds of success, because it depends more on how well you understand the problem space, and how well you architect your software.
Linear models or other more traditional statistical models can often perform just as well as DL or any other neural network, for the same reason that when you look at a kaggle leaderboard, the difference between the leaders is usually not that big after a while. The limiting factor is in the data, and how well you have transformed/categorized that data, and all the different methods of ML that get thrown at it all end up with similar looking levels of accuracy.
There used to be a saying: "If you don't know how to do it, you don't know how to do it with a computer." AI boosters sometimes sound as if they are suggesting that this is no longer true. They're incorrect. ML is, absolutely, a technique that a good programmer should know about, and may sometimes wish to use, kind of like knowing how a state machine works. It makes no great deal of difference to how likely a business is to succeed.
Back then a lot of startups didn't have websites, because they were making other products (hardware, boxed software, etc). If they had a website it was just a marketing page.
So saying that you were going to make a "web application" did in fact differentiate you, in that it showed your approach was very different from the boxed software folks, but it didn't tell you much beyond that.
Its fantastic to think that we didn't see 1ghz cpus and 1mb cable/dsl Internet until 2000.
The resourcefulness from that pre-2k era was amazing! It was leaps and bounds!
This is so true. We spent decades educating non-technical people that understanding a problem well is a prerequisite to programming it. Take something easy to understand like driving a car, doing it in a computer is now harder.
AI is undoing all that. People reach a vague problem they can't describe and assume computers will magically fix it.
L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press 2005.
G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics. CRC Press, 2001.
Drton, Mathias, Sturmfels, Bernd, Sullivant, Seth. Lectures on Algebraic Statistics, Springer 2009.
Or more like:
Watanabe, Sumio. Algebraic Geometry and Statistical Learning Theory, Cambridge University Press 2009.
My understanding (I do not do AI or machine learning) that AI is distinct from these more mathematical analytic perspectives.
Finally, might we argue that generally AI/ML is more easily suited to data that's already high quality eg. CERN data, trade data, drug trial data as opposed to unconstrained data eg. Find the buses in these 1MM jpegs?
Structured Data like tables, time series etc the techniques are still from statistics. Regression for example is the workhorse for numerical prediction problems
I think a lot of people are missing the point about leaps AI has made because they aren't aware of NLP or CV or reinforcement learning.
So "AI" mentioned above is stunningly good for buses in 1MM image and reasonably good drug trial, cern data.
The business models required for making AI business successful haven't been invented yet.
Good AI model will be Deep stack : example would be something like precision agriculture where you'd use AI for designing rice then use iot and earth observation to locate right acreages and monitor growth and adjust nutrient at crop level and get dramatically great output with least wastage and highest nutritional content.
Most AI companies are still started by ex CS folks who in general arent aware of deep technical opportunities in other disciplines.
I think this will change soon very fast due to ubiquity of deep learning training material, libraries and research papers.
This is a tautology in the narrow sense, but in the broader sense I think there surely exist things that humans don't "know" how to do without a computer, but know how to do with a computer. And the space of solveable problems is expanding, though AI is only a narrow slice of that.
> I’ll go out on a limb and assert that most of the up front data pipelining and organizational changes which allow for it are probably more valuable than the actual machine learning piece.
Especially at non-tech companies with outdated internal technology. I've consulted at one of these and the biggest wins from the project (I left before the whole thing finished unfortunately) were overall improvements to the internal data pipeline, such as standardization and consolidation of similar or identical data from different business units.
If you're solving a real problem and use ML in service of solving that problem, then you've got a great moat....happy trusting customers.
It's not complicated
My way of saying, you're very, very right.
I wrote it to be tongue-in-cheek in a ranting style, but essentially "AI" businesses and the technology underpinning it are not the silver bullet the media and marketing hype has made it out to be. The linked article about a16z shows how AI is the same story everywhere - enormous capital to get the data and engineers to automate, but even the "good" AI still gets it wrong much of the time, necessitating endless edge-cases, human intervention, and eventually it's a giant ball of poorly-understand and impossible to maintain pipelines that don't even provide a better result than a few humans with a spreadsheet.
There was this meme in the 70s about "self driving cars" following magnetic strips in the road in restricted highways. I remember at the time, being, like 8 and thinking "sure seems like an overly complicated train."
Your post was much better than mine, but I appreciate the comment.
goes on to say
>I agree, but the hockey stick required for VC backing, and the army of Ph.D.s required to make it work doesn’t really mix well with those limited domains, which have a limited market.
Also assumes running your own data center to be easy. Some people don't want to be up 24x7 monitoring their data center or to buy hardware to accommodate the rare 10 minute peaks in usage.
But is that really the use case here? I haven't worked in ML. But I'm not seeing where you are going to need to handle a 10 minute spike that requires a whole datacenter.
A month's worth of a quad gpu instance on AWS could pay for a server with similar capacity in a few months of usage.
And hardware is pretty resilient these days. Especially if you co-locate it in a datacenter that handles all the internet and power up time for you. And when something does go wrong, they offer "magic hands" service to go swap out hardware for you. Colocation is surprisingly cheap. As is leasing 'managed' equipment.
That’s why the author found it glaringly obvious that it should be brought in-house. It’s often both the most costly and most “in-housable” compute work involved in these companies.
Do you need that for training workloads, and what percentage of a startups workload is training?
AI is like the new gold rush. And just like back then, it's not the gold diggers that will get rich.
"Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling."
It hasn’t been for a lack of trying. We’ve had everyone from IBM and Microsoft to small local AI startup try to sell us their magic, but no one has come up with anything meaningful to do with our data that our analysis department isn’t already doing without ML/AI. I guess we could replace some of our analysis department with ML/AI, but working with data is only part of what they do, explaining the data and helping our leadership make sound decisions is their primary function, and it’s kind of hard for ML/AI to do that (trust me).
What we have learned though, is that even though we have a truck load of data, we can’t actually use it unless we have someone on deck who actually understands it. IBM had a run at it, and they couldn’t get their algorithms to understand anything, not even when we tried to help them. I mean, they did come up with some basic models that their machine spotted/learned by itself by trawling through our data, but nothing we didn’t already have. Because even though we have a lot of data, the quality of it is absolute shite. Which is anecdotal, but it’s terrible because it was generated by thousand of human employees over 40 years, and even though I’m guessing, I doubt we’re unique in that aspect.
We’ll continue to do various proof of concepts and listen to what suppliers have to say, but I fully expect most of it to go the way Blockchain did which is where we never actually find a use for it.
With a gold rush, you kind of need the nuggets of gold to sell, and I’m just not seeing that with ML/AI. At least no yet.
Ultimately the value of selling tools is dependent on the riches being mined actually existing. The value of AI/big data to the average business has yet to be determined
A lot of those companies are styled as "AI" companies themselves, aiming to automate the process of labeling.
The main winner here really is Amazon. They get a chunk by serving up infrastructure and in labeling through mechanical turk.
AI is so shiny that makes people want to jump as fast as they can into that boat but a reasonable objective analysis shows that a huge and not insignificant amount of software problems can still be solved without relying on the "AI black box".
Why don't they buy their own hardware for this part? The training process doesn't need to be auto-scalable or failure-resistant or distributed across the world. The value proposition of cloud hosting doesn't seem to make sense here. Surely at this price the answer isn't just "it's more convenient"?
Say you have $8M in funding, and you need to train a model to do x
You can either:
a) gain access to a system that scale ondemand and allows instant, actionable results.
b) hire a infrastructure person, someone to write a K8s deployment system. Another person to come in a throw that all away. Another person to negotiate and buy the hardware, and another to install it.
Option b is can be the cheapest in the long term, but it carries the most risk of failing before you've even trained a single model. It also costs time, and if speed to market is your thing, then you're shit out of luck.
We have become so DevOps and cloud dependent that everyone has forgotten how to just run big systems cheaply and efficiently.
It's definitely not something that you can launch manually - perhaps Kubernetes is not the best solution, but you definitely need some automation.
also, how else do you sensibly deploy and manage a multi-stage programme on >500 nodes?
I mean we use AWS batch, which is far superior for this sort of thing. SLURM might work for real steel, as would tractor from pixar.
ML distributed training is all about increasing training velocity and searching for good hyperparameters
This is why I’m much more excited by AR and VR than AI. Human brains are fucking amazing at certain kinds of data processing and inference and pretty mediocre at others. We should be focusing more on creating interfaces and data visualizations that unlock that superpower for wider applications.
> Machine learning will be most productive inside large organizations that have data and process inefficiencies.
I strongly believe ML is at worst dangerous and at best pointless here. Data and Process inefficiencies => garbage in, garbage out. ML is NOT a silver bullet in large organisations that have these issues*, I've seen managers try to adopt ML to solve issues, but the results are almost always suspect and/or marginally better than simple if else rules but require a multiple people or teams to get all the data and models right.
Exactly wrong and contradicts most of the thesis of the article - that AI often fails to achieve acceptable models because of the individuality, finickiness, edge cases, and human involvement needed to process customer data sets.
The key to profitability is for AI to be a component in a proprietary software package, where the VENDOR studies, determines, and limits the data sets and PRESCRIBES this to the customer, choosing applications many customers agree upon. Edge cases and cat-guacamole situations are detected and ejected, and the AI forms a smaller, but critical efficiency enhancing component of a larger system.
Single-focus disruptors bad. Generic consultancy good - with ML secret sauce, possibly helped by hired specialist human insight.
Companies that can make this work will kill it. Companies that can't will be killed.
It's going to be IBM, Oracle, SAP, etc all over again. Within 10 years there will be a dominant monopolistic player in the ML space. It will be selling corporate ML-as-a-service, doing all of that hard data wrangling and model building etc and setting it up for clients as a packaged service using its own economies of scale and "top sales talent" (it says here).
That's where the big big big big money will be. Not in individual specialist "We ML'd your pizza order/pet food/music choices/bicycle route to work" startups.
Amazon, Google, MS, and maybe the twitching remnants of IBM will be fighting it out in this space. But it's possible they'll get their lunch money stolen by a hungry startup, perhaps in collaboration with someone like McKinsey, or an investment bank, or a quant house with ambitions.
5-10 years after that customisable industrial-grade ML will start trickling down to the personal level. But it will probably have been superseded by primitive AGI by then, which makes prediction difficult - especially about that future.
We're also a long way off from AGI. Nobody really even has a roadmap to what an AGI would look like. Heck, DNN/ML techniques have been widely-known since the early 90s; they just became practical with access to cloud-scale hardware, so the current situation has been 25+ years in the making.
This irrational sheep mentality amuses me. Yes, tehre are some very specific cases where AWS & ca. is clearly a better choice, but for the most cases I saw the TCO with hosting it on premises or renting servers is much lower, sometimes by an order of magnitude (in some cases even more). But people insist on doing it because others do it. We'll soon have an entire generation of engineers completely hooked on AWS & co. and not even realizing other solutions are possible, not to mention lower TCO.
I see a lot of 'bolt-on' tech emerging -- it looks mostly snake oil -- there is no obvious way to be competitive against teams that baked it in to the bare metal design
Also most commercial use-cases I've seen need effective ML more than anything else
Being able to sift/classify/analyse data with ML really can be a 'moat', an extreme competitive advantage. But using "AI" doesn't automatically get you there.
Separately, AWS is an expensive luxury, which is worth it if for some reason you can't manage your own computers.
I really annoys me when analysts like this guy mangle together things which are obvious and then comes up with an unsupported conclusion, like "second AI winter is coming man".
Cost wise though it's clearly being not knowledgeable about how it works or at least think all AI startups have huge training set. For many companies owning your hardware for training is a very easy step to rationalise cost.
It feels like an article written about all AI companies but actually (very) true only for some AI companies.
If we focused on writing more efficient software instead of demanding bigger and faster machines with more and more GPUs, would the cost of ML become more practical? More importantly, as the author pointed out, would smaller companies have a better chance at making advancements in the field?
This is so right. Using a term "artificial intelligence" for machine learning is like using "artificial horses" to describe cars. It is even worse, since we cannot even define what "natural intelligence" actually is. Stop talking about "artificial intelligence".
>The bodywork represents a swan gliding through water. The rear is decorated with a lotus flower design finished in gold leaf, an ancient symbol for divine wisdom. Apart from the normal lights, there are electric bulbs in the swan’s eyes that glow eerily in the dark. The car has an exhaust-driven, eight-tone Gabriel horn that can be operated by means of a keyboard at the back of the car. A ship’s telegraph was used to issue commands to the driver. Brushes were fitted to sweep off the elephant dung collected by the tyres. The swan’s beak is linked to the engine’s cooling system and opens wide to allow the driver to spray steam to clear a passage in the streets. Whitewash could be dumped onto the road through a valve at the back of the car to make the swan appear even more lifelike.
>The car caused panic and chaos in the streets on its first outing and the police had to intervene.
Green AI (Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni - 2019)
Finally a correct use of "AI".
For a pure ML company to IPO they'd have to both solve intelligence and manufacture their own hardware. FOMO screwed a lot of investors who would've been better off buying Google stock.
I think they may just be crapping on them from a reasonable vantage point.
The next generation of Machine Learning is just emerging, and looks nothing like this. Funds are being raised, patents are being filed, and everything is in early stage development, so you probably haven't heard much yet - but these ML startups are going after real problems in industry: cross disciplinary applications leveraging the power of heuristic learning to make cross disciplinary designs and decisions currently still limited to the human domain.
I'm talking about the kind of heuristics which currently exist only as human intuition expressed most compactly as concept graphs and, especially, mathematical relationships - e.g. component design with stress and materials constraints, geologic model building, treatment recommendation from a corpus of patient data, etc. ML solutions for problems like these cannot be developed without an intimate understanding of the problem domain. This is a generalist's game. I predict that the most successful ML engineers of the next decade will be those with hard STEM backgrounds, MS and PhD level, who have transitioned to ML. [Un]Fortunately for us, the current buzzwordy types of ML services give the rest of us a bad name, but looking at these upcoming applications the answers to the article tl;dr look different:
>Deep learning costs a lot in compute, for marginal payoffs
The payoffs here are far greater. Designs are in the pipeline which augment industry roles - accelerate design by replacing finite methods with vastly quicker ML for unprecedented iteration. Produce meaningful suggestions during the development of 3D designs. Fetch related technical documents in real time by scanning the progressive design as the engineer works, parsing and probabilistically suggesting alternative paths to research progression. Think Bonzi Buddy on steroids...this is a place for recurring software licenses, not SaaS.
>Machine learning startups generally have no moat or meaningful special sauce
For solving specific, technical problems, neural network design requires a certain degree of intuition with respect to the flow of information through the network, which both optimizes and limits the kind of patterns that a given net can learn. Thus designing NN for hard-industry applications is predicated upon an intimate understanding of domain knowledge, and these highly specialized neural nets become patentable secret sauces. That's half of the most - the other comes from competition for the software developers with first-hand experience in these fields, or a general enough math heavy background to capture the relationships that are being distilled into nets.
>Machine learning startups are mostly services businesses, not software businesses
Again only true because most current applications are NLP adtechy bullshit. Imagine coding in an IDE powered by an AI (multiple interacting neural nets) which guides the structure of your code at a high level and flags bugs as you write. This, at a more practical level, is the type of software that will eventually change every technical discipline, and you can sell licenses!
>Machine learning will be most productive inside large organizations that have data and process inefficiencies
This next generation goes far past simply optimizing production lines or counting missed pennies or extracting a couple extra percent of value from analytics data. This style of applied ML operates at a deeper level of design which will change everything.
Citations needed. Large claims: presumably you can name one example of this, and hopefully it's not a company you work at.
I've seen projects on literally all the things you mention: materials science, medical stuff, geology/prospecting -none of them worked well enough to build a stand alone business around them. I do know the oil companies are using DL ideas with some small successes, but this only makes sense for them, as they've been working on inverse problems for decades. None of them buy canned software/services: it's all done in house. Probably always will be, same as their other imaging efforts.
Unfortunately this is all emerging just now and yes, I do work at such a company, but I'm old enough to not be naively excited by some hot fad. There's something profound just starting to happen but everyone is keeping the tech rather secret because it isn't developed/differentiated enough yet to keep a competitor from running off with an idea, yet. Disclosure is probably 1-3 years out of estimate.
>I do know the oil companies are using DL...as their other imaging efforts.
You're correct, and I happen to have experience in this domain - except there are a handful of up and commers courting funds from global majors like Shell and BP, and seismic inversion is near the end of the list of novel applications. Peteoleum is ground zero for a potential revolution right now, if we can come up with something before the U.S. administration clamps down on fossil fuels.
But we're talking complex algorithms which consist of multiple interacting neural networks. We are rapidly moving toward rudimentary reasoning systems which represent conceptual information encoded in vectors. I'm jaded enough that I wouldn't say we're developing AGI, but if the progressing ideas I'm familiar with and Workin on personally pan out, they will be massive baby steps towards something like AGI.
The space is evolving at least as rapidly as the academic side, which I think is an unprecedented pace of development for a novel field of study. I can't help but feel like these are the first steps towards some kind of singularity. There's no question that we are on to something civilization changing with neural networks, what remains to be seen is whether compute scaling will keep up with the needs of this next generation ML. Even if research stopped today, the modern ML zoo has exploded with architectures with fruitful applications across domains. The future is here!
I read all the NIPS stuff every other year; don't see anything game changing in there!
I don't believe anyone who says that we're getting close to AGI. There are interesting techniques that might get us closer, but they're so far behind in terms of compute power and scaling that it's hard to see how they could be practical in the near future. Anything that's based on back prop is not going to get there. The singularity is a loooong way away.
Love to hear your thoughts on our library