Hacker News new | past | comments | ask | show | jobs | submit login
Are the New M1 Macbooks Any Good for Deep Learning? (betterdatascience.com)
168 points by syntaxing on Feb 15, 2021 | hide | past | favorite | 150 comments



I think a better comparison would have been some of the smaller discrete GPU wielding laptop. You can get some of these for similar prices to the M1 macs.

Also, for me, the answer is no: they're not any good. I use PyTorch. I use custom CUDA code. The reality is brutal but simple: if it doesn't run CUDA, no serious ML research will use it for training anything else than toy models.


> no serious ML research will use it for training anything else than toy models

They will use it to edit code, deploy and remotely run jobs. Serious deep learning on laptops is a non-starter, on any laptop, on account of heat dissipation.


And for that purpose it's nice, but as good as any other semi-modern laptop. Over quarantine I amused some coworkers by doing an entire day's work (not ML, but not stuff that's run locally) on an old VT220, with very little change in workflow.

Being able to do light stuff on one's own consumer hardware without having to buy something new is still incredibly helpful to students and other people trying to learn, as well as hobbyists though.


I mean it's 'better' in that it runs cooler and faster for longer than anything price equivalent, but that advantage will eventually be normal through the industry.


Depends what do you mean by 'serious'?

Training Tesla's FSD neural net on a laptop? A student training some models for courses or self study?


Exactly! I would guess that maybe 80% of monthly active deep learning users would benefit from a small accelerator like that. In contrast to that, probably less than 1% of the deep learning flops will benefit


> A student training some models for courses or self study?

Serious schools have computing clusters. ML researchers might be interested to know which undergrad is actually training models complex enough to benefit from better hardware.


With WFH nowadays, I wouldn't even want a GPU workstation running at home considering the noise and power consumption. And with the high costs of a capable DS team, runing these workloads in the cloud doesn't make much of a difference either.


Sounds like you haven't seen the cloud costs for processing units handling these workloads, then... They're severely overpriced compared to normal CPU loads.


They even buried it in the conclusion, after a few thousand words & charts designed to give the opposite impression:

> these still aren’t machines made for deep learning. Don’t get me wrong, you can use the MBP for any basic deep learning tasks, but there are better machines in the same price range if you’ll do deep learning daily.


"buried it in the conclusion"

The article has 4 short sentences in the 'conclusion' section which can be found, as expected, at the end of the article.

it really isn't "buried"


But it is. A regular conclusion summarizes the article, doesn't contradict it.


That's not buried. This is buried.

“But the plans were on display…”

“On display? I eventually had to go down to the cellar to find them.”

“That’s the display department.”

“With a flashlight.”

“Ah, well, the lights had probably gone.”

“So had the stairs.”

“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”


> I think a better comparison would have been some of the smaller discrete GPU wielding laptop.

I agree, but from what I have seen they also perform poorly against collab. A 1660 Ti 4gb RAM laptop gpu, runs about as fast as Collab CPU does. [1]

I have not been able to find how the RTX 2080 Super (Laptop) and upcoming RTX 3080 (Laptop) compares to Collab. The 3080 has over 3x as many cores as the 1660 Ti.

[1] https://towardsdatascience.com/google-colab-how-does-it-comp...


>A 1660 Ti 4gb RAM laptop gpu, runs about as fast as Collab CPU does

According to the chart on that page you linked to the Lenovo Legion (with the 1660ti) is about 30% faster than the Colab GPU. The Lenovo 480s (CPU only, no GPU) is about the same speed as the Colab CPU. So if you can train your model on the on 4GB of VRAM, a laptop with a discrete GPU (if you exclude the really basic business laptop GPUs) may be useful.


Are the Quadro cards any good for it?


They are, other than being priced more expensively compared with consumer cards. When doing things like RNN's, RAM becomes an issue regardless of how powerful the card is in other ways.


In this particular use case, I would heartily recommend the Zephyrus G14.

It can come with a RTX 2060 (Max Q / low voltage) which supports CUDA; as well as Ryzen 9 4900HS, which was the fastest mobile CPU until the 5000 series mobile Ryzen came out.

The 2060 + 4900HS has gone on sale at Best Buy for $1199 multiple times late last year (just check slickdeals).

I think it’s absolute steal of a machine for that price. Not to mention that it’s beautifully built, is lightweight, has an amazing matte color accurate screen, etc.

There’s also a fan sub for this particular laptop with over 13k subscribers (as of today) on Reddit: https://www.reddit.com/r/ZephyrusG14/


No doubt it's a good laptop. However I would recommend anyone looking at laptops with machine learning in mind seriously consider a cheaper / less powerful laptop, and to do their training remotely on either on a workstation or in someones cloud.


Or just build a cheap desktop with a gpu. rtx 2060 + nvme SSD + any cpu is pretty good for most ML, and will only cost $800 or so.


That is easy to say, but good luck finding a RTX 2060 at a reasonable price. I have been trying for a month and a half, and almost all graphics cards are not available.


If you are in the US, you can easily buy a prebuilt machine that comes with RTX2060 for $1100-1200. Check Dell/Lenovo or Costco. Buying the card independently is pretty much impossible/uneconomical. If you don't want the rest of it, simply list the remains of the system with as a cheap PC with integrated GPU.


I'd recommend against Dell. A better option would be to purchase a gaming rig and install a flavour of linux on it. There's a video series from LinusMediaGroup that investigated different OEMs, Dell ended up overcharging/scamming them for services they didn't ask for and gave worse performance [1].

[1]https://youtu.be/Go5tLO6ipxw?t=73


But at that price now you're paying as much as the laptop and getting near-zero portability.


And getting significantly more performance because those components can actually be cooled properly.


I think if someone really needed the performance, they'd be using some much more powerful than the RTX 2060.

However, if your goal is just to have a machine on which you can actually test your CUDA code, then the Zephyris G14 laptop is perfect. It's portable, has a nice screen, etc.


But you get bigger fans and heatsinks.


Correction: if it doesn’t run Pytorch. No one really runs custom cuda code anymore. If you do, you’re an outlier.


I work in computer vision and pretty much all recent SOTA models had custom CUDA code. So I'd argue the opposite that in my field, lack of custom CUDA kernels will effectively exclude you from research.


Can you give an example?


https://github.com/princeton-vl/RAFT#optional-efficent-imple...

It's "optional" in the sense that things still calculate correctly on CPU without it, but at a 1000x performance penalty. Or you could skip it if you had 64GB of GPU RAM, which you cannot buy (yet).

So if you actually want to work with this on GPUs that are commercially available, you need it.


Ok, so in this example the cuda code is only used when you're short on GPU memory. This means if Apple makes a chip with enough memory cuda compatibility won't be necessary, right?

Are there any examples where custom cuda code implements some op that can't be written in Pytorch/TF/Jax/etc? That would have provided a better support to your claim that M1 needs to be able to run cuda.



Here you go: https://github.com/yanx27/Pointnet_Pointnet2_pytorch - no need for any custom cuda code.

Note the cuda kernels in the original repo were added in August 2017. It might have been the case at the time they needed them, but again, if you need to do something like that today, you're probably an outlier. Modern DL libraries have a pretty vast assortment of ops. There have been a few cases in the last couple of years when I thought I'd need write a custom op in cuda (e.g. np.unpackbits) but every time I found a way to implement it with native Pytorch ops which was fast enough for my purposes.

If you're doing DL/CV research, can you give an example from your own work where you really need to run custom cuda code today?


I am CUDA fan, but it all depends how much adoption SYSCL manages to actually achieve.


If you say so. I haven't used cuda in over a year, other than deploying to production. TPUs are probably the future (or something like a TPU, whenever Microsoft launches theirs).


Is there a roadmap for CUDA support?

(downvoted for asking a honest question. Ahhh the things you see on Apple related posts... Emotion driven bunch)


A better question would be when will serious ML researchers stop using proprietary frameworks? The open source projects should work towards using something more open than Cuda, probably Vulkan. It's a real shame that Nvidia has a whole industry in their grip, and nobody seems to mind.


Why would serious ML researchers stop using proprietary frameworks? They're the best that are out there right now, and ahead in pretty much every way than any other open source framework.

Researchers want to get their work done. They don't want to fight against their tools.


They shouldn't, it was poor wording from me. What I meant was when will there be non proprietary frameworks available for researchers to use?

There's also the issue that by relying on proprietary frameworks that work now, you might be painting yourself into a corner if Nvidia changes something in the future and then you have to adapt to them because you have no choice.


When somebody pays somebody to write those frameworks.

When nvidia sees a need, they can change CUDA over night to address it, and they pay people to do that.

When you need to do the same in Vulkan, that’s a multi year process till your extension is “open”. A teaser here whose job is to get something done with ML has better things to do than going through that.


lower reproducibility because drivers are blackboxes, worst prices if we're all only buying NVidia


We do mind, the incentives in academia are just not setup for people to build a replacement. Google are working on their TPUs which are even more closed. Facebook, MS. et al do not seem to be working on porting popular ML framework to something else...


cuda is basically equivalent to the combination of the instruction set and the C/C++ language, API and standard library, and all bundled together. In the CPU's case, C/C++, API and standard library are designed to be hardware platform agnostic, instruction sets is separate and open, and x86 have two players. Cuda became this way because it is designed by the hardware company and its business need. We really should be thankful that C/C++, open compilers, API and library exists in the CPU world back then. And hardware vendors like Intel published raw instruction sets and allow any software to target the instruction sets. I can't imagine in a world where people uses the programming language and libraries designed by a hardware CPU company and all that software only runs on hardware from such company.


Some things were better back in the day. E-mail could not be invented today, it would be a mix of incompatible mail system like Facebook mail, Google mail, Apple mail, Slack mail, some distributed open source mail nobody used, etc.


Go read the SMTP and related RFCs and see if you still think that email back in the day wasn’t already a mix of incompatible systems. The parts about address formats alone was quite an eye opener for me.


Yes, I remember IBM had its own email system back in the early 80's when I worked there, AT&T had its own in the later 80's. Even Prodigy/Compuserve/AOL had proprietary email systems (IIRC) with internet email bolted on.

And getting email to the internet at large was no mean feat: I remember having to do a slew of UUCP addressing to get an email from AT&T to the internet (something along the lines of "astevens@redhill3!ihnp4@mit.edu). It was the wild west.


I applied for a sysadmin job at BBN in the late 90s, and X.400 knowledge was listed in the description. Thankfully that turned out to be a red herring.


email wouldn't be invented by private institutions today because nobody in their right mind would willingly get rid of their moat... it'd take government regulation to get there and you know how efficient that would be. technical problems are easy in comparison.


Lots of people mind, but AMD screwed themselves by not jumping on deep learning years ago and the major open source efforts are all corporate sponsored and standardized around cuda. If you think you can make better, please do.


Yes, it is to a large degree AMD's fault, but NVIDIA also actively acted maliciously by neglecting OpenCL support, which meant that during the critical time 5-6 years ago, there was no realistic chance for the open alternative to succeed.

While developing a small competitor to Tensorflow back then (Leaf), we were one of the few frameworks that also tried to support OpenCL, but the additional dev work made it unfeasible.


Easy, provide the graphical tooling, GPGPU debuggers and mature polylgot support for GPGPU.


SYCL is a good framework, that can translate to SPIR-V, OpenCL, CUDA, and so on.


SYCL looks nice in theory, but AMD seems determined to stay irrelevant in GPGPU. To use SYCL on AMD you use a backend that targets ROCm, which still doesn't work on Windows. So why would I use that instead of Cuda and just not care about AMD's GPUs? A portable framework loses a lot of its meaning when it's not portable, and now SYCL just looks like an unnecessary layer over Cuda and OpenMP.

GPGPU has been a thing for 20 years now, and I still can't easily write code that works on Nvidia and AMD and ship it to consumers on Windows. From what I've seen OpenCL seems to be dying, AMD doesn't care about compute on Windows or on their Radeon cards, and Cuda continues to be the only real option year after year in this growing segment. Why would anyone buy anything else than Nvidia if they're using Photoshop, Blender, DaVinci Resolve, or other compute heavy consumer software? Maybe it's unrealistic to hope that any library can fix this and just rename GPGPU to Nvidia compute and be done with it.


I really don't get this. You can just use a SYCL backend that compiles to OpenCL or SPIR and target AMD GPUs. I did it yesterday. Use ComputeCpp and target AMD GPUs.

If you're using Blender it can absolutely make a ton of sense to use AMD hardware. For most of the time where Blender supported GPGPU AMD was the best choice, and I set up rendering servers with AMD hardware for that express purpose.

I feel like a big part of this attitude is from not actually having tried it. Because SYCL works fine on AMD. In fact, you have more backend options for AMD than NVidia.


Which AMD cards though? It's one thing to be able to buy AMD hardware for your data center or workstations, but as far as I can see their latest consumer platform RDNA2 doesn't support ROCm [0]. Radeon DNA doesn't support Radeon Open Compute. ROCm have never supported Windows, and it doesn't look like there are any future plans for it either. And when looking at which AMD cards support SPIR or SPIR-V I can't find any good list, but I do find issues where AMD removed support in the drivers and told people to use old drivers if they needed it [1]. Compare to Nvidia where you can use any Geforce card you can find, so if you have a Nvidia GPU you know Cuda will work.

If you control your own hardware and software stack maybe an AMD CDNA card is fine, but if you want to ship software to end users it seems to be difficult to even know what will work. So you use cross platform code for a worse experience on Nvidia and spotty support on AMD, or only Cuda and accept that it's Nvidia only but will give you a better experience.

I haven't done a lot of GPGPU programming, but I've tried to look at it from time to time, and I've been disheartened by it every time. Nvidia's handling of OpenCL, AMD's disregard for SPIR. This is what an AMD representative had to say in 2019 [2]:

"For intermediate language, we are currently focusing on direct-to-ISA compilation w/o an intervening IR - it's just LLVMIR to GCN ISA. [...] Future work could include SPIRV support if we address other markets but not currently in the plans."

[0] https://github.com/RadeonOpenCompute/ROCm/issues/1180#issuec...

[1] https://community.amd.com/t5/opencl/spir-support-in-new-driv...

[2] https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/iss...


So basically buy into what CodePlay is selling instead of what NVidia is selling.


Doesn't Intel's OneAPI and it's DPC++ (in partnership with SYCL) attempt to do the same thing or something similar?

oneAPI DPC++ Features Included in SYCL 2020 Final Spec [https://newsroom.intel.com/articles/oneapi-dpc-features-2020...]

Arstechnia's write up on OneAPI provides a good overview [https://arstechnica.com/gadgets/2020/09/intel-heidelberg-uni...]

[edit: added arstechnia reference & link]


I agree, DPC++ can also be used as a backend for SYCL IIRC.


Only if you want to write C++, which a diminishing number of ML researchers or practitioners do. SYCL makes more sense for traditional HPC, but since it works at the source level it doesn't make as much sense for ML framework codegen. Something like MLIR will hopefully help though (eventually).


If you're going to be writing CUDA, then I don't think C++ will add a lot of overhead. In fact, I think that SYCL/C++ being single source makes it more ergonomic for GPU programming than higher level languages.


Precisely. And one of the worst companies by the attitude to the FOSS world. I hope, there will be more development from the AMD side for this. Otherwise we are doomed.


There is a lot that has been done. AMD's HIP and the open SYCL infrastructure can already run TensorFlow and can run a large amount of CUDA code. The reason the transition isn't being done is momentum and tradition, really.


CUDA is proprietary and exclusive to Nvidia, so probably not! (Apple and Nvidia haven’t been cooperative for some time)


It's even more absurd because Apple discontinued OpenCL and replaced it with their own proprietary solution too.


Aside from the fact that CUDA is proprietary & very obviously made to lock people into using NVidia products, the thing is also that it's not just about CUDA. Indeed, deep learning primitives use CUDNN which have professionally written code hand tuned by PhDs in internal only assembly, getting performance not available to the public even if you did end up getting a CUDA compiler somehow.


CUDA is Nvidia exclusive, no? I don’t think support for third party graphics hardware will be forthcoming


cuda is designed by nvidia, to lock software and applications to use Nvidia's hardware. Cuda is the core of Nvidia's business model. Cuda is the reason why Nvidia has the guts to charge $5000 for a data center or Quadro card when that card uses nearly the same silicon as the consumer versions and customers will still have to buy their cards.

Its like Apple sell you the iPhone, but also the iOS APIs and application model so you can run iOS apps. Once you run iOS apps, you are in Apple's ecosystem, both the ISVs and the user are hard to leave. Its like saying when will Apple officially make iOS APIs and libraries run on Android phones. Both will never happen.


Officially I don't see that coming, I think it's iffy in terms of patents etc...


Hey don't jump to conclusions, downvotes all around here! (I upvoted your perfectly reasonable and honest question)


Sure, as a badass SSH machine, it works great for deep learning... when SSH'd to a server with NVIDIA cards.

Now don't get me wrong, I absolutely adore my M1 MacBook Air. It's so good that it I've kept my i7+2080+32GB RAM desktop turned off for weeks now (outside of gaming). My 8GB M1 MacBook Air is now my main work computer, and it's spectacular.

That said, the basic MNIST timing benchmarks are decent, but don't get your hopes up for anything more anytime soon. There's simply no way in hell today's M1 SoC can train much, much bigger & complex models than hardcore systems with beefy GPUs with 12+ GB RAM on each card.

So no, today, the M1 MacBook Airs/Pros are not really good for DL. And honestly, that's fine. Maybe Apple will compete with NVIDIA down the road, and I'd be happy to see that. But I'm glad the author came to the (correct) conclusion that M1s just aren't anywhere near there yet for anything more than basic DL.

Edit: that all said, there might be an interesting use case for M1 Mac Minis as edge inferencing nodes.


It’s not a viable offline training machine but this thing flies with data munging. I have the 16gb version and pandas data frames that are 40gb or so load from csvs in two minutes with type inference for example. So long as you can use Google Collab it’s an excellent deep leaning laptop where it counts.


how do you load 40GB of data into a device having only 16GB RAM (or do you mean 40GB csv which goes down in size when parsed machineusable?)


Training? No, of course not. It's still a laptop GPU/SoC/"Neural Engine"/whatever. Especially when the alternative is free GPU via Google Colab etc, there is no competition.


Alternate headline: tensorflow mlcompute on M1 macs actually works. That by itself is notable. The performance still makes Colab look like a better option, but it's been hard to even track the viability of deep learning training on Macs and this is a useful milestone to have.


How does collab make money? Do you have to eventually pay? Or pay to make it faster?

Considering the computationally intensive nature of ML, does it make more sense to train on specialized cloud-based processors?


Collab is awesome. It's not really for any serious training, it's for evaluating and sharing ML models that (may) need a GPU, and learning. But for many situations, it replaces the need to have a laptop with a GPU - it's good enough for getting set up before you engage the real compute to do serious training. If collab was not free, I would consider paying a subscription for it, which is maybe the end game.

They have a paid version now that is supposed to be faster and more lenient about disconnecting you, but I recently read a comparison that said it wasnt worth it.


Yes, given Google's reputation, relying on anything run by Google that doesn't make (a lot!) of money is a high risk venture. It's one thing to do so when you have options to fall back to, but if you are buying yourself a machine learning laptop as a long term (2 year+) investment on the basis that you don't need GPU hardware because of Colab ...


Google collab doesn't need to make money. It needs to suffocate space so much, that no potential competitor with commercial offering ever has a chance. The money they make here is money they save by not having to buy competitor for nine figures later.


I guess it's a testing environment for Google Tensors.


Is anyone really doing serious deep learning locally on a laptop these days? I think being able to play with some models is great to be able to do locally, and then you'd probably deploy to your remote setup for the production size training / deployment. I don't feel like this wasn't much of a loss.


Not everything in deep learning involves running gigantic models on large GPUs or TPUs in the cloud though I'm sure that is the norm for a lot of companies. There is a lot of value in being able to run smaller models when you are working on real-time, edge applications, especially on the inference side of things. We do a lot of that where I work and I find my macbook pro very frustrating on that front after previously having had a Dell XPS 15 with a decent enough nvidia GPU.


you can still easily do that on a smaller remote server with a big GPU, there is not really much time overheard in doing that when setup correctly.


Yes, I do that now, but there are still issues around that workflow and nothing beats being able to prototype locally. Some issues (not major deal breakers, but still a source of annoyance):

- Much more limited IDE experience if you use any graphical IDE. I prefer Pycharm because it is far superior to pretty much anything else out there when working in Python + Pytorch. You need to use a remote desktop solution like VNC in that situation, which is not remotely close to being on-par with local code prototyping.

Jupyter notebooks I abhor because of how poor they are in relation to any decent IDE. VsCode is the only tool that has a great remote dev workflow but it just isn't near the functionality of Pycharm when you use the latter daily. You also don't directly get the ability to plot and visualize data in something like matplotlib when working remotely which is again an issue. I'll still use it if I have to when VNC isn't snappy enough

- If data security is a concern, everything needs to occur through a VPN which is another intermediate step in your workflow to get started every time you open up your laptop.

- If you have spotty internet access or are traveling, the remote dev workflow suffers immensely.

The other thing I should mention is that outside of the standard data exploration + model training/inference workflow, there are other use-cases where being able to prototype locally is very advantageous. I write a lot of our tooling and internal libraries (for example: a keras-like model training framework in Pytorch) and that involves a lot of pytorch code that references GPUs but testing and prototyping that could easily be done with small models on a laptop. Not being able to do that at all is really annoying and having a native IDE experience when writing a library (especially when you rely on several internally developed libraries) is very critical in my experience.

Again, all of these are not individual deal-breakers, but I feel the pain almost every day.


Some of the most important research papers of the last 10 years amount to tiny code snippets and use very small datasets.

Not everything needs to be gpt3 scale.


I'd wait a few generations to be honest, Mac laptops have become notoriously faulty in recent years. They're not the bulletproof machines they once were, and tend to have a lot of engineering flaws (flex cable and display issues especially). The M1 seems okay so far as it's an iteration of the previous gen, but they did make a lot of changes as well. I'd hold out for a later version.


I was under the impression things start getting doable at 2 dedicated Nvidia GPUs (preferably with 1tb of ram each)?


1 decent GPU is usually fine for reasonable things. But the memory is way off: the gpus I use have 40GB of memory (a cluster of the higher end Nvidia A100s) and they're basically state of the art. Most gpus near sota are in the 16GB range and up.


Yeah, I personally like using an Nvidia Jetson AGX Xavier.

There are some cool packages out there, that detect your emotions and attentiveness while driving.

Aside from the Jetson, I have an 2080 Super on my laptop.


1TB? Isn't it more like 12-64GB?


It was clearly an exaggerating joke.


That's not clear at all, if it was a joke. It definitely doesn't come across as a joke, but perhaps just someone who isn't up to date on gpu memory, which is a fairly niche subject.


One would have to be pretty out of the loop to think that GPU memory was in the terabyte range, though.


Enterprise grade workstations and servers can be configured with 1TB of ram though, so it's not completely inconceivable that you could have a dedicated GPU workstation with 1TB of graphics ram. I mean the Tesla A100 cards have 40GB of VRam each so you'd only need 26 of them to have 1TB of VRAM in a single machine. While I have no idea if such a machine exists, it certainly could exist, 26 GPUs wouldn't fill even half of a server cabinet.

Edit: actually Quattro A6000 cards ship with 48GB vram each, so you only need 22 of those to have 1TB total.


Here you go friends

> Radeon Pro SSG (2016) https://www.amd.com/en/products/professional-graphics/radeon...


That's a GPU with a terabyte SSD on it, not a terabyte of VRAM.


It would be really interesting if micron or intel collabed to make a gpu with a ton of 3d xpoint.


I'm not super knowledgeable in machine learning but given that system RAM (not VRAM) in the TB range is really not that rare these days for high end workstations I assumed that the poster meant that you'd have 1TB of working RAM per GPU for the learning job (which would be paged in and out of VRAM as necessary).

I guess the lesson to be learned here is that if you want to use an implausibly huge amount of RAM to make a point, a TB is not safe anymore. Go for Exabytes instead, that should be unambiguous for a couple of years.


Haha, that might just be the right takeaway!


I thought it was but because 1TB is not too far out of the park, it would he been more obvious if something like '1 exabyte gpu' was used. Then, at least for a few years to come, it is indeed crazy talk.


A 3 layer convnet probably isnt the last word in whether the M1 is better for "deep learning". Something nontrivial where the gpu/processor is maxed out would be a more interesting comparison, even one of the pytorch vision tutorial that uses resnet I think could be a more realistic DL comparison


Is a bicycle great for transporting goods? Yes, absolutely, if your goods are like a handful of oranges. No, absolutely not, if your goods are like super heavy car parts.


Tensorflow is natively accelerated on M1. The branch is different - https://github.com/apple/tensorflow_macos

Not sure what the confusion was.


Commoditizing their complement - making it easier to teach ML with a hands-on approach, train and demo neural nets, so people and companies end up running them on their cloud machines.


It’s not clear to me. Are these M1 tests running on the CPU, GPU, or neural chip? If the M1 CPU is running at half the speed of a Colab GPU that seems impressive if the software for it is still immature enough that the neural engine/GPU aren’t optimized around.


The M1 Tensorflow-branch doesn't use Neural Engine (ANE). However, it does use the M1-GPU.

The M1 GPU is about 1/3 the speed of a 1080TI card when looking at OpenCL score in Geekbench, but may perform faster than that in some cases, due to the shared memory architecture of the M1.

As I understand it, the ANE has very low precision, which makes it unsuited for training in Tensorflow.


Just tried out apples native tensorflow for M1 branch yesterday: https://github.com/apple/tensorflow_macos

installation was a non-issue, The training kept the Mac mini completely silent, having 40% of colab gpu speed is very satisfying for small tests.

pytorch on the other hand is not ready yet (there are installation instructions but they didn't work)


Comparing an integrated GPU to a discrete GPU seems a bit Apples to Oranges, if you'll pardon the pun.


It's highly unlikely Apple is going to support discrete GPUs in any ARM-based Mac without another major change in business direction. So once the full product line is available, it's going to be as Oranges-to-Oranges as you can reasonably expect.


An M1 Mac sees and mounts a discrete GPU connected over Thunderbolt. It just doesn't have an ARM flavored driver for it.

Which seems to align just fine with Apple's ongoing efforts to get rid of all the kernel mode drivers and replace them with user mode drivers over time.

I don't think you should read too much into their entry level chip not supporting external GPUs in the first generation.


Interesting decision to go with tensorflow on a new system. If you have no choice because of existing code it's understandable, though recent research has clearly shifted to pytorch [1].

Though if you're required to use tensorflow, will you even be able to switch to the bleeding edge version? N=1 but all projects I worked on where customers required tensorflow had to be on version <2

http://horace.io/pytorch-vs-tensorflow/


To be fair, I found another earlier ML preview to be more balanced (please ignore the title): https://towardsdatascience.com/apples-new-m1-chip-is-a-machi...

As evident from the benchmarks, the result is very dependant on your network. Some getting huge acceleration boost from the neural processor, while others can't be accelerated. I would suggest try and see approach.


Any thoughts on the value proposition if/when >= 64 GB RAM will be available on apple silicon? Even if the chips are relatively slow, the extra RAM could be worth it.


Has anyone done, or seen, any analysis on energy used for ML training rather than clock time? I’ve done some of my own for comp heavy workloads, but it wasn’t ML on a GPU.


All of this is interesting.

I'm currently using - as many do - a 2080TI for ML training. With this my training is <30min.

I would not use a laptop for training as I no longer use "developer laptops" that are more expensive with lower performance than my Linux desktop, especially now with homeoffice I no longer need a laptop - I understand others have a need for laptops YMMV.

But the benchmarks could give a glimpse for a desktop M1.

What I want to know, is it 0.5x or 2x the speed of a 2080TI.


Appreciate seeing some actual benchmarks. But it is a bit silly to round the results to seconds when they are in the single digit range.


Slightly off-topic, but I found M1 to be extremely fast at Gaussian Processes (using GPytorch/Botorch). I'm doing a PhD in Machine Learning for protein research, and for the heavy Bayesian Optimisation loop, the MBP M1 is almost twice as fast as my 16 core Ryzen 3950X "deep learning" machine (both use the CPU).


Maybe not the M1 but perhaps the M3 or M4 Macs. Who knows.

These Macbooks will be superseded anyway, so I'd rather wait until the software I'm using is fully supported and optimised than to jump into the first generation of M1 Macs with unoptimised / unsupported software running in rosetta.

But right now, in general? No. they are not good for deep learning.


.


Why are you running away by hiding your comment? Someone else saved your comment anyway. Next time delete your post if you want to run.

> In general, they’re fantastic.

The unoptimised software and the missing developer tools says otherwise. Especially for users of deep learning, if it lacks the tools they will not use it at all for this use case.

> The question was "Are they good for deep learning" and the answer was no. No one asked about them in general.

Don't you think the answer is to skip the M1 altogether and in general for developers and deep-learning users? At this point, there is no reason on getting an M1 Mac at all since the software required is not even ready for M1 and the hardware will almost certainly be obsolete this year for M2.

I wouldn't want to be an early adopter on a system that has unoptimised software on it and would be running on Rosetta. The answer to the question above lies in whether if the hardware in the newer generation Mac products is powerful enough for deep-learning. In this case, it is not. So just get a desktop with a RTX 3080 instead or wait for an M4 / M5 Mac.


I don’t think you can delete comments once someone has replied to them.


In general M1 is fastastic tho.


The unoptimised software and the missing developer tools says otherwise.


> In general, they’re fantastic. The question was "Are they good for deep learning" and the answer was no. No one asked about them in general.


Apple laptops will never be good at anything compute intensive.

At a certain point, you need just raw throughput, which requires power, which means a bigger chassis with louder fans, which ruins all the aesthetics of macs.


I've been out of the GPU market for a while.

Any comparison with M, other laptops and desktop with various GPU card price points?

What are some current solutions when doing local development without breaking the wallet too much?


There are graphs comparing it to Google Colab GPU/CPU in the article.


there are lots of applications that benefit from GPU acceleration, where even a single small-ish GPU is useful. Even outside of deep learning -- DL tensor libraries can be useful just for GPU numerical operations + autograd. Managing moving data back and forth to cloud environments can add a lot of friction.

A laptop that has the performance of a low-end discrete GPU, but is the size and temperature of a regular laptop, would be a very nice thing to have. Hoping software support for the M1 continues to improve.


Apple's hardware is not any good for ML since they abandoned CUDA ecosystem giving nothing to match the needs of such customers.


CUDA is owned by NVidia, and they are closed source in all practical ways.

I'm not an Apple zealot, but I won't ding them for not being able to support a closed source walled garden API.


Are any of these benchmarks actually using the M! "ML"/"AI" hardware?


The title is promising but the comparison is more than subpar and misleading:

- There shouldn't be any comparison to a bare CPU because M1 includes a TPU, a typical CPU doesn't

- Comparing it to Colab's GPU is good because latter includes a TPU but OP should have stated which GPU he got; Google Colab allocates different GPU models


Google colab TPU left out why? Its free just like the other two...


> but these still aren’t machines made for deep learning.

Betteridge's law of headlines again.


M1 chip is a classic case of some big old monopoly (intel) that refuses to innovate. Probably all the execs over there will get wildly rich while they loot the company and drive it into the ground.


Failing != Refusing


Most likely there is a bunch of middle management that blocks the innovation. Its a refusal.


I think it's more likely the same mistakes Intel made when approached to supply chips for the iPhone have persisted: focusing more on "raw performance" over "efficiency". When it's been clear for years, and especially clear now, that efficiency is key to raw performance advancing new processor designs.


The hype is getting out of control.

They do have an interesting cpu, but the apple target market isn't engineers/scientists/gamers so they traditionally have not done anything interesting with respect to ml and gpu kinds of things. The closest is video.

I think things might get interesting with an arm mac pro if they can add a lot of cores. I wonder if they will add PCIe slots so they can collaborate with other hardware manufacturers, or if they will navel-gaze some more and close it off.


It seems odd to phrase things as "hype" and say Apple doesn't target engineers as their target market.

The machine literally has a component in it called a neural engine which Apple advertises as:

> In fact, with a powerful 8‑core GPU, machine learning accelerators, and the Neural Engine, the entire M1 chip is designed to excel at machine learning.

(https://www.apple.com/mac/m1/)

So asking "how well does that really perform" seems like precisely the right question, and to me it'd seem very clear Apple wants a piece of that market.


The Neural Engine is intended solely or almost solely for inference, not training. For instance, in a post from last November [1], Apple mentioned their tensorflow_macos fork can use the "the GPU in both M1- and Intel-powered Macs for dramatically faster training performance", but didn't mention the Neural Engine. (Incidentally, I think that is the same fork used for the benchmark here.)

[1] https://machinelearning.apple.com/updates/ml-compute-trainin...


Anyone who is serious about ML (and by serious I mean like actually training fairly big models with lots of data for a project or as part of their job) is going to use cloud resources or at least have a decent gpu(s) at home. At no point will they ever look at getting a laptop to do ML one. Not only are you going to pay extra for less performance, but even with CPUs (which are still important for ML since you need to not only do the matrix math on GPUS but also prepare the datasets and set up the GPUs), you are not going to reach even close to the power that desktop chips can put out in any laptop with any manufacturer.

As for general targeting of engineers the history of MBP pretty shows that they don't really care. Things like virtual escape key on the touchbar, shitty keyboards with sticking keys, hardware designed to not be reparable, and more recently, releasing the M1 models with a spotty backwards compatibility of software out the door is really not something that would be done if you were targeting tech minded people.


That's for running end user applications that rely on already trained neural nets.


Usually a mobile device isn't for training just application.


Does any of the software mentioned in the article even utilize the so-called neural engine? If it's just generic software running on CPU/integrated GPU, especially if under Rosetta (no mention about that in the article,) the results represent only the current software support situation, and not much else.


Cue the bitter non Apple fan who can’t handle their success. Of course they will go many core. Of course they will provide extension points. I am sure you saw the FPGA accelerator from the existing Mac Pro.


The M1 laptops already have Thunderbolt 3 which is 40Gb/s and actually faster than PCIe 4.0 x16 (only ~35Gb/s).

Also these laptops have 8 CPU cores (4 high performance, 4 high efficiency) and 8 entirely separate GPU cores. It's a powerhouse of multiprocessing, especially when you consider it's in a laptop that gets 20 hours of real battery life.


A PCIe 4.0 x16 link is 32GB/s in one direction. Capital B. Quite a bit faster.

It's also fully bidirectional, allowing for the same speed in the other direction at the same time. Thunderbolt is capped at 40Gb/s total bandwidth.


Thunderbolt 3 is 40GBits/s (5GB/s). PCIe 4.0x16 is over 6x the bandwidth.


Powerhouse is a bit of an overstatement, especially with the GPU, which is honestly a pathetic offering for a device bearing the "Pro" name. Even still, the M1's multi-threaded performance can be topped by the cheaper Ryzen 7 4800u, which was shipping in $500 laptops long before the M1 even hit the market. It's actually kinda disappointing to me that Apple didn't take advantage of the 5nm node they spent so much time and money securing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: