I'd expect the M1 to be terrible at it: with 128-bit vectors and only 4-big cores.
AMD has 256-bit vectors and 8-cores or 16-cores. Intel has fewer cores (10-cores or so) but 512-bit vectors on workstation processors. Consumer Intel (Laptops or Desktops) are usually only 256-bits per core, but that's still doubling up over M1.
Finally, GPUs are 32-wide x 32-bits aka 1024-bit (NVidia or AMD NAVI), or 64-wide x 32-bits aka 2048-bit (AMD GCN), and you can see why GPUs do so well on a massively-parallel program like Hashcat. Those 1024-bit or 2048-bit compute units are arranged 4x per Compute Unit / Workgroup Processor (AMD) or 2x per SM (NVidia), and then they offer 60, 70, 80, 100+ Compute Units / SMs per GPU (depending on chip specific details)
As Apple still implements OpenCL even on the M1's GPU, it's possible to run on the GPU portion of the M1, and appears to do reasonably well. It outperforms the low end Quadro GPU in my laptop and appears to outperform at least 9th gen Intel iGPUs by 5-10x as well.
I'm not very impressed with the 1000MH/s SHA1 Hashcat result. GPUs have moved forward in the recent generations by extraordinary amounts.
Even a low-end NVidia 1660 (which comes in laptop flavors and is a generation old at this point) is pushing 6000+ MH/s on SHA1.
Looking at someone testing laptop GPUs specifically (https://github.com/analsec/hashcatbenchmark), it looks like it's only 3-4x slower than a mobile GTX 1060?
And .. also hugely outclassed by even a laptop GTX 1070. Oh well.
I'm still holding to "it's impressive for the size and TDP", even if it's probably not enough to replace "SSH into the workstation at the office to run hashcat" yet.
> I'm still holding to "it's impressive for the size and TDP", even if it's probably not enough to replace "SSH into the workstation at the office to run hashcat" yet.
Jetson Xavier crushes the M1 using only 9-billion transistors. I realize that the M1 has other stuff on it (4-big cores, Neural Engine), but... yeah... the M1 ain't a GPU architecture. It has one, but its not a "serious" SIMD-platform.
M1 has impressive big-core / CPU characteristics though. But Hashcat just ain't where its promising.
I guess I'm just thinking it looks "good enough" for a particular usecase, running NetNTLMv2 through my usual rule/wordlists on pentests. Intel's IGPs have been good enough to do NTLM in a few minutes for a few years, so being able to crack NetNTLM handshakes locally would be nice.
Nvidia's Tegra stuff is really impressive though. I had a shot at playing with hashcat on a nintendo switch a while back, and even though that's several gens old and has very limited VRAM, it did surprisingly well. The newer SoCs must really fly.
Push your wattage up to 40W on the platform, and suddenly you're looking at 10W CPU + 30W GPU, and things start to get interesting. All Apple really needs to do is get their M1 + AMD GPU (NAVI 2x is looking decent), and they're set. (Rumor has it that Apple is pissed off at NVidia for some reason, so Apple x NVidia solution seems unlikely)
I'm not sure if the M1 has enough PCIe lanes. But hypothetically, a future design would include good I/O capabilities and start to scale upwards.
Could it be competitive in this area?
A company just decides where the scope of their work ends at some point.
By comparison with, say, my GTX 1060 (6GB), it appears to be around 4-5 times slower. Other benchmarks seem to confirm this.
I'm very impressed. My Quadro P600 is around 5 times slower than the M1, all while using more power by itself than the entire system.
Granted, that's a very slow (and generation old) discrete mobile GPU, but for what it is, I think the M1 makes a fine showing. It greatly outperforms my 8th gen Intel brick that weighs 5 pounds and needs a 110 watt power supply, so I'll probably cave and upgrade.
There it is as expected comparable to fast aarch64 phones with 2.5GHz, faster than 2GHz laptops.
Neon is not as fast as AVX2, only comparable to AVX, and has no AVX512.
Ie. 2x slower than a Ryzen with 2x higher clocks, but comparable to old Desktop CPU's.