"The solar system's distant reaches exhibit a wealth of anomalous dynamical structure, hinting at the presence of a yet-undetected, massive trans-Neptunian body - Planet 9. Previous analyses have shown how orbital evolution induced by this object can explain the origins of a broad assortment of exotic orbits, ranging from those characterized by high perihelia to those with extreme inclinations. In this work, we shift the focus toward a more conventional class of TNOs, and consider the observed census of long-period, nearly planar, Neptune-crossing objects as a hitherto-unexplored probe of the Planet 9 hypothesis. To this end, we carry out comprehensive N−body simulations that self-consistently model gravitational perturbations from all giant planets, the Galactic tide, as well as passing stars, stemming from initial conditions that account for the primordial giant planet migration and sun's early evolution within a star cluster. Accounting for observational biases, our results reveal that the orbital architecture of this group of objects aligns closely with the predictions of the P9-inclusive model. In stark contrast, the P9-free scenario is statistically rejected at a ∼5σ confidence-level. Accordingly, this work introduces a new line of evidence supporting the existence of Planet 9 and further delineates a series of observational predictions poised for near-term resolution."
This is (partly) outdated. MPS (metal performance shaders) are now (since torch 2.x) fully integrated in standard Pytorch releases, no external backends or special torch versions are needed.
There are few limitations left when compared with other backends. Instead of using 'cuda' device, one simply uses 'MPS' as device.
What remains is: the optimizations Pytorch provides (especially compile() with 2.1) focus on cuda and it's historic restrictions that result from CUDA being _not_ unified memory, and lots of energy goes into developing architectural work-arounds in order to limit the copying between graphics HW and CPU memory, resulting in proprietary compilers (like triton) that move parts of the python code into proprietary hardware.
Apple's unified memory would make all of those super complicated architectural workarounds mostly unnecessary (which they demonstrate with their project).
Getting current D/L platforms to support both paradigms (unified/non unified) will be a lot of work. One possible avenue is the MLIR project currently leveraged by Mojo.
> This is (partly) outdated. MPS (metal performance shaders) are now (since torch 2.x) fully integrated in standard Pytorch releases, no external backends or special torch versions are needed.
Not sure what you're referring to, the link I provided shows how to use the "mps" backend / device from the official PyTorch release.
> lots of energy goes into developing architectural work-arounds in order to limit the copying between graphics HW and CPU memory
Does this remark apply to PyTorch running on NVidia's platforms with unified memory like the Jetsons?
The project probably at least partially serves as documentation for other platforms to integrate Silicon acceleration. It basically demonstrates how to use macOS Accelerate and Metal MPS (metal performance shaders) using C++ for Machine Learning and training optimization.
Thus other platforms can simply take this backend-code and integrate it. (Pytorch basically did that already with Apple's help).
Nailed it.
I think more than partially. What happens in this repo will spread to the other major frameworks and over time, clever ideas that spawn on other projects will be reimplemented with Apple's adjustments back into the repo. It's a brilliant and efficient way to interact with the community, that can likely be measured in more sales of their hardware over time.
Neural engine is not helpful for training, its inference hardware, whereas this targets training and research. They use Accelerate and Metal (with seemingly similar/identical performance shaders that their Pytorch adaption uses) which allows for high performance training.
This project additionally serves as documentation for other platforms to integrate Silicon, which is good.
Still, being to run LLaMa2 on the NPU would be awesome due to the unified memory. Apple's restricting its use to only Apple-approved models is frankly irksome.
The main thing about this framework is, that it uses unified memory with GPU. This gives maximum performance. Neural engine one the other hand is optimized for low-energy inference (which is mostly an advantage on mobile devices), and imposes limitations and restrictions since it's hardware supports only very specific neural network operations. Thus supporting neural engine within a universal machine learning platform doesn't make much sense, it would just be a bottleneck.
The way to use neural engine is to convert existing models that strictly adhere to the limitations of the neural engine hardware (excluding many operations used in non-restricted NN models) for use in energy-restricted inference applications only. It's a different application scenario.