Hacker News new | past | comments | ask | show | jobs | submit | pnichols's comments login

Any comments on performance today or in the near future? Any features which should provide a big speedup in the future as compared to competitors (kdb, pandas)?


I've primarily been focused on the ergonomics of the language, so I've only tried to make performance "reasonable" for now.

Longer-term performance objectives are:

1. JIT - I designed the VM's byte code to be both interpretably and a mid-level IR to LLVM. Currently I just interpret everything since there is almost no runtime overhead for vector operations. However, compiled code will greatly speed-up any scalars in a loop.

2. SIMD - Since the VM's opcodes are already statically typed and vector-aware, integrating OpenBLAS and SLEEF (or Intel's MKL and VML) should be straightforward.

3. MIMD - Ideally I can just lean on existing libraries, though I'm not above embedding OpenMP if that gets the job done.

4. Distributed - Now comes the hard part. If we want MPI-level performance, I need to have more sophisticated scheduling. Which leads us to...

5. Streaming - This is the real holy grail. There has been a ton of research in the database community to get away from the "Volcano model" (iterators). I want to have the compiler generate streaming-aware opcodes for the VM based on the nature of how the data is to be consumed. I believe this will require a type system that can track the "context" of the computation, similar to how Koka and F* track side effects. I'm not aware of any general-purpose language that has compiled streaming.


Looking at interpret.cpp for SIMD potential: I bet you could add an allocator for std::vector that aligns and pads everything to 32 bytes then just replace all of the scalar op loops with loops over AVX intrinsics. No need for an external library.


That's a possibility to get something running near term. I'm trying to avoid CPU-specific intrinsics since I have a fantasy that this might be run on ARM in the future, though that may be getting really ahead of myself.


NEON intrinsics are pretty easy as well ;) As long as you are doing simple +-*&| ops they work the same as SSE.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: