Hacker News new | past | comments | ask | show | jobs | submit login
High Performance Numeric Programming with Swift: Explorations and Reflections (fast.ai)
113 points by parrt 8 months ago | hide | past | web | favorite | 45 comments



One thing to look out for is that Swift Arrays aren’t really arrays. https://www.raywenderlich.com/1172-collection-data-structure...:

”1. Accessing any value at a particular index in an array is at worst O(log n), but should usually be O(1).

2. Searching for an object at an unknown index is at worst O(n (log n)), but will generally be O(n).

3. Inserting or deleting an object is at worst O(n (log n)) but will often be O(1). These guarantees subtly deviate from the simple “ideal” array that you might expect from a computer science textbook or the C language, where an array is always a sequence of items laid out contiguously in memory”

If you want a more traditional data structure, use ContiguousArray, which is an array. https://developer.apple.com/documentation/swift/contiguousar...:

”The ContiguousArray type is a specialized array that always stores its elements in a contiguous region of memory. This contrasts with Array, which can store its elements in either a contiguous region of memory or an NSArray instance if its Element type is a class or @objc protocol”


My instinct is to find this horrifying. Is there a good reason for this?

If I have an array of ints, is it contiguous?

Especially with modern computers and how importance data locality is the name "array" is kind of sacred. If you want a fancy weird-non array give THAT the longer, annoying name.


Presumably Objective-C/Cocoa compatibility. Although I'm wouldn't be able to tell you why that would require non-contiguous storage... surely @NSArray is contiguous?



"If the array’s Element type is a struct or enumeration, Array and ContiguousArray should have similar efficiency."


Thanks for writing up your thoughts!

I find Julia's core design to be excellent for general purpose programming, better than python in fact since it essentially solves the expression problem with it's type system and multiple dispatch.

It's external program interop is also more pleasant than Python's :https://docs.julialang.org/en/v1/manual/running-external-pro...

Sure, it doesn't have the same general library ecosystem, but even that is being remedied for core areas like web programming: http://genieframework.com/ (a full MVC framework), https://github.com/JuliaGizmos/WebIO.jl (write front end code without javascript) and I'm particularly excited for https://github.com/Keno/julia-wasm, which will allow Julia programs to be compiled for the browser.

For any packages than are python only, it has excellent python interop using the pycall.jl package, which even allows users to write custom python classes in Julia.

With regards to numerical programming, it's obviously already far ahead of swift, and IMO much better placed to beat it in the long run. For example the WIP zyogte package is able to hook into Julia's compiler to zero overhead diff arbitrary code. Using Cassette.jl, package authors can write custom compiler passes outside the main repo and in pure Julia: https://julialang.org/blog/2018/12/ml-language-compiler

In addition, it's macro system, introspection, dynamic typing and value types through abstract typing approach allows for natural development of advanced probabilistic programming languages: https://github.com/TuringLang/Turing.jl, https://github.com/probcomp/Gen, https://github.com/zenna/Omega.jl/pulse


Agree with everything you've said, it's hard to see why one would prefer Swift over Julia for numerical computing. I use Julia for it's regex too; it's just nicer. Hopefully, the data munging packages in Julia can catch up to dplyr and data.table, then we are talking!


I am a happy Julia user, but I can imagine if you had a use case where you wanted to compile a binary or shared library, Julia could be a pain.

C++ of course works fine for this but I imagine Swift would be less terrifying to use.



I see a lot of great numerical code in Julia. As a C++ developer, I don’t want to deal with a runtime or various parts of the language, but I imagine I could be more easily brought to the table if Julia code could be exported to a shared object file and linked against, or if it could compile to a direct binary.


I think Julia is the dark horse to eventually take over a wide swath of computing - possibly wider than Java or C++. As others have pointed out there's an effort to produce static Julia executables, and I think it's already possible to produce libraries. One interesting datapoint is that Julia's C FFI is faster than that of C++...

https://github.com/dyu/ffi-overhead

(For those interested, the order of the first few languages is: lua-jit, julia, c(!), c++, zig, nim, d in order of decreasing speed.)

It's extremely well thought out, concise, powerful, and readable. I think Julia's approach to types and multiple dispatch is a better alternative to traditional OO programming.

One thing the author didn't point out is that C++ (clang), Swift, Rust and Julia all use the LLVM infrastructure, resulting in extremely similar if not identical code generation. If datacenter efficiency truly becomes a priority, highly efficient languages like Julia, Rust and Swift will see increasing use for general purpose programming.


Do you know how Julia gets such great ffi performance? Is it inlining, for instance?


It's a bit of a JIT party trick by emitting the target address directly rather than going through the PLT.


https://github.com/queryverse/Query.jl Allows for dplyr syntax to work with any iterable and custom table types using traits...So I think it's already beating R data munging in the flexibility department.

Still missing some verbs, but these will be added.


Given Julia has macros, so it will definitely catch up to R and data.table in terms of syntax (if it's not already there). I am more thinking about performance, e.g see https://h2oai.github.io/db-benchmark/. It shows that Julia is lagging behind on group-by (and from my experience many other operations) when compared to R's data.table.

Although I have done some work to make thing fast see: https://github.com/xiaodaigh/FastGroupBy.jl. I have yet to update it to Julia v1. Hopefully, I will get to that soon. However, the improvement I have made only works for grouping up to 2 group-by variables and I need to learn more about generated functions to make the code more generic. So from my (someone who's actually spent time trying to optimise these data operations) perspective, Julia will take a while to catch up. Hats off to the data.table crew!


Ah yes, that is true. However Julia is tackling a harder problem in that the speed lag is presumably due to optimizing for custom element and table types.


"for custom element and table types" that is highly likely to be true, but unless an equally fast Julia program exists I remained scientifically skeptical. But my prior believe is that Julia can be as fast.


DataFrames.jl doesn't parameterize on its types and tries to rely on function barriers to be fast enough. That's what gives it the speed issue. IMO this isn't the best idea. It's not difficult to write a data table that's easier to optimize than that, but you would have to give up some of the flexibility.


I am betting Julia will finally make Python community take JIT as standard feature seriously.


They do, there have been several high profile projects to make a JIT for python... The problem isn't, that JITs aren't taken seriously - the problem is, that python isn't designed for a JIT, so it's very hard to get the advantages of using one


I am well aware of PyPY and friends.

My point is about making JIT a standard feature of CPython.

JavaScript, Dylan, Smalltalk, SELF, Common Lisp aren't less dynamic than Python.


Yeah, I'm just trying to point out, that not much would be won by that.

You either need to change the semantic of the language (CPython) or live with a slow JIT for some of the most used features of the language... And changing the semantic of the language is pretty terrible - it fragments your libraries introduces unbreakable borders prohibiting cross package optimizations and extensability, and much worse, CPython is now basically a language that combines the bad stuff of python and C ;) I know, that's a bit too cynical, but coming from a well designed language for JIT (Julia), that's how CPython starts looking to me. You will simply never reach the elegance of a language that was designed from scratch for this.

What I want to point out is, that the whole topic is bigger than "JIT my one slow function and make it fast"! A good read about this topic is: http://www.stochasticlifestyle.com/why-numba-and-cython-are-...


And sorry for hijacking your perfectly fine comment - of course it would help python to embrace modern compiler technologies! ;) It's just always a bit maddening to see how much effort and money is spent on creating imense projects for python, just to catch up with Julia - while turning python pretty much into a jitted frankenstein monster, to cover use cases it was never designed for ;)


Reminds me of this post https://hackernoon.com/why-is-python-so-slow-e5074b6fe55b I can't say I understood everything but I also recall that Python is hard to optimise because it has such a rich object model


The same rich object model that Common Lisp, Dylan, SELF, JavaScript, Smalltalk enjoy while having JIT support.

For example in Smalltalk, you can destroy all JIT assumptions about a given object by sending a become: message.

https://gbracha.blogspot.com/2009/07/miracle-of-become.html


> With regards to numerical programming, it's obviously already far ahead of swift, and IMO much better placed to beat it in the long run.

Julia's secret sauce is LLVM. Considering the guy behind Swift also made LLVM, I'm inclined to think that Swift will come out ahead.


I'd say LLVM is an important part of Julia's success but not the whole story: it is also a very well designed language.


Really? The 1-based array is really a mistake.. They should have followed Ada: allow any starting index and provide keywords to access the first/last element of the array. All this by default, otherwise it doesn't matter.


I don't pretend to understand the feelings of people who care about this issue strongly, so I won't defend the choice vigorously. But I would guess they did that because their target demographic were Fortran and Matlab users, all of whom use 1-based indices, as is usual in math books. At any rate, in Julia you rarely need to write the start index explicitly (because of functions like eachindex) and can easily start your indices anywhere you like if you feel strongly about it.

(I know that to you specifically the fact that the ability to choose the starting index is not builtin means it doesn't matter, but I point it out for others who might not know that it's easy to do in Julia.)


>Julia's secret sauce is LLVM

That most certainly is not true. See this for example: https://arxiv.org/pdf/1810.09868.pdf


XLA is built on LLVM I think.


I don't think so(https://www.tensorflow.org/xla/developing_new_backend). And even if so, it's not relying on LLVM optimizations (for TPU) or the LLVM API. (which is why some of these had to be punted back up to the Julia optimizer for XLA, which was done in a third party package!).

The point is that Julia's design, type system and multiple dispatch facilitates writing dynamic yet highly optimized code for a variety of backends, even those requiring static semantics (unlike LLVM).

There is no way you can look at that paper (or the Flux ecosystem, or the prob programming languages or the SSA IR autodiff) and chalk up Julia's success to just LLVM.


The GPU backend is (and so's the CPU backend, but that's currently too slow to get much information from). The TPU backend isn't (which is why we targeted XLA in the first place).


Hello folks! I wrote this article - so if you have any questions, feel free to shoot them my way. :)


Jeremy, if you're up for it, could you talk any more about your explorations of Julia?

I see (and agree with) your point about worse non-numeric stuff, but if you stand back and squint I get the impression Julia does most of what you applaud here, with the added benefits of a more transparent compiler and a more numeric-focused community (no need for BasicMath there!).

In particular, Flux.jl seems like a fairly direct competitor of S4TF, and this blogpost [0] really blew me away.

[0] https://www.julialang.org/blog/2018/12/ml-language-compiler


The post says Julia is not good for general purpose programming. I think it is good for that, it's just that it does not have as many packages as Python, that's all. I will offer one reason, Julia syntax is actually very much like Python's in many respects. So how can Python be good for general purpose programming but Julia not? So if the sentence is more like, Julia doesn't have as many packages for general programming then I think it's more precise


That sounds totally fair. I haven't used Julia for a couple of years so my comments on it are dated and not well informed. Everyone I know that uses Julia nowadays loves it.


Flux.jl does look terrific. Frankly, part of my interest in this little Swift research project was to pick something that's not at all well explored, and try to dig in to it.

Julia is much more mature for machine learning than Swift at this point. So it would be a better choice if you want something that's at least somewhat ready for use now - but I was really wanting to get in on the ground floor on something that's just getting started.


Interesting article, but I have one nit. I don't think your reasoning behind why Objective C does not support overloading is sound. Selectors are completely incompatible with C function names anyway. For example, they are only relevant in the context of classes (which do not exist in C), or example they allow characters that are illegal in C function names in symbols (like ':').

The bigger issue is that to overload a function in a manner similar to C++ you need to have accurate type information, which historically was not available in Objective C since it relies heavily on duck typing and casting objects back through id. If such strong type information was available then the type data could have simply been mangled into the selector by the compiler the same way it is mangled into the symbol name in C++. I suspect that duck typing allowed a lot of productivity and memory wins in the 90s, and that most compilers of the era were not capable of exploiting the strong typing information to optimize as aggressively as they do now, meaning that it was probably the right trade off for the time.

I suppose an alternative implementation of overloading could have been implemented by having objc_msgSend dynamically query the types of all parameters which are overloaded, but that would have resulted in a huge performance hit on every dynamic message dispatch.


Very interesting perspective - many thanks for sharing.


Curious if you ever benchmarked your approach vs, say, going through `Array`/`ContiguousArray` (I think these are slated to converge eventually, FWIW) and using the [`withUnsafeMutableBufferPointer(_:)`](https://developer.apple.com/documentation/swift/array/299477... calls?

You've gotten into a place with a lot of unidiomatic designs--direct pointer access on COW types, etc.--and it's not clear how much is really necessary:

    extension Array where Element:CanDoMath {
    
      // instead of this style:
      func sum_outside() -> Element {
        var result = 0
        let p = self.pointerToStorage // your "get the pointer" method, I think it was just `p`, too?
        for i in 0..<count {
          result += p[i]
        }
        return result
      }

      // how does this compare (in -unchecked mode, at least)?
      func sum_inside() -> Element {
        return self.withUnsafeBufferPointer() {
          var result = 0
          for v in $0 {
            result += v
          }
          return result
        }
      }
    }
Going the `sum_inside` route for bulk operations makes it easier to remain idiomatic, keep COW around (assuming you want it), benefit from `var/let`, and so on. The only obvious concerns are (a) relative overhead--did you ever benchmark that?--and (b) alignment.

For (b) if you're planning to call things that need particular alignments then as far as I know you will need to write your own storage at this time.


What I'm doing is essentially the same as `withUnsafeMutableBufferPointer`. However I didn't find a way to get concise abstractions using that approach.


It is essentially the same, sure. We have some specialized in-house structs-of-arrays things for doing bulk geometry operations that (behind the scenes) go through `withUnsafeMutableBufferPointer` (etc.) for everything; we keep the code idiomatic, mutation only happens in methods that are marked as mutating, COW still works, and so on.

Thus we hadn't even considered just exposing the pointer and doing it C-style, whence the question as to whether you'd benchmarked the difference between the two.

The abstractions thing is hard, here, the key seems to be defining the bulk operations in terms of pointers (or Swift's "buffer pointers"), essentially what you have in your methods like `SupportsBasicMath.add` and so on. Abstraction is possible here by moving each "type signature"--destination & 1 source? destination & 2 sources? etc.--into "operation protocols", and then having fewer methods but a ton of "operation protocol implementations". Perhaps "more abstraction", definitely not concise. Very dependent on the compiler and inlining, too.

It's a good writeup nonetheless, was just asking a narrow question.


The pointer approach allows for brevity and idiomatic swift at the place of use. The verbosity behind the scenes need not bother the user.

Because I couldn't find anything that provides that using the approach you are discussing, I didn't investigate its performance characteristics. For me, dev UX comes first, and I wouldn't personally be interested in reading or writing in a language that requires the "with" construct wrapping every calculation.


Key point here is Chris Lattner is working on this stuff. He’s the creator of Swift and LLVM and one of the smartest minds in the industry. Pay atttention.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: