Hacker News new | comments | ask | show | jobs | submit login
Swift for TensorFlow Design Overview (github.com)
198 points by asparagui 9 months ago | hide | past | web | favorite | 45 comments

Whoa...The Python interoperability is incredible.

You can use numpy from swift..As well as load pickle files !


At first I was fairly disappointed that Swift was chosen over Julia, and I still wish there was strong Julia support because Julia is a great language, but I've slowly been changing my mind and think Swift could be a really good choice long term.

I also just really like Chris' work and trust him to make the right calls until proven otherwise.

I love julia, but it already has a TF library that does a lot of very nice things (like not having to specify a computational graph separately from the execution) and there are other really interesting machine learning libraries (like knet).

Swift is a good choice because there are some reasonably good mobile targets, which Julia does not have. If you're deploying a ML model, you should use swift. If you're developing one, you should use julia.

Flux.jl is backend agnostic, so it has a pure Julia (with GPU code generation) and in progress ONNX and tensorflowjs export capabilities .

And then once Julia can reliably compile to web assembly (already being worked on), your entire app (front, back and ML ) will work in the browser.

> reasonably good mobile targets

What's the story of Swift for Android nowadays? Last time at looked it didn't seem a practical idea... I mean, even Google chose Dart of all languages for their Flutter cross-platform mobile toolkit. And on the native side Kotlin is gaining all the ground.

To be honest I'd love to see Swift instead of Kotlin and other things for cross platform mobile...

I am not very optimistic that this would ever be a big thing. I see two possibilities how this could happen:

1. Someone would write a 100% source compatible swift version for the JVM or at least a large enough subset leaving out the parts which handles memory directly.

2. Someone develops an easy to use framework to bridge Swift through the NDK (There would be a layer of C needed to talk from Swift to the JVM code and vice versa).

Option 2 is the more realistic approach i guess, but this would heavily effect the architecture of the apps built with this. Everyone who has tried to do the same with C++ (which is supported from the NDK directly) and wanted to have as much code platform independent as possible knows that.

I see the use of Kotlin Native more promising here in the future. As far as i know they have already interoperability with Obj-C / Swift implemented. You could write your Android App completely in Kotlin leaving out the horrible NDK part, because everything is running on the JVM and use Kotlin Native + Swift in the iOS version.

A little poking around brought me to this: https://www.elementscompiler.com/elements/silver/default.asp...

I know nothing more about it, but it looks worth checking out.

What are the mobile targets besides an iOS device?

I cannot understand why someone would downvote a perfectly valid question.

At the moment, Swift is really only viable for Apple devices. Any other mobile targets are a distant dream.


pretty much the entire deep learning community is still using python 2.7. I have moved onto python 3.6, but Julia is still an unsafe bet because it has an unreliable community behind it.

What does unreliable community even mean? As far as I know the Julia community is strong and healthy. Just check out their discourse site or slack overflow tag julia-lang.

First of all, I am deeply involved with Julia, so be wary of any biases as I try to stay objective.

Lattner et al. outright says that "[We] picked Swift over Julia because Swift has a much larger community, is syntactically closer to Python, and because we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster." and I only buy the last point – which, mind you, is an argument I would have made as well if I was sitting on a whole team specialised in a language. They acknowledge that “Julia is another great language with an open and active community”, so I would hardly say that your “unreliable” statement is shared by them.

Apart from the very strong practical argument made above, I think “Deployment” is really what it boiled down to. When Google launched TensorFlow, the general sentiment among many in the research community was “why”? As in, why this big move on the part of a major corporation? Back then I stated that the reason I saw was that for better or for worse, Google – and many others – see a future where Machine Learning (or Artificial Intelligence, if you prefer) would be embedded into almost any product. For this to be feasible, you need: know-how throughout the company, see the excellent TensorFlow tutorials/training, and the ability to deploy models across their entire spectrum of devices/environments. This is what it would mean to be an “AI first company”. One of the reason why TensorFlow at its core is a static graph, I would bet is because they saw deployment as key back then, as they do now.

However, TensorFlow is the very embodiment of the “two language problem” [1]; as anyone that tries to wrap TensorFlow in language X quickly realises, pretty much everything you care about on a high level of abstraction is written in Python and the underlying C++ API is about as bare bone as it gets. Fun fact, this is why TensorFlow.jl [2] goes through both the Python and C++ API. Now, add to this that PyTorch recently arrived on the scene and quickly became a very real “threat” to TensorFlow since dynamic graphs (eager execution) is far more intuitive to work with and enables you to express a richer set of models. However, it suffers from deployment issues since it is intrinsically tied to the Python interpreter. TensorFlow Eager was most likely a stop-gap measure and it was perceived to be clunkier than PyTorch, also after TensorFlow Fold being left to rot I sincerely doubt that many are willing to buy into yet another code branch to interoperate with TensorFlow “proper”. This all sets the stage for the TensorFlow Swift announcement.

[1]: https://julialang.org/blog/2012/02/why-we-created-julia

[2]: https://github.com/malmaud/TensorFlow.jl

From my academic perspective, this is a great time to be a Machine Learning researcher; we have plenty of options and the attention we are receiving from other software and hardware communities is amazing. As for Swift “vs” Julia, they both have their own issues.

Swift has a non-existing scientific computing community (this is why I find the “much larger community” argument disingenuous, as mobile application developers do not count in this particular case), they will have to build it entirely from scratch and community building is difficult. But they have the power of Google and its minions to work on the software itself (Hello DeepMind employees, have you gotten over the sour taste in your mouth from being forced to switch from Torch (Lua) to TensorFlow (Python) yet? I suspect Mountain View has ordered another meal for you!), perhaps this is sufficient to overcome the current lack of a community? Time will tell. There is also the problem of external collaboration that we have seen with TensorFlow, while it is open source the direction of the development is partially hidden which makes bringing in open source contributors trickier than necessary – think, no unified public forum where things are discussed.

Julia, while it has a (strong?) scientific computing community, it lacks static compilation from the perspective of “deployment” and is ultimately a team of ragtag researchers and open source contributors spread across the globe. Can they keep up? Again, time will tell. There is also the classic “issues” with Julia not being statically compiled, but I can see this swing either way depending on the overall direction in preference for static vs dynamic over the next decade.

Ultimately, they are both on top of LLVM and I suspect that much will be learnt from the other; there is room for more than one approach. My decision to side with Julia is partially to stay my own course, partially a preference for “the bazaar” development model, and partially because I have a hunch that Julia has a better chance to capture the scientific computing community as a whole which is likely to yield benefits down the line. As I said before, time will tell.

It's going to have enough comparative advantage when a bit more mature that I can see it drawing off from the python community and entrenching itself as a serious contender.

As a longtime fan of C#, this article and use case just pushed me from “why would anyone think they needed to invent Swift” to “wow that’s an extremely cool set of language design constraints!”

> Automatic differentiation in Swift is a compiler IR transformation implemented with static analysis.

Super cool to see this implemented at the language level like this.

Given the fact that the vast majority of people is still on Windows, using Swift (with zero official support for Windows) will artificially limit the use of the project outside the circle of the original developers.

That being said, you can use Swift through WSL, but not directly on Windows.

> That being said, you can use Swift through WSL, but not directly on Windows.

I thought so too, but i've checked the instructions to build on windows, and now there's a way to build it natively using Visual Studio cl(or clang-cl).

Following the instructions here, it worked for me.


For the record, "official" support is still not there because there's no CI environment for Windows (currently only macOS and a few versions of Ubuntu have CI). So changes aren't guaranteed to not break other platforms.

There's upstream (in-tree) support for Darwin, Linux, Windows, Cygwin, FreeBSD, Android, PS4, and Haiku. But most of those don't have CI support yet.

Same is true for architectures. There's support for ARM, i386, x86_64, PPC64 (BE + LE), and s390x.

IIRC there's been some talk about how to extend the CI infrastructure to support more executor environments, but so far I don't think it's been a high enough priority.

I also tried it, and it didn't work for me. I asked about the error message in the swift forum and got no response...

Using the prebuilt binary for ubuntu with the WSL is WAY more userfriendly.

Another problem problem is proper tooling. So far my experience with the LSP implementation is a bit dissapointing..

Does Swift have a large community beyond iOS apps? Last I used it was years ago, right after it was introduced by Apple. I'm curious if it has found growth in other areas.

Not really atm. There is a small community around server-side-swift which did some impressive work so far, most interesting project is IMO the Vapor Framework [1], but at least outside the Apple Dev community they didn’t get much attention so far. For general system programming, which would be also possible with swift, Rust seems currently a lot more popular.

[1]: https://vapor.codes

For people not familiar, Vapor is close to releasing version 3.0, using https://github.com/apple/swift-nio. I've heard rumors that the performance is really good, but I haven't ran anything personally.

Swift itself in incredibly fast, at least in microbenchmarks. I last looked at the numbers a year or so ago, but IIRC it was about as fast as C, which is several (as in 5-6) orders of magnitude faster than, say, python.

In terms of "level of abstraction" vs "Speed", Swift is definitely at a Pareto-Optimum (if that's the term for "can't get better in one dimension without losing something in the other).

Microbenchmarks aren't the be-all yada yada yada... And it's somewhat ironic that the server framework is named Vapor. But for web apps that are computation-heavy, I think it would be a valuable option to have.

Yeah, here are some old Vapor benchmarks for curious people: https://medium.com/@codevapor/server-side-swift-vs-the-other...

I drop in on the Vapor Slack group occasionally. I heard someone mention that version 3.0 was passing Go in benchmarks. Apparently they had some performance regressions integrating nio (it was just released last month) - not sure what the status is now.

I've seen a few server-side Swift frameworks (for building web apps or API apps) such as Perfect[1] and Kitura[2] (which is backed by IBM). It's not clear to me how much uptake they have so far.

[1] https://perfect.org/

[2] https://www.kitura.io/

I was starting to use Kitura but it seems the community is more in favor of Vapor right now. I moved on to Vapor as a result. IBM seems like it might abandon the project like they did their Swift Packages website.

Vapor is the likely winner in terms of Swift frameworks, but this does not say much.

The API's not stable, the framework is in very active development.

I imagine things will settle down in a year or two - until then, there are so many great alternatives (other languages), that I don't see Swift gaining much ground.

For awhile it seemed IBM was about to pick Swift as they did before with Java, but then they lost steam.

Ah, was there some news about it or it just that IBM seems less active on Swift http/ssl stuff lately?

They seem to be less active.

I wouldn't want to use it for anything that would evolve beyond 100k lines of swift code until the language improves it's build and IDE performance.

And if your making a backend, why not use kotlin which is fairly similar, but you get the entire java ecosystem?

Still a bit more verbose than Python. What would you gain by doing the same in Swift actually? If you have to type more code to do some experiments, and still have to import Python libraries for extra functions as they do in that example, what's the selling point? (beside it being cool)

Here's some context: https://github.com/tensorflow/swift/blob/master/docs/WhySwif...

Some benefits vs Python are static types (catching more errors at compile-time vs hours into a training run), no GIL, don't have to drop to C++ for higher performance.

Some benefits vs some other (non-Python) languages are a shallow learning curve, small boilerplate, safety-by-default, and a growing community.

Obviously there are places where Python and others have advantages over Swift too, and several of those are called out in the paper.

Is it really likely that you'll catch a type exception hours into a training run? The way tf works it'll catch a shape exception before you even start training.

Its not likely, but I've had it happen to me where 2200 batches in it crashes because the graph has dynamic sizes and the input is not well formed. Static typing would not have helped.

But, static typing could help provide much more useful error messages than what you get from the Python code.

I don't have experience of a type exception hours into run, but for me, deploying code to the test environment is long, so the edit-run cycle is very long. If the compiler can catch the errors locally, then edit-(compile)-run is shortened.

This sort-of feedbacks on itself. I use estimators, so type errors occur far into a run, but not during training itself, and checkpoints mean very rapid recovery, so ...

I think I actually stopped caring so much because the recovery is so fast.

Big help for me was coding a little script that means I can just send a notification to my android from from cli and/or python, so I know when these things need attention.

I think the primary advantage of using this is that you get both the usability of writing imperative code (like PyTorch or TF Eager) and the benefits of having a computation graph (easier distributed training, using TPUs and stuff).

Static typing is nice.

But no GIL is the most important part. Multithreaded preprocessing data input to the network is almost a must these days, but that is hard to achieve in Python. Multiprocessing or that PyTorch magic alleviates the problem, but they are not as high performant as multithreading.

In Keras you can just use a Sequence-based generator, set a large number of workers, or even put in multi-GPU callback, and you can parallelize pre-processing as much as you like.

I doubt that Keras can break the limit of the CPython interpreter. Its parallelization is probably also done with multiple processes, in no way as efficient as multithreading.

I might take a look at Keras source just to know :) But still, if you do image augmentation on the fly, you are probably fine even with processes if your batch size is sufficiently large.

Swift for Tensorflow or Tensorflow for Swift?

Yes, it's right up there with "Windows Subsystem for Linux".

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact