Hacker News new | comments | ask | show | jobs | submit login
Why data scientists should start learning Swift (fritz.ai)
55 points by Ethcad 8 months ago | hide | past | web | favorite | 66 comments



Why would you use Swift as your new data science language when Julia was made for that purpose and Swift was not? Julia's data structures, functions, syntax and libraries were all designed with scientific computing in mind. Swift was designed for general purpose app development.


> Julia's [was] designed with scientific computing in mind. Swift was designed for general purpose app development.

General purpose always wins. Or should always win. Because in reality nobody has any idea what "the purpose" is in the grand scheme of things.

Python succeeded because it was general-purpose enough. Javascript too. We don't want more narrow purpose languages that force us to change the language every time we change the fragment of the stack we work on.

We need an ultra-general purpose language with good support for both OOP and FP, non-retarded type-system, decent performance, and a good "compile to readable JS" story... to unify this damn mess of "diversity" that forces us to over-specialize in narrow niches and drowns us in complexity.

(No, otoh, I don't think "general purpose" should mean "infinite power" or "maximum expressivity". There's are reason why we're not all using Common Lisp and Scala...)


You really should give Julia a go. It's as or more expressive in the general purpose sense of things as Python. It's the best of MATLAB and Python in one neat package.

Edit: Plus, you can pass your data structures out to Python or C for processing. And you can use a whole host of visualization tools.


I did. And it seems awesome.

I just dont see people with "software engineering" background taking any serious liking to it. So it creates DIVIDE between the "software engineering folks" (that want regular-looking-OOP-and-basic-FP architecting feature for APIs and stuff) and "data science folks" who just want to focus on the algorithms.

The litmus test for a "truly general purpose language" to me would be:

(1) write some algorithmic code in it (with not much concurrency and parallelism)

(2) write some (purposefully heavily overengineered) GUI or web-app (full-stack) code in it in a team of 3+ including at least one guy who's both really junior and another guy who's really sloppy

(3) write something making heavy use of networking, concurrency and parallelism

If all three feel EQUALLY natural in a language, than you've got a truly general purpose language. If not, look for something else.

And I know, people hate general purpose solutions just as much as they hate "expert generalist" people, and they have good reasons too, as we've all (or most) been burnt bad by contact with both such "solutions" and with such self-labeled people in the past. But just because we generally suck at "general purpose" doesn't mean we should stop trying!


> Python succeeded because it was general-purpose enough.

> I just dont see people with "software engineering" background taking any serious liking to it.

I guess my point is that if you found Python general purpose enough, you'd likely find Julia general purpose enough too. If people with "software engineering background" take a serious liking to Python but not to Julia, then the reason probably isn't the language itself, but a combination of lack of popularity and a pre-conceived notion that the language is meant to be "scientific" not "general-purpose", that there aren't enough libraries, that the language might not survive, etc.


And Just came across link to new Julia article thanks to HN: https://increment.com/programming-languages/goldilocks-langu...


Expressiveness of the language is one thing. Does it have the vast numbers of quality and well tried and tested libraries that Python does? That's generally more important than expressiveness.


I don't think that's necessarily true but can't help but still agree with you. There are tons of wonderful DSLs and niche languages out there, but with terrible interfaces to other languages that want to do more with it. In my case that's mostly R and Minizinc, but I know there's plenty more with the same problem.

But it doesn't have to be that way. SQL is a domain specific language. Regex is a domain specific language. They take different API strategies: one goes for ubiquitous and standardized interfaces, the other goes for direct embedding. But they both prove that it's not necessary to write everything in a general purpose language.


All programming languages have a limited scope of applicability that is a natural product of the design choices that went into them. The idea of a general purpose language that is exactly the right fit for every task at hand is a myth.

There's nothing wrong with designing a language that is particularly good at data analysis, and which emphasises those features at the expense of others.


> Swift was designed for general purpose app development.

Swift is a general-purpose language, but I would say that its first purpose is for mobile development. (Or, at least, that's where it started.)

> Why would you use Swift as your new data science language when Julia was made for that purpose and Swift was not?

The author calls out Swift as being good because (1) TensorFlow is now supported for it specifically and (2) it is optimized for mobile development. The author seems to feel that being able to deploy machine learning applications to mobile devices (and optimized) is a great boon, and Python is not well-suited to this.


Heh, because 1-based arrays are gross. I'm poking fun of course, but you really shouldn't underestimate how many people are turned off of Julia by this.


It's not like it's the first. Doesn't Fortran default to 1-based? R and Matlab are also 1-based. Julia like those two are aimed at a mathematical domain, not zero-based offsets.

Anyway, it's not hard to get used to.


A bunch of wrongs don't make a right :-)

And honestly, it's because I do numerical programming that I value zero-based offsets. In addition to subscripting arrays (which I could do in any base), I use those subscripts in the math itself. For instance, the zeroth bin of an FFT indicates the zero frequency. I also choose the zeroth array element to represent the constant term (zeroth power) of a polynomial, and so on.

The common places where math notation uses 1-based subscripts (matrix notation) have more to do with people saying "first", "second", etc... With a few exceptions (the Hilbert matrix comes to mind), the base of the subscript isn't actually relevant to the math itself.


Generally math formulae are one based, the fft and the Taylor expansions being the major exception. Natural numbers, by convention, unless you're bourbaki, start at one.

I do appreciate that zero is easier because of offset caluculations, but you really do get used to it and in most cases the compiler figures it out with almost no penalty.


> Generally math formulae are one based, the fft and the Taylor expansions being the major exception.

What formulas do you regularly use which benefit from being 1-based? Note, I'm not asking for instances where the index doesn't really matter and the author of a paper simply chose base-1.

There are clearly cases where zero based is more natural. But I don't regularly use any cases where one based is an improvement.

I will admit that zero-based adds a lot of confusion when communicating to other people.


> A bunch of wrongs don't make a right :-)

Funny how I always thought zero-based numbering[0] was a hack. Just in computer science a hack will become the right way? Any other fields where hacks became the new norm due to technical restrictions?

Your statement is a opinion.

To add to the list of languages using arrays in a "gross" way: pgsql, pascal, lua.

[1]: https://en.wikipedia.org/wiki/Zero-based_numbering#Origin


The way I've thought about this is that it's about whether an index is the name or the offset of an element of an array. Indices have a torsor-like structure, where you can subtract indices i and j to get an offset j-i from index i, and there is the relationship a[j] == a[i + (j-i)]. Zero-indexing is the special case that the name is the offset from the first element.

Because of this, I figure any language that supports 1-indexing should also support arbitrary ranges for the indexing, like in Ada. (Basically, what's so special about 1?)


I always thought it neatly mimicked the way we think of age. When you are born you start at 0. You only turn 1 after you have lived a year.


That's an argument for starting at 1, not at 0.

At age 0, you have no elements in your array of years. So the first year is also element 1.

If you want code to reflect the way you've done counting and arithmetic your whole life, you'd want it to start at 1.


That would be defining it as a time span and using the time since start as index.

Which to me is one step away from using the timepoint as the index iteself (zero as start), verus (1 year from zero).

How would you index the time before the first birthday?


Insofar as a timespan is the same as counting years, then yes. If elements in the array are years, then the time before your first birthday doesn't appear.


> Your statement is a [sic] opinion.

Yeah, that's why put a smiley there. :-)


Originally, FFTs were the only time I missed 0-based indexing in Julia. But the neat FFTViews.jl package (https://github.com/JuliaArrays/FFTViews.jl) addresses this, and even improves on it by letting you use periodic indices avoiding the need for functions like fftshift.


"Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration." -- Stan Kelly-Bootle


Also:

  $[ = 1;
(Yes, I'm _old_...)


And don't forget the venerable:

   $| = 1;



I think it’s addressed in the article. He wants one language that spans many domains, including mobile. He even addresses Go in the comments section as not being available on Android.

Personally, I’m not aware of the ability to write mobile apps in Julia.

If I’m wrong, please post links, as that would certainly open up the conversation.


>engineers need a language that treats machine learning as a “first class citizen”

Machine/Deep Learning are not some novel application, we've been multiplying matrices since forever.


Like with Fortran, APL, J/K, R and for at least a decade, Numpy.


hm, no! data scientist are not using only Tensorflow... in the article libraries like Numpy, Scipy, Pandas are mentioned and swift does not have it or not in that maturity at all. It's not only about the beauty of a language, especially for data scientiest it's the variety and maturity of third party packages designed for data scientist and Tensorflow is only ONE part. Don't get me wrong... swift is beautiful and it is great to have Tensorflow natively running on it for certain scenarios but I don't see swift as a bright and good language outside the apple ecosystem (look at C#, great language, still mostly Windows).

If you are starting with Python definitely start with Python 3 and you are future safe...


Just to give an example why a "domain specific" language like Julia is more appealing then a "general purpose" language like Swift: I would like to demonstrate this on the old and classy Fortran vs C++ discussion in numerical computing.

In Fortran, you can write linear algebra on n-dimensional arrays (similar as in numpy and julia) very compactly, i.e.

   d(i) = TRANSPOSE(MATMUL(B(i,:),c))
Writing something like this in C++ is absolutely possible and elegant with modern templates libraries such as `eigen`. However, the compilation will be slower, the compiler errors will be hard to read and it is hard to beat Fortrans runtime efficiency of such code.

But it get's more interesting. Think of tensor contractions. This is something where you probably want to implement your own algebra (say for relativistic quantum mechanics or for general relativity) -- or you just stick to the n-dimensional array again and use index-wise loops:

   DO i=1,4
   DO j=1,4
   DO k=1,4
   DO l=1,4
     A(i,j) = B(k,l)*C(i,k)*D(l,j)  ! note: compe up with better examples
   END DO
   END DO
   END DO
   END DO
I maintain a templated C++ library to write such expressions in one line instead of 4 loops. But contrary to this Fortran code, in order to understand my code, you first have to learn this library. Means you need to learn C++, then the library. In Fortran, it is just Fortran. Nothing more.

Believe it or not: Many scientists are no good programmers. Cut-down domain specific languages are perfect to avoid them to loose time on weird compiler features such as "const", templates and all that overhead which is hard to regain in time.


> But contrary to this Fortran code, in order to understand my code, you first have to learn this library

Do you ? I've used Eigen and boost a lot of times and didn't ever need to "learn how the sausage is made", just looking at examples is enough to get stuff to work.


Whether you look up examples or a reference documentation does not change the fact that ontop of a given language, you learn new concepts of a library.

In contrast, domain specific languages have exclusive support for certain data types built right into the heart. There is a need for that.


I don't really understand the difference between reading three pages of documentation on a language website or two pages on a language website and one page on a library website.


Correct title: Why TensorFlow developers should start learning Swift

Why on earth should data scientists ditch the IPython/Jupyter/SciPy/etc. ecosystem?


I think there is a subtlety you missed in the article. He didn’t say ditch Python, he said learn Swift. He simply states that 10 years from now people will be using Swift, not Python.

Obviously he may be wrong, but it’s certainly reasonable prediction.


While the author says "Don’t mistake Swift for TensorFlow as a simple wrapper around TensorFlow to make it easier to use on iOS devices." , the only thing Python is missing from his wishlislist of features is "6. Native execution on mobile".

"7. Performance closer to C" is a non-issue - all the parts where performance matters are going to run on CUDA anyway and there's no performance hit there, and very little computing time is spent in the actual python code.


Still an issue, just not where you think. For recent, more efficient CNN architectures _data augmentation_ is a bottleneck when done on a single thread. So Python has to resort to either queues and async (TF approach, worse perf than PyTorch in practice), or use multiprocessing (PyTorch approach, works better but ugly AF under the covers). I would absolutely love to use a multi core-capable language there. The machine does have several dozen cores after all.


Does Swift have the libraries that Python/Matlab has? My wife is doing some kind of fMRI analysis/research for her PhD and she's using Matlab. I'm an iOS developer, so I'd love it if I can get her to use Swift!


Not sure about Swift libraries, but you should definitely try Julia. It's incredibly easy to move from MATLAB to Julia.

https://julialang.org/


Most fMRI toolboxes are written in Matlab or Python unfortunately. You don't really want to go down the path of trying to rewrite (most of them) too.


If you mean the matlab toolboxes (eg, image processing toolbox), which are proprietary libraries that you pay like $1k per year to use, no.

Honestly, as someone who wrote matlab professionally for a couple years, the price is a joke, and the performance is jokier. The only reason that matlab still exists is that they use the same marketing tactics as drug dealers- it's free to universities, and super easy to use.


I know companies like e.g. Airbus which replaced Matlab with Python.


Google just release tensorflow support javascript and swift. I don't understand why would somebody go with swift for this, if javascript is the language for the web. with javascript you could not just make a web app but also potentially like almost native app using frameworks. so, where is the use case for swift? probably running on IOT devices? or is swift is faster than javascript?

Javascript should just upgraded its syntax to be more swift syntax in the near future, that would be a game changer.


I haven't followed the news about Chris Lattner. For those who like me who haven't seen that he's in Google now:

http://nondot.org/sabre/

"I worked for Apple from July 2005 to January 2017, holding a number of different positions over the years" "This included managing the Developer Tools department, which was responsible for Swift Playgrounds for the iPad, Xcode, and Instruments, as well as compilers, debuggers, and related tools. In early 2017, I briefly ran the Tesla Autopilot team. We built a lot of great things, but Tesla wasn't the right fit for me."

Joined "Google Brain" in August 2017.

https://techcrunch.com/2017/08/14/swift-creator-chris-lattne...


The end to end application building aspect of python is not yet there with swift (swift for servers?). Also if folks keep sticking to tools well tuned for their jobs, maybe something like graalvm may provide enough interoperability and performance eventually ... in the "good enough is the competitor to the best" sense.


Swift runs on Linux and there are also plenty of web frameworks, so it’s definitely possible to run on a server


>> A clean, automated way of compiling code for specialized hardware from TPUs to mobile chips

If it is so good at that, why are we waiting so long for expeditiousness on run-of-the-mill industry-standard non-specialized non-Apple linux machines?


I hope swift gets serious about serving the needs of data scientists tool — it would not take that much work to improve swift playgrounds to the point to enable a far better dx than can be had with Jupyter notebooks or matlab ...


Agreed. I monitor this page [1].

So far, no Swift. :(

[1] https://github.com/jupyter/jupyter/wiki/Jupyter-kernels


Just found this. Not sure how good it is, but it’s a start....

https://repl.it/languages/swift


I don't know anything about Swift, but I really wish ML had settled on a language with higher performance. Interpreter speed doesn't bottleneck vision stuff but reinforcement learning really suffers.


Honest question: isn't python just glue over the well-performamt c++ tensorflow library? I thought it was, but given this article does python do more then? There wouldn't be much to gain by swappinge the wrapper.


It may sound weird, but I believe if any data scientists switch to a ‘nontypical’ language for the domain, it should be JavaScript.

What’s required for data science is a healthy ecosystem of scientific computing tools. While js obviously isn’t as mature as python (anaconda stack + Jupiter, etc) or R (tidyverse etc) in this aspect, it has made great strides recently: - tensorflow.js - observable notebooks - mathjs - simple-statistics / jstat

Furthermore, with tools like d3 + leaflet, js has very little competition when it comes to data visualiation.

A big thing holding js back is a mature library for data manipulation, hopefully this changes in the future (anybody know of any potential fills for this gap?).


Maybe you should use Kotlin instead. It runs native on IoS, on the server, and on Android as well.


Honestly I don't even use Swift for serious iOS apps, just for 'throwaway' apps for lack of a better word. Swift would need to stabilize for at least 5-10 years before I would consider building anything with Swift as the foundation. I strongly believe in backwards compatibility, the Swift team does not.


Yes and no... They did a ton if work to be backwards compatible with ObjC, while making breaking changes to Swift syntax/libraries with each new release. Although it is stabilizing.


Swift? Um, okay... Good language, yes, but it's still very iOS-centric. I generally agree with the assessment of Python, but I'll stick with R for everything it can do. I use Python for everything else. If Swift breaks out of the iOS box then maybe I'll think about learning it.


Python seems a lot more "fun" than Swift IMO


How do you mean?

I'm a big fan of Python, so I'm just curious what you mean here. Swift has a lot of great language features that Python lacks, in my opinion. (My first favorite: native option types. Second favorite: internal and external function parameter names. There are more, but these are my top two.)


i like Swift, i think, but prefer something more dynamically typed i guess. Swift gives me that “i’m using C++” feeling.


Please stop balkanizing the scientific software development community. Python has excellent wrappers for many other excellent scientific libraries which in turn leverage C and Fortran for high performance computing.


When people say Python, I never know if they mean 2.7, 3.x, or both, or are unaware that there's a difference, or don't realize how much it matters in practice... so a language that has clear forward momentum, focuses on the latest version, and quickly deprecates old versions is pretty welcome.


It's a lot easy to "quickly deprecate" when the ecosystem is small and breakage is acceptable. Let's see how Swift handles the situation at a similar point in it's life cycle. Bear in mind that most criticisms of the Python2/3 situation came from people that wanted less breakage - not more.


The Python community focuses on stability and maintains each major and minor release for a long time so applications built with a particular version of Python continue to work with updates for a long time




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: