
An Introduction to Machine Learning in Julia - one-more-minute
http://juliacomputing.com/blog/2016/09/28/knn-char-recognition.html
======
shoyer
This is a cute example, but it misses the mark. The efficient way to do fast
nearest neighbor search is with a search tree (e.g., KDTree or BallTree),
which brings down query time from linear to logarithmic in the number of
items.

~~~
staticfloat
Agreed, but since those two concerns are separate (algorithmic improvement vs.
taking advantage of parallel hardware) I'm not sure I'd categorize this as
"missing the mark" so much as that's a further improvement that could be made.
For a blog post that looks to be an attempt to tout methods to easily exploit
data parallelism, I think focusing on algorithmic improvement would be counter
productive.

~~~
adrianN
I tend to agree, but there is a tendency to throw hardware at things where
some algorithmic improvements would work much better. See for example this
blog post

[http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...](http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)

------
ovis
I was a bit disappointed that there wasn't much detail given about the problem
being attacked, and no information about the results other than some timings.
Sure, Julia's nice, but what are we looking at here?

Also, there's talk about how slick this is when using IJulia notebooks. It
would be cool to provide a link to an actual notebook.

------
MrQuincle
It would be great to see the difference with other languages. So why Julia and
not R, or Matlab, or Python? Is it more elegant, more concise, does it have
more libraries, can it be run in parallel better? That would be great to know!

~~~
Lxr
Julia is a lot (lot) faster than Python, Matlab or R. This doesn't matter so
much if you're gluing library calls together (if using well-known ML
algorithms that use native BLAS/LAPACK etc) but for custom stuff Python et al
are just too slow. Julia is comparable to C++ in speed (not as fast, but
within an order of magnitude in my experience) and it's MUCH more fun to
write.

There are definitely not more libraries though, it's still a young project.

~~~
Derbasti
The problem is: plain Julia is faster than plain CPython. But Numba has solved
this problem for me, by compiling my hot loop in-place. Other people report
similar success with PyPy, Cython, or Pyston.

Furthermore, libraries like scikit-learn, pandas, matplotlib, or scipy are
_incredibly_ powerful, and usually implement in fast languages. Python is only
there to glue them together.

For my applications, I just don't see any compelling reason why I should use
Julia over Python. In practice, Julia is (for the above reasons) not faster in
practice, and the libraries are a lot less mature. I try Julia every few
months though, and there is progress. Maybe in a few years.

~~~
conjectures
Between ccall, PyCall and RCall you can access a huge range of existing code
from inside Julia. So the libraries issue is moot.

Having used both, I much prefer Julia to python+cython. Three examples:
multiprocessing is much less restrictive, there's no edit-compile loop so
development is quicker, and no awkward pyx/pxd system. If you have to go deep
with a cython project, you basically end up writing C, and that's slow - not
cython's fault, it's a great tool, just a limitation of that platform.

~~~
Derbasti
I don't like Cython very much. I much prefer CFFI or Numba, which have none of
your stated problems.

~~~
conjectures
A quick look at the numba docs suggests that it doesn't parallelise any better
than regular python - so there's one of my problems.

Numba also seems to be restricted in the functionality it offers, e.g.
currently looks like no user defined types so good for hot loops but not your
whole codebase. If you touch the python C API it suggests it can't do full JIT
compilation, i.e. sounds like this would happen with any C-extension library
code outside the range of supported numpy features. No strings?...

I'd love to be shown wrong about this.

