

Why Go for decision trees? (2013) - rgbrgb
http://datascience.systemsbiology.net/data-notebook/

======
micro_cam
Hah, I wrote this article ages ago when we were experimenting with a data
science blog. Surprised to see it pop up here.

I think most of it still holds true but we did do some benchmarks on public
data and the implementation is competitive with or faster then other
implementations including scikit-learn's cython implementation which wins a
lot of benchmarks.

Code and some benchmark results in the README are here:
[https://github.com/ryanbressler/CloudForest](https://github.com/ryanbressler/CloudForest)

~~~
rgbrgb
I submitted it because I'm playing with CloudForest right now. I really
appreciate your work, it's fun to use.

I'm co-founder of a digital real estate brokerage[1] and as a weekend hack,
I'm using CloudForest to predict sale prices for residential real estate using
data from the MLS.

A lot of the predictions are really good with a few that are terrible (like
double the list price, which is a feature). Any way to get a certainty value
for each prediction? Perhaps I could look at the variance of values from each
tree? I'm kind of a noob with decision trees so I'm probably totally off base
but maybe that question makes sense. :)

[1]: [https://www.openlistings.co/](https://www.openlistings.co/) (YC W15)

~~~
pigscantfly
Yes, you should be able to estimate a confidence interval by looking at the
distribution of votes from each tree.

[1] [http://arxiv.org/pdf/1311.4555.pdf](http://arxiv.org/pdf/1311.4555.pdf)

------
jerf
It sometimes raises eyebrows, but this sort of post is why I classify Go as a
modern take on a scripting language. With GC and interfaces standing in for
duck typing, it's got a lot of the good characteristics of the scripting
languages, but it's also got just enough low-level performance stuff going on
(real structs, real arrays, the slices the blog post mentions) that rather
than taking the 50-100x hit you can easily take with Python you can take
merely a 2-3x hit over C, and you'll write the code significantly more quickly
than C or Rust too. The result is that where you might not need something that
allows you, and correspondingly, _requires_ you to deal with all the issues of
memory management or ownership that C, C++, or something like Rust may, but
you might not be able to afford Python's performance issues or the general
difficulty the scripting languages have with true concurrency and multicore,
you have a viable choice in Go.

Go's not alone... it seems to me there's many languages taking a swing at this
niche now. All the best to all of them... I love me the flexibility of Python
or Ruby sometimes and may not need total system-level control but I can't
always afford the resource gluttony of the older-school scripting languages.

~~~
Dewie3
This development/trend seems to show that the value of the "dynamicity" of
scripting languages have perhaps been overrated. There seems to be a set of
abstractions with reasonably/very good mechanical sympathy that are both easy
to understand and use, and that doesn't constrain you that much with regards
to factorability and coding iteration.

~~~
pekk
If there was a time when dynamic typing was overrated, then it is certainly
possible that this time is a time when static typing is overrated.

~~~
jerf
"If there was a time when dynamic typing was overrated, then it is certainly
possible that this time is a time when static typing is overrated."

I think that's actually a very important aspect of understanding _where_
dynamic typing as a hugely popular thing came from. Today, I favor static
typing. However, given a choice of Python or 1990s-C++, I'll take Python. It
could be argued that the static typing world needed to be dethroned and be
forced to improve itself in the face of the dynamic competition to become what
it is today.

On the other hand, pretending that "static typing" is the same today as it was
20 years ago is merely the complementary failure that people too bedazzled by
static typing 20 years ago and dismissed the oncoming dynamic typing wave
made.

------
chenzhekl
We need generics!

------
pkroll
Don't know if the link was at a different place earlier, but it's now at:
[http://datascience.systemsbiology.net/data-
notebook/](http://datascience.systemsbiology.net/data-notebook/)

~~~
dang
Thanks, we've updated the URL to that.

------
mkoryak
why go?

why is go being so heavily featured on HN lately? reminds me of that effort to
ask a ton of R questions on stackoverflow.

We should have more stories about languages people actually get paid to use,
like java!

jk (kind of)

~~~
sethammons
i'm paid to write go :)

~~~
mkoryak
I wonder how many here are in your boat

