

Growing a Language with Clojure and Instaparse - gigasquid
http://gigasquidsoftware.com/wordpress/?p=689

======
michaelochurch
One of my future blog posts is going to deal with how what makes a problem
area "sexy" often has to do with how much of computer science you touch along
the way. I call it the Ingress Factor; how cool something is, is directly
proportional to how likely it is to get you exposed to other seemingly
unrelated things. For example, an inordinate number of people want to get into
"data science"/machine learning right now. What makes ML attractive, I would
argue, is how much it brings you into interaction with all sorts of CS stuff
that the corporate world would otherwise block out as "too hard" for average
developers.

Clojure has this same aura, I think. What's great about it is how much of
computer science is accessible at the REPL. It's not about firing up some IDE
and not knowing how any of the magic works. Whatever it is, the community
makes sure that there's a way to dive in and explore it quickly, and that's
really admirable.

~~~
kyllo
_an inordinate number of people want to get into "data science"/machine
learning right now_

Guilty.

I think, though, that the reason for the data science boom, is simply that
it's the next phase in using technology to automate business.

First, we had CRUD apps to allow data entry and storage, with database rows
replacing paper documents. Later, these apps further automated the workflow
through data interchange and web services (EDI, SOAP, REST, etc.) to make sure
the data only needed to be entered one time and could be shared and reused
across trading partners. In this phase, the primary concern was acquiring data
--there wasn't enough of it yet to really make sense of it.

The next phase was the "business intelligence" industry of the last decade,
with analysts slicing and dicing data to make reports and views and
visualization dashboards for managers, so they could use sums and averages and
pretty pictures of line/bar/pie charts to try to make sense of business
performance data and use it to inform (or simply justify) their decisions.
Lots of BS going on in this phase, as most of these analysts and managers have
little to no actual statistics training, don't deeply understand the
visualizations, and only know "enough to be dangerous."

The next phase, in which enterprises have much more data (and more data
sources and formats) than an analyst armed with a BI stack can handle, is
actually using parallel processing and rigorous statistical modeling to infer
trends from the data, and even applying machine learning techniques to
automate the decisionmaking itself. This automates away the manager and
analysts' responsibility, and now all you need is a team of data scientists
and a technically knowledgeable executive with authority to approve the
models. I see it as actually centralizing executive control by automating away
the lower levels of decisionmaking in the organizational hierarchy. That's
where the game is headed.

~~~
michaelochurch
I am with you on the attractiveness of the data scientist role. I know many
people with that title who work on distributed systems, compiler hacks for GPU
code, etc. It seems to mean "software engineer who is good at math and
therefore smart enough to attack the most interesting projects".

If you're a software engineer, there are a million jerks out there who don't
understand your job but think they could do it just as well given a couple "21
days" books. If you're a data scientist, you get a lot more autonomy and dibs
on the interesting work.

~~~
kyllo
_If you're a software engineer, there are a million jerks out there who don't
understand your job but think they could do it just as well given a couple "21
days" books. If you're a data scientist, you get a lot more autonomy and dibs
on the interesting work._

Well, to paraphrase something you've said in your blog, successful convex work
tends to become concave over time. The first business CRUD app was a huge
breakthrough. But today, making CRUD apps, which is still how a whole lot of
programmers make their living, is becoming an increasingly concave task, as
the technically difficult parts are being abstracted away. We won't really
need any more LAMP developers soon--Rails is really only one layer of
abstraction below the point where non-programmers will be able to generate
functioning CRUD apps with a few mouseclicks, to collect their business data.
So, those "million jerks" are getting closer to being right every day, if your
job as a software engineer is just making CRUD apps.

So, now "data science" (aka massively parallel processing of large distributed
datasets with statistical modeling and machine learning techniques) is where
the convex work is. And of course you get a lot more control and autonomy when
you're doing convex work, because it's uncharted territory and no one really
understands it yet, so there are no standards or best practices to manage your
performance to.

------
airlocksoftware
gigasquid, do you happen to know if this is compatible with ClojureScript, or
does it have some Java dependencies? I ask because I started writing a
Markdown processor in ClojureScript, but I eventually realized the find-and-
replace-based approach I took (similar to John Gruber's version) wasn't going
to cut it.

~~~
gigasquid
After a quick look at the library, there does seem to be some Java libraries
being used - like java.io.BufferedReader. But having a ClojureScript port is a
great idea!

