
Google Sawzall now open source - mokeefe
http://code.google.com/p/szl/
======
DrJosiah
I took a Sawzall class for a week at Google along with my team, our manager
had requested it because there was a lot of things that we wanted to do with
logs in the long-term, which were quickest to do with Sawzall. The people who
taught the class were awesome, but I can't help feeling that they were a bit
apologetic for 1) knowing so much about the language, 2) having to teach us,
and 3) not being able to change the language in any way. But maybe that's
hindsight and altered memories based on the dinner and drinks we shared after
the week of classes.

The opacity of the language lead to a shared conclusion of those in the class
(some of whom were taking it as a refresher), that all of the unique Sawzall
code at Google had already been written in the first few months of it's use,
and everyone else had just been copying and pasting snippets from everyone
else's scripts.

I can understand why Google released it, it's the start to a halfway decent
map-reduce implementation, having a low-overhead startup, quick runtime, etc.
(compared to Python, which had been an initial logs processor, but which was
punted for logs-processing mapreduces thanks to it's relatively high startup
costs compared to processing time). But with things like Hadoop (and it's
support for arbitrary languages for operations), I can't help but feel like
this is a little late to the open source game.

Also, back at Google, I had the start to a project to translate a subset of
Python to Sawzall in order to allow for people to not have to suffer, and
potentially to write better logs processing code. Left before even getting
close to finishing it.

~~~
andrix
I'm wondering if this is the start of a set of open source lunching from
google of its core tools. AFAIK, sawzall without mapreduce is like a car
without the engine, but anyway I'm very pleased to read the code and start
trying the language. Kudos to Google!

~~~
eof
Do you mean an engine without a car? It seems like you mean sawzall is only
useful for mapreduce.

~~~
DrJosiah
Sawzall as a language is quite a bit uglier than the vast majority of general
purpose languages. Couple this with the read/emit nature of the language, and
it's either useful as a stream processing language, or as a step in a
mapreduce chain.

Given how easy other languages are at processing streams, tagging output,
etc., and that Sawzall doesn't really have an idea of shared state between
"records" (aside from data emitted), it's hard to find things that Sawzall is
good at _other_ than mapreduce.

~~~
eagleal
You can always use it with Yahoo!'s s4 (released yesterday).
<http://wiki.s4.io/Manual/S4Overview>

~~~
DrJosiah
Or I could use any one of a dozen other languages that are more convenient to
use, already available on my system, already works with S4, and with a syntax
that doesn't make me want to cry :P

------
tav
Sawzall Language spec: <http://szl.googlecode.com/svn/doc/sawzall-spec.html>

You might also want to take a look at Cascalog. It offers a much nicer API —
just list the various predicates and it'll take care of the rest for you, e.g.
to find all the guys following Emily:

    
    
      (follows "emily" ?person) (gender ?person "m")
    

[http://nathanmarz.com/blog/introducing-cascalog-a-clojure-
ba...](http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-
language-for-hado.html)

~~~
tedunangst
blech. html with content-type: text/plain == :(

~~~
mccutchen
That's because it's being served as raw text data from the Subversion repo.

~~~
mbreese
That's no excuse. You can always set the svn:mime-type property.

~~~
mccutchen
Haha, I think it's a great excuse. But I just don't see the value in telling
my version control software about the MIME type of every single file in my
project.

------
vomjom
The paper: <http://research.google.com/archive/sawzall.html>

On a related topic about another Google project:
<http://sergey.melnix.com/pub/melnik_VLDB10.pdf>

------
swah
If you want to compile on a Mac and you use Homebrew:

    
    
      brew install binutils
      cd /usr/bin/local; ln -s gobjdump objdump
      brew install icu4c
      brew link icu4c
      ./configure
      make

------
tav
It's worth noting that you can trace parts of Google Go's lineage to Sawzall.

~~~
supersillyus
Interesting. Can you elaborate?

~~~
cdavid
R. Pike was involved in both projects, if that's what is meant by lineage. I
am basing this on the author list in the sawzall paper, I am not working at
google.

------
chopsueyar
Why has Sawzall not experienced the Firebird moment yet?

<http://news.cnet.com/2100-7344-5156101.html>

~~~
DrJosiah
Before Google's Go, there was another language named Go. "Chrome" is the name
of the UI outside of the content in Mozilla's browser, email client, etc.
Google has a smaller-scale ad-hoc SQL-inspired language and system for log
queries called Dremel (which is awesome to use, I miss the hell out of it).

Sawzall as a physical tool falls into the same category as Dremel, which
hasn't seemed to have been an issue. Never mind the "Go" and "Chrome" names.

~~~
chopsueyar
What happened to Dremel (software)?

~~~
DrJosiah
As far as I know, nothing. But since I no longer work for Google, I would need
to set up a dremel-like system (which would minimally include needing to
inject logs, etc., into an new or existing hosted system), which increases the
barrier to entry until I have the time to do so.

------
nlake44
Anyone got an idea how hard it might be to port this to work on top of Hadoop?

