

Sizzle: A compiler + runtime for Google's Sawzall language, optimized for Hadoop - yarapavan
https://github.com/anthonyu/Sizzle

======
cheald
Wish they'd picked a different name. Sizzle is already the name of the widely-
known Javascript selector library used by jQuery.

------
ukdm
From the readme:

What is Sizzle? \---------------

Sizzle is an open source implementation of the Sawzall programming language
designed for interoperation with the Hadoop MapReduce and DFS stack. It is
implemented in pure Java, is easily extendible, and the programs produced by
it will run anywhere that has a recent Hadoop installed, even if Sizzle is not
also installed.

Why Sizzle? \-----------

Up until a few days ago, there was no publicly available implementation of
Sawzall.

About six months ago, I asked some of the authors of _Interpreting the Data:
Parallel Analysis with Sawzall_
[[<http://code.google.com/p/szl/wiki/Interpreting_the_Data]>] for more
specific details about how Sawzall worked than was explained in that high-
level document. Mr. Pike explained that he intended to open the source to
Sawzall; however, when I didn't hear from him for several months I started my
own implementation.

------
mahmud
Anyone know what the "pagerank" variable in here is?

[https://github.com/anthonyu/Sizzle/blob/master/src/proto/siz...](https://github.com/anthonyu/Sizzle/blob/master/src/proto/sizzle_document.proto)

~~~
anthonyu
That Document structure stores a URL and its pagerank for later processing.

In the Sawzall paper, it's used in a program that reports the top URL from a
domain, ranked by pagerank:

    
    
         proto "document.proto"
         max_pagerank_url:
              table maximum(1) [domain: string] of url: string
                    weight pagerank: int;
    
         doc: Document = input;
         emit max_pagerank_url[domain(doc.url)] <- doc.url
              weight doc.pagerank;

