Fast data processing engine written in C

beobab · on Nov 24, 2011

Looks interesting, but I am struggling to figure out what it actually does. Some concrete example output in the tutorials would help.

simonw · on Nov 24, 2011

Same here - I'd suggest moving the design justification to a separate page and having the project home page start with a less-than-10-line usage example of something useful you can do with it.

ed · on Nov 24, 2011

The thing I found interesting is its flexibility. A list of planned/implemented modules might be helpful. Once you understand that Frozen just chains together small programs (modules) that process streams of data you can start to think of all sorts of interesting applications.

Of course that sounds a lot like Unix. What can Frozen do that a series of chained programs can't? A self-documenting config file may be better than a bash script, for one.

tlrobinson · on Nov 24, 2011

I like the idea. Unix pipes are very limited though.

Something like this but using, say, ZeroMQ could be really interesting. Mongrel2 could be the HTTP server in such a system, etc.

x86_64 · on Nov 24, 2011

Implemented modules list is in documentation on page "Modules". Planned in TODO file.

x86_64 · on Nov 24, 2011

There is one nice example for now - split file on many using set of regular expressions. But i get it, will write more examples including web application, caching and so on.

cturner · on Nov 24, 2011

Interesting. This reminds me of a product I used to work on. It is a node design. Nodes can be plugged together to normalise and aggregate trade data.

I'd be interested in contributing to your project, starting with working on the docs. If interested, please contact me. I'm at gmail, username cratuki.

x86_64 · on Nov 24, 2011

Feel free to fork it on github and make contributions. To avoid double work, check out TODO file - top items have first priority and would be done very soon.

icebraining · on Nov 24, 2011

To setup a web server we need to install many different components: database, server, may be some cache, balancer, fastcgi handlers and so on. And every part must be connected somehow. But any connection introduce significant latency and limit performance of overall system. So, if there is only one physical machine, why bother with it? Lets join all components to one process. Simple call within process is best avaliable to world communication channel, it is fast and simple.

So, what happens when you need to scale it to more than one machine?

x86_64 · on Nov 24, 2011

Depends on what you scaling. Actually there is no borders in such approach. If you app overloaded with template parsing you could move it to another machine, if problem in "database" thing same solutions came as for NoSQL. There is no limit what to scale and how. You could have everything in one process exept something that need scaling and this would be more efficient than dividing things on databases, cgi and balancers.

pork · on Nov 24, 2011

What makes you think that IPC is the dominating cost?? It's usually network or disk bandwidth, not IPC. And once you scale past one machine, you can't avoid network chatter by definition.

x86_64 · on Nov 24, 2011

All this is not only about ipc. Removing some ipc will give a little speed-up in latency - nice bonus, nothing more.

dhx · on Nov 24, 2011

After reading through the source code, I'm no closer to figuring out how this is different to plugging functions, classes or external libraries together.

Can someone describe what makes this library different to a standard utility library?

x86_64 · on Nov 24, 2011

In case of using it as library - if you write some application you use different api to make things work, for example call api to write info into file. But, if you face with scaling or other problems in most cases you need to rewrite code of your application, for example set up buffer before writing to file. If you use frozen you can define format in which you dump info to it and stop touching code and binaries of your application at all. Quick changes in configuration file to add buffer, or write into memory, or flush info by network and everything is working.

iradik · on Nov 24, 2011

is there a project you have used this for?

x86_64 · on Nov 24, 2011

Plan to use in web project with huge data sets (domains, whois, dns, traceroutes). Now i collect data with crawler and writing this.

iradik · on Nov 25, 2011

hmm, given that this is a kind of toolkit... would love to see an real-world example app to go with your product. that way i can figure out what purpose it's right for.

skylan_q · on Nov 24, 2011

Sorry to be a bother, but the title of the section should be "Rationale" and not "Rational"

...just a pet peeve of mine. :p

x86_64 · on Nov 24, 2011

Fixed. Thanks. English is not my first language, i'm sure there is a lot more bugs than this one.

alexchamberlain · on Nov 24, 2011

I think you should fork the README

shin_lao · on Nov 24, 2011

What does it offer compared to memcached, redis, leveldb, etc.?

x86_64 · on Nov 24, 2011

Sorry, but for now - nothing. There is a lot of work to do, especially in caching and indexing. Try to do it as fast as i can.

shin_lao · on Nov 24, 2011

Best of luck with your project then!

marshallp · on Nov 24, 2011

Seems like a half-assed scripting language. Another example of over-engineering gone awry.