Hacker News new | past | comments | ask | show | jobs | submit login
Fast data processing engine written in C (github.com/x86-64)
49 points by x86_64 on Nov 24, 2011 | hide | past | favorite | 24 comments



Looks interesting, but I am struggling to figure out what it actually does. Some concrete example output in the tutorials would help.


Same here - I'd suggest moving the design justification to a separate page and having the project home page start with a less-than-10-line usage example of something useful you can do with it.


The thing I found interesting is its flexibility. A list of planned/implemented modules might be helpful. Once you understand that Frozen just chains together small programs (modules) that process streams of data you can start to think of all sorts of interesting applications.

Of course that sounds a lot like Unix. What can Frozen do that a series of chained programs can't? A self-documenting config file may be better than a bash script, for one.


I like the idea. Unix pipes are very limited though.

Something like this but using, say, ZeroMQ could be really interesting. Mongrel2 could be the HTTP server in such a system, etc.


Implemented modules list is in documentation on page "Modules". Planned in TODO file.


There is one nice example for now - split file on many using set of regular expressions. But i get it, will write more examples including web application, caching and so on.


Interesting. This reminds me of a product I used to work on. It is a node design. Nodes can be plugged together to normalise and aggregate trade data.

I'd be interested in contributing to your project, starting with working on the docs. If interested, please contact me. I'm at gmail, username cratuki.


Feel free to fork it on github and make contributions. To avoid double work, check out TODO file - top items have first priority and would be done very soon.


To setup a web server we need to install many different components: database, server, may be some cache, balancer, fastcgi handlers and so on. And every part must be connected somehow. But any connection introduce significant latency and limit performance of overall system. So, if there is only one physical machine, why bother with it? Lets join all components to one process. Simple call within process is best avaliable to world communication channel, it is fast and simple.

So, what happens when you need to scale it to more than one machine?


Depends on what you scaling. Actually there is no borders in such approach. If you app overloaded with template parsing you could move it to another machine, if problem in "database" thing same solutions came as for NoSQL. There is no limit what to scale and how. You could have everything in one process exept something that need scaling and this would be more efficient than dividing things on databases, cgi and balancers.


What makes you think that IPC is the dominating cost?? It's usually network or disk bandwidth, not IPC. And once you scale past one machine, you can't avoid network chatter by definition.


All this is not only about ipc. Removing some ipc will give a little speed-up in latency - nice bonus, nothing more.


After reading through the source code, I'm no closer to figuring out how this is different to plugging functions, classes or external libraries together.

Can someone describe what makes this library different to a standard utility library?


In case of using it as library - if you write some application you use different api to make things work, for example call api to write info into file. But, if you face with scaling or other problems in most cases you need to rewrite code of your application, for example set up buffer before writing to file. If you use frozen you can define format in which you dump info to it and stop touching code and binaries of your application at all. Quick changes in configuration file to add buffer, or write into memory, or flush info by network and everything is working.


is there a project you have used this for?


Plan to use in web project with huge data sets (domains, whois, dns, traceroutes). Now i collect data with crawler and writing this.


hmm, given that this is a kind of toolkit... would love to see an real-world example app to go with your product. that way i can figure out what purpose it's right for.


Sorry to be a bother, but the title of the section should be "Rationale" and not "Rational"

...just a pet peeve of mine. :p


Fixed. Thanks. English is not my first language, i'm sure there is a lot more bugs than this one.


I think you should fork the README


What does it offer compared to memcached, redis, leveldb, etc.?


Sorry, but for now - nothing. There is a lot of work to do, especially in caching and indexing. Try to do it as fast as i can.


Best of luck with your project then!


Seems like a half-assed scripting language. Another example of over-engineering gone awry.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: