

I wrote a compression algorithm, hire me? - yoav

https://github.com/YoavGivati/Givati-Compression<p>A smart and lossless data compression algorithm with a focus on repeating data structures like JSON/JSONH, network and language permeability, decompression without requiring a pre-agreed dictionary, and minimum size over speed. Compression and speed comparable to Lempel–Ziv–Welch (LZW) implemented in Javascript, PHP/Python coming soon.<p>I'm a software engineer/entrepreneur with design sense. I have 8 years of programming experience and 5+ years of experience building cool interfaces / apps, and consulting for some well known people. Fast learner (I taught myself about data compression, conceived of and wrote this in 3 days). I've been a technical co-founder at a stock related startup and have managed a small team of developers.<p>html5, node.js, mongo, mysql, php, flex, python, jsp/jstl<p>I'm building [inkapp.co] as a part of [chalkhq.com] but need funding, or a steady income for a few years at a loving company that wants me on their team.<p>Anyone hiring in the Toronto-ish, Ontario-ish, Canada-ish area? willing to relocate anywhere for the right gig, 3 and 10 page CV available on request.<p>Dear hacker news, please help me get hired before the end of January.
======
pork
I respect the initiative you're showing, but I'm a little skeptical of your
choice of presenting your own eponymous compression algorithm as part of a
portfolio, especially without any theoretical guarantees or rigorous
experimental data on its performance. It gives the impression that you're
either not aware of the rich history of data compression algorithms, and the
heavy burden of proof on any new algorithm, or you like re-inventing the
wheel. Please take this as constructive feedback.

~~~
yoav
I appreciate it.

The first one is correct, I never thought about data compression at all until
3 days ago where I had a lot of JSON to store. The best Javascript compression
implementation I could find by Googleing around was lzw, which due to the way
it steps through the data and builds the dictionary is inherently inefficient,
never mind that it compresses to an array which can only be stored as far as I
know as text delineated by commas which tends to increase size. So I wrote one
that analyzes and prioritizes the dictionary before applying compression, and
doesn't need a dictionary to be predefined like lzw does, which is necessary
for one of my use cases.

I wasn't going for "I'm a compression expert", I was just going for "Look at
my latest weekend project" and "here's a code sample."

The decision to name it after me was a hard one, but I figured namespacing my
projects would prevent me from using up all the good ones.

I should also emphasize that I have no idea if this particular algorithm
already exists somewhere, I've only looked at a few compression schemes.
Didn't mean to invent, just wanted to solve my particular JSON problem with
client-side compression and get data smaller than I could achieve with
existing libraries.

~~~
pork
As I said, I appreciate the initiative. At the very least, consider posting
benchmarks comparing your scheme to the output of gzip.

~~~
MichaelGagnon
Agreed.

Also, be aware that many non-expert attempts to develop compression algorithms
turn out to be flawed. However, it sounds like you've developed a compression
pre-processor specifically targeting JSON-like structures, which seems like a
much more plausible accomplishment than a non-expert developing a brand new
general-purpose compression algorithm that can beat LZW.

All that said, if I were considering hiring you I would be much more
interested in your ability to rigorously analyze the strengths and limitations
of your algorithm (rather than the algorithm itself). Plus, without that
rigorous analysis it makes it much more difficult for me to judge the quality
of your algorithm.

------
karanbhangui
contact me at karan@notewagon.com. We're hiring in downtown Toronto.

------
chrisacky
Looks great, I'd love to see an eventual port to PHP. Incidentally, I know
this was just a project for the weekend, but why don't you just rely on
gzip'ed encoding/deflate?

~~~
yoav
I wrote an html5/canvas chalkboard the weekend before that lets you replay
your drawings and get a link to send them to other people. I don't want to
link it here because it'll crash my server if a (hacker news) of people start
saving drawings.. anyway it stores the path data as JSON on the server as a
text blob to optimize database queries... otherwise it would have to fetch
thousands of rows just to get a single drawing. I wanted to reduce storage,
but also bandwidth; specifically for uploads which can take a while(a short
drawing produced around 1MB of JSON), I also thought it would be cool to add
the drawing data directly in the url or send it to someone in an email —hence
using a modifiable alphabet for the hashes, or allow unrestricted drawing and
save it to html5 localstorage.

I figured I might as well put the ideas I had into code and on github so
others could help refine it, and if it turned out not to work that well was
planning on attempting to implement gzip in javascript.

Incidentally in my unreliable data specific benchmarks using my algorithm +
gzip resulted in smaller files than just gzip, but I'm not going to make that
claim without more benchmarks. Hard to say if you can exploit http gzip
compression and store the gzipped data on the server, whether the extra time
using my algorithm as a pre-compressor would be worth any byte reduction
achieved for that use case.

Also the current implementation is focused on compressing ascii characters
into fewer ascii characters, it should be possible to implement a version that
compresses bits into smaller bits and achieves better compression for use
cases where you can compress to binary and ascii isn't a requirement.

------
dwwoelfel
You may want to put your contact info in your profile. The email field is only
visible to admins.

~~~
yoav
done, thanks for the heads up.

------
asider
We're hiring in Montreal. Would love to chat: andrew@urbanorca.com

