Hacker Newsnew | comments | show | ask | jobs | submit | ot's comments login

I also liked a lot this ICML paper [1], which is more theoretically principled than the two you reference (and weirdly they do not cite it).

[1] http://www.eecs.tufts.edu/~dsculley/papers/round-model-icml....


This research was awarded the prestigious IgNobel prize in 2006:

http://www.improbable.com/ig/winners/#ig2006

-----


Obligatory XKCD: https://xkcd.com/221/

-----


Even shorter: proofs without words http://mathoverflow.net/questions/8846/proofs-without-words

-----


Somewhat related: http://mathoverflow.net/questions/7330/which-math-paper-maxi...

-----


We should send those into space, so aliens can use them without having to learn our language first.

-----


> Every user and organization on GitHub.com with Git LFS enabled will begin with 1 GB of free file storage and a monthly bandwidth quota of 1 GB.

Does this mean that with the free tier I can upload a 1GB file which can be downloaded at most once a month? Even a small 10MB file, which fits comfortably in a git repo, could be downloaded only 100 times a month. Maybe they meant 1TB bandwidth?

-----


I would suspect the point is rather that you have a bunch of megabyte range files, and you rarely update them and don't have to sync. But for most workflows this feature seems targeted at, the free tier seems insufficient.

-----


I'm having trouble seeing where a 1GB/month quota in any way meshes with "large file" support. The free tier is basically "test out the API, don't even think about using it for real".

-----


Yes, that is the free tier. If you want to use it seriously, it will cost some money. OR you can use it and host your own file server, for free.

I don't think these facts are a problem. They create an open source tool, provide a location to try it out, and a service to pay to use it if you like it and don't want to host yourself. Seems like a fair offer.

-----


GitLab.com offers 5GB per repo support for git-annex (unlimited repos).

-----


Interesting, can you mention them?

-----


If you use a fast hash function you still have to look up a hash table, so the memory access is still there. If you use some clever probing strategy you might be able to have only 1 cache line access per lookup on average, while most perfect hashing functions, which use the MWHC scheme, do 3 random accesses.

However, these 3 random accesses are independent, so the CPU can mostly pipeline them, and the perceived latency is that of a single random access.

-----


cmph is not the fastest library for MPHF, mostly because it uses a slow ranking function to turn a PHF into a MPHF.

I wrote a small library some time ago that performs much faster, both for in-memory construction and lookups, [1, see the linked paper for benchmarks], unfortunately I have no time to maintain it but I recently found out that some projects are using it, so it wasn't all wasted time :)

[1] https://github.com/ot/emphf

</shameless plug>

-----


Excellent. I'll take it :)

-----


Previous discussion: https://news.ycombinator.com/item?id=8888485

-----


Previous discussion: https://news.ycombinator.com/item?id=7947782

-----

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: