b52_'s comments

b52_ · 2025-08-25T21:40:18 1756158018

What do you mean? Modulo is not a perfect hash function... What if your hash table had size 11 and you hash two keys of 22 and 33?

I also don't understand your first point. We can run n^2 algorithms on massive inputs given its just a polynomial. Are you thinking of 2^n perhaps?

Panzer04 · 2025-08-26T01:36:49 1756172209

n^2 is probably the worst offender of these algorithms - it's fast enough to get into production and slow enough to blow up once you start using it.

Rockstar infamously wasted five minutes loading GTAV because they had an n^2 algorithm in the startup sequence.

svara · 2025-08-28T22:57:03 1756421823

> What do you mean? Modulo is not a perfect hash function

It's a perfect hash function for the case where you work with ints and know the maximal min-max range beforehand; then you can modulo by the size of the range as long as it's not too large. In your example 33-21 --> mod 12

This comes up for me surprisingly often but obviously depends a lot on what you work on. It is often tempting to reach for a hashtable, but it's a lot less efficient in this case.

LPisGood · 2025-08-25T22:40:05 1756161605

n^2 algorithms on _massive_ inputs seems a little far fetched, no?

Around one to one hundred billion things start getting difficult.

vlovich123 · 2025-08-26T04:26:09 1756182369

The challenge with big-O is you don’t know how many elements results in what kind of processing time because you don’t have a baseline of performance on 1 element. So if processing 1 element takes 10 seconds, then 10 elements would take 16 minutes.

In practice, n^2 sees surprising slowdowns way before that, in the 10k-100k range you could be spending minutes of processing time (10ms for an element would only need ~77 elements to take 1 minute).

b52_ · 2025-08-27T11:04:55 1756292695

I'd argue that one billion is fairly large for what most people work on. But yes, point taken.

b52_ · on June 27, 2024

Hey ot,

We worked on improvements to this following our reproducibility study including a synchronized iterative version and alternative estimation functions.

You can find the paper here if you're still interested!

https://jmmackenzie.io/publication/tkde22/

b52_ · on March 15, 2020

Where do they claim to be best in class? I don't seem to see that anywhere.

b52_ · on Aug 4, 2019

Nobody cares if matplotlib is outdated. It served its purpose here, which was to show how plotting data can lead to further understanding data. You don't need a fully featured tool to do so.

goerz · on Aug 4, 2019

As far as "fully featured" goes, there is nothing more fully featured than matplotlib. It just tends to take a lot more code to generate a plot, compared to some higher-level libraries, so for simple visualizations, matplotlib may not be worth the tradeoff of power vs ease of use.

enriquto · on Aug 4, 2019

Nothing really beats "plot sin(x)" in gnuplot.

b52_ · on July 15, 2019

> Each one of these needs to be scored (well, sorta - there are various tricks you can use to avoid scoring some docs, which again I'm not at liberty to discuss)

You mean like Wand or BMW?

b52_ · on Dec 27, 2017

I completely agree with this article, and the parallels I draw from it (and some of the comments here) are extremely close.

But why is it that I read this article, agree with it, yet then continue scrolling HN or FB 2 minutes later? How can I get out of this dopamine cycle?

b52_ · on Sept 20, 2017

I suspect they do indeed use such URLs to track which link the user clicks. It's not that they log the IP with the click, but rather, the click allows implicit relevance feedback with respect to the issued query. This can be used for machine learning (to improve ranking, for example).