bbirk's comments

bbirk · on Dec 5, 2024

>Traversing

With the reverse pagination solution here, the order of the items is inverted. For example if you order by some incrementing value, and page size is 5, then page 1 has items 1,2,3,4,5, in that order, the next page, page 2, has items 6,7,8,9,10, and when you press previous page you get items 5,4,3,2,1 instead of the original order. You can reverse them in the client when you use previous pagination, but this is inconvenient. You can implement this in sql by wrapping it in another order by with the correct order. There is a page on select (sql) on wikipedia that covers this kind of pagination for most relational databases[1].

>Designing interfaces

You mentioned that you don't have a concept of link to page 7 when you use cursor pagination, but you could always combine offset and cursor pagination, like: "6 pages (as offset) after item with lastupdate=2024-01-01". The performance downsides of using offset are not that big with small offsets, and you can add some maximum offset in the backend to make sure the performance is good enough. This means you can also add a feature where the user can jump 5 pages ahead instead of having to jump one page at a time like with the github example, without most of the performance overhead of using pure offset pagination.

>So, I think it's reasonable to return cursors that traverse in just one direction

Being able to traverse backwards is genuinely useful in a lot of cases, and as a developer it is your job to figure out a technical solution to the user needs. Again, have a look at the pagination wikipedia page to see how you can implement it in sql.

[1] https://en.wikipedia.org/wiki/Select_(SQL)#Method_with_filte...

bbirk · on Sept 22, 2022

Sometimes when benchmarks are posted on HN that compares your own product (A) with a competitor's product (B), someone working for the competitor refutes the claims made in the blog and states that B were not given the same requirements as A. This particular thread comes to mind [1]. Maybe this is due to the blog writers misunderstanding B or maybe the competitor misunderstands A, or maybe there is a better way to do what was benchmarked using B. I think a better way would be to use some standard benchmark, and publish the code used for benchmarking them.

[1] https://news.ycombinator.com/item?id=31767858

bbirk · on Nov 9, 2021

I would use lateral join. Alternatively you can use window function instead of lateral join, buy from my experience lateral joins are usually faster for this kind of top-N queries in postgres.

select user_table.user_id, tmp.post_id, tmp.text, tmp.timestamp

from user_table

left outer join lateral (

select post_id, text, timestamp

from post

where post.user_id = user_table.user_id

order by timestamp

limit 1

) tmp on true

order by user_id

limit 30;

bbirk · on Sept 12, 2021

Snowflake has been mentioned here, there is another way to achieve this although I've never seen it used in practice. Imagine, in the DBMS the identifier column would be sequential, but whenever the rows are queried, the identifiers are ran through AES. The server would have a single AES key that is used throughout it's lifetime. To the user the identifiers would seemingly be random, but on the server side they are sequential and thus highly compressible. In addition, when querying the rows and putting a condition on the identifier column, AES decryption would have to be used. For example "select user_id, first_name from user where user_id = 6744073709551615;" would decrypt 6744073709551615 into the sequential number, like 103 (if it's the number 103' inserted row), and then use 103 to query the table. There would certainly be a performance overhead when returning large result sets where lots of identifiers need to be encrypted. On the other hand, modern cpus have specific AES instructions which are really fast, and in addition, when the identifiers are compressed to something like 1.5 bits on average, you can fit a lot more in the cpu cache which may offset some or even all of the performance loss.

If you have only a single master DBMS server, then 64-bit integers should be sufficient when using this strategy. Otherwise, if you are using multi-master in a distributed environment, you could achieve lock/communication-free identifier generation by instead using a 128 bit integer and for example include the server identifier inside those 128 bits, similarly to snowflake, before encrypting it and exposing it to the user.

Side note, since the identifiers are sequential there is anther benefit in that, like snowflake, when you order by (created_date, user_id), it wouldn't incur any performance cost over just ordering by user_id.

If anyone knows anyone who uses this or why it would be a bad idea I would love to know.

bbirk · on April 11, 2021

There's no need for zookeeper or any centralised/decentralised service. In the article you link they mention why depending on something like zookeeper is suboptimal. Given that you have less than something like 2048 web server instances, (don't remember how many bits they give to worker_number and the snowflake github repo is basically unavaible) all you need to do is make sure every instance has a rank/worker_number (infrastructure/devops problem) which the instance will use when it generates the snowflake ids. Sidenote snowflake also suffers from the unix epoch 2038 problem, but that can be simply solved by adding bits for epoch number.

bbirk · on Nov 21, 2020

I had the same problem, but after setting tracking protection to strict in ff I am now only nearly-unique according to this test. Also this page [1] shows some more details around fingerprinting, although I don't know if it works the same as the page announced here.

[1] https://amiunique.org/fp

alquemist · on Nov 21, 2020

Ironically, amiunique.org is broken with JS disabled.

akavel · on Nov 21, 2020

Umm, works for me - redirects me to amiunique.org/fpNoJS or something similar

alquemist · on Nov 21, 2020

Testing page works for me now too. Could have been a temp fluke. Though, for example, https://amiunique.org/stats, ends up being an empty page.