SeanSullivan86's comments

SeanSullivan86 · 2026-03-06T07:00:29 1772780429

Hmm, can someone educate me here? Why don't bit flips ever seem to impact the results of calculations in settings like big-data analytics on AWS?

Is it a difference between server hardware managed by knowledgeable people and random hardware thrown together by home PC builders?

huhhuh · 2026-03-06T07:13:17 1772781197

In Belgium elections, a party received 4096 unaccounted votes likely due to a bit flip: https://en.wikipedia.org/wiki/Electronic_voting_in_Belgium#R....

zadikian · 2026-03-06T07:01:15 1772780475

Servers and pro workstations normally have ECC RAM.

matja · 2026-03-06T11:03:13 1772794993

You can only detect what you measure. Are these big-data analytics processes running multiple times to detect differences?

OkGoDoIt · 2026-03-06T07:03:05 1772780585

Presumably professional hardware uses ECC memory, which automatically corrects these kinds of errors.

SeanSullivan86 · 2026-02-07T21:22:43 1770499363

Why is it called a C Compiler if it's a subset of C?

userbinator · 2026-02-08T08:01:01 1770537661

[flagged]

perching_aix · 2026-02-08T11:56:06 1770551766

Why is your visceral reaction is to frame it as a quest for truth versus a great suppression of truth? Everything alright up there?

Literal second sentence in the article, in case it wasn't incredibly obvious to people anyways:

> It supports a subset of C that is large enough to write real and interesting programs.

I'm all for more boring headlines, but this characterization is ridiculous.

userbinator · 2026-02-08T20:42:09 1770583329

I've had enough of headlines that overpromise and underdeliver. It's essentially false advertising. It's not like the word "subset" would put it over the length limit.

SeanSullivan86 · 2025-12-19T23:28:53 1766186933

I'm aware of this to an extent. Do you know of any list of what degree of parallelization to expect out of various components? I know this whole napkin-math thing is mostly futile and the answer should mostly be "go test it", but just curious.

I was interviewing recently and was asked about implementing a web crawler and then were discussing bottlenecks (network fetching the pages, writing the content to disk, CPU usage for stuff like parsing the responses) and parallelism, and I wanted to just say "well, i'd test it to figure out what I was bottlenecked on and then iterate on my solution".

moab · 2025-12-20T02:46:41 1766198801

Napkin math is how you avoid spending several weeks of your life going down ultimately futile rabbit holes. Yes, it's approximations, often very coarse ones, but done right they do work.

Your question about what degree of parallelization is unfortunately too vague to really answer. SSDs offer some internal parallelism. Need more parallelism / IOPS? You can stick a lot more SSDs on your machine. Need many machines worth of SSDs? Disaggregate them, but now you need to think about your network bandwidth, NICs, cross-machine latency, and fault-tolerance.

The best engineers I've seen are usually excellent at napkin math.

SeanSullivan86 · 2025-08-20T17:40:13 1755711613

https://www.gs1.org/services/gdsn/global-data-model

As someone who has worked in e-retail, this catalog seems to have a lot of momentum.

glawrence13 · 2025-08-20T18:02:12 1755712932

Thanks for sharing. Always a fan of standardization of information online.

SeanSullivan86 · 2025-08-19T02:10:33 1755569433

Senior management doesn't like it, in my experience. But I agree it makes sense for many interactions.

b_e_n_t_o_n · 2025-08-19T04:52:40 1755579160

Not everyone will appreciate your authentic self, that's the inherent risk which requires bravery to overcome. It might also be simply not worth the risk, but either way you should act in a way that you can respect. There are things I wouldn't do because I consider them too risky, but I can still respect myself because I'm the one making the judgement call. It's when I act inauthentically because of the fear, not the risk, that I lose respect for myself and I do my best to avoid it.

mmaunder · 2025-08-19T02:29:20 1755570560

The legal fiction that is the corporation is the antithesis of authenticity, and the brand they invest so heavily in, its mask. Having an authentic employee risks removing the mask.

b_e_n_t_o_n · 2025-08-19T04:53:12 1755579192

The rise of corpo-culture is a big factor in the loss of authenticity in society.

ebcode · 2025-08-22T06:37:53 1755844673

Ding ding ding ding!

SeanSullivan86 · 2025-08-18T17:53:48 1755539628

What happens when something is put on the scale while it's sampling? Does the curve depend on properties of the scale, or just properties of the object and the manner in which it was put on the scale?

Scene_Cast2 · 2025-08-18T18:07:40 1755540460

It's the latter. The scale is meant for real-time monitoring of rapidly varying force (the primary application is about monitoring the force derivative and repeatable max force logging). It uses an aluminum load cell if you're familiar with those, there's a tad of a multi-kHz resonance that is typically overshadowed by the object properties.

SeanSullivan86 · 2025-08-12T03:16:30 1754968590

I guess everyone has their own preferences, I just find this opinion surprising given the wealth of other hikes in the country.

As someone who grew up hiking in the White Mountains before moving to Washington, the mountains in Washington (and many places in the West) are just on a whole different level.

etrautmann · 2025-08-12T05:15:26 1754975726

Franconia ridge is stellar, but so is angels landing in Zion or the Half Done cables or the Kaibob trail in the Grand Canyon or many others that are hard to compare and all worth the trip.

SeanSullivan86 · 2025-08-09T17:50:40 1754761840

It's higher than average/median in the US, but certainly not exceptional. Pretty normal for certain groups of people. There's a huge gap in miles driven between urban and rural area. US average is something like 13-15k miles per year (for all driving, not just commute).

20,000 miles solely for commuting would be about 43 miles each way (if you work 235 days per year), which is obviously more unusual than 20k total miles driven from all sources.

SeanSullivan86 · 2025-07-26T18:08:29 1753553309

I've sometimes been confused by the term "inverted index". The example in this post feels like what I would just call an "index"... i.e documents indexed by the words they contain. Feels about the same as the index in the back of a physical book.

Is the distinction that an index on a multi-valued attribute is called an inverted index?

atombender · 2025-07-26T18:22:18 1753554138

Inverted indexes are what databases call indexes. It's used in the IR field to differentiate from forward indexes, which are less common, so you're right that we could just say "index's.

But when we talk about inverted indexes, they are almost always term -> posting list, and most index data structures lay these out so that posting lists are sorted and compressed together. Traditional database indexes like B-trees are optimized for rapid insertion and deletion, while inverted indexes tend to be optimized for batch processing, because you typically deconstruct text into words for a large batch and then lazily integrate this batch into the main index.

Part of this is about scale; a row in a database typically has a single column or maybe 2-3 columns in a composite index; but a document text may tokenize into thousands, hundreds of thousands, or millions of words. At this scale, the fine-grained nature of words mean B-trees aren't as a good a fit.

Another part of it is that inverted indexes aren't for point queries, which is what B-trees are optimized for; you typically search for many words at a time in order to rank your search results by some function like cosine similarity. You rarely want a single posting; you want the union or intersection of many posting sorted by score.

modulovalue · 2025-07-26T18:33:08 1753554788

NIT: That's not quite correct if your first statement is meant to imply an equality rather than a subset relation.

The idea of an index is more general, as an index can be built for many different domains. For example, B-trees can index monoidal data and inverted indexes are just an instance of such a monoid that a B-tree can efficiently index.

Furthermore, metric spaces (e.g., levenshtein distance) can also be efficiently indexed using other trees: metric trees. So calling inverted indexes just indexes would be really confusing since string data is not the only kind of data that a database might want to support having efficient indexes for.

atombender · 2025-07-26T19:12:20 1753557140

My point is that all indexes are "inverted" in the sense that they map some searchable value to occurrences of said value. That is true even if method of comparison is not strict equality.

giovannibonetti · 2025-07-26T21:12:34 1753564354

Most indexes people hear about are like that. However, there are indexes that work the other way around, like Postgres' Block Range Indexes (BRIN). They are mostly useful as skip indexes - for a given block, they have a summary that tells whether some given data may be there.

The trade-off this kind of index makes is that it is more optimized for (batch) writes than the more popular B-Tree indexes, but it is less optimized for reads on the other hand. If the write throughout of a given table is very high, you might want to remove all B-Tree indexes that are not strongly correlated to the insert order and have BRIN indexes instead. Combine it with table partitioning, and you can add B-Tree indexes in the cold partitions, or even migrate them to columnar storage if available (with the Citus extension).

By the way, a few years ago a Bloom BRIN variant was added, not to be confused with Postgres' Bloom indexes which are something else.

atombender · 2025-07-26T21:16:38 1753564598

I wouldn't say BRIN indexes are "the other way around"; index structure is still one where data values are looked up to find the area where occurrences exist.

"Coarse" indexes like BRIN and ClickHouse's data-skipping indexes are still indexes in a broad sense of serving to narrow down a search.

___tom___ · 2025-07-26T19:54:01 1753559641

This drove me up the wall, until I researched it.

A document can be viewed as an object with a set of pointers to the words it contains.

The inverse of that, was a word object, with a list of pointers to the documents it is found it. This was referred to an an inverted DOCUMENT index. This is what people would normally just call an index.

At some point, people dropped the "DOCUMENT" part, and started just calling it an "inverted index". This makes no sense, grammatically, as it's the document that is inverted, not the index, but it is what it is.

So, an inverted index is just an index.

teiferer · 2025-07-26T22:30:42 1753569042

Wow, thanks for the explanation! That was driving me nuts too, as I was waiting for the point where they would invert the thing they built and what that would look like, though that point never came. But now I don't need to put in the time that you did!

In summary, they are not "inverted index" in the sense of "the inversion of what you'd normally think of as an index" but instead in the sense of "a map which provides the inversion of the map from documents to words in them, in other words, an index".

nzeid · 2025-07-26T21:29:11 1753565351

Love this take.

mrkeen · 2025-07-26T18:15:27 1753553727

No it's the same thing. With any book you have built-in mechanism to go to a page number see what words are there. An inverted index lets you do the inverse (words -> page numbers).

SeanSullivan86 · 2025-07-26T18:20:52 1753554052

People (non-tech) don't tend to refer to "go to page 106" as using an index. The pages at the back of the book providing the word->page numbers lookup are commonly known as the book's "index"

grg0 · 2025-07-26T18:26:11 1753554371

"commonly" is an understatement; that's literally what a book index is by definition.

The only thing "inverted" here is the context. The author even admits themselves that the word->doc mapping is an index:

"If user wants to search by words - then words should be keys in our "database" (index)"

It's a pointless debate of semantics. An inverted map is still a map.

teiferer · 2025-07-26T22:36:35 1753569395

It's pointless in the sense that the word "inverse" in the term is pointless, a mild way of saying that it's confusing or even unnecessary to the point of being incorrect.

The discussion about it is not pointless since it clears up confusion. It might not have been for you, but it's clearly for many others, so if you think that's pointless then allowing yourself to appreciate other perspectives could go a long way.

An inverted map is still a map, but if you are typically thinking of the map A->B and then suddenly somebody talks about an inverted map, then it's understandable that people start to assume that this is now about B->A and get confused if it somehow actually isn't really.

valiant55 · 2025-07-27T13:12:21 1753621941

If the documents where themselves stored in a database they have and id and the contents. The clustering key (an index) would be on the id. It's inverted because the contents are deconstructed into tokens with a list of ids that contain that token. Now the contents (tokens) server as the indexed value.