
Ask HN: When are large data structures used in practice? - rtheunissen
Can anyone share some practical examples where it is a requirement to maintain 1M+ elements in a data structure when streams and iterators are not a viable alternative?<p>Using a vector as an example, where O(n) operations are actually very fast in practice when n &lt; X, when might X be big enough in practice to warrant the use of other specialized structures like skip lists and radix-balanced trees?
======
shoo
How about cases where the data structures need to support "interesting" query
operations. E.g. data structures to support spatial queries: "find me all the
neighbouring elements within a radius r of element x". Not super-specialised,
but probably need something like a quadtree or a hash table or other indices
to help partition the search space. There can easily be 1M+ elements if each
element is an address in state / national address database, or a component of
some utility network, or a location of an app user. There are use cases where
you'd want to be able to execute these kinds of queries with low latency, so
jamming everything into ram in an appropriate data structure could be a good
fit.

------
posnet
Any sort of DNA processing. There is a lot of research into concise data
structures for a very specific set of operations for DNA processing since our
ability to generate DNA data is accelerating much master than processing or
memory speed.

~~~
ignoramous
> There is a lot of research into concise data structures for a very specific
> set of operations for DNA processing

See: [https://alexbowe.com/fm-index/](https://alexbowe.com/fm-index/)

Impl: [https://github.com/shibukawa/fm-
index.jsx](https://github.com/shibukawa/fm-index.jsx)

More:
[https://news.ycombinator.com/item?id=22544718](https://news.ycombinator.com/item?id=22544718)

------
wsh
Here’s one example:

In the IPv4 Internet, there are about 830,000 prefixes (network addresses)
announced, and the number is growing [1]. Routers at ISPs and sophisticated
customers maintain full (“default-free”) routing tables with at least one
entry for each prefix.

These tables are used for packet forwarding decisions, so lookups have to be
fast. Traditionally, a radix tree is used, but some routers use other data
structures or specialized hardware.

[1] [https://www.cidr-report.org/as2.0/](https://www.cidr-report.org/as2.0/)

~~~
ignoramous
> _Traditionally, a radix tree is used..._

Here's an excellent write-up on radix-trees usage by the Linux kernel for IP
routing: [https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-
lin...](https://vincent.bernat.ch/en/blog/2017-ipv4-route-lookup-linux)

Also see:
[https://news.ycombinator.com/item?id=22467251](https://news.ycombinator.com/item?id=22467251)

------
ecesena
Not sure if this fits the example you have in mind:
[https://medium.com/pinterest-engineering/an-update-on-
pixie-...](https://medium.com/pinterest-engineering/an-update-on-pixie-
pinterests-recommendation-system-6f273f737e1b)

> The ultimate goal is to fit the entire Pinterest graph in memory. Obviously
> we can’t fit the entire 100 billion edges in RAM, but once we prune down to
> 20 billion edges, we end up with a graph that’s 150GB.

------
arduinomancer
Do you mean specifically 1M elements in memory?

Because something like a relational database could have 1M elements in a
B-tree but the whole data structure doesn't have to be in memory at the same
time.

Or operating system page tables as another example.

~~~
rtheunissen
No just some arbitrary 80%-case "large" number.

