

STXXL: Standard Template Library for Extra Large Data Sets - wslh
http://stxxl.sourceforge.net/

======
wslh
This is a new discovery on my side. Based on a SO answer.

Not just about the usefulness of this library but the theoretical and
practical aspects of the research. This tutorial clarifies the goals in the
first pages: <http://algo2.iti.kit.edu/dementiev/files/stxxl_tutorial.pdf>

~~~
ExpiredLink
"The objectives of STXXL project (distinguishing it from other libraries):

• Make the library able to handle problems of real world size (up to dozens of
terabytes).

• Offer transparent support of parallel disks. This feature although announced
has not been implemented in any library.

• Implement parallel disk algorithms. LEDA-SM and TPIE libraries offer only
implementations of single disk EM algorithms.

• Use computer resources more efficiently. STXXL allows transparent
overlapping of I/O and computation in many algorithms and data structures.

• Care about constant factors in I/O volume. A unique library feature
“pipelining” can half the number of I/Os performed by an algorithm.

• Care about the internal work, improve the in-memory algorithms. Having many
disks can hide the latency and increase the I/O bandwidth, s.t. internal work
becomes a bottleneck.

• Care about operating system overheads. Use unbuffered disk access to avoid
superfluous copying of data.

• Shorten development times providing well known interface for EM algorithms
and data structures. We provide STL-compatible2 interfaces for our
implementations."

------
alexeiz
<http://algo2.iti.kit.edu/stxxl/trunk/FAQ.html>

"STXXL container types like stxxl::vector can be parameterized only with a
value type that is a POD"

Unfortunately this is a significant constraint that limits the usefulness of
this library.

~~~
wslh
But I think if you are using, for example, Python you can serialize an object
and store it as an string.

------
vilya
This looks really interesting. Thanks for posting it!

