Hacker News new | past | comments | ask | show | jobs | submit login
Moving away from HDF5 (rossant.net)
116 points by nippoo on Jan 7, 2016 | hide | past | favorite | 63 comments

Alembic, a data transfer format for 3D graphics, especially high end VFX, also initially started with HDF5 but found it to have low performnance and was generally a bottleneck (especially in multithreaded contexts.)

Luckily the authors of Alembic were smart and in their initial design abstracted out the HDF5 interface and were able to provide an alternative IO layer based on C++ STL streams. The C++ STL streams-based interface greatly outperformed the HDF5 layer.

Details on that transition here:


Reading all these comments that bash HDF5 makes me want to tell how HDF5 has really worked for my group.

Although the spec is huge, there is plenty of sample code online to get it working. You do actually have to read it though to understand slabs, hyperslabs, strides etc. Once you get though, its really versitle.

As far as speed, we used it to replace our propriatary data format. We would have to provide readers to all the scientists that use our data. It was nightmare. Some people want stuff in R, some in Python 2.7, some in Python 3.4, some in Matlab, and the list goes on. HDF5 gets rid of all this.

When in the field and the system shits the bed, its really easy to open an HDF5 file in HDFview and inspect the file contents. I dont always have matlab available when im in the field, same with python. Sometimes I just need to look at a time series and I can diagnose the problems with the system.

For me, its silly in 2016 to have any kind of proprietary binary format when something HDF5 exists.

Many of the complaints the author had makes me think he the stereotypical scientist of really smart in one area but cant program worth the beans. I dont think that HDF5's fault.

The author (my colleague, and probably the most talented developer I know) isn't replacing HDF5 with a 'proprietary binary format': in fact, the transition is as simple as replacing "HDF5 group" with "folder in a filesystem", "HDF5 dataset" with "binary file on the filesystem" (ie you store each array item sequentially on disk, exactly as HDF5 or any other format will store it, which you can memmap trivially with any programming language), and "HDF5 attribute" with "JSON / XML / INI / text file".

"When in the field and the system shits the bed", to quote you... you can just open the dataset in Windows Explorer. Or Mac OS Finder. Or Nautilus. Or using 'cd' and 'ls' in Linux. Want to look at an array? Sure, open it in a hex editor or Python or, heck, FORTRAN88. You can .tar up your folder (or subfolder, or any arbitrary subset of the dataset) and send it to someone with no knowledge of the format whatsoever, and they'll be able to make sense of it in minutes. This isn't anything remotely complex - it's just using the filesystem rather than creating a filesystem-within-a-filesystem. Want to keep track of changes in your massive dataset? Sure, just back it up on Mac OS X Time Machine, or a simple rsync script, or even Git LFS.

Most researchers aren't computer scientists; they know how to use Dropbox and Notepad++ and open text files, and they don't want to have to install a Java-based HDF5View when they could just use Windows Explorer.

It's not even that HDF5 is that bad, it's just that filesystems are, in many respects, so much better.

(If it wasn't clear from the article, we're not just misinformed - we're making this call after having developed an entire software suite around HDF5, spent about two years of firefighting HDF5 issues and wasted days of development time (so many horror stories) - this is actual feedback from several dozen users, thousands of datasets, and petabytes of data.)

Filesystems are the worst! Try telling a customer to tar up a directory and send it to you -- things get lost so easily! Most of our customers dont even know what "tar" means. I think you are asking for trouble going with a filesystem as a storage mechanism.

HDFfView is not the only viewer in town. There are several viewers.

I've used HDF5 weekly for the last 5 years and so have my associates and its been wonderful. MATLAB even uses it to store its .mat files these days.

I think you are misinformed and Im sticking to my story !

It depends who your users are and what needs they have. Our users are mostly MATLAB/Python users, pretty tech-savvy, and being able to edit individual bits of the dataset with other applications or write their own code to analyse them is an often-used feature.

It's rare we ever need the whole dataset - in fact, it's really great to be able to say to the user "don't send us your 100GB dataset: just go to the "acquisition" subfolder and send me the 10MB file called "oscilloscope.dat"". With HDF5 this is difficult enough that it's almost always easier to send 99.9% of useless data (i.e. the whole file) when all you want is a single array within it.

If your users will rarely need to do this, you could just store the entire folder hierarchy in a .zip and access it using standard tools that most programming languages have. It's worth noting that the new Microsoft Office formats do exactly this - in their case, a bunch of XML files inside a .ZIP. (Rename a .docx to .zip and you'll see!).

MATLAB has moved from their own custom binary format to HDF5, which is the lesser of two evils.

Your usecase doesnt seem like HDF5 would be a good fit.

Yes, I do know about docx and zip.

I'm happy with what matlab has done. People send me .mat files and I happily process them in python. And my plots usually look much nicer also. :)

Funny enough, we started out with a file-based system like the one you described, and moved to an HDF5-based system.

Are you not concerned about i-node consumption with the file-system based approach?

It depends on whether you have many small files or a small number of large files. In the former case, one file approach (HDF5) makes sense. In the latter case, you don't have a problem with i-node consumption and you gain the ability to easily access only the data you need without bringing the entire data-set into memory.

I kind of the like the approach suggested above - if you have many small files store them in a zip archive and use some library to access the data directly.

I've done more than my fair share of fucking with FITS and ROOT files, HDF5, SQLite, proprietary struct-based things, etc...

It's easy to get a file format working on one system. It's Herculean getting it working on all systems bug free. It's nearly impossible to get something to work portably and performant across many systems.

As for simplicity, people start wanting metadata and headers and this and that, and before you know it you need HDF5 or ROOT again and it's no longer simple. Maybe if you're lucky you can stick with something that looks like FITS. If it's tabular, SQLite still can't be beat. Maybe Parquet would work fine too.

I'd vehemently oppose anyone in the projects I work on from trying to standardize on a new in-house format. I'd maybe be okay if they were just building on top of MessagePack or Cap'n Proto/thrift etc... but nearly every disadvantage the OP references about HDF5 will undoubtedly be in anything they cook up themselves. For example, a "simpler format" that works well on distributed architectures, well... now you're going to go back to the single implementation problem.

It all depends what your aims are. We have a well-defined set of data we need to keep, including intermediate processing steps. We don't need headers, structured arrays, or any weird esoteric object types. (The author is my colleague.)

We can get by just fine with: - N-dimensional arrays stored on disk - Key-value metadata associated with those arrays - A hierarchical data structure.

We've been very happy so far replacing HDF5 groups with folders (on the filesystem), HDF5 datasets with flat binary files stored on disk (just as HDF5/pretty much any other format stores them - each value takes up 1 or 2 or 4 bytes, and your filesize is just n_bytes_per_value * n_values), and attributes by JSON/XML/INI files. If I sent you one of our datasets, zipped up, you'd be able to make sense of it in a matter of minutes, even with no prior knowledge of how it was organised.

It is very tricky to build something that works reliably across all systems, but, thankfully, filesystem designers have done that job for us. And filesystems are now at a point where they're very good at storing arbitrary blobs of data (which wasn't the case when HDF was founded). Filesystem manipulation tools (Windows Explorer / Finder / cd/cat/ls/mkdir/mv/cp/[...]) are also very good and user-friendly.

There isn't really anything we miss about HDF5 at all. Perhaps if your project has spectacularly complex data storage requirements (as to your examples: metadata/headers are easily stored in JSON), but there's no other project I know of that actually relies on an HDF5-only feature and couldn't trivially use the filesystem instead.

I've done one format which was append-only (so a pretty easy problem to solve) on one homogeneous system with metadata that had above-average performance (compared to the commercial and open-source alternatives available at the time) but it sounds you have me well-beaten. These are the war-stories that I love to hear. What was your problem domain, what were the recurring implementation problems, where were the bugs primarily?

I liked the post (well, as an HDF5 user, I found it depressing...).

My main qualm with it was the claim about 100x worse performance than just using numpy.memmap(). To the author's credit, he posted his benchmarking code so we could try it ourselves. (Much appreciated.) But as it turned out, there were problems with his benchmark. A fair comparison shows a mixed picture -- hdf5 is faster in some cases, and numpy.memmap is faster in other cases. (You can read my back-and-forth about the benchmarking code in the blog's comments.)

One minor complaint about presentation: Once the benchmarking claims were shown to be bogus, the author should have removed that section from the post, or added an inline "EDIT:" comment. Instead, he merely revised the text to remove any specific numbers, and he didn't add any inline text indicating that the post had been edited.

I think the rest of the post (without performance complaints) is strong enough to stand on its own. After all, performance isn't everything. In fact, I'd say it's a minor consideration compared to the other points.

When it comes to performance, I think the main issue is this: When you have to "roll your own" solution, you become intimately aware of the performance trade-offs you're making. HDF5 is so configurable and yet so opaque that it's tough to understand why you're not seeing the performance you expect.

And one last point. In the blog comments, I wrote this, which I think sums up my view of the performance discussion:

...it's worth noting that many of the tricky things about tuning hdf5 performance are not unique to HDF5. For storing ND data, there will always be decisions to make about when to load data into RAM vs. accessing it on demand, whether or not to store the data in "chunks", what the size of those chunks should be (based on your anticipated access patterns), whether/how to compress the data, etc. These are generally hard problems; we can't blame HDF5 for all of them.

You can also see h5py maintainer Andrew Collette's response here: https://gist.github.com/rossant/7b4704e8caeb8f173084#gistcom...

One reason to use binary formats like HDF5 is to avoid precision loss when storing floating-point values. I started using HDF5 once exactly for this reason and it was overkill. HDFView requires Java installed and HDF library with single implementation and complex API is a problem too.

For simple uses I now use '%a' printf specifier. It is specifically designed to avoid losing a single bit of informaiton. And you can easily read floats stored this way in numpy by using genfromtxt with optional converters= argument and builtin float.fromhex function.

Man, if I had a dollar for every time data roundtripping to files has corrupted things I'd have many dollars.

The schema-based serialization libraries (Thrift, Protobufs) or MsgPack are a good way to avoid that too. They come with a lot less baggage than say HDF5. Also, if efficiency isn't paramount -- just use SQLite! Amazing tool when it's in its sweetspot.

Lots of tradeoffs when dealing w/ serialization and file formats, no easy answers.

It's also really useful if you have a lot of numerical data streaming in that you want to store and use at a later date. CERN results I'm sure use something similar to HDF5, nearly all of HFT algo trading uses HDF5 for securities they are going to explore down the road but don't want to waste KDB+ licenses on, Google File System's chunking scheme seems to be somewhat similar to it as well. _"Third, most files are mutated by appending new data rather than overwriting existing data. ... Once written, the files are only read, and often only sequentially."_ [1] _That_ is the use case for HDF5. The problem is this guy tried to slam a circular peg into a square hole. I'm in no way an apologist for HDF5 but his complaints are terribly vague. "Limited support for parallel access" Then you go read the source[2] and and see GIL complaints abound. And again, this was meant for an append-only situation where you shouldn't even have to acquire a lock in the first place since there's no contention possibility! "Impossibility to explore datasets with standard Unix/Windows tools" right, but there are plenty of Java tools that perform quite well, even with a cold JVM. "Opacity of the development and slow reactivity of the development team." AFAIK it's an open-source project, this complaint is valid if you're paying a vendor fees for a product and have a support plan with an SLA, and it's not valid in the least otherwise. "High risks of data corruption" I've never once seen this happen when HDF5 was properly used, though I'd love to see a pdb dump of the state his program when that occurred. Open offer - I'll fix that bug if it's a fault with the C lib you're FFI'ing with.

edit: oh, the Java tooling was already mentioned.

[1] http://static.googleusercontent.com/media/research.google.co... [2] https://github.com/h5py/h5py/blob/master/h5py/_locks.pxi


Huh! Good to know. ftp://root.cern.ch/root/doc/11InputOutput.pdf[1] The spec for anyone interested. For comparison:https://www.hdfgroup.org/projects/hdf5_aip/aip15.gif to page 6 on CERN's PDF.

ROOT has been making progress with regards to one of the points addressed in the post. Regarding needing a special program to view an HDF5 file, root has developed a JavaScript viewer which can display data and stored plots and graphs. It's pretty cool, HDF5 could do something similar if it doesn't already exist. https://root.cern.ch/js/

You can still use binary formats without HDF5, just write the memory buffer of floats to disk directly skipping text format. Any save/load system that uses printf/scanf will be brutually slow (at least 10x slower than just writing/reading the memory buffers), as well as space inefficient.

The problem with binary storage is endianess. If you want portable format, you either have to use text, specify endianess in the format specification or store endianess in the file. For highest speed you need to convert data to your endianess before first use, update information about endianess inside the file and then work with memory-mapped file.

Endianness generally is fast to convert, just pick a standard for the file format and detect on the platform. You can convert endianness with just shifts in C if you need to or with assembly instructions that I believe are one tick on most platforms. It is much much faster to convert endianness (again at least 10x faster) than to parse text.

It looks like you can just use these functions to convert between endianness conditionally based on the platform if you are in C/C++:


And boost has one here: http://www.boost.org/doc/libs/1_58_0/libs/endian/doc/index.h...

It is right that converting endianness is faster than reading text, but it still requires more code than calling printf and scanf. I am not into big data really and my program spends most of the time computing instead of reading and writing so 10x speedup in I/O is just not worth it. As I said, HDF5 was an overkill in my case.

The OP, and any other file format spec, needs to be able to read/write large numeric datasets. Printf is very poorly suited to this case.

Your comment is true for your use case, but it's not really responsive to the technical issue here.

  for (i=start; i<end; i++) arr[i] = ntohl(arr[i]);
That's not more code than calling printf/scanf.

What big endian platforms do you have to support? Got an SGI Indigo somewhere in your lab?

Exactly. All the major processors are little endian now. Just standardize on little endian. If you are working on some esoteric platform that is big endian you will know it

IEEE 754 floats don't have different endianess. And as the other comment says, swapping endianess is fast (if ever needed, since most platforms are little endian or hybrid).

> IEEE 754 floats don't have different endianess.

The standard does not specify endianness [1]. From the standard point of view float is a sequence of bits. But if you memory-map the file and store float on little-endian machine, you get bytes reversed compared to how it is written in the standard. On big-endian machine sign is stored in the first byte, and on little-endian machine sign is stored in the last byte.

[1] https://en.wikipedia.org/wiki/Endianness#Floating-point

Damn, you're right... therefore I have no idea how a simple float quantization algorithm I did could even work.

I deal with binary data all the time and I just use fwrite in C and pack in Perl, and a text readme in vim explaining the (simple) format. I don't use Matlab but often need exchange the data with Matlab user; I simply search help page and send the one or two lines of matlab code to them along with the data. I'll do the same thing if my fellow colleague switch to numpy tomorrow (it is surprising that they would read HDF5 tutorial but won't read my readme).

Research data are almost always specific. The idea of a general format serving all research just sounds silly to me.

Can't think of a common programming language which doesn't support saving bit-accurate copies of floats in a contiguous buffer. Even in JavaScript you can put your doubles in a Float64Array and convert that to or from a buffer.

If you want headers, you can define those as text and just use extents to embed the binary data. Tar is another alternative, pretty easy to implement.

Hex-formatted floats have the further advantage of being extremely fast to print and parse (compared with decimals): I've seen 3x speed ups for io operations. They really should be more widely used

FYI, HDFView is not the only viewer in town.

its also really easy to write a viewer in python using matplotlib.

> You can't use standard Unix/Windows tools like awk, wc, grep, Windows Explorer, text editors, and so on, because the structure of HDF5 files is hidden in a binary blob that only the standard libhdf5 understands.

HDF provides command-line tools like h5dump and h5diff, so you can dump HDF5 file to text and pipe it into standard and non-standard unix tools [1].

[1] https://www.hdfgroup.org/products/hdf5_tools/index.html#h5di...

The submission talks about terabytes of data. Copying/transforming is not viable in such situations.

With h5dump you can specify which datasets you want to dump. I am sure nobody is going to use awk, grep and wc to process terabytes of data. As for sanity checks, like checking that probabilities stored in a dataset sum up to 1.0 and things like that, dumping one dataset and processing it with awk should be ok.

Actually, piping data between standard Unix tools can be extremely efficient. This example is working on gigabytes rather than terabytes of data, but memory utilization is basically limited to just buffers, so you could definitely scale that to terabytes without a problem.


The problem -- and I've been burned on both sides of this -- is that you need either a container file, or you need users to understand that a directory is essentially a file. Not only does this complicate users lives when they want to move what they, quite reasonably, view as a single file between different computers or back it up, but they can and will remove individual pieces. Or copy it around and have some of the pieces not show up and be very confused that copy/move operations -- particularly to/from a NAS -- are now nothing like atomic.

Another thing that will happen is this: if you just use a directory full of files as a single logical file, you will end up writing code that does the equivalent of 'rm -rf ${somedir}' because when users choose to overwrite "files" (really, a directory), you need to clear out any previous data so experiment runs don't get mixed. You can easily see where this can go bad; you will have to take extraordinary care.

This is a double-edged sword. For our (reasonably savvy) users, being able to duplicate and easily modify individual datasets/files is a feature, not a bug: people can symlink the contents of an entire folder but modify a single array and easily run their analysis on this slightly different dataset, for example.

While it's true that you lose atomicity with this, it can both burn you and help you: you can track specific parts of your dataset in revision control, email subsets of it back and forth, combine datasets easily, or even store it across several servers and manage it with symlinks, for example.

Our users are aware of this and it isn't really a problem for our use-case. But if you're worried, there's always the option of having your whole dataset as a ZIP/TAR file (like all of Microsoft Office file formats are - XML files within a .ZIP); tools for modifying folders within ZIP are much more well-established than HDFView, and most modern programming languages provide libraries to read and modify files within archives without unzipping them; you could make your program agnostic to the files being within an archive (high portability, lower performance) or directly within the FS (loss of atomicity, easier/faster to use).

Depending on the setup, moves / exporting / snapshots can be helped by fancy-shmancy file systems that allow snapshots (well LVM does too but I hear it is slow). But generally means BTRFS or ZFS. The pattern is to create a read-only snapshot of directory. Then feed that snapshot to tar (maybe apply some mild compression on it depending on your tradeoffs). Then delete snapshot.

Even better directly use BTRFS (or ZFS) send / recv to send to move to another system with that file system.

Can also pipe tar directly to something like nc. Remember benchmarked moving a large number of files on a LAN and tar + nc came out as the winner over say rsync or scp. It saturated 1Gbps connection pretty close to its maximum expected capacity.

You'd obviously not expect user to do that by hand and would write tools to do it.

In my previous job we were evaluating HDF5 for implementing a data-store a couple of years ago. We had some strict requirements about data corruption (e.g., if the program crashes amid a write operation in another thread, the old data must be left intact and readable), as well as multithreaded access. HDF5 supports parallelism from distinct programs, but its multithreaded story was (is?) very weak.

I ended up designing a transactional file format (a kind of log-structured file system with data versioning and garbage collection, all stored in a single file) from scratch to match our requirements. Works flawlessly with terabytes of data.

Might one take a peek at that ? In other words was it open sourced or plans to that effect exists ?

No and no. Strictly proprietary technology which gives a real competitive advantage. Fun thing is, if you choose your data structures wisely, it's not even that hard to write; it ended up being under 2k lines of C++ code.

GC was offline though; it was performed at the time the container was "upgraded" from RO to RW access. I don't think it'd be difficult to make it online, but there was no need for that.

Am I the only one misreading this as HDFS?

Nope, I was very confused for a second. Silly homoglyphs (https://en.wikipedia.org/wiki/Homoglyph)!

ditto. :)

"we have a particular use-case where we have a large contiguous array with, say, 100,000 lines and 1000 columns"

This is where they lost me. This is NOT a lot of data. Should we be surprised that memory-mapping works well here?

Below about 100-200 GB you can do everything in memory. You simply don't need fancy file-systems. These systems are for actual big data sets where you have several terabytes to several petabytes.

Don't try to use a chainsaw to cut a piece of paper and then complain that scissors work better. Of course they do...

Unfortunately our users can't afford fancy computers with hundreds of GB of RAM. They often need to process entire datasets on laptops with 16GB of RAM but 1TB+ GB of disk space. Of course with 200 GB of RAM with have no problem at all...

Also, as I said, the 100,000 x 1000 example is a quite optimistic one, we do have cases now with 100,000,000 x 10,000 arrays, and this is only going to increase in the months to come with the new generation of devices.

This is a very interesting article, thanks for sharing. I attempted several times to understand the HDF5 C API and create a custom format for storing connectivity data for neuroscience models, but each time I found the API exceedingly complex and bizarre. I am quite impressed that the author managed to write a substantial piece of software based around HDF and relieved to read the sections on the excessive complexity and fragility of the format.

* High risks of data corruption - HDF is not a simple flat file. Its a complex file format with a lot of in memory structures. A crash may result in corruption but there is no high risk of corruption. More over, if your app crashed, what good is the data? How can you make sense of the partial file? if you just need a flat file which can be parsed and data recovered, then you didnt need HDF in the first place. So wrong motivation to pick HDF. On the other hand, one could do a new file for every checkpoint / iteration / step, which is what most people do. If app crashed, you just use the last checkpoint.

Bugs and crashes in the HDF5 library and in the wrappers - sure, every sw has bugs. But in over 15 years of using HDF, I have not seen a bug that stopped me from doing what I want. And the HDF team is very responsive in fixing / suggesting work arounds.

Poor performance in some situations - yes & no. A well built library with a well designed application should approach posix performance. But HDF is not a simple file format, so expect some overhead.

Limited support for parallel access - Parallel HDF is one of the most, if not the top most, popular library for parallel IO. Parallel HDF also uses MPI. If your app is not MPI, you cant use Parallel HDF. If the "parallel access" refers to threading, HDF has a thread safe feature that you need to enable when building the code. If "parallel access" refers to access from multiple processes, then HDF is not the right file format to use. you could do it for read-only purposes but not write. again, not the right motivation to pick HDF

Impossibility to explore datasets with standard Unix/Windows tools - again, HDF is not a flat file, so how can one expect standard tools to read it? its like saying I would like to use standard tools to explore a custom binary file format I came up with. wrong expectations.

Hard dependence on a single implementation of the library - afaik there is only one implementation of the spec. the author seems to know this before deciding on HDF. Why is this an issue if its already known?

High complexity of the specification and the implementation -

Opacity of the development and slow reactivity of the development team - slow reactivity to what? HDF source is available so one can go fix / modify whatever they want.

seems the author picked HDF with wrong assumptions.

HDF serves a huge community that has specific requirements, one of which is preserving precision, portability, parallel access, being able to read/write datasets, query the existing file for information of the data in the file, multi dimensional datasets, large amount of data to fit in a single file, etc.


A common pattern (that my scientific software used) was this: an initial file is created from the raw data pulled from the sequencer. After sequencing, you could run all sorts of analyses. Sometimes the analyses themselves, and sometimes intermediate results, where very slow to compute and hence cached in the file.

I think it's reasonable to be very upset if you have a container file and adding new named chunks to the file has the possibility of causing the old data to become unreadable. It's fair in a crash before the file was saved that new chunks might be bad, but old chunks should be fine.

Good example. definitely a problem. but that limitation exists now, so the programer would to work around it. hopefully journalling support will appear soon.

Agreed, HDF could benefit from journaling. I have to ask though, why not just make a cached file of your data and onces its done, integrate into the final HDF file?

This is true, but anything can happen while futzing with a file.

Our method is to make a copy of the file with a .tmp extension, make your mods, then rename the file and delete the old one.

HDF5 was never intended be be a container used to stream data into -- it was meant for sharing data.

Good post!

I may not agree with Cyrille, but what about alternatives for storing binary data that might be structured and play well with newer tools like Spark? ASN.1 and Google Protocol Buffers both specify a binary file format and generate language-specific encoding and decoding. Is there a set of lightweight binary data tools we're missing?

How widely supported are the alternatives in the wider ecosystem? It is trivial to read and write HDF5 files in Python, Matlab, Mathematica, etc.

That's a good question. Both ASN.1 and Google's offering have more limited language coverage (ASN.1 is ancient, but venerable, now in the hands of NCBI), but maybe we should expand that list. These are tools that serialize buffers with razor-sharp binary specifications. I, too, use HDF5 for all of its features, but maybe someone who is rolling their own, for instance, under Spark, should have a solid binary specification.

In the old days, researchers measure the charge of electron with oil drop, and figure out what is gravity with pen and paper. I guess nowadays researchers have to spend million dollar on a electron microscope to look at anything and have to depend on HDF5 to deal with any data.

We used HDF5 and NETCDF at NASA and it was a constant struggle. I remember when someone dropped the specification on my desk and said, "should be a good read. Enjoy!" Glad you found a more suitable alternative.

Leaving a few links here. From the discussion that has taken place it seems these two would be of interest.

http://cscads.rice.edu/workshops/summer-2012/slides/datavis/... Extreme IO scaling with HDF5

http://algoholic.eu/sec2j-journalling-for-hdf5/ HDF5 with a journal

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact