
Interplanetary telemetry compression - matagus
http://blog.klaehn.org/2018/06/10/efficient-telemetry-storage-on-ipfs/
======
zrav
Sounds like a use case for zstd with custom dictionary.

~~~
rklaehn
Yes, zstd is definitely on the todo list. The increased performance should be
quite welcome on low power edge devices.

I tried to get it to work, but had some trouble with the existing javascript
bindings. I have only recently started developing in the javascript ecosystem.

PRs welcome...

~~~
seppel
It's not only increased performance. Since you can precompute a dictionary
with zstd, you'll (most likely) get much better compression. Or you can at
least stick to smaller block sizes.

------
danielvf
What IPFS software are you using?

I've seen some bugs and issues with the official go server, and didn't feel it
was production ready.

~~~
rklaehn
0.4.15 currently. There are some things that are not production ready (in
particular IPNS), but the basic infrastructure (the distributed content-
addressed storage) is pretty solid. Discovery also works pretty well. NAT hole
punching is impressive. Resource usage is low enough that you can run it on a
raspberry pi with some room to spare.

For IPNS I am using the DNS txt record workaround until IPNS gets more stable.
See [http://blog.klaehn.org/2018/06/06/publish-blog-on-
ipfs/](http://blog.klaehn.org/2018/06/06/publish-blog-on-ipfs/) for how that
works.

I am publishing the blog and several static pages on ipfs. So far it works
really well.

------
planteen
Did the telemetry arrive in CCSDS format? Did you consider directly logging
all of the CCSDS headers?

~~~
rklaehn
We were storing telemetry that was preprocessed by the MCS. This is preferable
for analysis, since you can not afford to pipe the CCSDS packets through the
rather slow mission control systems every time you want to plot or analyse the
data. This also has the advantage that you store the data exactly as seen in
the control room.

The raw CCSDS packet stream also gets stored, but given that the MCS systems
are rather inflexible, they are not as valuable for general analysis.

------
theamk
Have you considered existing scientific columnar data storage formats, like
HDF5 or Parquet?

Their main advantage is that they have good, mature implementations in a
variety of languages, which would be handy if you ever find Javascript to be
too slow.

~~~
marsokod
I developed an archive system for our spacecraft TM/TC using HDF5 with blosc-
lz compression for speed reasons (and I plan to move to Zstd in the future).
One of the main issue I have seen in the industry is the difficulty to upgrade
the hardware so we have to design something working great on regular hard
drives and with minimal RAM requirements.

While HDF5 is good when you manage the whole system, it is always tricky for
sharing data with other people so we are also using SQLite. We lose the
compression but it is very easy to share and people are more familiar with it.

~~~
rklaehn
For the export at GSOC we had a streaming REST interface based on akka-
streams. It is used to get the data into other systems based on spark for more
complex analysis.

Of course you also lose the compression, but the target system is typically
only interested in a small subset of the data.

------
userbinator
It's amazing just how much of a loaded word "telemetry" has become in the past
few years --- 10 years ago it'd just remind me of space missions and such,
like in the article, but now I associate the word more with pervasive
surveillance and privacy invasion.

------
cryo
I wonder if sqlite was also considered here?

~~~
rklaehn
We are using sqlite for some things. But for large quantities of simply
structured telemetry data it is not very good. About as good as raw json. The
reason it is not very efficient is that the disk layout is optimised for fast
access.

