Hacker News new | past | comments | ask | show | jobs | submit login
Datensparsamkeit (2013) (martinfowler.com)
67 points by _culy on Feb 18, 2018 | hide | past | favorite | 17 comments



I wish more developers/companies practised this discipline. Not just for ethical reasons, but also just because I find it interesting to see clever ways of achieving tasks with minimal data.

One example that I love is Firefox Sync. It's end-to-end-encrypted with your password as encryption key. And there's a "Forgot password?" link on the login page. If you're thinking along at home, you'll notice that clearly this cannot work. If you forget the password to data that's encrypted with your password, then that's it. You can't just reset the password on it. If you could, anyone could. All you can do, is erase the data.

Well, and that's exactly what Firefox Sync does. It just wipes your data off of Mozilla's servers. You typically have this data synchronized to your local client when you do the password reset, so it just encrypts this local data with your new password and uploads that to Mozilla's servers.

There are some corner cases where you might lose data, but I've never heard a complaint about it being less reliable than for example Chrome's implementation. And it's a hundred times more secure and private, in particular also because it means NSA/FBI/CIA can't look through your browsing history (and not having Google look through it, which is why they keep it in cleartext, is also nice).


Excellent introductory article. My only gripe is that the following statement should have been qualified:

> Datensparsamkeit suggests that you shouldn't store the IP address directly, perhaps instead you should hash it and only store the hash.

Hashing an IP address with a well-known function such as SHA1 will be almost useless for keeping the data private from malicious actors, as there are less than 4 billion hashes to precompute. A keyed hash would be better assuming that the key could be well protected.


Well, the article sounds like it was written with IPv6 in mind


Fun fact: Sometime this year, the IPv6 spec will be as old as the IPv4 spec was when the IPv6 spec was published.


Interesting in this context is to point out how it's anchored into German law. Germany's Federal Data Protection Act states:

> Section 3a

> Data reduction and data economy

>

> Personal data are to be collected, processed and used, and processing systems are to be designed in accordance with the aim of collecting, processing and using as little personal data as possible. In particular, personal data are to be aliased or rendered anonymous as far as possible and the effort involved is reasonable in relation to the desired level of protection.

Source: https://www.gesetze-im-internet.de/englisch_bdsg/englisch_bd...


Privacy is only one dimension, of course. While space is cheap, time isn't, so having less data can also mean less data processing (for example, when possible, using online algorithms to compute what you really care about instead of logging all the things and then computing what you really want on demand). However...

"We might not have an immediate use [...] but we'll ask for it anyway in case it comes in useful later [...] so if we come up with some way to use that data later, we can."

That, at least in my experience, is the major motivator. Unless either the ethos changes or stiff penalties instill sufficient fear in the incorrigible opportunism of the the internet tech sector, the fear of missing out will continue to be a major motivator of the lust for data.


TLDR: the principle of not storing data unless absolutely necessary

As opposed to, say, logging each and everything, including credit card transaction details to Splunk/elastic search, or storing every user action "just in case".


and the german word for that is "Datenreichtum" (data-richness)


Which is also ironically used whenever the next "500gb of personal healthcare data" gets leaked.


Because some business organization recently invented it to push the opposite of Datensparsamkeit. Seems like it only took off used ironically, except maybe in clueless politician circles...


Datensparsamkeit

data-spare-some-ness


I like data frugality from Martin Fowler's post best.

According to [1] it is called data economy. in the official translation of the German Federal data protection act (Bundesdatenschutzgesetz, BDSG).

[1]

§ 3a Datenvermeidung und Datensparsamkeit

Section 3a Data reduction and data economy

https://dict.leo.org/forum/viewUnsolvedquery.php?idThread=56...


data frugality seems to express it the best. data economy is a bit unfortunate since it also could mean an economy based on trading data, which is almost the opposite of the goals of Datensparsamkeit.


"spare" is not a German word. closest is "sparen" : to save.

"sparsam" : economic, thrifty, frugal.

data - economic - ness.


spare and sparen have the same root and related meanings. And -sam suffix is related to -some, like in handsome.


Exactly: "spare" is the imperative ("save!"), "sparen" the base form ("to save").


I meant "spare" in English. English "spare" and german "sparen" have same roots and related meaning. To spare versus to save. Just wanted to point out the translation attempt of data-spare-some-ness is not unreasonable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: