Hacker News new | more | comments | ask | show | jobs | submit login
Fedora, UUIDs, and user tracking (lwn.net)
132 points by tlburke 39 days ago | hide | past | web | favorite | 83 comments



Good on them for taking the time to think about how to do this properly. I use Fedora, I'm happy to them know that I use Fedora, and even to make the check-in somewhat regular so that they can know if I stop using Fedora.

Provided that they figure out a way that absolutely nothing can be done with the information other than to say, "This non-identifiable machine reports that it uses Fedora," I'd be okay with that.


It's a slippery slope, first you want to know how many use your software, then you want to know what features they use, then you want to know other apps installed, then you want to know what web sites they visit and what they search for, and so forth.


I can understand the slippery slope threat in a proprietary system, but Fedora is a open system. Should data collection at some point go beyond whatever it is the devs announce, someone is going to pick it up and everyone is going to drop the distro like a hot potato. So I'm not sure this is a big issue.


The lack of serious consequences to Canonical regarding their spyware-like, opt-out search integrations in Ubuntu with Amazon way back when (and then doubling down with legal threats against fixubuntu.com) gives me doubt as to this being the case.

That goes double given Fedora's position as what boils down to RHEL upstream. Too much corporate support for any backlash to make a dent, even if it could get critical mass.


IMHO, there were serious consequences. 5 years ago I would not have consider using fedora on my desktop because ubuntu was trustworthy and I was considering fedora as useless fragmentation. Now, I see redhat as a knight on my side to improve linux adoption (I hope that IBM will not ruin this) and I see fedora as stable and up to date than ubuntu.


Can I ask what was / is it about Debian that made you discount it, and embrace RH / Fedora instead?


Not the OP, but I made a similar transition. I got fed up with using APT to manage packages while Yum/DNF seem much more complete and elegant. I seem to get far fewer package conflicts with Yum, and the error messages when something does go wrong are more digestible, though this may be just due to my use case/package selection. I also dislike the use of the dash shell by default. To me, it just further muddies the water between compatibility of sh, bash, and dash. I'd rather just have bash and be done with it.

Fedora's packages are also more up to date while being as or more stable than Ubuntu. Debian is still probably king of stability, but when compared to CentOS, I prefer CentOS' default package selection and configuration (postfix vs exim, sudo installed by default, ssh installed and enabled by default) , especially since they embrace systemd while Debian seems to use it grudgingly while also keeping around old methods of configuration that don't quite fit with systemd (network configuration being the big one here, don't even get me started on Ubuntu's adoption of friggin' Netplan)


Thanks for the reply. The past 9 months I've had to dive deep into rhel again, and it's been a disappointing experience, but that may be due to a) more than two decades comfort with Debian, and b) some number of third party addins to rhel 7 systems, but conflicts have been much more painful in the rhel side. I've never noticed or been bitten by the dash / bash arrangement. package selecting in both cases is defined by cfm (salt or ansible) so I've never felt pain in that front either. Systemd seems to be full commitment by Debian, including network config, but perhaps the migration scripts weren't working for your system(s).


> especially since they embrace systemd

I will never forgive Red Hat for SystemD, and even more than that, I will never forgive Debian for adopting it. But that's off-topic.


To each their own, of course. I happen to like it a lot, but I'm still surprised Debian chose to use it as well.


Your question pre-supposes that the GP evaluated Debian and decided against it. While this may be true, he/she never said that.

In my case tho, I found Debian to be moving at a glacial pace. Fedora OTOH is always fresh and current. There's nothing wrong with Debian, and sure some people don't mind running Sid. Fedora is my favorite flavor of ice cream tho.


Wasn't my intent.

Parent omitted reference to Debian, while bemoaning Ubuntu quality degradation - I would expect most people disappointed with Debian derivatives, but familiar with the tooling, would consider moving further upstream rather than outright abandonment.


> I found Debian to be moving at a glacial pace.

I view this as a feature, not a bug!


For servers I agree! For desktop tho, it drove me nuts. I still use Debian as the base of most of my servers cause the stability is amazing.


I like it for my desktops as well.


As one other commentator has pointed out, there actually has been serious consequence to Canonical, as least as far as Ubuntu as a desktop end-user distro is concerned. Many other community distributions have risen in popularity specifically around this issue of corporate involvement.

But more importantly, I don't think there is anything wrong in principle with Ubuntu collecting user data if they openly communicate it to their users. I do not agree with this Stallman-esque usage of the term spyware, if the exchange of data is voluntary and the consequence of informed decision.

If customers disagree with Ubuntu on data collection they can switch to a different distribution, but as long as the information is out there, this is not an issue, spyware or a slippery slope.


> I don't think there is anything wrong in principle with Ubuntu collecting user data if they openly communicate it to their users.

As long as it's opt-in, I agree.

> I do not agree with this Stallman-esque usage of the term spyware, if the exchange of data is voluntary

Being truly voluntary is key. If that's the case, then I agree with this. However, if data is being collected about me or the hardware/software that I'm using without my affirmative consent, that completely qualifies as "spying".


Didn't know about the fixubuntu story, thanks for the share


Telemetry has a nice creeping scope like that, although not necessarily a problem with fedora, this scope often expands to "how can we monetize this info".


It's interesting to see how this discussion steers towards privacy. Some distributions, like Ubuntu for example, are far less conservative. Besides the NTP tracking mentioned in the article, there's the Amazon fiasco of past, and Ubuntu 18.04 installs quite bit of telemetry [1], including tracking packages. It ships with a "dynamic" MOTD that runs a script periodically which downloads updates from Canonical. While this may be useful for server administrators who wish to be notified of products and updates, it has at one point shown ads for an HBO show [2].

Annoyingly, while installing the Xubuntu flavor, there appeared to be no option to opt out nor was there even mention of any such telemetry in the live installer interface. I had to track down and disable manually post installation - something the average user is not going to bother with and what Canonical is surely betting on. I appreciate how Poettering brings up trust and "red flags", knowing full well the lower the transparency, the larger the reactionary incentive for users to opt out or disable such telemetry. Canonical could perhaps take note.

[1] https://askubuntu.com/questions/1027532/how-to-opt-out-of-sy... [2] https://bugs.launchpad.net/ubuntu/+source/base-files/+bug/17...


This post describes a bad way to track users, but the real utility of this post is in the email that describes a way to count Fedora users without tracking them:

https://lwn.net/ml/fedora-devel/20190108152239.GA24118@garde...


That actually sounds more reasonable, although it does run tiny risk of being trivial to mess with if a malicious client wanted to skew numbers. But I don't think it's possible to defend against that without being horribly invasive, privacy perspective.

I must say, it feels odd to support a Poettering proposal, but this actually does look like a good solution.


> although it does run tiny risk of being trivial to mess with if a malicious client wanted to skew numbers

Is that not also the case with the UUID solution? Generating the UUIDs in virtual machines, or just replacing the UUIDs in the requests, doesn't seem out of the question


> although it does run tiny risk of being trivial to mess with if a malicious client wanted to skew numbers.

I don't doubt it will happen if this becomes well-known. Activism takes many forms.


My first thought when I saw they just wanted to do counting was could you not doing something like send a

> This_is_a_first_install

request the first time, then

> get_updates

in future, an so I'm glad to see the proposed solution is something vaguely similar.


Yep. Qutote from Fedora Wiki:

Options for "true" values

Rather than a simple boolean, we'd like the "countme" variable to act as an increment-counter. That is, it would be "1" the first week, "2" the second week, "3" the third week, and so on. This will let us sort out short-lived test or CI infrastructure machines and get a better picture of how systems are used over time, without tracking individual systems. Optionally, we could have a cap on the maximum value to mitigate risk of uniqueness for systems which have been running for a very long time (it may be that there are only a few systems running for exactly 327 weeks, for example). As the supported lifetime of a Fedora release is about 30 months, a logical cutoff would be around 60 weeks — the counter could go from "59" to "old".


"Essentially, users would need to trust that the project isn't doing the tracking because it says it isn't."

The cynic in me is recalling that red hat just got bought by IBM and IBM is in the news for tracking people in a weather app in a sneaky way

I don't know any better though, maybe fedora is quite independent of red hat/IBM and its 100% legit to trust their promises. I'm not sure how it works tbh

Edit: added quote from article


(disclaimer: I work at Red Hat, not on any OS/distro)

This is a nitpick, but Red Hat did not get bought by IBM. What happened was IBM announcing the intention to buy Red Hat.

It's maybe a subtle, but possibly important distinction. Red Hat is still its own independent entity until the deal goes through (which means IIRC passing the board's approval, SEC and likely other stuff). This is expected to happen in late 2019 I believe, but it might still fall through.

This doesn't absolutely dispel any possibility of IBM's influence, but it should be very low/zero until the merger actually goes through. But I also don't know how all this works.


Thanks for clarifying for me, I had that mixed up


No worries! Pretty much everyone I talked to who was aware of the deal (online and off) thought the same.

These things are pretty complex.


I don't mind Fedora wanting to get counts of things, provided it is exclusively an opt-in feature. Debian's 'popcon' is an example of doing it right.


Why not just track downloads from the mirrors? If you post a new version of package for fedora 29, just track how many downloads of that specific file are made. Write some scripts for log processing and require official mirrors to submit the logs to give you the package counts.

That way user info never makes it past the mirror (which has their IP anyways) and you don't need anything complex like UUIDs, playing tricks with NTP, or calling home.

This would give a reasonably accurate number. Use bash for measuring linux installs (pretty rare to have linux installed without bash). Then more desktop apps like firefox, eog, and xpdf to measure desktop use. If interested in server side track mongodb, apache, mysql, and similar.

This would also help fedora decide which applications they should pay more attention to.


But this would be either over-counting if some CI scripts download the version every once a while, or under-counting if some organization put the image on their own privately maintained mirror which is quite common.


Telemetry of any type usually fails to measure precisely the thing you want, but something adjacent that correlates strongly. What you mention are clear problems with inferring usage from downloads, but if you can infer the percentage of downloads that correlates to a machine running Fedora, you don't need much more precision.


Perhaps packages are reinstalled too frequently to give accurate numbers? Though I agree this sounds like a good solution.


Any distro that phones home with a unique identifier is a distro I won't touch with a ten foot pole. I don't care what they claim they will or won't use that identifier for.


Maxims that act on the symptom rather than the problem rarely help in the end, as the problem just evolves to support its needs through other means.

For example, sending a unique identifier is not the problem. Tracking people through a unique identifier is. So, depending on your goals, you can design a unique identifier system that does not allow tracking (or at least makes the tracking period so small as to be unuseful for purposes other than designed) as outlined in the article through changing the identifier on the client side weekly.

If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version), a UUID that rotates weeks on the client side is perfectly acceptable to use for those statistics to a fair degree of accuracy.

On the other end of the spectrum, people long ago started reducing their trackable footprint online, and the online tracking ecosystem just evolved to finding people through other, trickier methods, such as browser fingerprinting.


You're right in general, of course. But here's the reason for my hardline stance on that: history shows that trusting promises or assertions made about things like unique identifiers is unwise, and so I have to take a strong defensive stance.

> you can design a unique identifier system that does not allow tracking

You can (sortof), but we run against that trust issue again. If I'm giving a unique identifier to someone, I have no way of knowing if their assertions about its use are accurate. Even if they are, there's no guarantee that won't change in the future.

> If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version)

You're talking about the perspective of the publisher. I'm talking about my perspective as a user. A company's "need" to collect metrics is their problem, not mine. If their solution results in more information disclosure than I'm comfortable with (and a unique identifier absolutely is), then I will avoid their software or block communications to their home base.


> A company's "need" to collect metrics is their problem, not mine.

When it's couched in how to deliver software updated, it becomes your problem as well. That's a transaction, and they want to charge more for it now. You can decide it's too costly, as you indicate here, but it's not like they're giving nothing in return.

I think it's important to note the goals of those involved. In this case, it's the people that put together a free product for us to use and also supply free timely software updates looking for more information on who is using what so they can do a better job at delivering that free stuff to us.

And in this case, it's not adding tracking where it doesn't exist, it's making it better for the specific cases that are useful to them and that impact users the least (an accounting of software configurations). They already track through IP address, but that's inaccurate to a much larger degree for the information they want (but somewhat less so for the personal information you likely want to protect). Adding an additional system that allows better tracking of the useful information without increasing the personally identifying features of IP based tracking (which still exists) is laudable, in my eyes.


> When it's couched in how to deliver software updated, it becomes your problem as well.

I honestly don't see how. If/when I'm ready to take an update, I can come get it myself. If they want to charge me (or charge me more) for it, then they can do so at that time. No tracking needed except for that associated with payment.

> Adding an additional system that allows better tracking of the useful information without increasing the personally identifying features of IP based tracking (which still exists) is laudable, in my eyes.

Not as laudable as not engaging in tracking in the first place. However, I don't see how this doesn't increase personally identifying features. On the contrary, it's adding one: a unique identifier.


> If they want to charge me (or charge me more) for it, then they can do so at that time. No tracking needed except for that associated with payment.

That's what's proposed? An identifier sent along with the request to see the current list of updates available?

> I don't see how this doesn't increase personally identifying features. On the contrary, it's adding one: a unique identifier.

An identifier that changes every week or so. At that point it is useless for identifying an individual, but can still be used statistically to determine how many systems are running what versions of Fedora, even behind NAT gateways. The only difference from before is now instead of "there's one IP with more than average check-ins, or check-ins from two or more different configurations", it's "there's one IP with X number of unique identifiers that randomize weekly seen over the last 28 days, so we can approximate X/4 different systems behind that IP".


> The only difference from before is [...]

Yes, I understand, but your explanation isn't reassuring to me. It's confirming that I actually do understand the mechanism and its ramifications.

Red Hat can do whatever it likes (although my take on it is that they're not likely to do this unique identifier thing). I'm not saying otherwise -- that's their right, after all.

All I am saying is that software that does this sort of thing is unacceptable to me and I will avoid it to the best of my ability. As is my right.


You said free so many times that I had to share some news I learned earlier today: the 'free as in freedom' podcast is releasing new episodes again after 2-3 years hiatus!


> A company's "need" to collect metrics is their problem, not mine.

And your need to run an OS on your computer is your problem, not theirs. What do you do if everyone on the sell side of the market uses telemetry? Just stop using computers?


> What do you do if everyone on the sell side of the market uses telemetry? Just stop using computers?

Well, that's not going to happen. I doubt Slackware would go down that road, for example.

But lets say that what you assert happens -- all that means is that I won't use distros. It doesn't mean that I won't use computers.

It's entirely possible to install Linux without using a distro or prebuilt binaries at all. It's also possible to keep using an older version of the operating system.

But, being essentially lazy, what I'd most likely do is an extension of what I do with with most applications these days: firewall off the servers that the OS is trying to communicate with.


That's the beauty of foss, you can just remove the offensive bits.


Yeah... its kind of amazing


There are reasons to draw a line in the sand, to say that even attempting to do some things is contrary to a strong norm that we will defend even if you promise that you're not using it for anything malicious, something which is hard to police.

Taking a strong stand against tracking and, therefore, in favor of privacy is perfectly reasonable for people who use Linux in part due to our hatred of the deep tracking closed-source OSes do.


The problem with drawing lines in the sand is that you trip up all the players that make an effort to act responsibly as well, thus reducing the incentive to act responsibly.

You're basically reducing market effectiveness by ignoring the details of available information and grouping unalike things together. The market will likely respond by reducing access to or the clarity of that information *e.g. they'll track you, but hide it even if it's innocuous and the vast majority would have no problem in what info is given up because apparently the people can't be bothered to make a decision on anything but the coarsest of details).


If you are opposed to tracking, then a company being aboveboard about it doesn't resolve the issue anyway.


You speak of "tracking" as if it's all the same thing. Every sale you make at a store is tracked, and for good reason to both the customer and the store (how else do you allow returns). Every time you visit a doctor, they add the info regarding your visit to a log. That's tracking. Tracking itself is not bad.

Tracking individuals and personal information about them while they are trying to remain anonymous or have no expectation anything peraonal has been revealed is bad.

Attacking anything with the word tracking in it because it's been conflate with this even though it shares little or no resemblance and can't be used later for this purpose it it's current form is just FUD and an indicator or how broken human communication fundamentally is.


> Every time you visit a doctor, they add the info regarding your visit to a log.

JohnFen already said most of what I'd say about these examples, but I want to add one big thing:

The tracking the medical world does is controlled by law. Laws people take very, very seriously. It therefore can't be mixed with other data through being resold or in any other fashion to help form a more accurate picture of me.

That data re-use is part of why I want strong norms against data collection.


> You speak of "tracking" as if it's all the same thing.

True, and that's bad of me. I'm speaking in shorthand.

> Every sale you make at a store is tracked

But the store does not track me if I don't use a card. Returns are handled through the receipt that they give me during the transaction. That's a kind of tracking, but tracking the transaction itself, not me.

> Every time you visit a doctor, they add the info regarding your visit to a log. That's tracking. Tracking itself is not bad.

Indeed, and here's where I'll try to introduce the shades of gray I left out. I consent to the doctor tracking me to that extent (but I would object strongly if the doctor started keeping track of my whereabouts or what I was doing). The doctor even gives me a consent form affirming that. If I'm not OK with the tracking, I don't see that doctor. Software is no different in this sense.

I oppose tracking that I don't give affirmative consent for. In the case of Red Hat's purpose, I will not give such consent, as the cost/benefit ratio is not sufficiently weighted to the "benefit" side.

> is just FUD and an indicator or how broken human communication fundamentally is.

It's not FUD, as I'm not claiming that Red Hat is intending to do anything nefarious. And I don't see this as a human communication problem.

Speaking personally, this is a reaction to the trend in software and online to engage in massive amounts of user tracking and data collection, both disclosed and undisclosed, that has resulted in real harm (both intentional and unintentional).

Once bitten, twice shy and all of that. This is a problem that comes from real misbehavior of software companies, not from poor communications.


> But the store does not track me if I don't use a card. Returns are handled through the receipt that they give me during the transaction. That's a kind of tracking, but tracking the transaction itself, not me.

That's exactly analogous to what's happening here. The data being tracked isn't you, it's generic information about how many Fedora installs of what type there are. The countermeasures in place mean it cannot not, nor ever, be used to track you if implemented in the way I've outlined.

> The doctor even gives me a consent form affirming that. If I'm not OK with the tracking, I don't see that doctor. Software is no different in this sense.

Even if you don't affirm any documentation, you are still tracked by the docter himself. If you visit the same doctor, even without a log of prior visits, he or she might remember you. This is implicit in all communication. That's why the discussion is not really track vs not track, but what data is tracked and how. Every time you request updated from any network based update system, you can bet your connection is tracked in some manner.

> I oppose tracking that I don't give affirmative consent for. In the case of Red Hat's purpose, I will not give such consent, as the cost/benefit ratio is not sufficiently weighted to the "benefit" side.

Why I'm so confused by your stance is that your reasoning for disliking "tracking" does not seem to follow (in my eyes) from the evidence you've presented for that reasoning.

I feel it's akin to looking at the ills that automobiles have brought about with pollution, and taking a stance against vehicles. When someone comes by to show you a bicycle, you say no-thanks, you've taken a hard line against all vehicles because of pollution. When they show you how it doesn't pollute, even can't pollute in that manner, you say that it's your right, which it is, and you've drawn a line you won't cross, which you have, but I can't help but think you've drawn that line in a rather odd spot.

You can obviously do what you want, but I'm not sure I can be blamed for trying to figure out how this reasoning works, because it makes no sense to me.

> It's not FUD, as I'm not claiming that Red Hat is intending to do anything nefarious.

You're equating tracking, as being discussed here, with identity tracking, which is not really on the table as an option at all.

> Speaking personally, this is a reaction to the trend in software and online to engage in massive amounts of user tracking and data collection

And I would classify it as an overreaction to that problem. Sure, the problem is bad, but does that mean we should attack real solutions which do not exhibit that problem just because it shares some easily identifiable similarities, such as a name?

What we have is an open source operating system offered for free with open source utilities that are used to check for remote updated for that operating system, also entirely free, with the ability to see who is asking for updates. That's what we already have, by nature of using IP transport.

All they are proposing is to get a finer grained view (but still not perfect) of how many systems there are and what version they are. None of that is personal to an individual, and the discussion is how to go about it in a way that it is not, and can not, be used later for those purposes. If that's not okay, you might as well just shut off your internet connection, because there's startlingly little you can do online that doesn't reveal massively more information about you than that at every interaction. Just loading a web page generally gives the host your IP address, browser of choice, a list of installed extensions, what the dimensions of your browser window are, what the dimensions of your desktop is, what the 3D capabilities of your video card are, what fonts you have available to use, and more.

Unless you are browsing HN through lynx, telnet, or some system that mails webpages to you after you submit the URL (a-la Stallman), I can't reconcile your hard line in one instance and apparent blasé attitude in the other.


> That's exactly analogous to what's happening here. The data being tracked isn't you

If we're talking about using a unique identifier, then I disagree. This isn't analogous to getting a store receipt at all. With a store receipt, there is nothing that connects me to the transaction described in the receipt except that I am in physical possession of the receipt.

> If you visit the same doctor, even without a log of prior visits, he or she might remember you.

Indeed, but that's in no way similar to what we're talking about.

> I feel it's akin to looking at the ills that automobiles have brought about with pollution, and taking a stance against vehicles. When someone comes by to show you a bicycle, you say no-thanks

I think this analogy also misses the mark. If tracking is like a car, then the UUID tracking we're talking about is like a compact car. Not at all like a bicycle (Poettering's suggestion, which I'm OK with, is more like that).

> You're equating tracking, as being discussed here, with identity tracking, which is not really on the table as an option at all.

I view this as effectively identity tracking. Much like the "advertising IDs" that Android uses.

> And I would classify it as an overreaction to that problem.

Perhaps it is, but if so, it's because as a user it's impossible to determine which tracking is OK and which isn't, therefore it's wise to avoid it all.

> but does that mean we should attack real solutions which do not exhibit that problem just because it shares some easily identifiable similarities, such as a name?

Of course not, but I'm not sure that this is an example of that. Also, it's important that a company prove (I'm not sure how that would be done, admittedly) that their representations of the tracking system are accurate, and that future business decisions couldn't change that.

> That's what we already have, by nature of using IP transport.

It's not, really. For instance, I run about a dozen Linux machines at home. Each of those machines does not go to the distro's repository for updates -- I have an update server that caches them and the other machines get their updates from that. So, if you're looking at the repository's logs, it looks like only one machine is getting updates. And, if I wanted to be even safer, my update server could get the updates using a VPN and thus completely disconnecting my IP address from the IP address the repository is seeing.

Besides, as I said before, just because there's one data leak doesn't mean it's OK to introduce another one.

> All they are proposing is to get a finer grained view (but still not perfect) of how many systems there are and what version they are.

Yes, I understand.

> I can't reconcile your hard line in one instance and apparent blasé attitude in the other.

That might be because you're assuming I have a blasé attitude in an area where I don't.


I would think if it rotated on the first of each month, that would probably be sufficient... then you could get your counts for any given month (excluding first/last day) assuming most system check every week or two at least, and it would be pretty consistent.


Rotating the identifier means you lose the information about attrition rate.

If you have some number of users leaving, but a similar number incoming, then it would look like you have a consistent usage. Losing the info about lost users means you don't improve in retention.


Could you regain this info by adding a static prefix to the rolled id? So you know it was rolled, but not from which previous id. Where as new id's would have no prefix, so you can count new users as new.


Could just be the date of install in UTC as a prefix, the other part randomized on the first of the month... they could still calculate relative drop off, and still get better stats more anonymously.


Later on in the article they describe a revised solution that doesn't do that:

> Poettering came up with a scheme that alleviated most of the problems that were identified. He proposed that a "countme" flag simply be added to a single mirror-list query each week. The sum of all such queries over a week's time should provide an accurate estimate of the number of Fedora systems. That way, UUIDs need not be stored, which removes much of the concern—data that is not stored cannot be misused.


Yes, that's much better.


>unique user ID (UUID) for each installed system that would be sent with DNF mirror-list requests. It explicitly calls out privacy concerns: "We don't want to track; just count."

If Fedora server is compromised they can serve different packages to different users.


Given that package servers serve packages over HTTP, you can already do this, identifying the user you want to serve different packages by their IP.

However, the packages need to be signed by Fedora for the package manager to accept them, so this has been considered a pretty weak excuse for an "attack" for a while now. "Getting access to code-signing keys allows you to attack the people consuming signed binaries"—wow, you don't say!


With control over the mirror list you can prevent certain users from getting updates which is a security problem but without being able to sign packages the danger is limited.


Looking at the wiki page [0], I can see the benefits of the move:

> Better metrics overall

> Public stats page updated automatically

> Better knowledge of relative use of different variants

> Insight into Fedora's use in short-lived test systems and temporary containers vs. longer-term installations

but nothing evaluating how and whether the proposed solutions will achieve those things.

With no method being perfect, I'm suprised that no one is calling for a quantitative evaluation of various ID collection schemes, and that there is defined "good enough" value, other than

> We need better data than that.

I'm not a Fedora maintainer, and I'm not maintaining any other software of such popularity, so I have to ask: why? I assume it's to allocate work better. At which point do the downsides outweigh that benefit?

[0] https://fedoraproject.org/wiki/Changes/DNF_Better_Counting


If it's totally anonymous there's nothing stopping someone trolling the statistics.


Disclaimer: I work for Red Hat on Fedora

True but we're already in that boat with the way that we gather statistics from mirror hits. I have a hard time seeing how a method like the one proposed would be any more vulnerable to tampering.

EDIT: spelling


The idea is that it isn't less vulnerable to tampering, but you pay a privacy and public image cost.


This change proposal can be tracked here: https://fedoraproject.org/wiki/Changes/DNF_Better_Counting

In short:

Add a new "countme" variable. This variable will: - Start as a "true" value, - Reset to a "false" value the first time the client successfully makes a request to Fedora mirror servers, and - Be reset to a "true" value after seven days.

This way, rather than filtering by unique IP addresses, we can count only the "true" requests, so we count each machine once — but no more than once.


I'm not sure what they want to count. It definitely isn't users, as they ignore multiple users per system. It seems to be something like "currently active and online machines". But then you should not ignore machines that will not be updated. Maybe they mean "machines that follow the weekly update schedule this week"?

That seems to be what Poeterring's approach counts.


Disclaimer: I work for Red Hat on Fedora. Take that for what you will

As far as I know, the desire is to get better numbers on how much the parts of Fedora are being used. There is always more work to do than there are folks to do all of it; having better numbers on how much different bits are being used helps us make better decisions on what to focus on.

Granted, I'm not Matt but I've heard him talk about similar things and have run into the issue myself - "Is anyone even using this? Is it worth putting this level of effort into this particular thing?"

EDIT: Phrasing of the last sentence


But Fedora should remain wary of an over-reliance on telemetry. It's very, very easy to draw the wrong conclusions about things, leading to decisions that reduce the quality of the product.

As an example, there are very likely to be packages that aren't often needed, but are absolutely critical when they are.


Just count the number of bug reports. That seems like a more useful metric anyway. If the users aren't complaining who cares.

(about 75% serious)


It's not a consistent metric. You'll get both spikes around new releases and changes that reflect the automated reporting/ease of reporting changes.


Also, since Fedora is primarily an integration project, many users report bugs upstream.


Here you go https://retrace.fedoraproject.org/faf/summary/

The same problem arises though as you can't track senders - there's no way of knowing how many reports were produced by a single machine.


what funny is i just started using fedora (and i have actually been really enjoying it).. but to help me remember its not apt or rpm or even yum i have been thinking to myself Do Not Follow - for no reason at all other than i first learned about it after installing a new machine and configuring firefox etc. :)


[dead]


What project isn't interested in knowing how large their installed userbase is?


> Lennart Poettering ... did suggest using an application-specific machine ID, like those calculated by sd_id128_get_machine_app_specific().

Yes, I'm sure he did.


Please don't use "UUID" for that, it's taken (and useful).


Well, they could just as easily use a "real" UUID [1] variant, and all the concerns of this topic would still remain the same.

[1]: https://en.wikipedia.org/wiki/Universally_unique_identifier


Wonderful. I guess I'll now have to find a way to regenerate this UUID or to spoof it every time Fedora tries to phone home.

If you want to count users, ask for permission during firstboot. If that's too much to ask, then I'll be in the market for a new OS. Maybe I'll finally go back to my first love: FreeBSD.


Read the whole article. They seem to have decided against that and for a simple ‘countme’ flag on update requests to mirrors. Possibly by only a random subset of machines.

No tracking, just simple numeric data for for purpose.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: