> Homebrew's analytics are accessible to Homebrew's current maintainers.
Yeah, I know, my tinfoil hat is quite elaborate.
EDIT: Like I said, my tinfoil hat is quite elaborate; I already have analytics and automated updating turned off. I like to retain some control over what information is reported back to Google or other projects.
That doesn't somehow boost my confidence in Google and volunteer-run open source projects to properly respect the privacy of their users.
I wasn't part of the creation of that particular page, but one thing we (the maintainers) commonly find ourselves doing is publicly referencing install statistics as justification for removing an unused formula or taking extra care during a version bump. Having public statistics for the top 1000 most popular packages makes those considerations a little bit more transparent.
Edit: I've created a PR to fix the language: https://github.com/Homebrew/brew/pull/3120
Why is telemetry collected through Google? I suppose that’s because they have some particularly convenient APIs, however please consider that many user are concerned by the amount of information Google has about their lifes.
(I know, it’s possible to opt out, but good choices should be the default!)
No-one else provides anywhere near the same scale for free. We have over a million monthly active users and every other solution we tried (including FOSS ones) either fell over at that load or we couldn't find anyone willing to provide hosting for us.
We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.
> No-one else provides anywhere near the same scale for free.
That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.
> We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.
Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.
Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.
Yep and that's an acceptable trade-off for the maintainers of Homebrew given that we need analytics to do our (volunteer, free-time) job adequately and we do not have financial resources for other alternatives. If you're willing to provide those financial resources indefinitely: get in touch.
> Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.
That may be technically possible but I see no evidence that it is the case.
> Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.
We disagree. Please consider another macOS package manager. MacPorts is a good alternative.
So what if they make profit -- they provide a fantastic service that actually works well, and it allows homebrew (and many other systems) to continue to provide it's services for free.
Opt-in telemetry clearly stating that Google will record the first three bytes of your IP? I’m OK with that!
Yes, you do. While you use the "Anonymize IP" option, a packet is still sent from the user's IP. Google's business model includes gathering as much data as possible so it's foolish to think that they are throwing data away in this situation. You may disagree and trust Google to honor the "Anonymize IP" option, but trust is not transitive so you shouldn't ever assume users agree (use opt-in in every situation).
However, claiming you don't send pii to Google makes me wonder if you have actually read the documentation for GA? The "Anonymize IP" (aid=1) option is blatant doublespeak. From their own documentation:
> The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses ...
They are only masking out the last 8 bits of the address, which are the least interesting bits. You can still discover the ASN from the remaining data. At worst all that option did is add a 1-in-256 guess when correlating your analytics data to the rest of Google's tracking data. That is trivial to overcome Google's massive databases of tracking data.
You even provide a unique additional per-install tracking number that lets Google track users when they move to a different IP address. Once a correlation exists between your analytics data and everything else at Google, your analytics events provide a reliable report about that can allows other tracking data to be correlated to the new IP address.
Why does that option exist? It's possible that it was designed to mislead developers into sending Google tracking information, but their own documentation suggests a different hypothesis:
> This feature is designed to help site owners comply with their own privacy policies or, in some countries, recommendations from local data protection authorities
This is a feature designed to check boxes on compliance requirements, not to provide any provide actual anonymity to users.
> Yes, you do
PII is a term of the art which the GP is using in its standard sense and you are not. https://en.wikipedia.org/wiki/Personally_identifiable_inform...
(This is independent of the deontolic status of your comment.)
It may well be that some relatively-little-used tool nontheless has a very significant use. Popularity is one factor within the mix, but only one.
We generally only refer to popularity after a formula comes up on our radar due to failing other sniff tests. To give you an idea for some of them:
1. Multiple subsequent releases without anybody bothering to update the formula
2. Historical problems with the package (flaky builds, complex build systems)
3. Historical problems with the upstream (patches ignored, unwillingness to cooperate with packagers, unreliable download servers)
Since homebrew is aimed mostly at technical types and devs, build tools themselves are probably fairly highly featured, but those are easy things to lose in large general-public releases. (Android's abysmally poor shell tools come to mind.)
Sounds like a pretty good basic set. The flaky builds criterion in particular seems like a strong signal of a low-quality upstream.
This does not, in my mind, help the case at all. This culture of deletion makes no real sense to me; something not being used for a month or a year doesn't mean it's not being used at all. What's the real cost of just leaving those formula around? A slight problem with it not working immediately when it is used? At least then it can be fixed, instead of seeing the "No formula found" text.
The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense. Yeah, yeah. Get off my lawn, too.
When unused and unmaintained projects and formulae pile up in Homebrew, we end up spending a tremendous amount of time and effort patching mostly unused software for a very small part of the userbase. That dis-proportionality hurts the 95% of users who expect timely and well-tested updates to major packages.
We used to provide a "boneyard" tap for unused/unmaintained formulae, but even that led to a lot of requests for support that we simply can't provide. If something is being removed from the core tap, our current recommendation is to put it a personal tap.
> The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense.
Keep in mind that the "artifacts" in question are still available, since Homebrew and all Homebrew taps are just Git underneath. You might not be able to build an old formula for compatibility reasons, but all prior work is available for reference.
IOW, HomeBrew seems to be big enough that it needs to start engaging in some PR.
(It's pretty easy to contribute your own formulas, too! The checklist is pretty small.)
That said, I've occasionally run into a broken `brew cask install X`, so I get that maintenance is a thing. But in that situation it seems best to let someone else notice that and contribute a patch rather than remove it entirely. I understand that might eventually lead to a large percentage of broken casks though.
In the best case, the community does pick up the work of patching and maintaining formulae (and casks). However, that's the best case, and low-volume formulae rarely fall into it.
In my experience, formulae tend to get abandoned by the upstream that originally submitted them, causing them to eventually break when the system or surrounding dependencies change. One potential solution is to have a chain of custody for formula maintenance, but that hasn't worked so well for MacPorts.
IMO that's self-inflicted by homebrew's policies. from https://docs.brew.sh/Acceptable-Formulae.html :
> We frown on authors submitting their own work unless it is very popular.
I'd expect "very popular" software to have relatively large populations of users capable of maintaining the formulae (and authors less likely to bother with packaging), while in lesser-known software, the respective authors are most likely to maintain long-time interest.
You don't see that text but instead text indicating the formula has been deleted.
They will be preserved in the Git history for as long as we continue to use Git.
Both the above comments make it seem like you've not really done any research into your own objections, here.
Anyways. Having done my research:
> You don't see that text but instead text indicating the formula has been deleted.
If, and only if, you know the exact name of the package. If you do a search, you get the "no formula found". If you attempt to use the braumeister.org web site, it also will not show up unless you explicitly craft the recipe URL.
> They will be preserved in the Git history for as long as we continue to use Git.
How does one go back into git history and revive deleted formula, given how automated the entire process is around ensuring the most recent version of brew is always in use? By your "deleted" help docs, I infer that it comes down to creating your own cask?
The one example I came across quickly is "abi_compliance_checker'. It was deleted because it requires GCC 4.7. Yet Homebrew is still quite capable of installing and using email@example.com. Not that it was broken in a recent build. Not that it was a major maintenance burden - the updates for years consisted of version updates.
This isn't something I need, but it's a great (and quickly found) example of seemingly arbitrary deletion of an otherwise active project.
brew analytics off
Interesting; I don't see apt, dpkg, nix, or yum maintainers releasing top lists of packages. Even the package providers rarely provide a "top" list, especially one with compile-time flags.
Those flags being provided in this list are very explicit, which could be easily correlated with data provided (or harvested) by other parties.
> a good way to measure the health of the Homebrew ecosystem
What value does this really provide to the general public, other than a, ahem, "member" measuring contest? You already see this occurring in this very thread - the MySQL vs. PostgreSQL comments. I fully expect a "Node vs. X" one as well.
I can't remember whether the default is opt-out or in with the current installer.
It also doesn't sent data to a privacy abusing company like google.
Actually the data is sent to Google.
# disable homebrew from tracking metrics
# see https://github.com/Homebrew/brew/blob/master/docs/Analytics.md
If you don't trust us to respect your privacy why are you trusting us to download and run code from the internet on your machine?
We need analytics to effectively run the Homebrew project (as you've noted: we are volunteers). If you no longer trust our project: go use something else.
I can't on yours (or Google's).
EDIT: I'm sorry, was there something factually wrong with this statement to warrant downvoting it into oblivion?
Similar data could probably be compiled from Google Trends, but this clean and authoritative view is something that can be trusted, and I think the result makes the open-source ecosystem stronger.
I'll happily tell you my toolset, but I don't use home brew and I wouldn't allow it to send my usage to google even if I did
Just because someone on the internet had a particular setup doesn't mean I want to follow it. Or that I have time to track down several people's opinions.
Getting install stats directly from the homebrew project, which I know because I use it, is infinitely more useful to me and much more easily discoverable. that's just my opinion though and you're entitled to your own.
That also gives you more context, because it's answering your actual question, rather than trying to answer your own question with a bunch of vaguely related data.
It also handles the dependency issue. Someone asked why imagemagick is so popular, but its probably actually just a dependency for language-level bindings (e.g. php-imagick), not that people are using `convert` or `identify` directly on the CLI.
Heck, consider the case of front-end developers who have a toolset that depends on nodejs. They may never write any server side code, but if they follow recent trends they probably need nodejs for their css/js "toolchain" - the stats don't tell you that though. They just tell you that nodejs is installed a lot.
This list was added specifically because a bunch of people expressed concerns about this data not being public. You can please some of the people all of the time, etc.