Hacker News new | past | comments | ask | show | jobs | submit login

Homebrew's use of analytics still bothers me; specifically, making data public like this. It was supposed to only be used for development efforts, not showing off top-1000 lists. Also, I guess this following statement is - taking the charitable option - out of date.

> Homebrew's analytics are accessible to Homebrew's current maintainers.

https://github.com/Homebrew/brew/blob/master/docs/Analytics....

Yeah, I know, my tinfoil hat is quite elaborate.

EDIT: Like I said, my tinfoil hat is quite elaborate; I already have analytics and automated updating turned off. I like to retain some control over what information is reported back to Google or other projects.

That doesn't somehow boost my confidence in Google and volunteer-run open source projects to properly respect the privacy of their users.




Homebrew maintainer here. That language could probably be more precise - only current maintainers have access to detailed analytics (the details being specified on that same page).

I wasn't part of the creation of that particular page, but one thing we (the maintainers) commonly find ourselves doing is publicly referencing install statistics as justification for removing an unused formula or taking extra care during a version bump. Having public statistics for the top 1000 most popular packages makes those considerations a little bit more transparent.

Edit: I've created a PR to fix the language: https://github.com/Homebrew/brew/pull/3120


Mildly unrelated question.

Why is telemetry collected through Google? I suppose that’s because they have some particularly convenient APIs, however please consider that many user are concerned by the amount of information Google has about their lifes.

(I know, it’s possible to opt out, but good choices should be the default!)


> Why is telemetry collected through Google?

No-one else provides anywhere near the same scale for free. We have over a million monthly active users and every other solution we tried (including FOSS ones) either fell over at that load or we couldn't find anyone willing to provide hosting for us.

We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.


First of all, let me say I appreciate your response and your work on Homebrew, a great piece of software, and quite a blessing for Mac users.

> No-one else provides anywhere near the same scale for free.

That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

> We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.

Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.

Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.


> That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

Yep and that's an acceptable trade-off for the maintainers of Homebrew given that we need analytics to do our (volunteer, free-time) job adequately and we do not have financial resources for other alternatives. If you're willing to provide those financial resources indefinitely: get in touch.

> Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.

That may be technically possible but I see no evidence that it is the case.

> Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.

We disagree. Please consider another macOS package manager. MacPorts is a good alternative.


I also use Google Analytics for tracking usage in my desktop applications (opt out) and it's great. I think most just think about web stats (page views) but I just use the event system. This allows me to identify unused features or possible confusing UX. I also started using it in a project at work and the data has been extremely useful.


> That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

So what if they make profit -- they provide a fantastic service that actually works well, and it allows homebrew (and many other systems) to continue to provide it's services for free.


If you are fine with a company profiting out of your personal information and habits, I am not, and I think is a reasonable choice to assume that people are not fine with that.


Then those people won't use Homebrew.


Sure! But the telemetry is a default setting, and the implications for the user’s privacy are not clear at all.

Opt-in telemetry clearly stating that Google will record the first three bytes of your IP? I’m OK with that!


> We don't send any personally identifiable information to Google.

Yes, you do. While you use[1] the "Anonymize IP" option, a packet is still sent from the user's IP. Google's business model includes gathering as much data as possible so it's foolish to think that they are throwing data away in this situation. You may disagree and trust Google to honor the "Anonymize IP" option, but trust is not transitive so you shouldn't ever assume users agree (use opt-in in every situation).

However, claiming you don't send pii to Google makes me wonder if you have actually read the documentation for GA? The "Anonymize IP" (aid=1) option is blatant doublespeak. From their own documentation[2]:

> The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses ...

They are only masking out the last 8 bits of the address, which are the least interesting bits. You can still discover the ASN from the remaining data. At worst all that option did is add a 1-in-256 guess when correlating your analytics data to the rest of Google's tracking data. That is trivial to overcome Google's massive databases of tracking data.

You even provide a unique additional per-install tracking number that lets Google track users when they move to a different IP address. Once a correlation exists between your analytics data and everything else at Google, your analytics events provide a reliable report about that can allows other tracking data to be correlated to the new IP address.

Why does that option exist? It's possible that it was designed to mislead developers into sending Google tracking information, but their own documentation[2] suggests a different hypothesis:

> This feature is designed to help site owners comply with their own privacy policies or, in some countries, recommendations from local data protection authorities

This is a feature designed to check boxes on compliance requirements, not to provide any provide actual anonymity to users.

[1] https://github.com/Homebrew/brew/blob/fd4fe3b80cab9902437016...

[2] https://support.google.com/analytics/answer/2763052?hl=en


>> We don't send any personally identifiable information to Google.

> Yes, you do

PII is a term of the art which the GP is using in its standard sense and you are not. https://en.wikipedia.org/wiki/Personally_identifiable_inform...

(This is independent of the deontolic status of your comment.)


Redefining a phrase to omit much of its common meaning is what got us here. I appreciate you helping to bridge the gap by translating, I am just bemoaning its nessessity.


Beware the Tyranny of the Minimum Viable User.

It may well be that some relatively-little-used tool nontheless has a very significant use. Popularity is one factor within the mix, but only one.

https://redd.it/69wk8y


You're right, which is why popularity isn't the sole factor in determining whether to remove a formula from the core repository. It's just one of the things we reference to justify our choice, when relevant.

We generally only refer to popularity after a formula comes up on our radar due to failing other sniff tests. To give you an idea for some of them:

1. Multiple subsequent releases without anybody bothering to update the formula

2. Historical problems with the package (flaky builds, complex build systems)

3. Historical problems with the upstream (patches ignored, unwillingness to cooperate with packagers, unreliable download servers)


Criticality in workflow might also be a consideration. I'd have to check how homebrew does this, but Debian frequently has separate documentation and debug packages as supplements to core functionality. Those might be essential in cases.

Since homebrew is aimed mostly at technical types and devs, build tools themselves are probably fairly highly featured, but those are easy things to lose in large general-public releases. (Android's abysmally poor shell tools come to mind.)

Sounds like a pretty good basic set. The flaky builds criterion in particular seems like a strong signal of a low-quality upstream.


> as justification for removing an unused formula

This does not, in my mind, help the case at all. This culture of deletion makes no real sense to me; something not being used for a month or a year doesn't mean it's not being used at all. What's the real cost of just leaving those formula around? A slight problem with it not working immediately when it is used? At least then it can be fixed, instead of seeing the "No formula found" text.

The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense. Yeah, yeah. Get off my lawn, too.


> What's the real cost of just leaving those formula around?

Maintenance.

When unused and unmaintained projects and formulae pile up in Homebrew, we end up spending a tremendous amount of time and effort patching mostly unused software for a very small part of the userbase. That dis-proportionality hurts the 95% of users who expect timely and well-tested updates to major packages.

We used to provide a "boneyard" tap for unused/unmaintained formulae, but even that led to a lot of requests for support that we simply can't provide. If something is being removed from the core tap, our current recommendation is to put it a personal tap[1].

> The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense.

Keep in mind that the "artifacts" in question are still available, since Homebrew and all Homebrew taps are just Git underneath. You might not be able to build an old formula for compatibility reasons, but all prior work is available for reference.

[1]: https://github.com/Homebrew/brew/blob/master/docs/How-to-Cre...


Somewhat of a tangent, but a lot of these reasons seem to boil down to "we don't have the resources". The HomeBrew project does appear to have a Patreon account, but to find it you have to go to the GitHub repo and scroll to the bottom of the readme. It might help to promote that a bit more on the homepage. It might also help to engage a bit more with the community other than "here's a new release".

IOW, HomeBrew seems to be big enough that it needs to start engaging in some PR.


maintain burden. betroth and the fact that git/GitHub really doesn't like huge dirs.


I've been delighted to install casks using `brew cask install X` where X is some obscure program that I thought brew cask would never have bothered adding. Stuff like that dazzled me about brew cask, so I hope it never goes away even under the guise of maintenance burden.

(It's pretty easy to contribute your own formulas, too! The checklist is pretty small.)

That said, I've occasionally run into a broken `brew cask install X`, so I get that maintenance is a thing. But in that situation it seems best to let someone else notice that and contribute a patch rather than remove it entirely. I understand that might eventually lead to a large percentage of broken casks though.


As a nit, `brew install` and `brew cask install` use largely different codebases ;)

In the best case, the community does pick up the work of patching and maintaining formulae (and casks). However, that's the best case, and low-volume formulae rarely fall into it.

In my experience, formulae tend to get abandoned by the upstream that originally submitted them, causing them to eventually break when the system or surrounding dependencies change. One potential solution is to have a chain of custody for formula maintenance, but that hasn't worked so well for MacPorts.


> the community does pick up the work of patching and maintaining formulae (and casks). However, that's the best case, and low-volume formulae rarely fall into it.

IMO that's self-inflicted by homebrew's policies. from https://docs.brew.sh/Acceptable-Formulae.html :

> We frown on authors submitting their own work unless it is very popular.

I'd expect "very popular" software to have relatively large populations of users capable of maintaining the formulae (and authors less likely to bother with packaging), while in lesser-known software, the respective authors are most likely to maintain long-time interest.


> At least then it can be fixed, instead of seeing the "No formula found" text.

You don't see that text but instead text indicating the formula has been deleted.

> The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense.

They will be preserved in the Git history for as long as we continue to use Git.

Both the above comments make it seem like you've not really done any research into your own objections, here.


The best defense is a good offense. I express concern about your project's use of Google Analytics and then releasing that data to the public, then about the culture of deletion; and I am criticized for my "lack of research". Sorry I touched a nerve; but all of these issues are decisions that are being made, to not expect critiques of your decisions is naive.

Anyways. Having done my research:

> You don't see that text but instead text indicating the formula has been deleted.

If, and only if, you know the exact name of the package. If you do a search, you get the "no formula found". If you attempt to use the braumeister.org web site, it also will not show up unless you explicitly craft the recipe URL.

> They will be preserved in the Git history for as long as we continue to use Git.

How does one go back into git history and revive deleted formula, given how automated the entire process is around ensuring the most recent version of brew is always in use? By your "deleted" help docs, I infer that it comes down to creating your own cask?

The one example I came across quickly is "abi_compliance_checker'. It was deleted because it requires GCC 4.7. Yet Homebrew is still quite capable of installing and using gcc@4.7. Not that it was broken in a recent build. Not that it was a major maintenance burden - the updates for years consisted of version updates.

This isn't something I need, but it's a great (and quickly found) example of seemingly arbitrary deletion of an otherwise active project.


Well I happen to agree with your sentiments. I've been meaning to make a fork of homebrew that doesn't have these problems. You should checkout out tigerbrew.


Reminder, you can opt out of this with:

    brew analytics off
You can check the current status with:

    brew analytics


I'm curious why releasing data in this fashion is troubling? It think this kind of public data sharing in an open-source project is good, since it builds transparency. A top-N list seems like a good way to measure the health of the Homebrew ecosystem, similar to other package managers. Without any other identifiable information like email addresses, it seems near impossible to de-anonymize statistical usage from this set.


> similar to other package managers

Interesting; I don't see apt, dpkg, nix, or yum maintainers releasing top lists of packages. Even the package providers rarely provide a "top" list, especially one with compile-time flags.

Those flags being provided in this list are very explicit, which could be easily correlated with data provided (or harvested) by other parties.

> a good way to measure the health of the Homebrew ecosystem

What value does this really provide to the general public, other than a, ahem, "member" measuring contest? You already see this occurring in this very thread - the MySQL vs. PostgreSQL comments. I fully expect a "Node vs. X" one as well.


Debian publishes statistics gathered by the optional "popularity-contest" package at http://popcon.debian.org

I can't remember whether the default is opt-out or in with the current installer.


It prompts if you want to install it.

It also doesn't sent data to a privacy abusing company like google.


> It also doesn't sent data to a privacy abusing company like google.

Actually the data is sent to Google.


Helps to find packages which I might have missed from other sources. LWN is good, but doesnt cover everything.


Let me lend you this handy piece of code, straight from my production codebase, written the day this feature was released.

    # disable homebrew from tracking metrics
    # see https://github.com/Homebrew/brew/blob/master/docs/Analytics.md
    HOMEBREW_NO_ANALYTICS=1


You're running Homebrew (and presumably OSX Server, if it still is a thing) in production??


> volunteer-run open source projects to properly respect the privacy of their users

If you don't trust us to respect your privacy why are you trusting us to download and run code from the internet on your machine?

We need analytics to effectively run the Homebrew project (as you've noted: we are volunteers). If you no longer trust our project: go use something else.


In short, I can verify what's being done on my computer. I can inspect the formulas and verify that what is claimed to be executed is what is actually being executed.

I can't on yours (or Google's).

EDIT: I'm sorry, was there something factually wrong with this statement to warrant downvoting it into oblivion?


You can trivially verify all the information that's sent by us to Google; we have an environment variable to do exactly that.


There's significant value, when developing a product, in knowing which other products your customers are likely to have, so that you can prioritise your different integrations.

Similar data could probably be compiled from Google Trends, but this clean and authoritative view is something that can be trusted, and I think the result makes the open-source ecosystem stronger.


That only sets up the value proposition for the maintainers - not for making it public. What value do any of us, not on the Homebrew maintenance team, derive from this list (other than fuel for "X is better than Y" arguments)?


It informs me of software I didn't know about and yet is popular among presumably mostly developers and power users. I need to know about that.


There are literally dozens of "how I setup my machine/tools I use" lists and even setup scripts on GitHub.if you "need" to know what others are doing, refer to those. Or ask.

I'll happily tell you my toolset, but I don't use home brew and I wouldn't allow it to send my usage to google even if I did


Thanks for being so open to sharing. Do you have a blog? How do I find your blog? Before you posted this comment how would I even know to look you up.

Just because someone on the internet had a particular setup doesn't mean I want to follow it. Or that I have time to track down several people's opinions.

Getting install stats directly from the homebrew project, which I know because I use it, is infinitely more useful to me and much more easily discoverable. that's just my opinion though and you're entitled to your own.


They have a section on this very website, called "Ask HN". It's not uncommon for people to ask for opinions/input on tooling.

That also gives you more context, because it's answering your actual question, rather than trying to answer your own question with a bunch of vaguely related data.

It also handles the dependency issue. Someone asked why imagemagick is so popular, but its probably actually just a dependency for language-level bindings (e.g. php-imagick), not that people are using `convert` or `identify` directly on the CLI.

Heck, consider the case of front-end developers who have a toolset that depends on nodejs. They may never write any server side code, but if they follow recent trends they probably need nodejs for their css/js "toolchain" - the stats don't tell you that though. They just tell you that nodejs is installed a lot.


> What value do any of us, not on the Homebrew maintenance team, derive from this list

This list was added specifically because a bunch of people expressed concerns about this data not being public. You can please some of the people all of the time, etc.


It's interesting.


There is always macports: https://www.macports.org/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: