Hacker News new | past | comments | ask | show | jobs | submit login
HomeBrew Analytics – top 1000 packages installed over last year (brew.sh)
318 points by sairamkunala on Sept 3, 2017 | hide | past | web | favorite | 165 comments



I'm surprised that ripgrep is so low, at #227.

I've been using it instead of grep the last few months and I could never go back. Check it out if you haven't! Here is the repo and a technical breakdown by the author:

https://github.com/BurntSushi/ripgrep

http://blog.burntsushi.net/ripgrep/


Probably in part because the_silver_searcher ('ag') which is a very similar tool is #65

It's the one I personally use simply because I discovered it before ripgrep came out. Plus the 'ag' commandline is super easy.


ripgrep's command line is `rg`, and it's much faster. When I first installed it I aliased `ag` to it to easy the transition.


How often do you handle files large enough to observe a difference between grep, ack, ag and rg ?

I'm willing to bet (and happy to lose) that most people, even in the subset who use grep "a lot" (defining "a lot"...), wouldn't see a significant improvement. They are people (I'm betting fewer) who need speed above all other concerns, and those people already make it to the top 1000.


The popularity of ripgrep, ag, ack, etc., is an object lesson in "defaults matter." I don't say this in the prescriptive sense, i.e., "hey, you, you should care about defaults!", but rather, in the descriptive sense, i.e., "there are a lot of people out there that care about the defaults." The second lesson to learn is that people care about the difference between "results are instant" and "there is a bit of noticeable lag." I don't personally care all that much, but other people do. (AIUI, some people use ripgrep in their fuzzy searchers, and maybe "instant" matters there. A lot of engineering went into ripgrep to make its directory traversal fast.)

Before I wrote ripgrep, I was a grep user. I hadn't migrated to ack or the silver searcher because I didn't see the need. (Sound familiar? :P) In my $HOME/bin, I had grepf:

    #!/bin/sh
    
    find ./ -type f -wholename "$1" -print0 | xargs -0 grep -nHI "$2"
and grepfr

    #!/bin/bash
    
    first=$1
    shift
    grep -nrHIF "$first" $@
And that was pretty much all I ever needed. If ack had never come along, I'm not sure I ever would have changed. The tools I had were good enough.

ripgrep didn't begin life as something that I intended to release as its own project. It began life as a way to test the performance of Rust's regex engine under similar work-loads as the regex engine in GNU grep. In other words, it was a benchmark that I used. (In fact, I used it quite a bit to reduce per-match overhead inside the regex engine. The second commit in ripgrep says, "beating 'grep -E' on some things.") I didn't really start to convert it to a tool that other people could use until I realized that it was actually as fast---or faster---than GNU grep. That, plus I was bored in an airport. :-)

A lot of people are happy with their tools that are good enough. I know I was. Has my life been dramatically changed by using ripgrep? No, not really. But I do like using it over my previous tools. It's a minor quality of life thing. It turns out, a lot of people care about minor quality of life things!

But yeah, I hear roughly the same sentiments that you say from a lot of people. All it really comes down to is different strokes and different common workloads that magnify the improvements in the tool.


Have you considered problem of printing searched results to terminal? I saw it is detailed little bit in your blog, but one of the things that bothers me is - say I am searchin for string "foo" in a 2GB log file. There are usual number of matches, nothing unusual.

But typically, I am not really looking for string "foo". I guess most users who are grepping log files are also looking for strings/text that appear slightly before and slightly after the match. This mostly happens when I am searching for an error/exception that triggered "foo". I find usability of grep frustrating when I need to search around something. It usually means, I have to restart the search with `grep -C` or something like that and even then line numbers I specified may not be enough.


Thoughts have crossed my mind, but it's a wickedly hard problem. My personal opinion is that once you start trying to solve the problem you're describing, then you really start to venture away from "line oriented searcher" to "code aware searcher" in a way that invites a lot of trade offs. The most important trade off is probably maintenance or complexity of the code.

In particular, in order to better show results, I kind of feel like the search tool needs to know something about what it's showing. Right? How else do you intelligently pick the context window for each search result? For code, maybe it's the surrounding function, for example.

The grep -C crutch (which is also available in ripgrep) is kind of the best I've got for the moment for a strictly line oriented searcher. `git grep` has some interesting bits in it that will actually try to look for the enclosing the function and emit that as context. I think it's the `-p/--show-function` flag. ... But that doesn't really help with your log files.

In any case, I am very interested in this path and even have an issue on the ripgrep tracker for it: https://github.com/BurntSushi/ripgrep/issues/95 --- I'm not sure if it really belongs in ripgrep proper, but I would really love to collect user stories. If you have any, that would be great. Examples of what you'd like the search results to look like would be great!


I use -C 20 combined with a pager.

rg foo -C 20 -p | less -R

Also, yuck, that command line. Glad I have those hidden behind shell scripts.


Hehe, yeah, I have `rgp` in my $HOME/bin:

    #!/bin/sh
    
    exec rg -p "$@" | less -RFX


First, thank you for your reply and your work.

Second, I include myself in the users of ripgrep (and the silver searcher before), I also dislike the slight waiting time when I have a better alternative.

Third, I'd like one those to be a default package in Debian.


:-)

> Third, I'd like one those to be a default package in Debian.

Yeah, I'd love that too! I know there have been people pushing on this, but AFAIK, it's stalled on "how do we package Rust applications in Debian."

(I don't use Debian and I'm not terribly familiar with their policies, so I'm not really familiar with the details.)


Rustc and Cargo are packaged for Debian; both are Rust applications. That shouldn't be the holdup.


Oh, I thought Cargo hadn't made it into Debian. I'm way behind the times then. :-)


It's not in stable, but it is in sid and buster. Now that rustc builds with Cargo, you gotta get both. :)

https://crates.io/crates/debcargo is also a big help.


> How often do you handle files large enough to observe a difference between grep, ack, ag and rg ?

> I'm willing to bet that most people, even in the subset who use grep "a lot" (defining "a lot"...), wouldn't see a significant improvement.

Daily, but not for the reasons that I think you're thinking. I work in Python a lot, which means there is typically a virtualenv in the tree somewhere, sometimes more than one. Typically, I want to search the code base itself — not a virtualenv, not the .git directory, etc. ripgrep, by default, will ignore entries in the .gitignore (and the virtualenvs are listed there, as they're not source, and cannot be committed), and repository directories like .git, and will thus not even consider those files. For my use case (searching my own code base), this is exactly what I want, and culling out those entire subtrees makes rg considerably faster than grep.

Yes — I could exclude those directories with grep by passing the appropriate flag. But it's time consuming to do so: ripgrep wins out by doing — by default — exactly what I need.

I also greatly prefer ripgrep's output format; the layout and colors make it much easier to scan than grep's.

Most of the people I've recommended ripgrep to are using grep, and passing flags to it to get it to do what essentially rg does quicker and/or by default. Ripgrep is an excellent tool.

(I used `git grep`, which is also considerably faster for similar reasons, prior to rg. But `git grep` requires a repository — for obvious reasons — and thus fails in cases where you're not in one. I often need to search several codebases when doing cross repository refactors, and ripgrep has been quite useful there.)


Possibly redundant information, but still: ag has those same features. I see lots of reasons to choose rg/ag over grep, but none yet to choose rg over ag.


ripgrep's gitignore support is more complete. ripgrep also supports seamless UTF-16 search, and can be made to search files of any number of other encodings as well.

But yes, the feature sets are very similar.


Well, the main would be speed, possibly even stability, (Rust vs C), but if you're not after those, there's little reason to choose rg over ag. On the other hand, speed is also the primary reason to go with ag over ack, so the question is, why not go with the fastest alternative?


Ripgrep’s antipitch [0] lists some reasons to prefer ag over rg.

[0]: http://blog.burntsushi.net/ripgrep/#anti-pitch


I intended to mention in my original post, but I appear to have forgot: I'm only comparing rg/grep; I have no experience with ag, so I can't speak to it. rg was my first "better than grep" tool, and it's filled my needs quite well. (Enough so that I've not felt the need to investigate ag.)


Every time I search in VS Code, which now uses rg by default.

You'll easily notice the difference on any recursive search. grep is really slow.


Depends on the size of the codebase you're grepping. I routinely deal with larger ones that I have to dissect and the difference is more than noticeable.


ripgrep wasn't released until late September last year or so, so it's missing a couple months of events based on that. Plus, it needed time to become popular. :-)

Thanks for the kind words!


Wow, rg looks fantastic! Will try it today I think...

http://blog.burntsushi.net/ripgrep/


I imagine a lot of people install it via cargo.

It’s also only very recently that the crown has passed from ag (the silver searcher) to rg.


I hate that ripgrep won't let you specify an arbitrary list of filename extensions to search. You have to do some voodoo to get it to only search .foo files. With ag it's as simple as -G foo.

It's better than ag in a lot of ways, but there are little pain points like that which make me shift back and forth between tools.


It seems that ripgrep looks at .gitignore files. You could write such a file to exclude glob-patterns.


You can't accomplish that with the -g glob flag?


  $ rg -g l foo
  No files were searched, which means ripgrep probably
  applied a filter you didn't expect. Try running again
  with --debug.
Empirically not! :)

EDIT: I see, you have to do "rg -g '*.l' foo". Well, that's a bit silly. Why force people to put asterisks inside of single quotes on the command line? Asterisks have a specific meaning in a shell setting. It's five times longer than -G l, the ag equivalent.

EDIT: Thanks for all the explanations.


`ag -G l` and `rg -g '{STAR}.l'` are not equivalent. The former will match any file name that contains `l` where as the latter will match any file name that ends with `.l`. (ag's -G flag accepts a regex with match-anywhere semantics, where rg's -g flag accepts a glob that matches on the basename of a file, e.g., `rg -g Makefile` will search all files name `Makefile`.)

The asterisk is part of standard globbing. You can also write it as `rg -g \{STAR}.l foo`, if you find that nicer.

If you want to match a list of extensions, then you can fall back to standard glob syntax: `rg -g '{STAR}.{foo,bar,baz}' pattern`. Or, as others have mentioned, if you're searching for standard file types, you can use the `-t/--type` flag. e.g., To search HTML, CSS and Javascript: `rg -tjs -thtml -tcss foo`.

Basically, ripgrep's `-g` flag is supposed to match grep's `--include` flag, which also uses globs and requires the same type of ceremony. I'd like to add --include/--exclude to match grep's behavior more precisely (which is based on user complaints wanting those flags).

N.B. Replace {STAR} in text above with a literal asterisk symbol.


>I see, you have to do "rg -g '.l' foo"

Actually it's just:

  rg -g '*.js' query
And if it's a known file type, like js, you can do (-t type):

  rg -g -t js
>
Why force people to put asterisks inside of single quotes on the command line?*

Because else the shell will auto-expand the asterisk before it even gets to rg, and rg will instead get the expanded list of files that match the pattern. E.g. if you have

  a.js b.js /foo
in a folder, then:

  rg -g *.js query
will be expanded by your shell to:

  rg -g a.js b.js query
and THEN run. rg will never see the asterisk in that case (and it also wont search inside /foo).


This is because of the shell, not the executable. The shell will translate `*` into a relevant list of files before passing them to the executable.


If you're using bash you don't need single quotes:

  rg -g*.l foo


`git grep` works just fine and it also pretty quick. i'm sure it isn't 100% the same use-case, but most of the time i'm looking inside a repo anyway, and for the other times i can just go get a coffee


The blog post linked in the GP contains benchmarks, including `git grep`. TL;DR is that for simple queries, `git grep` is going to perform about as well as ripgrep. For more complex queries, ripgrep can be faster.

If you're on Windows, ripgrep will also (seamlessly) search UTF-16.


i'm definitely not anti-ripgrep, but OP did say "I'm surprised that ripgrep is so low" - i'm not surprised at all, just trying to point out why.

it's just yet another tool which i'd have to learn, and yet another tool that isn't going to be installed on the remote machine.

is the productivity win worth installing it, let alone learning it compared to going from `grep` to `git grep`? for me probably not, diminishing returns. doesn't mean it isn't great software!

(also, for the sake of stats, homebrew is macOS only, although i appreciate the info)


Is it better than ack-grep?

I've been using ack-grep for years, seems fine.


It's a lot faster. See http://blog.burntsushi.net/ripgrep.

> Notably absent from this list is ack. We don’t benchmark it here because it is outrageously slow. Even on the simplest benchmark (a literal in the Linux kernel repository), ack is around two orders of magnitude slower than ripgrep. It’s just not worth it.


Speed is probably the main difference. I can't find direct comparisons, but ag is touted as faster than ack, and ripgrep as faster than ag.


The README contains a quick breakdown: https://github.com/BurntSushi/ripgrep#quick-examples-compari...

My blog post linked elsewhere goes into more detail, although I left out ack because it was too slow.


I forgot that I even had it on my system!


This information is similar to debian's popcon: http://popcon.debian.org/, but one advantage popcon has is that it tries to measure actual usage of a tool even after it's installed. This avoids over-counting people who install something and then never or rarely use it. Of course, that's more of a problem with Linux distributions (which tend to install a kitchen sink worth of stuff) than with homebrew (where people probably install a much smaller subset). In any case, it would probably be quite easy for homebrew to collect the same statistic (basically just look at the atimes of installed binaries).

[Disclosure: I'm the original author of popcon so I'm biased :)]


Homebrew's use of analytics still bothers me; specifically, making data public like this. It was supposed to only be used for development efforts, not showing off top-1000 lists. Also, I guess this following statement is - taking the charitable option - out of date.

> Homebrew's analytics are accessible to Homebrew's current maintainers.

https://github.com/Homebrew/brew/blob/master/docs/Analytics....

Yeah, I know, my tinfoil hat is quite elaborate.

EDIT: Like I said, my tinfoil hat is quite elaborate; I already have analytics and automated updating turned off. I like to retain some control over what information is reported back to Google or other projects.

That doesn't somehow boost my confidence in Google and volunteer-run open source projects to properly respect the privacy of their users.


Homebrew maintainer here. That language could probably be more precise - only current maintainers have access to detailed analytics (the details being specified on that same page).

I wasn't part of the creation of that particular page, but one thing we (the maintainers) commonly find ourselves doing is publicly referencing install statistics as justification for removing an unused formula or taking extra care during a version bump. Having public statistics for the top 1000 most popular packages makes those considerations a little bit more transparent.

Edit: I've created a PR to fix the language: https://github.com/Homebrew/brew/pull/3120


Mildly unrelated question.

Why is telemetry collected through Google? I suppose that’s because they have some particularly convenient APIs, however please consider that many user are concerned by the amount of information Google has about their lifes.

(I know, it’s possible to opt out, but good choices should be the default!)


> Why is telemetry collected through Google?

No-one else provides anywhere near the same scale for free. We have over a million monthly active users and every other solution we tried (including FOSS ones) either fell over at that load or we couldn't find anyone willing to provide hosting for us.

We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.


First of all, let me say I appreciate your response and your work on Homebrew, a great piece of software, and quite a blessing for Mac users.

> No-one else provides anywhere near the same scale for free.

That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

> We don't send any personally identifiable information to Google. You are tracked by a randomly generated UUID (which you can regenerate at any time) and we tell Google to not store IP addresses.

Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.

Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.


> That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

Yep and that's an acceptable trade-off for the maintainers of Homebrew given that we need analytics to do our (volunteer, free-time) job adequately and we do not have financial resources for other alternatives. If you're willing to provide those financial resources indefinitely: get in touch.

> Even with no IP, Google can easily cross-references searches and the random UUID, since a typical use case is that a user installs something through Homebrew after Googling it.

That may be technically possible but I see no evidence that it is the case.

> Please rethink this bad default choice. And sorry if my reply sounds harsh, but I think that Google tracking by default is an extremely bad choice for your users.

We disagree. Please consider another macOS package manager. MacPorts is a good alternative.


I also use Google Analytics for tracking usage in my desktop applications (opt out) and it's great. I think most just think about web stats (page views) but I just use the event system. This allows me to identify unused features or possible confusing UX. I also started using it in a project at work and the data has been extremely useful.


> That's a bit naive. Google is not a charity and provides that service by making a profit out of user data.

So what if they make profit -- they provide a fantastic service that actually works well, and it allows homebrew (and many other systems) to continue to provide it's services for free.


If you are fine with a company profiting out of your personal information and habits, I am not, and I think is a reasonable choice to assume that people are not fine with that.


Then those people won't use Homebrew.


Sure! But the telemetry is a default setting, and the implications for the user’s privacy are not clear at all.

Opt-in telemetry clearly stating that Google will record the first three bytes of your IP? I’m OK with that!


> We don't send any personally identifiable information to Google.

Yes, you do. While you use[1] the "Anonymize IP" option, a packet is still sent from the user's IP. Google's business model includes gathering as much data as possible so it's foolish to think that they are throwing data away in this situation. You may disagree and trust Google to honor the "Anonymize IP" option, but trust is not transitive so you shouldn't ever assume users agree (use opt-in in every situation).

However, claiming you don't send pii to Google makes me wonder if you have actually read the documentation for GA? The "Anonymize IP" (aid=1) option is blatant doublespeak. From their own documentation[2]:

> The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses ...

They are only masking out the last 8 bits of the address, which are the least interesting bits. You can still discover the ASN from the remaining data. At worst all that option did is add a 1-in-256 guess when correlating your analytics data to the rest of Google's tracking data. That is trivial to overcome Google's massive databases of tracking data.

You even provide a unique additional per-install tracking number that lets Google track users when they move to a different IP address. Once a correlation exists between your analytics data and everything else at Google, your analytics events provide a reliable report about that can allows other tracking data to be correlated to the new IP address.

Why does that option exist? It's possible that it was designed to mislead developers into sending Google tracking information, but their own documentation[2] suggests a different hypothesis:

> This feature is designed to help site owners comply with their own privacy policies or, in some countries, recommendations from local data protection authorities

This is a feature designed to check boxes on compliance requirements, not to provide any provide actual anonymity to users.

[1] https://github.com/Homebrew/brew/blob/fd4fe3b80cab9902437016...

[2] https://support.google.com/analytics/answer/2763052?hl=en


>> We don't send any personally identifiable information to Google.

> Yes, you do

PII is a term of the art which the GP is using in its standard sense and you are not. https://en.wikipedia.org/wiki/Personally_identifiable_inform...

(This is independent of the deontolic status of your comment.)


Redefining a phrase to omit much of its common meaning is what got us here. I appreciate you helping to bridge the gap by translating, I am just bemoaning its nessessity.


Beware the Tyranny of the Minimum Viable User.

It may well be that some relatively-little-used tool nontheless has a very significant use. Popularity is one factor within the mix, but only one.

https://redd.it/69wk8y


You're right, which is why popularity isn't the sole factor in determining whether to remove a formula from the core repository. It's just one of the things we reference to justify our choice, when relevant.

We generally only refer to popularity after a formula comes up on our radar due to failing other sniff tests. To give you an idea for some of them:

1. Multiple subsequent releases without anybody bothering to update the formula

2. Historical problems with the package (flaky builds, complex build systems)

3. Historical problems with the upstream (patches ignored, unwillingness to cooperate with packagers, unreliable download servers)


Criticality in workflow might also be a consideration. I'd have to check how homebrew does this, but Debian frequently has separate documentation and debug packages as supplements to core functionality. Those might be essential in cases.

Since homebrew is aimed mostly at technical types and devs, build tools themselves are probably fairly highly featured, but those are easy things to lose in large general-public releases. (Android's abysmally poor shell tools come to mind.)

Sounds like a pretty good basic set. The flaky builds criterion in particular seems like a strong signal of a low-quality upstream.


> as justification for removing an unused formula

This does not, in my mind, help the case at all. This culture of deletion makes no real sense to me; something not being used for a month or a year doesn't mean it's not being used at all. What's the real cost of just leaving those formula around? A slight problem with it not working immediately when it is used? At least then it can be fixed, instead of seeing the "No formula found" text.

The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense. Yeah, yeah. Get off my lawn, too.


> What's the real cost of just leaving those formula around?

Maintenance.

When unused and unmaintained projects and formulae pile up in Homebrew, we end up spending a tremendous amount of time and effort patching mostly unused software for a very small part of the userbase. That dis-proportionality hurts the 95% of users who expect timely and well-tested updates to major packages.

We used to provide a "boneyard" tap for unused/unmaintained formulae, but even that led to a lot of requests for support that we simply can't provide. If something is being removed from the core tap, our current recommendation is to put it a personal tap[1].

> The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense.

Keep in mind that the "artifacts" in question are still available, since Homebrew and all Homebrew taps are just Git underneath. You might not be able to build an old formula for compatibility reasons, but all prior work is available for reference.

[1]: https://github.com/Homebrew/brew/blob/master/docs/How-to-Cre...


Somewhat of a tangent, but a lot of these reasons seem to boil down to "we don't have the resources". The HomeBrew project does appear to have a Patreon account, but to find it you have to go to the GitHub repo and scroll to the bottom of the readme. It might help to promote that a bit more on the homepage. It might also help to engage a bit more with the community other than "here's a new release".

IOW, HomeBrew seems to be big enough that it needs to start engaging in some PR.


maintain burden. betroth and the fact that git/GitHub really doesn't like huge dirs.


I've been delighted to install casks using `brew cask install X` where X is some obscure program that I thought brew cask would never have bothered adding. Stuff like that dazzled me about brew cask, so I hope it never goes away even under the guise of maintenance burden.

(It's pretty easy to contribute your own formulas, too! The checklist is pretty small.)

That said, I've occasionally run into a broken `brew cask install X`, so I get that maintenance is a thing. But in that situation it seems best to let someone else notice that and contribute a patch rather than remove it entirely. I understand that might eventually lead to a large percentage of broken casks though.


As a nit, `brew install` and `brew cask install` use largely different codebases ;)

In the best case, the community does pick up the work of patching and maintaining formulae (and casks). However, that's the best case, and low-volume formulae rarely fall into it.

In my experience, formulae tend to get abandoned by the upstream that originally submitted them, causing them to eventually break when the system or surrounding dependencies change. One potential solution is to have a chain of custody for formula maintenance, but that hasn't worked so well for MacPorts.


> the community does pick up the work of patching and maintaining formulae (and casks). However, that's the best case, and low-volume formulae rarely fall into it.

IMO that's self-inflicted by homebrew's policies. from https://docs.brew.sh/Acceptable-Formulae.html :

> We frown on authors submitting their own work unless it is very popular.

I'd expect "very popular" software to have relatively large populations of users capable of maintaining the formulae (and authors less likely to bother with packaging), while in lesser-known software, the respective authors are most likely to maintain long-time interest.


> At least then it can be fixed, instead of seeing the "No formula found" text.

You don't see that text but instead text indicating the formula has been deleted.

> The culture of deletion surrounding the long tail of digital artifacts just doesn't make any sense.

They will be preserved in the Git history for as long as we continue to use Git.

Both the above comments make it seem like you've not really done any research into your own objections, here.


The best defense is a good offense. I express concern about your project's use of Google Analytics and then releasing that data to the public, then about the culture of deletion; and I am criticized for my "lack of research". Sorry I touched a nerve; but all of these issues are decisions that are being made, to not expect critiques of your decisions is naive.

Anyways. Having done my research:

> You don't see that text but instead text indicating the formula has been deleted.

If, and only if, you know the exact name of the package. If you do a search, you get the "no formula found". If you attempt to use the braumeister.org web site, it also will not show up unless you explicitly craft the recipe URL.

> They will be preserved in the Git history for as long as we continue to use Git.

How does one go back into git history and revive deleted formula, given how automated the entire process is around ensuring the most recent version of brew is always in use? By your "deleted" help docs, I infer that it comes down to creating your own cask?

The one example I came across quickly is "abi_compliance_checker'. It was deleted because it requires GCC 4.7. Yet Homebrew is still quite capable of installing and using gcc@4.7. Not that it was broken in a recent build. Not that it was a major maintenance burden - the updates for years consisted of version updates.

This isn't something I need, but it's a great (and quickly found) example of seemingly arbitrary deletion of an otherwise active project.


Well I happen to agree with your sentiments. I've been meaning to make a fork of homebrew that doesn't have these problems. You should checkout out tigerbrew.


Reminder, you can opt out of this with:

    brew analytics off
You can check the current status with:

    brew analytics


I'm curious why releasing data in this fashion is troubling? It think this kind of public data sharing in an open-source project is good, since it builds transparency. A top-N list seems like a good way to measure the health of the Homebrew ecosystem, similar to other package managers. Without any other identifiable information like email addresses, it seems near impossible to de-anonymize statistical usage from this set.


> similar to other package managers

Interesting; I don't see apt, dpkg, nix, or yum maintainers releasing top lists of packages. Even the package providers rarely provide a "top" list, especially one with compile-time flags.

Those flags being provided in this list are very explicit, which could be easily correlated with data provided (or harvested) by other parties.

> a good way to measure the health of the Homebrew ecosystem

What value does this really provide to the general public, other than a, ahem, "member" measuring contest? You already see this occurring in this very thread - the MySQL vs. PostgreSQL comments. I fully expect a "Node vs. X" one as well.


Debian publishes statistics gathered by the optional "popularity-contest" package at http://popcon.debian.org

I can't remember whether the default is opt-out or in with the current installer.


It prompts if you want to install it.

It also doesn't sent data to a privacy abusing company like google.


> It also doesn't sent data to a privacy abusing company like google.

Actually the data is sent to Google.


Helps to find packages which I might have missed from other sources. LWN is good, but doesnt cover everything.


Let me lend you this handy piece of code, straight from my production codebase, written the day this feature was released.

    # disable homebrew from tracking metrics
    # see https://github.com/Homebrew/brew/blob/master/docs/Analytics.md
    HOMEBREW_NO_ANALYTICS=1


You're running Homebrew (and presumably OSX Server, if it still is a thing) in production??


> volunteer-run open source projects to properly respect the privacy of their users

If you don't trust us to respect your privacy why are you trusting us to download and run code from the internet on your machine?

We need analytics to effectively run the Homebrew project (as you've noted: we are volunteers). If you no longer trust our project: go use something else.


In short, I can verify what's being done on my computer. I can inspect the formulas and verify that what is claimed to be executed is what is actually being executed.

I can't on yours (or Google's).

EDIT: I'm sorry, was there something factually wrong with this statement to warrant downvoting it into oblivion?


You can trivially verify all the information that's sent by us to Google; we have an environment variable to do exactly that.


There's significant value, when developing a product, in knowing which other products your customers are likely to have, so that you can prioritise your different integrations.

Similar data could probably be compiled from Google Trends, but this clean and authoritative view is something that can be trusted, and I think the result makes the open-source ecosystem stronger.


That only sets up the value proposition for the maintainers - not for making it public. What value do any of us, not on the Homebrew maintenance team, derive from this list (other than fuel for "X is better than Y" arguments)?


It informs me of software I didn't know about and yet is popular among presumably mostly developers and power users. I need to know about that.


There are literally dozens of "how I setup my machine/tools I use" lists and even setup scripts on GitHub.if you "need" to know what others are doing, refer to those. Or ask.

I'll happily tell you my toolset, but I don't use home brew and I wouldn't allow it to send my usage to google even if I did


Thanks for being so open to sharing. Do you have a blog? How do I find your blog? Before you posted this comment how would I even know to look you up.

Just because someone on the internet had a particular setup doesn't mean I want to follow it. Or that I have time to track down several people's opinions.

Getting install stats directly from the homebrew project, which I know because I use it, is infinitely more useful to me and much more easily discoverable. that's just my opinion though and you're entitled to your own.


They have a section on this very website, called "Ask HN". It's not uncommon for people to ask for opinions/input on tooling.

That also gives you more context, because it's answering your actual question, rather than trying to answer your own question with a bunch of vaguely related data.

It also handles the dependency issue. Someone asked why imagemagick is so popular, but its probably actually just a dependency for language-level bindings (e.g. php-imagick), not that people are using `convert` or `identify` directly on the CLI.

Heck, consider the case of front-end developers who have a toolset that depends on nodejs. They may never write any server side code, but if they follow recent trends they probably need nodejs for their css/js "toolchain" - the stats don't tell you that though. They just tell you that nodejs is installed a lot.


> What value do any of us, not on the Homebrew maintenance team, derive from this list

This list was added specifically because a bunch of people expressed concerns about this data not being public. You can please some of the people all of the time, etc.


It's interesting.


There is always macports: https://www.macports.org/


It is weird how small our global tech tribe is.

I would estimate I download tmux via homebrew once a year on average. It is possible 112,000 represents a good estimate of all mac carrying tmux users in the world [*]. For comparison this is roughly the same number as employees at Apple.

Assuming these stats aren't opt in.


For another view, I'm a mac user, but prefer to use all those tools via a linux VM, either local, remote server or from a VPS. Also, not everyone doing development in macos uses homebrew.


If you don't mind me asking, why do you prefer working in a linux VM or VPS?

And what are the pros and cons?


Don't know what half of these are? Here's some quick and dirty Javascript to make each formula clickable to a detail page:

    document.querySelectorAll("td > code").forEach(code => code.innerHTML = `<a target="_blank" href="http://brewformulas.org/${encodeURIComponent(code.innerText)}">${code.innerText}</a>`)
Cut and paste that in your Javascript Console (`View/Developer/Javascript Console` on Chrome) when you're on https://brew.sh/analytics/install-on-request/.


Youtube-dl at #18 is hilariously telling, although to be fair it's great for archiving clips (from all sorts of sites, not just YouTube) that may get deleted due to all sorts of reasons.


It also is updated very often (I guess to keep step with all the non-public APIs it probably uses). Almost every time I run `brew update`, youtube-dl has an update ready. This might inflate its download stats a bit.


I wish there was a decent package manager on windows too. Chocolates used to be OK but I often find outdated packages and they keep breaking the syntax of existing scripts regularly (like by making almost all packages now require the flag --allowemptychecksum). I don't know why it never really took off. This is such a practical way to setup a machine.


*typo chocolatey.


Someone super needs to turn that list into an expanded version with a 2 line summary of each line.

I found myself tabbing out to google constantly like 'fdk-aac, wow, that's a thing? cool~'


> brew info fdk-aac was querying locally with this.


oh, nice! I didn't realize that `brew info X` had a nice summary line in it. Thanks for the tip.


If you want only that little description you can use `brew desc <formula>` as well.


I found myself googling 'yarn' and being surprised to find a bunch of pages about... knitting.


Great. Caddy is #666. Hope that's not a sign. :P


I moved from NGINX to Caddy on my raspberry pi 3 for my home needs and couldn't be happier.

The automatic LetsEncrypt stuff is great and has removed crufty cron jobs to make sure the certs are up to date, and support for HTTP/2 out the box (with 0 configuration) has seen a marked improvement in the performance of my site.


That's excellent to hear! Glad you are happy. Thank you to people like you in the community who test it in various environments to make sure it works well. ;)


I like Caddy on my linux servers. very few cases on Mac.

Great software by the way.


Thank you!


    #575 	algol68g 	3,485 	0.02%
Now that's awesome.


Who the hell uses homebrew to install Perl? OS X ships with it, and perlbrew is a much more natural way to install it if you care about not using the system Perl. Maybe it's a dep for another package?


If I wanted a newer version of Perl (or Python, or anything else that ships with macOS), I'd use homebrew, if only to keep everything in the same place. By doing so, for example, brew upgrade would update everything. That's the whole point of package managers, having a discrete package manager for each package defeats the purpose. This is also a reason why I dislike npm and the weird practice of distributing non-js packages with it.


> Maybe it's a dep for another package?

Random pedantry: the list on that page is the stuff where people have typed `brew install perl` (or `brew upgrade perl`) and it hasn't been pulled in as a dependency. As to why: I have no idea :D


Based on how low it is on the rankings compared to other languages I'd say not many.

I have no data to back this up other than my anecdotal personal experience but I'd wager Perl hackers who use Homebrew never use it to install Perl.

Most of the classic Perl documentation walks you through installing from source, which is quite easy, and modern documentation/tutorials usually refer to perlbrew.

The Perl community often has established solutions within their own ecosystem, and this makes them seem like they have a smaller presence then the otherwise would have.


>Maybe it's a dep for another package? This is likely the reason. I would imagine more people use things written in Perl rather than Perl itself as a programming tool.

Also, Perlbrew is not necessarily easier, simply because having one package manager manage your packages is better than having many package managers (cargo, npm, gem, etc). It suggests that maybe there should be a package manager for all these package managers... I'd personally want to use the simplest way possible to install something unless I had a reason.


`brew install <thing>` is a natural thing to try first. If it works well then why use something else?


Works well for what? There's a recent Perl already installed on machines that brew runs on


The system perl on my work mac is v5.18.2. It's years out of date. If I want to run something that needs a newer version I can either update it without knowing what impact that has on the rest of the things running on my computer (seems like a bad plan...) or I can install a newer version alongside the system perl. My preferred method for that would be brew because I use brew for pretty much everything. I don't really want a second (or third, or fourth) package manager for every other thing.


I'm surprised (in I guess a good way =) that awscli is all the way up at #15. Definitely expected quite a few utilities and languages to rank higher.


I saw that and wondered if these count unique installations, or simply number of times downloaded. awscli updates a lot, so if we're counting downloads, that would give it a significant boost, I think.


A lot of people have yet to discover nvm (:


More likely due to issues rather than ignorance: "Homebrew installation is not supported. If you have issues with homebrew-installed nvm, please brew uninstall it, and install it using the instructions below, before filing an issue."

https://github.com/creationix/nvm#important-notes


nodenv is so much better than nvm https://github.com/nodenv/nodenv


YARN went from idea to #4 ️


Nice stats, also the single software version update frequency should be considered. For example I installed both node and vim, the latest only once.


Can someone explain the popularity of imagemagick on there? As in, are that many people tinkering with graphics via the command-line?


I think it's still pretty commonly used on the server side for image manipulation on uploads.


batch image resizing. batch image conversion. thumbnailing. making a montage of screenshots or graphs.

many, many use-cases. i use it all the time, it's just more convenient than pixelmator/gimp/photoshop for those tasks.


I always end up using my old standby GraphicConverter (been using it since 1996...) for that kind of stuff since with imagemagick there always seems to be some kind of voodoo you have to invoke once you have a PNG alpha channel, images with EXIF rotation tags, PDF files, color profiles, etc.


Imagemagick and graphicsmagick are being used for server side image manipulation. E.g when you upload your images on a server, two of the first things to be done is to delete the metadata and resize.


That makes sense, so it's to generate thumbnails/resize uploads.

Could it be some node package that has a dependency on imagemagick?

What I'm having trouble wrapping my head around is so many people being aware of imagemagick and reading through it's documentation to learn how to use it.


It was a really popular gem requirement and lots of Ruby developers use Macs/Homebrew


Also pretty popular in the PHP world. Judging by the number of PHP installs on the list I'd say a lot of people are using it for development.


A lot of stuff at the top of the list is requirements/pre-requisites for other packages, for example wget is in top place used by other packages to download stuff.


Or probably since the very first brew command on the homepage is `brew install wget`.

"Ah, wget. I guess I do need that."


i'm not sure this is true. erlang ranks lower than elixir, but erlang is a prereq for elixir


this list is `install on request`. Find raw metrics at https://brew.sh/analytics/install/


Personally, I opted to download it manually since I prefer it over curl for some things.


I am, personally. It's just the quickest way for me to manipulate images. I don't have Photoshop installed, and macOS' Preview only gets you so far.


I do resize on the command line all the time. Format conversion occasionally. Other things few times a year.


alot of other programmable things also link to its functionality as a library..


While I really like 'yarn', I was surprised to find it at #4. I did not expect it to be that popular :D


Just curious, How does the brew analytics infra look like?Datastores, Aggregations, rollups etc.


They use Google Analytics. Here's there post about it https://github.com/Homebrew/brew/blob/master/docs/Analytics....

There are also instructions on how to opt-out.


Interesting:

Python3 #5, Python #6, Pypy #439 ...

Go #22, Scala #78, Elixir #83, Rust #141, Typescript #584 ...

Groovy #150, Kotlin #195 ...

Go #22, Ruby #33 ...


#83 elixir

#85 fish

I am glad to see these 2 in top 100.


Aren't qt and qt5 the same package nowadays ?


Why is yarn installed via HomeBrew and not npm?


Because npm can't install a signed app. That said, if you really want to you can install it using npm - https://yarnpkg.com/en/docs/install#alternatives-tab


autojump should be higher. It's insanely handy once it gets your directories indexed.


I have been using z[1] for a _long_ while now, lovely little thing

[1] https://github.com/rupa/z


Seconded, it's a great utility, without dependencies, and very simple.


have you compared autojump to z / fasd? Did you decide to stick with one for a particular reason?


MySQL. Still more popular than Postgres.


Many people on MacOS use Postgres.app for their PostgreSQL needs.

http://postgresapp.com


And and I'm sure many people use the mysql.com installer for MySQL https://dev.mysql.com/downloads/mysql/


Postico[1] too. It's really wonderful.

[1]https://eggerapps.at/postico/


With both numbering in the 6 figures and differing by ~50k it's not like they're vying for a lot of market share.


People should install stuff like their db with docker or vagrant anyhow. It'd be annoying to keep your app's db version in sync with the homebrew package.


I actually do this, but annoyingly for quite a few (postgres/redis), the client tools are not separate packages so you still need to install the full db locally to use them.


People should install stuff like their db with homebrew anyhow. It'd be annoying to keep your app's db version in sync with the docker or vagrant package.


If you use vagrant or docker, the idea is to match the deployment server, which is probably Linux. So you'll be installing (using the OS package manager - something like apt or nix) whichever version is installed for your server OS.


MongoDB on the podium too...

Interesting to see MariaDB in the middle of that list though...


There are of course applications like Wordpress where it's a default


Maintaining projects with MySQL in them is still a thing, yes.


I have read an article which claims LAMP stack has made MySQL preferred to Postgres.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: