It wouldn't be a big stretch to say that 90% of quantitative hedge funds use Numpy in some fashion, whether its directly, or via a library that sits on top of it like pandas or tensorflow.
I can't think of a more ubiquitous library in the financial space, maybe QuicFix (http://www.quickfixengine.org/)...
Maybe numpy's problem is visibility?
Possibly it does its job so well that people don't know they are using it when they use library libraries like scikit learn and Pandas?
It's just an assumed resource in quant finance, like air or water. You do realise you're using it, though. When you're using scikit or pandas it's very normal to do "import numpy as np". And you get the odd np.nan reminding you.
Do people still use QuickFix? I dropped it years ago, it was noticeably slower than alternatives when I tested it.
QuickFIX is open source too, so you can make it somewhat faster without abandoning it. I did.
That's maybe the biggest practical benefit of open source software. You don't have to keep track of who you owe what. A lot of these projects have had a few different critical creators and maintainers over the past decade or so. And we don't have to keep track of any of that. That's a huge efficiency boost.
(You should not re-distribute open-source software in a way that violates the license, but that's a separate issue from using it, and it scales a lot easier - everyone receives/uses many more different software works than they distribute.)
You would obviously know better than I. I guess I considered employing people who were working on NumPy as "funding", but possibly not on the scale or with the focus of a specific grant. So many of the folks who do scientific computing with Python have gone through Enthought, it seems kinda like everyone has drawn a salary or contract work from there at some point. But, I guess a lot of the work at Enthought was focused on making the tools palatable to industry rather than the actual science side of things, and much of the math they're packaging came from the academic world.
That being said, I do wonder if numpy is the most appropriate recipient. In my experience with data science, the tool that would benefit the most is not numpy, but pandas. While data scientists rarely use numpy directly, every data scientist I know who uses pandas says they are constantly having to google how to do things due to a somewhat confusing and inconsistent API. I use pandas at work every day and I'm always looking stuff up, particularly when it comes to confusing multi-indexes. In contrast, I rarely use R's dplyr at work, but the API is so natural that I hardly ever need to look things up. I would love if pandas could make a full-throated commitment to a more dplyr-like API.
Nothing against pandas -- I know the devs are selflessly working very hard hard. It's just that it seems there is more bang for the buck there.
Will have to check out dplyr :) love to see how they master the magic that is multi-indexes.
The tooling to support nested dataframes (and maybe even lists) is simple to create, It can even be a third party library. I find that multi-indices though may be an accurate conceptual way of thinking about certain data, they tend to be practically more inconvenient than nesting the dataframes. In all cases I have encountered only single level of nesting is required.
By the way, dplyr doesn't use multi-indexes. I actually think this one of the reasons (although not the biggest reason) dplyr is easier to use.
I could be wrong, but I'm pretty sure that these would be solved by pandas API design improvements, not with numpy improvements under the hood. (NB: As always, a big thanks to the developers for all their work.)
On the other hand if lots of libraries use numpy, making it more efficient and/or capable would seem to give quite a lot of bang for the buck. And it sounds like that's the kind of problem that money can actually solve.
There have been a few independent attempts to add dplyr-like functionality to pandas without being backwards incompatible (e.g. dplython). I'd be very happy if the core pandas team went down this path.
That being said, I don't have a good understanding of how strong the distinction is between "design issues" and "issues where money helps". There must be some overlap.
> That being said, I don't have a good understanding of how strong the distinction is between "design issues" and "issues where money helps". There must be some overlap.
That's true, but many projects have turned out bad no matter how much more money has been spent compared to less expensive, but better projects. See: Design by committee. The design of an API obviously requires careful thought, which I suppose is work that could be paid. But the issue of getting everyone to agree on a design isn't one that money can solve, and then you need to make some hard decisions about backward incompatibility. Perhaps you'd fund a fork of the project, splitting it into an old legacy one and a new, fancy version with a new API, but then you're committed to maintaining two projects which is its own headache.
These are the kinds of things I mean by design issues. Problems that aren't necessarily hard because they require many people to work for many billable hours to solve them, but because finding acceptable compromises is a very human issue quite irrespective of the programming effort involved.
Many a software project has recognized that serious, backwards-incompatible changes would improve the project, and often there is even a working implementation, but these human and legacy support issues prevent widespread adoption and then the new implementation dies a quiet death because nobody is using it, so nobody finds it worth their time to work on it.
Perhaps what you really want is a new library, rather than trying to contort a different project into the shape you want. Which is of course something money helps with, but then when the money dries up the question of adoption is going to determine whether it lives or dies as an open source project.
Again, those were some general thoughts, I don't know much about this particular project, so maybe I'm way off base. Just offering an alternative POV regarding what exactly constitutes "getting your money's worth" with respect to choosing which OS projects to fund.
I'm a regular user of pandas, would definitely say it's my favorite Python library by far... but it is very hard to do certain operations with it (as the OP said, anything involving multiple indexes, and things like plotting multiple plots after a groupby, etc.)
That's a design error, not necessarily something that money will fix for you. This is why you need to think really long and hard before deploying a public API, it is very hard to change those.
Personally, I believe the biggest blocker for me is to have good visualization tools. That's ultimately what gets me paid is showing other people my work and getting them to give me money to continue it.
On the core science stack IMO there's numpy, scipy, sympy, matplotlib, pandas and xarray. I probably use it next to least, but I really think sympy is the one that could benefit the most from some funding.
At the end of the day a lot of code uses NumPy and not Pandas.
Numpy is an amazing library, and it's basically Python's "killer app." The fact that you can seamlessly blend numerical/data science computing with more general web applications is what makes Python great.
Sounds a little schizophrenic to complain about the (lack of) money at the same time?
I personally think this may be connected to the "academic" origins and mindset of much OSS. Everything's "free" in that world, which can make a software's transition into the world with real economic constraints and sustainability challenges somewhat painful.
Consider: today I might be working on FOSS. But maybe tomorrow a friend asks me to help him with his (small) business, by e.g. adding a little bit of automation. Suddenly, all my knowledge and experience of GPL-ed libraries goes to waste, as I won't be able to use any of that to help my friend.
Given two equivalent libraries, one on GPL and one on MIT, I'll always go for the one on MIT. MIT, BSD, etc. seem to be the libraries that give you most options (I'd say freedom, but that's not how GPL sees freedom) while still maintaining the integrity of the library itself. Those are the licenses that best and at the same time satisfy the needs of developers who are not in it for money, and users who don't want to waste their brain cycles on going through possible legal scenarios around all their actual and imagined use cases.
Help me understand this please. In what legal way is GPL obstructing you as opposed to say an MIT or a BSD license.
GPL license has noting against use in a proprietary setting, its only if GPL'ed software is being sold/distributed, that it is required that the source and the changes be made available as GPL. Google uses GPL'ed software all the time and is far from the only one.
Which one? 2 or 3?
But the discussion here is about FOSS sustainability from the dev perspective, not yours as a user. Dual licensing was one option, proposed by the OP.
Or do you think it's OK that critical libraries like NumPy, Django, etc end up with scraps (if lucky), and then we read odes to that on HackerNews? Long-term planning needs a certain reliable continuum (no pun intended) of people and resources.
You know, experts able to meaningfully contribute at this level (core NumPy, core Scikit-learn, core Django, whatever) have very real trade-offs to make, regarding the cost of their labour, free time, family time etc, once out of academia. The "I HAZ BUG PLS FIX NOW, GIMME GIMME FREE" users are only one piece of the open source puzzle.
I'm sure your friend that runs a business understands this very well (or he'll be out of business quickly).
The presented reasoning may indeed be a big part of the problem here. I'm just describing it, not encouraging it. I am definitely not happy to see so much open source being used as critical components of worldwide infrastructure and businesses, and yet receiving little support both in terms of money and professional effort. FWIW, I do donate the little spare money I have to open source projects.
Beginnings are a creative effort, undertaken by passionate pioneers (rarely in it for the money), with outcomes that are notoriously hard to predict in advance. The hallmark of academia.
That's why I said the later transition hurts -- it's a conceptual and cultural shift, not merely financial.
On the other hand, the GPL wouldn't make a difference here unless they are actually distributing it and my understanding of LGPL means that they could do whatever they wanted to as long as they use NumPy and don't change it.
* make it easier to implement and deploy custom dtypes, fix the time-related dtype
* support for ragged arrays
* consolidate internals, especially around ufuncs
It was blaze and bokeh. The description of blaze given in the article is pretty out of date at this point.
While we continue to develop those things, we are working on taking the core ideas in NumPy and creating a set of lower-level C and Python libraries that could be used by pandas, xarray, Numba, dask, arrow, and potentially NumPy: https://github.com/plures. This work is not directly related to the NumPy funding.
NumPy built on Numeric which was primarily written by Jim Hugunin while he was a grad student at MIT. While I was a professor at BYU, I wrote the the core of NumPy with a lot of community input -- outside of my regular job. I sold a book "Guide to NumPy" for a while that I used to replace the grants I should have been writing and pay for a grad student to help write iterative solvers for SciPy. Chuck Harris joined the NumPy effort early and has been steadily contributing ever since without direct funding. Many others have contributed volunteer time since then.
A major reason I started Continuum was to help create places where people could get paid to write open-source. I am happy to say we have been doing this for 5 years though mostly outside core NumPy itself (Numba, Dask, Bokeh, conda, etc.) We are working to support many more open source projects more generally -- and our devs have now made additional contributions to NumPy itself. We have a thriving 40 person Community Innovation team at Continuum supporting many open source projects. I expect this funding to help bring more new people to the NumPy development ecosystem.
The community also started NumFOCUS at this same time to be a community-run foundation that could be a focal point for donations and support to projects including NumPy.
Nathaniel Smith wrote a great proposal and put the effort into securing this funding. It is real work to secure funding. I look forward to NumPy getting better for the benefit of all because of this work.