Let's say you are writing an API that works with some particular scientific file types on the back end, and you want to load that data into memory for fast querying and returns. Now, that data is a multidimensional time series for each file. You could spend the next months writing libraries and bashing your head against the wall, or you could leverage the 30+ years of development in that stack that enables you to read these.
Xarray to read, numba for calcs in xarray, pandas to leave it sitting in a dataframe, numpy as pandas preferred math provider. You could write the api componentry from there, sure. Or you could use a library that has had the pants tested off it and covered most of the bugs you are likely to accidentally create along the way.
There's no compelling reason to write everything from scratch. If everyone was taking that approach then there would be no reason to have an ecosystem of libraries, and development would grind to a halt because we, as a collective of people programming, are not being efficient.
I see no compelling reason to implement a multidimensional time series for multiple files as a component of any backend API that consumes user (defined) data.
In what circumstance could that be profitable? Even if you batched data, any number of concurrent users would gobble resources at an incredible rate.
Who said anything about profit? Not everything that exists to be solved, and for which their is a demand is driven by profit. Think: regulation, environmental, NGO, citizen science, academia, government agency, public service. All places where systems can exist that are not for profit, but do grant significant capabilities to their user base.
Also, it's a particularly arrogant point of view to assume that because you cannot see a reason for something to exist that its development is invalid both now and into the future. You've also assumed the data is user defined.
I can also guarantee you that user concurrency is not an issue after some recent load testing, with load capabilities surpassing expected user requests by several orders of magnitude whilst on minimum hardware.
I probably should have said economically viable. Handling and manipulating data like that is intensive and thus expensive. If it's not user provided data, why manipulate data with that approach?
Maybe it is arrogant. That entirely depends on whether or not a product or service uses this specific approach -- successfully. Do you have an example?
Edit: I also want to clarify that my comment doesn't suggest that the underlying technology is bad or without use cases; only that it isn't suited for remote (online) processing. It would be way cheaper to manipulate data like that locally.
Thats the point. It's not user data, and the data cannot be manipulated on the user side without excessive hardware, software, and troubleshooting skills.
Taking that scientific data and making it available in report format for those which need it that way, when the underlying data changes at a minimum once per day, is the more important aspect.
The API is currently returning queries in about 0.1 to 0.2s. They are handled async right the way through. It's fast, efficient, and the end result whilst very early in the piece is looking nice. Early user engagement has been overwhelmingly positive.
It's not a public endpoint, and the api is still under dev with interface largely yet to start. So, can't share / won't share. Sorry.
Where it will be shared is among those with an interest in the specific space. That includes government agencies, land managers, consultancies etc. At no cost to them, because what the outputs can help offset in terms of environmental cost dwarves dev cost.
Ceres Imaging (Aerial and satellite imagery analytics for farming), Convoy (Trucking load auctions), etc. There are plenty of companies doing very real work that need this kind of heavy numeric lifting.
Very cool examples. Thank you for sharing. I'm going to read into them. I'm not familiar with any web companies using this technology so it'll be interesting to dig in.
Flask seems to be a very stable and feature-complete framework (I see about 3 commits per year for the last few years).
At this point isn't it easier and just as safe to manually review the code, pin the hash in a lockfile, and manually review the rare changes than it is to rewrite everything?
Why do you need numpy for web?
Edit: I will concede that there is no point in retooling ML. Web is an entirely different circumstance though.