A spreadsheet allows for semi-automation of data processing. Each cell can have a (rather simple) function defined and its evaluation result is printed in that cell. You can actually build up pretty complex workflows by just concatenating cell evaluations.
To give a more concrete example, think about a loop. It is arguably the building block of any computer programming language and a necessary cornerstone in learning how to program. Both notebooks and spreadsheets implement loops differently. You can code a loop in a notebook but the cell output will be difficult to interpret (think of it having to fit a linear model for 5 different outcomes). You would be better off just splitting up the cell and run the models separately. That will allow for commenting the code and explaining the results, just like you would in writing a paper. In a spreadsheet, you would define a function, then copy/paste it for the cells you want it evaluated for. No programming required, just knowledge of how to link to cells from within a function and how to copy/paste in the spreadsheet. That's why spreadsheets are wildly used by non-technical people with little knowledge of computer programming.
If you have more data, Notebooks can handle that better. However I've noticed lots of colleagues skipping notebooks and using IDEs instead. Much easier to work with and better for scm. I'm not a huge fan of notebooks any more.
Now that's quite a generalization... most tasks? If all you're doing is a=b+c then perhaps. I work in HFT and even for trivial data exploration I would never even consider touching Excel; why would I? Even if it's just 100 rows. No thanks. Once you're comfortable with Python / its scientific stack, the exploratory part of data analysis becomes fast and trivial.
What I would like to see is notebooks becoming more IDE-like. This is already happening gradually, e.g. with JupyterLab replacing Jupyter notebooks ()
Analysts that used to work in excel are moving their models into environments like these. Libraries for most common functionality are provided, and allow someone with only a bit of VBA knowledge to feel comfortable enough to start working with python.
And when you browse places like r/financialcareers, it's filled with finance students wondering which programming languages they should learn. And the answer is always to learn python using jupyter notebooks.
Notebooks are cells of logic. You could conceivably change the idea of notebook cells to be an instance of a function that points to raw data and returns raw data.
Perhaps this just Alteryx though
I'm picturing the ability to write a Python function with the parameters being just like the parameters in an Excel function. You can drag the cell and have it duplicated throughout a row, updating the parameters to correspond to the rows next to it.
It would exponentially expand the power of excel. I wouldn't be limited to horribly unmaintainable little Excel functions.
VBA can't be used to do that, can it? As far as I understand (and I haven't investigated VBA too much) VBA works on entire spreadsheets.
Essentially, replace the excel formula `=B3-B4` with a Python function `subtract(b3, b4)` where Subtract is defined somewhere more conveniently (in a worksheet wide function definition list?).
The ubiquity of Excel is both a blessing and a curse in that everyone has it, so everyone uses it, regardless of whether or not it is the best tool for the job.
as of now jupyter/ipython would not recompute `subtract(b3, b4)` if you change b3 or b4, this has positive and negative (reliance on hidden state and order of execution) effects.
I too would really like something like this, but I think it is pretty far away from where jupiter is now.
> Traitlets is a framework that lets Python classes have attributes with type checking, dynamically calculated default values, and ‘on change’ callbacks.
> Traitlet events.
Widget properties are IPython traitlets and traitlets are eventful. To handle changes, the observe method of the widget can be used to register a callback
You can definitely build interactive notebooks with Jupyter Notebook and JupyterLab (and ipywidgets or Altair or HoloViews and Bokeh or Plotly for interactive data visualization).
> Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting, and filtering controls, as well as edit your DataFrames by double clicking cells.
Qgrid's API includes event handler registration:
> neuron is a robust application that seamlessly combines the power of Visual Studio Code with the interactivity of Jupyter Notebook.
"Excel team considering Python as scripting language: asking for feedback" (2017)
OpenOffice Calc ships with Python 2.7 support:
Procedural scripts written in a general purpose language with named variables (with no UI input except for chart design and persisted parameter changes) are reproducible.
What's a good way to review all of the formulas and VBA and/or Python and data ETL in a spreadsheet?
Is there a way to record a reproducible data transformation script from a sequence of GUI interactions in e.g. OpenRefine or similar?
"Within the Python context, a Python OpenRefine client allows a user to script interactions within a Jupyter notebook against an OpenRefine application instance, essentially as a headless service (although workflows are possible where both notebook-scripted and live interactions take place.
Are there data wrangling workflows that are supported by OpenRefine but not Pandas, Dask, or Vaex?
Not sure how they can converge.
This kind of thing exists at a larger scale for pipeline visualization. I could see it working for notebooks.
I wish I had these tools when I was a student (lectures laid out as notebooks that you can interact with to see how the graph changes).
Of course just reading through or listening to clear explanations is still key.
of course ymmv according to prof and institution
I like the concept of Notebooks a lot, but you have to be careful that students aren't getting slightly flashier presentations that come with confusing installation woes.
This gives me the sense, personally, that economists aren't interested in making accurate predictions about the world. Other fields would, I think, test their theories against observations.
There's a lot of structural econometric papers that do exactly what you ask, but you need graduate level statistics and a deep understanding of discrete choice, identification and simulation methods.
Structural econometrics is a field where PhD students, in their 5 year of study, usually produce only one complete study, if that.
I agree with the sentiment that if work is messy, teaching should have messy as well. But not when you're starting out with new tools.
But if you compared these notes to the notes for a college level physics course, you would find a similar level of abstraction, idealized models, and absence of real world data. Those things are not in themselves indicators that physicists (or economists) don't care about the real world. In any mature field, there is a body of knowledge and techniques to be learnt. There's a certain formalism to be picked up, rather than just staring at data.
There might be legitimate reasons for dismissing the general approach taken by mainstream economic theory, but what you seem to be saying ("hmmm, my intuition is that this stuff doesn't focus enough on accurately predicting the real world") is not a reasoned critique.
pandaSDMX can pull SDMX data from e.g. ECB, Eurostat, ILO, IMF, OECD, UNSD, UNESCO, World Bank; with requests-cache for caching data requests:
The scikit-learn estimator interface includes a .score() method. "3.3. Model evaluation: quantifying the quality of predictions" https://scikit-learn.org/stable/modules/model_evaluation.htm...
statsmodels also has various functions for statistically testing models:
"latex2sympy parses LaTeX math expressions and converts it into the equivalent SymPy form" and is now merged into SymPy master and callable with sympy.parsing.latex.parse_latex(). It requires antlr-python-runtime to be installed.
IDK what Julia has for economic data retrieval and model scoring / cost functions?
You say this as though using mock-up data to teach techniques isn't a universal practice in literally every other discipline.
Pretty much every course I took in undergrad physics had no real world data. The intro level courses were especially fun, when we'd go into the lab and get such horrible data that we'd never conclude what they're teaching in the theory classes. We wondered what the point of the lab even was.
The biggest offender is the friction model. Heck no - it's not proportional to the normal force. No one could successfully show that in the lab. And a quick Google search shows you a trivial experiment where just changing the orientation and keeping the normal force the same leads to wildly different frictions.
Ever taken statistics courses? You're not doing multiple regression analysis on real world data on day 1. On day 1 you're learning odds using playing cards and coin flips.
Curiously enough, my undergrad statistics textbook was loaded with problems where the data was taken straight from a journal paper. The book has poor reviews on Amazon, but I think it's the best I've seen.
You could test your own theory against observations that calculations with real world data are very much a part of economics, but are just not part of this particular course.
Of course, it depends on who they work for. Effectively, the American field of economics is an exercise in decoupling private reality from public theory.
Thank you so much for sharing.
On the other hand, if you don't do any quantitative, empirical, or experimental economics -- i.e. you only do theory or political econ -- then you won't pick up these skills (as much).
You would see a difference in that these sort of models are used for causal inference and counterfactual analysis, whereas Machine Learning is mostly predictive.
That being said, Machine Learning is starting to apply methods developed in econometrics and/or stats, like GMM and Time Series methods.
For example, Long-Term Memory models are quite recent additions to Machine Learning. The short-memory process restriction of autoregressive models has been worked on since the early 80's.