Moreover, quant-econ, an academic entity dedicated to scientific programming in the social sciences has a few notebooks in python and Julia 
Is there some way to play with these models? Or do you need lots of input data, which isn't freely available?
The reason that is is that purely statistical/predictive models of macro are subject to the Lucas Critique . It says roughly, if you observe an economic relationship is happening, but you don't have an explanatory reason why it is, it's a bad idea to use it for policy prescription.
The NY fed model has a whitepaper here , which should be accessible to the technical data scientist (technical, but not prohibitively so).
Their readme points to a few posts on using it, I think the open source code comes with a csv for example input data. There should be publicly accessible macroeconomic data in a few places for you to play with it, say at FRED or the World Bank. I think Julia has a Stata-style api package for FRED data, making the data processing easier.
The estimation however, is usually based on Bayesian hierarchical models, taking the constraints imposed by the theoretical model, I believe.
Here is a good source for that
DSGE have been criticized for their low predictive power, but it can be said that in that for mid to long term predictions they are more robust for monetary and fiscal policy use.
That's fine, but I prefer my models to be language-independent.
1. An incredibly powerful tool.
2. No bar of entry (cost aside, true in corporate environment).
3. Very gradual learning curve.
4. The efficiency gain vs time invested is exponential.
Power Excel users, much like their VIM/Emacs counter parts don't use a mouse. It is just keyboard short cuts . This makes them insanely productive.
Excel is something managers and executives can understand, so it became the default language for data analysis. Now technologists trapped using it have to create ex post facto justifications for why it's really "just misunderstood."
Excel is massively slow, makes it easy for beginners to make massive mistakes, computes lots of types in very odd manners, performs floating point operations wrong, and leads to spaghetti code that is a rat's nest of incomprehensible cross-references.
Worst of all, the lack of code path visibility usually leads to a bus factor of 1.
Sure, one can learn to operate Excel for data analysis with a decent level of efficiency, in the same manner one can cross the Pacific in a canoe, but both are still terrible tools for the job.
It would be interesting to see if anyone could get some power Excel users together and construct a next-generation spreadsheet that encouraged better practices and worked to prevent huge messes. Spreadsheets are like SQL, where the initial release was so far ahead of its time that it managed to entrench itself into the very fabric of computing, even though it's long overdue for a reimagining.
Hotkey training built into Excel
Python as an optional language along side VBA
Proper Data Tables with Types and Indices, or even SQL in Excel.
Regex Search over Columns
PowerPivot use case training
Web publishing of reports made stupid easy
As for SQL, you can use Data Connections from the GUI or ADO with JET/ACE in VBA to query Excel sheets, CSVs, etc as you please.
Regex can be used with the VBScript.Regexp object, but it's a slightly funky (perl-like) syntax and not a great implementation.
There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements.
SQL is based on relational algebra -- so it's the model with the best theoritical justification out there, even if the syntax could be improved.
It's the other ad-hoc solutions that is crazy that they keep getting suggested. SQL/RDBMS were invented because we had those (key stores etc, tree dbs) and they were crap.
That thought has made me wonder if logic programmming has something to offer in the design of new APIs.
More controversially, I question the entire intent of making the core query language something that is putatively declarative, but then in practice often requires extensive engine-specific annotations to tell the engine how to actually do the query. (More on that https://news.ycombinator.com/item?id=3506345#3507281 ). I think RethinkDB's query language was much more imperative, because of the level of development resources they had, and I bet it actually worked out OK. However, even if I could not sell the development world on making SQL++/SQL-replacement non-declarative, we certainly could do a better job this time around of separating query strategy from query contents in some deliberate manner, rather than hacking crap up.
Imagine if, for instance, you could feed the query optimizer a query, get back a query plan that was actually manipulable and executable, tweak that to your tastes, and then send it back to the DB, rather than working via hints and circumlocutions and hopes and dreams.
It would also be nice if SQL were more composible. The serialized version of SQL is not practical to use string manipulations to combine two queries into a larger query. Many languages have libraries that permit this, but they're always second-class citizens. If I were redesigning SQL I'd want something that handled this more cleanly. I'd seriously consider something RethinkDB-esque in the sense that it didn't have an "english" serialization, but was purely symbolic, leaving it to language authors to figure out how to best represent it in the local language.
Also, bear in mind that most if not all features I describe in this post exist in databases already. (Not sure about that last one.) What I'm saying is that SQL integrates poorly with all that, not that the features don't exist. Recursive queries and common table expressions also seem ripe for some serious rethinking. Plus I think for a long time SQL really kinda limited the sort of DBs that would be produced because if a feature integrated poorly with SQL, it was a lot less likely to come out. (In particular, structured cells took IMHO forever to come out. Possibly the massive market failure of "object databases" also scared DB developers off from that feature too, though. They aren't the same thing but may be closely enough related.)
The barriers to moving beyond Excel can be overcome, but it will take some serious effort on many fronts. Both Excel and SQL embody genius concepts, but are such poor implementations that it is easy to conflate the cruft with the advantages.
Regarding your "stockholm syndrome" comment above: Someone in his car hears a PSA about "some guy wrong-way driving" on the very road he is on and thinks "one? hundreds!". Unless you can beef up your argumentation you are that guy.
That's fallacious too. I can be right, even if my argument is incorrect or unconvincing.
Warren Buffet and Nate Silver are both driving against traffic and both of them are righter than everyone else combined.
> SQL [...] cannot be compared to Excel
What Excel and SQL have in common is that they're both a first attempt at a solution to (different) problems, and they've been too successful to properly iterate on. That's why everyone uses some proprietary extensions to SQL and everyone extends Excel with VB or C#.
For clarification, is the GP referring to Microsoft SQL Server when then they say 'SQL' or do they actually mean SQL?
Microsoft's product naming convention is confusing IMHO.
I can still take my ad hoc SQL query data and run decent analysis and produce graphical summaries in less time than it would take me to setup the boilerplate I'd need in C#.
Arguably something like Matlab or R would be similarly quick for a lot of things - but I'm not even slightly sold that they are safer based on my observations of their use. I've certainly seen plenty of formal code that was less readable than a decent spreadsheet.
I'm not really a fan of excel tools, and tooling. VBA has made me want to actually smash my computer in the past. But to claim that it isn't incredibly powerful at working with a few megabytes of raw data is flat out wrong.
I'm just a lowly DBA re-posting and summarizing comments  for karma.
All I see is the same old Martin Shkreli video that has been floated around before, and all you see him do is 'Vim' around as he explains his thoughts -- not on Excel, but on company financials.
Also, if you post a lopsided list of pros, it makes sense to the audience to see someone else post a lopsided list of cons. But then you reply with pettiness. Why?
It makes it slow, but usually for these sorts of things you want your data to be available on the scale of days, not nanoseconds so it works out. We have a complicated grading sheet here that manages all of the students information all one the same sheet in a google drive. Grades, attendance, recitation attendance, and at the end of the class homework.
It also verifies the test answers against the correct ones to make sure we score exams correctly.
Excel is great for fixed sets of data that need simple map/reductions & input verification because that's all we really use it for. After that move to something like Python&Numpy/R/Julia/Matlab.
Would love to work with a replacement, even if it is some sort of Pandas/Python/Matplotlibb derivative - but it takes too long to set up things with these tools, and it seems not all operations are as trivial as I want them to be.
Did it have bugs? No-one knew.
And there is the reason Excel (and spreadsheets in general) are dangerous.
Another two: Onboarding Process.
Even if they have a fancy tool, someone is using an Excel spreadsheet to figure out how to subvert it.
- Resolver One ( https://en.wikipedia.org/wiki/Resolver_One )
- Project Dirigible ( https://github.com/pythonanywhere/dirigible-spreadsheet )
However, it's not all sweetness and light. Excel even gets some basic calculations wrong - and those ignorant of its quirks happily propagate those errors. More problematically, it can easily be pushed to the degree that your modeling is really beyond the tool or spreadsheet's design strengths without knowing it. And debugging is a pain in the ass. As a result lots of erroneous outputs get presented as meaninful.
Does this actually mean anything?
Probably true for smallish values of competency, but it must be logarithmic after that.
OK, does this mean anything? How have you quantified efficiency? How have you quantified "learning curve"? What data do you have supporting that the relationship is exponential?
Nobody here has been able to elaborate on the initial statement "The efficiency gain vs learning curve is exponential". People are just rewording the sentence slightly and passing that off as an explanation. That seems to indicate that nobody knows what the statement means because the statement is vacuous.
This is true! Then you hit a pretty hard wall with the limitations of the tool.
Somehow, sloppily, "steep" has come to mean difficult to learn, rather than quick to learn.
In the original version:
A steep learning curve means quick learning at the beginning. A shallow curve means that it takes a long time to build up skill.
If all the data you receiving is also coming to you as an Excel format (csv, xls, xlsx), but with major differences in formatting, or wholly inconsistent formatting. Now you have a multi-month long project just to have a consistent import script. Replacing a 1 second task done 2-3's times a day with a 4month project has an ROI on the scale of decades. Not worth it.
Then you add visualization. What is 3-4 keystrokes in Excel is a lot of back of forth, learning a new library, ensuring it works on your system. Vetting the visualizing, dealing with that weird bug on the triple line double axis line chart.
Then you have to validate integer handling and mathematics to ensure your newly written Python, Julia, etc. handles the same as your well vetted Excel Spread Sheet.
Replacing that one slow bloated spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.
And now you have a scalable system. You can go from something one employee takes all day to look at 2x/day, to something anyone in the company can see in real time on a dashboard of some sort.
Is that worth it? Depends
Of course that raises the question would any other software environment have a lower error rate?
Quick and very, very dirty.
You have a Turing Complete spread sheet.
Excel is useful in one particular case only: when you don't want to build a GUI. It's great as a not-very-pretty interface for functionality written in DLLs.
For any process that's well thought out, you can write a Python script if it's not time critical. And it probably isn't if you were doing it in Excel.
The main problem with Excel is it's too easy to write an ad-hoc fix. Sounds like a weird reason, but in finance they just pile up and up and up. Finance Excel users also tend to know just enough coding to dig a huge hole, and just little enough to not understand this. Soon you have an unauditable mess, and the business is almost never going to spend time paying up technical debt.
There's also the philosophical issue of ever more complex models. If you have some sane coding practices, you will tend to favour more elegant code. Balls of spaghetti are more obvious in something like Python. More elegant code is connected to more elegant models. Inelegant models, such as the ones often bragged about by M&A guys (let's be honest, they're sales tools, not predictions) when written into an ordinary language, will look like the balls of spaghetti that they are.
Ended up putting it all in a database and developing an excel add-in to pull it from the database as array formulae. Used a great library called Excel DNA to develop the add-in using C# if anyone is interested.
So you could build a sheet that pulls in portfolio holdings for yesterday where yesterday updates each day and then compute performance and risk stats referencing the data cells in the sheet and it would all update.
In that context it was just an easy way to build reports pulling data from a database but same applies to quickly doing one-off analysis in Excel pulling dynamic data from the database - guys in finance tend to not be programmers but they're really good at Excel.
The add-in approach was really useful too because you could create function that returns the holdings of a portfolio to an array of cells (an array formula) and have a drop-down box with all portfolios that fed the input of the formula so that when you change the combo box, it changed the portfolio data and then everything recalculated off the back of that :)
He worked for them till he got out on his own. All his backtracking software is written by him and is in C (nice GUI, graphing feature, etc). He uses it to find his edge.
His trading platform is Excel...Obviously he doesn't do HFT...his trades are measured in days.
I know - 1 data point, but if a software engineer who is better than me in both trading & coding is using Excel, I'm not going to knock it.
But pretty soon you are mired in spreadsheet hell. Nothing can be seen or understood, everything is invalid or valid - who knows and worst of all when something stops working you don't know why.
And you don't know when it will stop. Goodbye agility!
Any spreadsheet with more that 2 days of work to reproduce it should be counted as IT and put on a formal risk register until it is recoded and removed. But dream on..
Bonus points for the facilitation of any type of documentation, automated testing, or version control.
There can be no god.
It also feels like a smaller language in many ways. You don't have to keep as much in your head, which makes it more effective to keep up with.
The killer features for me, automating a lot of tasks is the awesome integration with the shell environment. Running and combining Unix commands is very elegant and getting hold of the output is done really smoothly.
Secondly what I think saves a lot of time is that it really has ALL BATTERIES included. With Python that feels more theoretical. With Julia you just start writing the name of a function you think should exist in the REPL and it pops up. With Python you got to start guessing which package it might be located in. And that goes for stuff you use all the time like string manipulations, regular expressions, file reading and writing, process input and output etc.
With Python functions often end up having unnatural names because you can't have naming conflicts. Julia handles functions with the same name but different argument types elegantly.
Also there is no schizophrenia wondering if something is a function or method on an object. Everything is a function so that is easy.
* Julia Con 2015 India - https://www.youtube.com/playlist?list=PL279M8GbNseuhXmZk9rWM...
Overview - http://julialang.org/blog/2016/09/juliacon2016
"Julia: to Lisp or not to Lisp?" from the European Lisp Symposium in May 2016. I found it was rather illuminating on Julia in particular and lisps and programming languages in general.
And nearly always your bottleneck is not the language itself..
Julia is just a small step forward. It still has performance problems where the JIT can't help much - like the DataFrame structure, which is effectively a black box and therefor hard to optimize.
The main reason I like the idea of Julia is that I don't see the problem of parallelism being addressed in python, at least not without cumbersome libraries (disclaimer: I haven't tried dask yet). If python had an equivalent to
#pragma omp parallel for
Sorry if I implied otherwise - what I meant is that the current DF structure is hard for the JIT to optimize, not that this is a fundamental limitation of Julia. Just that JIT's are limited. I shouldn't have said Julia is only a small step forward, I think. A JIT gets you a long way.
> There is a lot of work being done to develop a new API that has none of these problems and provides very high performance, which demonstrates that Julia's JIT is up to the task so long as you choose an appropriate API.
Can you link to information on this? I'd love to see information on the design around this.
This blog post is a good starting point on the current shortcomings and design solutions of dataframes in Julia: http://julialang.org/blog/2016/10/StructuredQueries
Here's to hoping it'll be put to good usage!
Everyone else has more reasonable performance constraints.
: I can't find my source right now, but I remember seeing a credible estimate that the HFT industry in the US takes in $10 billion in revenue. That's a drop in the bucket of an industry that's measured in trillions just in the US.
Intuitively this makes a lot of sense because HFT strategies make a tiny amount of money per trade and have to do a ton of trades across a ton of different securities—which means the capacity of HFT strategies is relatively small.
Maybe not in terms of earnings, but in volume. Estimates range from 30-70%, depending on the market.
I'm digging up references now for an edit...
Here's a thought experiment: how much would things change if HFT firms traded half as much, with double the margin on each trade? Some securities would be a bit less liquid but otherwise almost nothing would change. It certainly wouldn't be a 2x difference from the status quo!
Furthermore - HFT is perhaps the worst case for Julia, since the implementation will always be in low level languages. Julia's speedup is only useful for research in HFT.
I'd expect most of the Julia usage in this space would be to replace any R or Matlab in the back testing engines if at all.
I didn't get the impression from the article that they were really talking about low latency anyway.
That's not to suggest that Julia doesn't have a place in Finance. I think it's a great language and there are a bunch of places I can see it being very valuable. Just doubt that low latency fast path is one of them in the near term.
Learning your second language is much easier than learning your first. You'll be more familiar with programming concepts and terms that make it easier to search for how to do X in that language.
Julia is still relatively new and not quite ready (in my opinion) for new programmers. Things change relatively quickly and I think that could add a lot of confusion that gets in the way of learning.
That being said, I think Julia is a pretty great language. I really love the way it has grown and I'm excited to see how it matures.
Data Smart: Using Data Science to Transform Information into Insight
All the work is done in Excel. I enjoyed the book quite a bit. Author is Chief Data Scientist for MailChimp.
There's a (very) quick intro here:
You can also look into quantopian for practice, but I don't recommend actually trading money.
Then learn Python, it is mature and has way more study material.
Not to mention Machine Learning libraries like TensorFlow, Scikit Learn & math libraries like Numpy, Pandas, Matplotlib?
Everything there is just set.
I once tried to learn Julia, but the community is just to small to get help at times. It is a work in progress but I recommend you to keep an eye on It.
Think of Julia as a future successor to Python, but not a current replacement.
Having to stick python scripts in front of all my Julia scripts everywhere that it would perform data access doesn't really seem viable. Thus, my beef with Julia.
Most string functions operate on AbstractString types. You can create whatever subtype you like implementing alternative string encodings.
In this case, it appears Julia 0.5 has a transcode method supporting UTF8/16/32. 
If not, then it also begs the question - if it's just that simple, why's it not already there in the core alongside UTF-8?
Anyone new to the language working through data problems is going to pick it up and start at the data collection/wrangling stage, and immediately run into the 'you can implement that yourself/run it through a converter' mentality, and likely go back to whatever they were using before. I know we did.
"We've been spending a lot of time working with firms to replace K code."
Maybe the small percentage that uses K/Q/KDB+ are moving to something more readable?
https://www.youtube.com/watch?v=QRWZBWwBVR0 (I can't listen to the audio now so I'm not sure how much I misparaphrased.)
And, you're right, I shouldn't have conflated KDB+ with K.
> if you learn the language, you'll find it perfectly readable, elegant even.
Just comparing K to something like Python.. Though, personally, it only became (somewhat) readable after playing with APL and J for a year.
When competing for talent and desperate for maintainability, companies are unlikely to introduce tools that are not the industry standard.
Writing good code in R is an effort, it's possible but the language doesn't make it easy. Julia makes me write good code off the bat and facilitates writing excellent, elegant code.
Letting R be the first language I've gotten pretty good at is really spoiling me.
Anyone have experience with it?
I've heard plenty of buzz about it in the HPC community but haven't had a moment to play with it yet.
That says, it's less important than you could possibly state.
But Python is the modern way to go, so every shop stresses that they use it and they expect every new hire to master it.
could anyone give a performance test of it?
The julia docs mention that Fortran arrays can be indexed with various starting numbers. I've programmed and maintained millions of lines of Fortran over 20 years and I've never seen anybody use anything other than 0 as the starting index.
Anyway, I'm not defending my decision as a globally correct one- it only matter to me.
However, in order to do so, they have to give up the generality of either language, and focus on a particular domain (numerical computation and web servers respectively). And because of this, the language never develops a big enough ecosystem. Because of this, I think Python combined with native C libraries is ultimately the better way forward and will be more successful.
If they are trying to express ideas - wouldn't any language do just fine?
They can 'compile' in an 'optimized manner' using something else ...
But in terms of algorithmic expression?
Unless they are writing true 'real time' stuff right for the chip ... why a new syntax?
(3) What is "new" syntax. All languages have variations in syntax. Wherever Julia differs in syntax from whatever you know it isn't just for the hell of it. There are usually good reasons. Mostly Julia tries to look similar to programming languages familiar to its target group. Where it differs I'd say it is usually to improve on deficiencies in said languages.
I am an iOS developer but I write smaller scripts to do custom processing of source code, assets etc. I wrote my scripts in Julia first, but decided to convert to Python afterwards to make it easier to distribute to other people, given that Julia does not come pre-installed.
I got to say Julia is WAY WAY nicer to program than python for shell script like stuff. I kept hitting on so many issues when converting to Python that I kept a long list of issues with Python. My takeaway from this experience is that a LOT has happened since Python, Ruby etc was conceived. Julia has been able to tap into all the advances that has happened since then. It has a much smoother REPL environment, much more sensible set of built in functions. More sane naming conventions. And the libraries are built to combine effectively in much better way. You simply need to write much less code when writing Julia because stuff combine so well. While in Python I find myself having to write a lot of glue code to make pieces fit together.
So the question is, why stick to a language just because it is well established. I had many more years of experience with Python. But I get the job done much faster with Julia. So why waste extra time on Python just because it is established? That has no inherent value.
You answered the question yourself:
"but decided to convert to Python afterwards to make it easier to distribute to other people"
Languages are platforms. They have vast distribution, incumbency, developers, apis etc. - and that has a lot of value.
Imagine if you did dev on Julia - how do you hire devs?
Julia has a very, very small mind-share - in order for it to reach 'platform status' it really has to cross a threshold, and for that - it has to provide significantly more value.
I'm also curious to know 'how much faster' it is - couldn't those performance problems be solved with more horsepower?
Developers have a terrible habit of over-optimizing for performance, which usually comes at the cost of complexity.
Anyhow - I know little about Julia so I'll hold my tongue, maybe it's the bees-knees ...
It'd be great to see some data points.
Julia doesn't take long to learn if you've ever written any Python, or Matlab, or R. It's very good at calling into existing code written in other languages (the above, and C) so it's possible to transition gradually.
> I'm also curious to know 'how much faster' it is - couldn't those performance problems be solved with more horsepower?
Depends on the exact type of computation. If everything you're doing is straight BLAS/LAPACK or other library calls, then Python is just glue code and not taking much of the time. If you're implementing your own custom algorithms (or writing libraries) that are difficult to express in a vectorized way (where numpy just does the work in C) or you want more customized data structures and types than dense arrays of single or double precision floating point numbers, Julia will do well. There are various projects out there that try to make Python faster (PyPy, Numba, Pyston, Pyjion, the list seems to grow every few months) but it's a very difficult problem to do in full generality since you either have to restrict yourself to a subset of the language where the semantics permit optimizations, or lose compatibility with all the C-extension libraries you're used to. Julia has designed the language semantics and the conventions of the standard library to be more amenable to optimization. It's early days yet in terms of multithreading support, but Julia has a better story there as well.
For a representative performance comparison case study, see section 4 of https://arxiv.org/pdf/1312.1431v1.pdf. There have been similar experiences reported and published in other application areas since.
Thats actually the issue Julia attempts to solve as highlighted in the article. There are many languages that you can use to express ideas, but when you need the idea to run as fast as possible, you have to rewrite in C/Fortran/etc and then add bindings to to the language.
Julia lets you write inline C, which I believe means that you can still express algorithms simply, but with some parts (trivially) optimized. (Not a huge julia user, but thats my take on it)
For instance, the standard library of Julia  is written in Julia itself (and is very performant) and only calls into external C or Fortran libraries where there are well established code-bases (e.g. BLAS, FFTW). Compare this to, e.g., Python or R where much of the standard library is written in C.
And if you have to be able to write 'some parts' in C for speed ... then why not just use C? And have nice libs for whatever you are doing?
Or intelligently distribute it across processes?
I find it hard to believe that the 'financial industry' has performance requirements that have never been encountered in the entire rest of high tech before ...
Well compilers are tied to the language they need to compile so language choice does still matters. As an extreme example, GHC can be much more aggressive with inlining because it's pure, whereas most other languages don't provide the same guarantees.
> And if you have to be able to write 'some parts' in C for speed ... then why not just use C? And have nice libs for whatever you are doing?
Yes I think they could do that, but then they'd need people to write and maintain those libraries. My understanding is they want to find something which strikes a better balance between productivity and speed. Having to write your own libraries to do everything is likely considered a productivity loss.
I think this is what blaze and pypy try to do with python: they compile it so that certain things are optimized. The harder part is that you have to generally hint for the compiler to truly be the best...
WRT to just why not use C, python/R/Julia are MUCH easier to prototype in than C (in my opinion, having used those languages). The "grail" is to be able to prototype quickly, and then identify the hotspots to speed up (which, in Julia's case, would be just inlining with C code).
You can't get something for nothing, but it's a really good start.
now it is open sourced in github: