Hacker News new | past | comments | ask | show | jobs | submit login
A Look at How Traders and Economists Are Using the Julia Programming Language (waterstechnology.com)
399 points by ViralBShah on Nov 8, 2016 | hide | past | web | favorite | 196 comments



Always fun to point out is that the NY federal reserve switched their full scale model of the economy to Julia last year[1]. It's also open source, so data scientists interested in macroeconomics can feel free to play around with it.

Moreover, quant-econ, an academic entity dedicated to scientific programming in the social sciences has a few notebooks in python and Julia [2]

[1]https://github.com/FRBNY-DSGE/DSGE.jl

[2] http://quantecon.org/notebooks.html


Interesting. What quantities do economic DSGE models measure for example? And how accurate are these models?

Is there some way to play with these models? Or do you need lots of input data, which isn't freely available?


DSGE models aim to be explanatory models of an economy. They are not the best at predicting (bayesian VARs do better forecasting, there is a tutorial on it in my quant econ link) but they turn out to do forecasting decently well, too.

The reason that is is that purely statistical/predictive models of macro are subject to the Lucas Critique [1]. It says roughly, if you observe an economic relationship is happening, but you don't have an explanatory reason why it is, it's a bad idea to use it for policy prescription.

The NY fed model has a whitepaper here [2], which should be accessible to the technical data scientist (technical, but not prohibitively so).

Their readme points to a few posts on using it, I think the open source code comes with a csv for example input data. There should be publicly accessible macroeconomic data in a few places for you to play with it, say at FRED or the World Bank. I think Julia has a Stata-style api package for FRED data, making the data processing easier.

Have fun!

[1]https://en.m.wikipedia.org/wiki/Lucas_critique

[2] https://www.google.ca/url?sa=t&source=web&rct=j&url=https://...


The foundation DSGE models are nothing like regular statistical or machine learning models. They are general equilibrium models, a type of mathematical model the falls under the microeconomics area of economics. DSGE try to predict how an economical system will evolve over time, given agents preferences, technologies and institutions, how much will be produced, consumed, traded, prices and how those behave over time, taking into account stochastic impacts, like idk oil price fluctuation. They are beautiful theoretically, since they are micro constructed. See, economists don't like to use aggregates only to make predictions and infer policy impacts due to something called Lucas critique ( https://en.wikipedia.org/wiki/Lucas_critique ) which basically means that taking an inflation series create an ARIMA(x,y,z) to predict next weeks CPI is theoretically. invalid.

The estimation however, is usually based on Bayesian hierarchical models, taking the constraints imposed by the theoretical model, I believe.

Here is a good source for that https://www.newyorkfed.org/medialibrary/media/research/staff...

DSGE have been criticized for their low predictive power, but it can be said that in that for mid to long term predictions they are more robust for monetary and fiscal policy use.


Wasn't aware DSGEs were often formulated in a Bayesian framework. Learn something every day.


Models are just datastructures and algorithms; presumably, you mean there is an implementation of those, and that the Fed uses that implementation as the 'official' one.

That's fine, but I prefer my models to be language-independent.


How do you have language-independent data structures and algorithms that you can run on a computer?


sorry, I should have made more clear: I want them to publish a model in a file format that can be consumed across a wide range of language. For example, YAML, protobuf, thrift, etc.


That covers the data structures. It doesn't help with the algorithms.


Algebra serves pretty well for that purpose.


Most financial firms need their quants to worry less about the new programming language hotness, and more about moving entire systems off unbelievably complicated Excel spreadsheets.


The reason for this is simple. Excel is

1. An incredibly powerful tool.

2. No bar of entry (cost aside, true in corporate environment).

3. Very gradual learning curve.

4. The efficiency gain vs time invested is exponential.

Power Excel users, much like their VIM/Emacs counter parts don't use a mouse. It is just keyboard short cuts [1][2]. This makes them insanely productive.

[1] https://youtu.be/jFSf5YhYQbw

[2] https://youtu.be/0nbkaYsR94c


This is digital Stockhold syndrome.

Excel is something managers and executives can understand, so it became the default language for data analysis. Now technologists trapped using it have to create ex post facto justifications for why it's really "just misunderstood."

Excel is massively slow, makes it easy for beginners to make massive mistakes, computes lots of types in very odd manners, performs floating point operations wrong, and leads to spaghetti code that is a rat's nest of incomprehensible cross-references.

Worst of all, the lack of code path visibility usually leads to a bus factor of 1.

Sure, one can learn to operate Excel for data analysis with a decent level of efficiency, in the same manner one can cross the Pacific in a canoe, but both are still terrible tools for the job.


Everybody's right. Excel is a powerful, flexible tool that also has almost no guard rails and all but begs people to make profound mistakes and huge messes. There's too many people who sneer at spreadsheets when they should be using them, and there's too many people who use them when they shouldn't.

No contradictions.

It would be interesting to see if anyone could get some power Excel users together and construct a next-generation spreadsheet that encouraged better practices and worked to prevent huge messes. Spreadsheets are like SQL, where the initial release was so far ahead of its time that it managed to entrench itself into the very fabric of computing, even though it's long overdue for a reimagining.


I'm an Excel power user. I think Microsoft is moving in the right direction, with the addition of Tables, PowerBI, PowerPivot and R in SQL Server. What I'd like to see in Excel is:

Hotkey training built into Excel Python as an optional language along side VBA Proper Data Tables with Types and Indices, or even SQL in Excel. Regex Search over Columns PowerPivot use case training Web publishing of reports made stupid easy


Both Python and R can be used in Excel via several addons.

Python:

http://www.python-excel.org/

https://www.xlwings.org/

https://datanitro.com/

R:

http://rcom.univie.ac.at/download.html

https://bert-toolkit.com/

However, officially MS have chosen to move ahead with Javascript for add-ins and VBA-like automation:

https://dev.office.com/docs/add-ins/develop/understanding-th...

http://rockthecode.io/blog/javascript-and-excel/

As for SQL, you can use Data Connections from the GUI or ADO with JET/ACE in VBA to query Excel sheets, CSVs, etc as you please.

Regex can be used with the VBScript.Regexp object, but it's a slightly funky (perl-like) syntax and not a great implementation.


Could you say a bit more about what you'd like to see in a "reimagined" SQL? Are there any serious efforts to replace it?

There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements.


>There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements

SQL is based on relational algebra -- so it's the model with the best theoritical justification out there, even if the syntax could be improved.

It's the other ad-hoc solutions that is crazy that they keep getting suggested. SQL/RDBMS were invented because we had those (key stores etc, tree dbs) and they were crap.


Visual query tools like Tableau don't seem to be going away. I'd love to see an effective open-source alternative to Tableau that doesn't require scripting your own D3 website.


Butler Lampson makes the point (in a recent set of slides) that relations are a good base for DSLs: they have enough complexity to model graphs, functions, sets etc.

That thought has made me wonder if logic programmming has something to offer in the design of new APIs.


There's Tutorial D but it probably doesn't qualify as "serious".

https://en.wikipedia.org/wiki/D_(data_language_specification...


SQL in fact deviates from true relational theory, in which the "cells" of a table could themselves have additional structure rather than just being "a string" or "a number". Cells could also be truly absent. SQL's NULL, while something you can make your peace with, could use some tweaking with 21st century experience. SQL's syntax has acquired a lot of cruft over the years to deal with new features... in fact in that sense it reminds me of the evolution of OpenGL and the way it acquired extension after extension until finally it needed to be broken apart into Vulkan and CUDA pieces (to brutally summarize the situation to the point of inaccuracy; please try to see what I mean rather than pick nits with that).

More controversially, I question the entire intent of making the core query language something that is putatively declarative, but then in practice often requires extensive engine-specific annotations to tell the engine how to actually do the query. (More on that https://news.ycombinator.com/item?id=3506345#3507281 ). I think RethinkDB's query language was much more imperative, because of the level of development resources they had, and I bet it actually worked out OK. However, even if I could not sell the development world on making SQL++/SQL-replacement non-declarative, we certainly could do a better job this time around of separating query strategy from query contents in some deliberate manner, rather than hacking crap up.

Imagine if, for instance, you could feed the query optimizer a query, get back a query plan that was actually manipulable and executable, tweak that to your tastes, and then send it back to the DB, rather than working via hints and circumlocutions and hopes and dreams.

It would also be nice if SQL were more composible. The serialized version of SQL is not practical to use string manipulations to combine two queries into a larger query. Many languages have libraries that permit this, but they're always second-class citizens. If I were redesigning SQL I'd want something that handled this more cleanly. I'd seriously consider something RethinkDB-esque in the sense that it didn't have an "english" serialization, but was purely symbolic, leaving it to language authors to figure out how to best represent it in the local language.

Also, bear in mind that most if not all features I describe in this post exist in databases already. (Not sure about that last one.) What I'm saying is that SQL integrates poorly with all that, not that the features don't exist. Recursive queries and common table expressions also seem ripe for some serious rethinking. Plus I think for a long time SQL really kinda limited the sort of DBs that would be produced because if a feature integrated poorly with SQL, it was a lot less likely to come out. (In particular, structured cells took IMHO forever to come out. Possibly the massive market failure of "object databases" also scared DB developers off from that feature too, though. They aren't the same thing but may be closely enough related.)


Most modern relational databases now allow the cells of a table to have additional structure through the use of SQL/XML. We can query into the contents of a cell using XQuery.


I fully concur. This is an excellent summary and suggestion for future progress.

The barriers to moving beyond Excel can be overcome, but it will take some serious effort on many fronts. Both Excel and SQL embody genius concepts, but are such poor implementations that it is easy to conflate the cruft with the advantages.


SQL is not an implementation but a specification and thus cannot be compared to Excel, a very specific implementation of non-monotonic dataflow programming.

Regarding your "stockholm syndrome" comment above: Someone in his car hears a PSA about "some guy wrong-way driving" on the very road he is on and thinks "one? hundreds!". Unless you can beef up your argumentation you are that guy.


> Unless you can beef up your argumentation you are that guy.

That's fallacious too. I can be right, even if my argument is incorrect or unconvincing.

Warren Buffet and Nate Silver are both driving against traffic and both of them are righter than everyone else combined.

> SQL [...] cannot be compared to Excel

What Excel and SQL have in common is that they're both a first attempt at a solution to (different) problems, and they've been too successful to properly iterate on. That's why everyone uses some proprietary extensions to SQL and everyone extends Excel with VB or C#.


> SQL is not an implementation but a specification

For clarification, is the GP referring to Microsoft SQL Server when then they say 'SQL' or do they actually mean SQL?

Microsoft's product naming convention is confusing IMHO.


Almost all of these are done or do-able, though. They're just not right there on the surface.


https://www.herculus.io/ was doing the rounds a few days ago - the idea seems to be a spreadsheet with a type system.


Excel is terrifying. Each employee has taken the same concept and written their own bespoke tooling around it which probably has at least one bug. These are "copy and pasted" around a bazillon network drives and then passed on to other people who will modify the undocumented process based on their best understanding of what they think it does (or what it was meant to do...?).

I can still take my ad hoc SQL query data and run decent analysis and produce graphical summaries in less time than it would take me to setup the boilerplate I'd need in C#.

Arguably something like Matlab or R would be similarly quick for a lot of things - but I'm not even slightly sold that they are safer based on my observations of their use. I've certainly seen plenty of formal code that was less readable than a decent spreadsheet.

I'm not really a fan of excel tools, and tooling. VBA has made me want to actually smash my computer in the past. But to claim that it isn't incredibly powerful at working with a few megabytes of raw data is flat out wrong.


You are projecting so hard I could show PwerPoints off your forehead. I take it you are rather unhappy in your line of work?

I'm just a lowly DBA re-posting and summarizing comments [1] for karma.

[1] https://news.ycombinator.com/item?id=12448545


Do these videos have anything to do with beginners making mistakes, floating points and other type conversions, bus factor of 1, spaghetti code, etc?

All I see is the same old Martin Shkreli video that has been floated around before, and all you see him do is 'Vim' around as he explains his thoughts -- not on Excel, but on company financials.

Also, if you post a lopsided list of pros, it makes sense to the audience to see someone else post a lopsided list of cons. But then you reply with pettiness. Why?


Lots of the tasks carried out in offices are not technical enough to suffer from the issues you correctly identify after a given hurdle. I work as an economist in a government department, and a lot of the analysis involves ad-hoc projects processing data from different sources and doing some basic plotting/elementary calculations. Excel is perfect for this, but if something is too technical/repetitive it becomes less suitable.


Not to mention excel errors have huge consequences.

http://www.bloomberg.com/news/articles/2013-04-18/faq-reinha...


you mean Stockholm?


I thought it was mildly witty, given the subject matter.


I can vouch for this. I thought spreadsheets where stupid and useless until I was forced to use them by my boss at the CS department here for grading. I didn't realize how easy having your computation and data being in the same place.

It makes it slow, but usually for these sorts of things you want your data to be available on the scale of days, not nanoseconds so it works out. We have a complicated grading sheet here that manages all of the students information all one the same sheet in a google drive. Grades, attendance, recitation attendance, and at the end of the class homework.

It also verifies the test answers against the correct ones to make sure we score exams correctly.

Excel is great for fixed sets of data that need simple map/reductions & input verification because that's all we really use it for. After that move to something like Python&Numpy/R/Julia/Matlab.


I used to work at an NLP company, and we made extensive use of Google Sheets for doing P/R/F calculations on the results of various tests. It was so useful.


The nice thing about Excel is that it does resemble a functional programming tool.

Would love to work with a replacement, even if it is some sort of Pandas/Python/Matplotlibb derivative - but it takes too long to set up things with these tools, and it seems not all operations are as trivial as I want them to be.


Yea, for exel you can use "=(function)" and you're done! In Google Sheets you can also add in additional JavaScript code to run in the sheets.


I know of a retail bank using a huge and complex Excel spreadsheet for their entire loan approvals process.

Did it have bugs? No-one knew.


> Did it have bugs? No-one knew.

And there is the reason Excel (and spreadsheets in general) are dangerous.


I've seen software specifically made to tackle such complex processes and they were buggy as hell. And I've seen 100+ connected Excel spreadsheets managing $500M+ yearly transactions of the buying department of a white goods manufacturer, it was a work of art.


Why would anyone do that? There are much better tools for handling complexity on that level. That's insane.


Two words: Corporate IT.

Another two: Onboarding Process.


I didn't realise you could connect spreadsheets together. Thanks for new knowledge!


could be the same Bank I know where they use excel for everything


I can personally guarantee you it is literally every bank.

Even if they have a fancy tool, someone is using an Excel spreadsheet to figure out how to subvert it.


Couldn't agree more, in Indonesia Oil and Gas Upstream Special Task Force use Excel for almost everything, we try to replace Excel with many good app, but alas, when an app failed we just switch to good old Excel, someone even created Montecarlo simulation in Excel!


Indeed -- the most compelling reasons to move trading desks off Excel are non-technical. Model auditability and traceability are no longer 'nice-to-haves' but are compliance requirements.


Is there any open source framework that has an excel-like GUI and good integration with standard coding tools?


LibreOffice, but it's not as good as Excel for the upper-end stuff. Microsoft doesn't have an empire built on Office by accident.


There used to be a couple of them for Python, but it seems they've died out.

- Resolver One ( https://en.wikipedia.org/wiki/Resolver_One )

- Project Dirigible ( https://github.com/pythonanywhere/dirigible-spreadsheet )


The efficiency gain vs time invested is not exponential. It's linear at best and plateaus after about 5k hours. It is this plateauing of the curve that is the biggest reason power Excel users move to R or Python. That was certainly my experience after eeking everything I could out of Excel in 15 years of trading floor fixed income. A visual 2d paradigm is excellent for quick productivity but is severely limiting as complexity and data size rises. Even with VB.


People really love it because they can do business analysis without hiring programmers or becoming them. And spreadsheets are a great model for a constrained set of problems.

However, it's not all sweetness and light. Excel even gets some basic calculations wrong - and those ignorant of its quirks happily propagate those errors. More problematically, it can easily be pushed to the degree that your modeling is really beyond the tool or spreadsheet's design strengths without knowing it. And debugging is a pain in the ass. As a result lots of erroneous outputs get presented as meaninful.


Can you version/diff it though? Basically can you have a semi-sane software engineering process when it becomes big?


You can track changes within Excel and with Office 2016, there is a comparison and merge tool included (that's been long overdue). But since the files are binary encoded, there is no external way to track changes unless someone wants to write up a parser for the XLSX format that can keep up with all the new features that MS adds every release.


For what it's worth, XLSX files aren't binary. They're just XML in a ZIP. They're not particularly nasty to diff once they're extracted - actually, I think they might do pretty well in Git.


The diff is actually not really straightforward - a sheet is two-dimensional and you have to work out column/row inserts/deletes (which themselves are intertwined) before you can even start looking at cell changes. So it's quite non-standard stuff.


Ever tried editing the xml and rezipping? Also on examination you might notice the odd binary blob in the xml..


.docx and .xlsx are zipped xml, there are git plugins that produce reasonable diffs


> 4. The efficiency gain vs learning curve is exponential.

Does this actually mean anything?


Yes. He is saying as you become more competent using excel, the efficiency increases exponentially.

Probably true for smallish values of competency, but it must be logarithmic after that.


To paraphrase for Guar, he's asking whether someone means "exponential" or exponential. By "exponential", I mean my feelings.


probably a sigmoid function


> He is saying as you become more competent using excel, the efficiency increases exponentially.

OK, does this mean anything? How have you quantified efficiency? How have you quantified "learning curve"? What data do you have supporting that the relationship is exponential?


Being obtuse is not a desirable attribute, less so deliberately.


Asking people to say clearly and concretely what they mean is not obtuseness.

Nobody here has been able to elaborate on the initial statement "The efficiency gain vs learning curve is exponential". People are just rewording the sentence slightly and passing that off as an explanation. That seems to indicate that nobody knows what the statement means because the statement is vacuous.


It's mostly hand waving, but I think the OP was trying to point out that at the low end, a modest investment in training/learning gives great results in efficiency.

This is true! Then you hit a pretty hard wall with the limitations of the tool.


I think it means the efficiency gain is exponential (assuming the learning curve is anything less than exponential).


In the traditional, original sense of the term, learning curves are (presumably) asymptotic to a horizontal line representing total competence.

Somehow, sloppily, "steep" has come to mean difficult to learn, rather than quick to learn.

In the original version: A steep learning curve means quick learning at the beginning. A shallow curve means that it takes a long time to build up skill.


You're just rephrasing nonsense. What is "the efficiency gain is exponential" supposed to mean, in concrete terms? This is just empty manager-speak.


/s


There are also many downsides


Yes and No.

If all the data you receiving is also coming to you as an Excel format (csv, xls, xlsx), but with major differences in formatting, or wholly inconsistent formatting. Now you have a multi-month long project just to have a consistent import script. Replacing a 1 second task done 2-3's times a day with a 4month project has an ROI on the scale of decades. Not worth it.

Then you add visualization. What is 3-4 keystrokes in Excel is a lot of back of forth, learning a new library, ensuring it works on your system. Vetting the visualizing, dealing with that weird bug on the triple line double axis line chart.

Then you have to validate integer handling and mathematics to ensure your newly written Python, Julia, etc. handles the same as your well vetted Excel Spread Sheet.

Replacing that one slow bloated spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.


> spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.

And now you have a scalable system. You can go from something one employee takes all day to look at 2x/day, to something anyone in the company can see in real time on a dashboard of some sort.

Is that worth it? Depends


Don't know if this is the best study, but it contains an overview of many studies that show that excel calcs tend to have huge error rates.

http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm

Of course that raises the question would any other software environment have a lower error rate?


You can say that about every technology but the question is if the good out ways the bad in your specific usecase.


All at the expense of reproducibility, testing, auditing.

Quick and very, very dirty.


Lol, seriously? Martin Shkreli? I had been wondering what hole he crawled into.


does it have regex search though?


It's pretty easy to get RegEx exposed in Excel via VBA/extensions


This is the big key. A few short VBA macro's can give you regex's and cell swapping. Now you can conditionally swap the programs in other cells, via the contents of other cells.

You have a Turing Complete spread sheet.


At one point, VBA in Microsoft Word was my only available programming outlet. I was able, easily, to access DirectDraw and create a faux screen saver. DLL imports are available in VBA, so the entire win32 API is available (in addition to the normal Office automation stuff like sending email, modifying spreadsheets, etc)


Yeah, in a similar situation right now and I find the DLL thing is relatively unknown and incredibly powerful if used well. Recently this 'robotics' evangelist keeps trying to rope us into spending a few mil on his automation and I keep showing him up by automating the same stuff right out of Excel for little to nothing.


I've worked in financial firms my whole career, and I agree.

Excel is useful in one particular case only: when you don't want to build a GUI. It's great as a not-very-pretty interface for functionality written in DLLs.

For any process that's well thought out, you can write a Python script if it's not time critical. And it probably isn't if you were doing it in Excel.

The main problem with Excel is it's too easy to write an ad-hoc fix. Sounds like a weird reason, but in finance they just pile up and up and up. Finance Excel users also tend to know just enough coding to dig a huge hole, and just little enough to not understand this. Soon you have an unauditable mess, and the business is almost never going to spend time paying up technical debt.

There's also the philosophical issue of ever more complex models. If you have some sane coding practices, you will tend to favour more elegant code. Balls of spaghetti are more obvious in something like Python. More elegant code is connected to more elegant models. Inelegant models, such as the ones often bragged about by M&A guys (let's be honest, they're sales tools, not predictions) when written into an ordinary language, will look like the balls of spaghetti that they are.


Excel is incredibly useful and powerful. This type of comment screams "I've never used Excel a day in my life for anything other than creating a table." It can handle very complex formulas, that are easy to follow, and the data manipulation and efficiency is amazing. Your argument for moving away from Excel is the same as those who don't develop and say everything should be done through a WYSIWG editor.


I've been around since Lotus 1.0, and worked in financial and engineering firms. I've seen cases where spreadsheets have been the right answer for knowledgeable and relativity sophisticated users, either to build a quick model or as a front-end, and cases where the result is an unauditable mess. Lots of oops when say accounting people don't understand say the math of partial-period NPVs, or are so innumerate that an obviously wrong result looks fine to them. Without the review process that should go along with production code, sometimes you get lucky, sometimes you don't. It all comes down to who is using the tool, I guess.


Excel use should have an inverse relationship with complexity. Just because you can make Excel handle complex formulas doesn't mean you should, or that's it's the right tool for the job.


Totally - I used to work as quant for a boutique asset manager and the whole business was running on spreadsheets.

Insane.

Ended up putting it all in a database and developing an excel add-in to pull it from the database as array formulae. Used a great library called Excel DNA to develop the add-in using C# if anyone is interested.


Just a question-did you ever consider using Jupiter Notebooks? Or RMarkdown Notebook?


Only jumped on the python train after I left finance but the reason we used an add-in is because you can build dynamic sheets with calculations that update when the inputs change (where the inputs were pulling from the database).

So you could build a sheet that pulls in portfolio holdings for yesterday where yesterday updates each day and then compute performance and risk stats referencing the data cells in the sheet and it would all update.

In that context it was just an easy way to build reports pulling data from a database but same applies to quickly doing one-off analysis in Excel pulling dynamic data from the database - guys in finance tend to not be programmers but they're really good at Excel.

The add-in approach was really useful too because you could create function that returns the holdings of a portfolio to an array of cells (an array formula) and have a drop-down box with all portfolios that fed the input of the formula so that when you change the combo box, it changed the portfolio data and then everything recalculated off the back of that :)


I know a software engineer who worked for one of the big quant firms in the north east (forgot the name...it's big) before people even knew what quants were.

He worked for them till he got out on his own. All his backtracking software is written by him and is in C (nice GUI, graphing feature, etc). He uses it to find his edge.

His trading platform is Excel...Obviously he doesn't do HFT...his trades are measured in days.

I know - 1 data point, but if a software engineer who is better than me in both trading & coding is using Excel, I'm not going to knock it.


It's all part of agility vs. efficiency, but not in the way people think! If people are heavily utilized they turn to Excel because they can get through a few simple things really fast. There's risk to using things like R or Julia because you can't see (literally) how to do it, or what you can do, and trying something different will earn you a rapid sacking at the hands of the super utilizers.

But pretty soon you are mired in spreadsheet hell. Nothing can be seen or understood, everything is invalid or valid - who knows and worst of all when something stops working you don't know why.

And you don't know when it will stop. Goodbye agility!

Any spreadsheet with more that 2 days of work to reproduce it should be counted as IT and put on a formal risk register until it is recoded and removed. But dream on..


There are many technical arguments on both sides, but the business argument i've been given is that Excel sheets are the most audit-able, especially when they are self-contained. Auditors like this, especially after Sarbanes Oxley. Excel sheets fall into a different audit classification as compared to "systems" (a python script might be considered a "system".)


IIRC, Standard Chartered achieved this by initially adding Haskell interop to Excel[0] and then moving to a custom GUI solution to replace Excel altogether.

[0]- https://www.youtube.com/watch?v=hgOzYZDrXL0


Does anyone know if the way complicated/advanced formulas are handles in libreoffice would make it less suitable for these tasks? They seem to have ironed out most of the bugs, so I wonder if it would be worth pushing these quants who can program into the libre scene to get them to contribute back to projects? Of course if whatever the backend is handling formulas in something like libreoffice truly is subpar, that will never happen.


I work in aerospace engineering, and it's definitely the same here. In scientific research or engineering these days, there are a lot of potential steps up from excel spreadsheet hell or spaghettified MATLAB code.

Bonus points for the facilitation of any type of documentation, automated testing, or version control.


Spreadsheets where functions are entirely based on cell formulas publishing to bespoke internal data busses with in-house plugins that randomly stop working.

There can be no god.


I really love Julia and see a bit potential outside of the scientific computing. I used to do all sorts of automations, smaller throw away programs etc in Python and Ruby, but I find Julia much more effective to work with.

It also feels like a smaller language in many ways. You don't have to keep as much in your head, which makes it more effective to keep up with.

The killer features for me, automating a lot of tasks is the awesome integration with the shell environment. Running and combining Unix commands is very elegant and getting hold of the output is done really smoothly.

Secondly what I think saves a lot of time is that it really has ALL BATTERIES included. With Python that feels more theoretical. With Julia you just start writing the name of a function you think should exist in the REPL and it pops up. With Python you got to start guessing which package it might be located in. And that goes for stuff you use all the time like string manipulations, regular expressions, file reading and writing, process input and output etc.

With Python functions often end up having unnatural names because you can't have naming conflicts. Julia handles functions with the same name but different argument types elegantly.

Also there is no schizophrenia wondering if something is a function or method on an object. Everything is a function so that is easy.


I would also like to use Julia more for "throw away" scripts and general automation. Unfortunately, startup time (and sometimes compilation) make this a less fun endeavor than with Python etc.


If you want to learn more about Julia, here are Julia Conf videos / slides from past years:

* http://juliacon.org/2014/

* http://juliacon.org/2015/

* Julia Con 2015 India - https://www.youtube.com/playlist?list=PL279M8GbNseuhXmZk9rWM...



Speaking of talks on Julia, I'll again recommend:

https://www.youtube.com/watch?v=dK3zRXhrFZY

"Julia: to Lisp or not to Lisp?" from the European Lisp Symposium in May 2016. I found it was rather illuminating on Julia in particular and lisps and programming languages in general.


Julia is a very interesting language but big data processing is not a matter of the programming language itself, but how your platform and your architecture is put together.

And nearly always your bottleneck is not the language itself..


Depends on whether you're optimizing for latency or throughput - and of course, in some systems, that decision changes on an instant-to-instant basis. For throughput, indexing and shuffling is probably going to be the bottleneck, and that's dependent on network and filesystem I/O. But for the low-latency path, choice of language absolutely matters, so if you want to be able to share code between the two you end up needing both a fast language and fast architecture.


I think that if this were true, the data science libraries in Python wouldn't all be written in C under the hood, linking to Fortran, etc. Language can have a huge impact on performance with big data.

Julia is just a small step forward. It still has performance problems where the JIT can't help much - like the DataFrame structure, which is effectively a black box and therefor hard to optimize.


If you need performance on these kind of structures, you can use structured arrays in python.

The main reason I like the idea of Julia is that I don't see the problem of parallelism being addressed in python, at least not without cumbersome libraries (disclaimer: I haven't tried dask yet). If python had an equivalent to

    #pragma omp parallel for 
I wouldn't have Julia ready in mind for my next project that requires creating a high performance algorithm.


This is not quite accurate as stated: the problem is the combination of a data structure that is not amenable to type inference with an API that would only be efficient if type inference had perfect knowledge of the contents of a DataFrame. There is a lot of work being done to develop a new API that has none of these problems and provides very high performance, which demonstrates that Julia's JIT is up to the task so long as you choose an appropriate API. Julia, like any language, has intrinsic limitations, but this is not a good example: it's an example of how good API design for Julia differs from good API design for other systems.


> This is not quite accurate as stated: the problem is the combination of a data structure that is not amenable to type inference with an API that would only be efficient if type inference had perfect knowledge of the contents of a DataFrame.

Sorry if I implied otherwise - what I meant is that the current DF structure is hard for the JIT to optimize, not that this is a fundamental limitation of Julia. Just that JIT's are limited. I shouldn't have said Julia is only a small step forward, I think. A JIT gets you a long way.

> There is a lot of work being done to develop a new API that has none of these problems and provides very high performance, which demonstrates that Julia's JIT is up to the task so long as you choose an appropriate API.

Can you link to information on this? I'd love to see information on the design around this.


>Can you link to information on this?

This blog post is a good starting point on the current shortcomings and design solutions of dataframes in Julia: http://julialang.org/blog/2016/10/StructuredQueries


Thanks a lot.


JIT's can only do so much by themselves, but the addition of generated functions in Julia gives the ability for library developers to modify generated code just before execution. It's almost like having a fully programmable compiler chain with a fraction of the work or effort.

Here's to hoping it'll be put to good usage!


Python numerical libraries links to LAPACK/BLAS/etc because that's how people used to do scientific computing before Python. Also for scientific computing you're mostly CPU bound which means that no one would use Python if the libraries were slower than Fortran or C. But big data problems are different in the sense that you're typically bound by IO, so while it certainly helps to have fast code it wouldn't be the first problem you'd want to solve.


I would say they link to LAPACK/BLAS because there's nothing to touch the combination of speed and robustness they provide.


It depends on what kind of data you're working with and what kind of operations you perform on it. If protein folding, then that's computationally expensive and the language will matter, doing mapreduce jobs, language matters little as the network overhead is going to be your bottleneck in a distributed environment.


I agree that this is generally true, but Finance is sort of a special case. In a world where firms pay the premium for colocated servers and direct data feeds in order to shave off a few milliseconds of latency there is also a strong push to use the vary fastest language possible + have the very best possible architecture, all to shave off as many extra milliseconds as possible.


That's only true for a very niche sort of trading (only high-frequency/latency arbitrage type stuff) which makes up a tiny portion of the financial industry[1]—and that tiny portion is certainly not using Julia for production trading systems.

Everyone else has more reasonable performance constraints.

[1]: I can't find my source right now, but I remember seeing a credible estimate that the HFT industry in the US takes in $10 billion in revenue. That's a drop in the bucket of an industry that's measured in trillions just in the US.

Intuitively this makes a lot of sense because HFT strategies make a tiny amount of money per trade and have to do a ton of trades across a ton of different securities—which means the capacity of HFT strategies is relatively small.


Good point. Although there are some other areas of trading where speed matters that aren't HFT. An example is the algorithms used to split up large orders into smaller trades in order to hide the position and have a minimum impact on the market price. These are often optimized for very low latency as well. Especially the more sophisticated versions that do things like halt execution in the event that there is news on the stock, etc.


HFT trading is very big.

Maybe not in terms of earnings, but in volume. Estimates range from 30-70%, depending on the market.

I'm digging up references now for an edit...


Volume isn't a particularly relevant metric here. Think of it as the resolution HFT firms trade at rather than a measure of the size of their trading activity. It's not a perfect analogy, but the point is that trade volume is not the best metric for understanding the scope and impact of HFT. It's definitely not an indicator that HFT makes up 30-70% of the finance industry in any meaningful capacity.

Here's a thought experiment: how much would things change if HFT firms traded half as much, with double the margin on each trade? Some securities would be a bit less liquid but otherwise almost nothing would change. It certainly wouldn't be a 2x difference from the status quo!


HFT having a lot of volume is irrelevant, what matters for Julia to be prevalent, is to have a lot of analysts use the toolset. You should measure the amount of analysts working in HFT vs those not working in HFT.

Furthermore - HFT is perhaps the worst case for Julia, since the implementation will always be in low level languages. Julia's speedup is only useful for research in HFT.


Yeah but it's just not really relevant for the vast majority of finance businesses. None of the big banks do HFT for example.


Morgan Stanley and Goldman Sachs might be considered "big banks" and yes, they do.


When raw performance is so important you end up implementing the solution on FPGAs. Switch to Julia does not really matter in such cases.


True. You're pretty much stuck with verilog at that point.


I doubt anyone's currently using Julia for the fast path of low latency trading. Usually it goes from FPGAs for sub-milli work, C++ on GPUs then either C++ or Java for multi-milli work. In my experience.

I'd expect most of the Julia usage in this space would be to replace any R or Matlab in the back testing engines if at all.

I didn't get the impression from the article that they were really talking about low latency anyway.

That's not to suggest that Julia doesn't have a place in Finance. I think it's a great language and there are a bunch of places I can see it being very valuable. Just doubt that low latency fast path is one of them in the near term.


Well, interpreted languages like MATLAB and R can be annoyingly slow at times


I'm really interested in learning more about data science. Currently attending moocs to learn R but I don't have much free time and it's a slow progress. Currently I work as analyst for an environmental consulting firm but I always wanted to work in something related to quants. I know I'm probably being naive but I want to change careers. Do you guys recommend that I keep learning R or should I take a closer look to Julia? Or both? I don't have a programming background but I do know statistics and Excel. Edit: fast, concise and useful comments in a matter of minutes, I love HN. Thanks!


Become fully confident & proficient in a single programming language before hopping around. If you get to the point where you can build a full application from scratch and answer other peoples' questions on SO or IRC with one, you'll be able to easily get up to speed with another should the need arise.


There are many more resources for learning R, given that the language is older. These resources have become much better in recent years and I think there has been no better time to learn R. Probably the best introduction to data science in R is Hadley Wickham's latest book: http://r4ds.had.co.nz

Learning your second language is much easier than learning your first. You'll be more familiar with programming concepts and terms that make it easier to search for how to do X in that language.

Julia is still relatively new and not quite ready (in my opinion) for new programmers. Things change relatively quickly and I think that could add a lot of confusion that gets in the way of learning.

That being said, I think Julia is a pretty great language. I really love the way it has grown and I'm excited to see how it matures.


Can't answer whether to learn R or Julia, but for a more general education on Data Science, this book might be a really good fit for you:

Data Smart: Using Data Science to Transform Information into Insight

https://www.amazon.com/Data-Smart-Science-Transform-Informat...

All the work is done in Excel. I enjoyed the book quite a bit. Author is Chief Data Scientist for MailChimp.


R is great for producing scientific papers, charts, and doing a vast array of "standard" statistical analyses. I have found python to be much nicer because it's more of a "real" programming language, but the numpy/scipy/pandas stack make it very easy to use for data science and financial analysis also.

There's a (very) quick intro here:

http://lectures.quantecon.org/py/pandas.html

You can also look into quantopian for practice, but I don't recommend actually trading money.

https://www.quantopian.com/


A basic introduction to Julia in Data Science

https://www.coursera.org/learn/julia-programming


>I always wanted to work in something related to quants

Then learn Python, it is mature and has way more study material.

Not to mention Machine Learning libraries like TensorFlow, Scikit Learn & math libraries like Numpy, Pandas, Matplotlib?

Everything there is just set.

I once tried to learn Julia, but the community is just to small to get help at times. It is a work in progress but I recommend you to keep an eye on It.

Conclusion: Think of Julia as a future successor to Python, but not a current replacement.


The programming side is only part of the picture - don't forget to learn some math and theory as well!


My beef with Julia when I last tried it out was the lack of flexibility within the 'data munging' sphere. The assumption appeared to be that all of your data would come in to as UTF-8, whereas in my org we need strong support for Latin1/cp1252, and UTF-16LE BOM. Is this something that has changed recently, or are these traders are economists just working with more sane data?


AbstractStrings (the generic string abstraction) can have any encoding – only the standard built-in String type is UTF-8. If you've got data in another encoding, use the StringEncodings [1] package to read it.

[1] https://github.com/nalimilan/StringEncodings.jl


Pipe the data through a converter before feeding it into Julia.


Well - that's my point. I do use a converter, as it's built into python at every intersection that I need it, reading CSV, XLSX, database calls, etc.

Having to stick python scripts in front of all my Julia scripts everywhere that it would perform data access doesn't really seem viable. Thus, my beef with Julia.


Not a language issue per say then. It shouldn't a problem to add support for this. Julia is even designed from the start to work with Strings of different types. Something not common in most other languages I've worked with.

Most string functions operate on AbstractString types. You can create whatever subtype you like implementing alternative string encodings.


Building on this comment, it's actually surprisingly straightforward to add your own encodings into the core language. Just add the appropriate types and extend the necessary base methods.

In this case, it appears Julia 0.5 has a transcode method supporting UTF8/16/32. [1]

1: http://docs.julialang.org/en/release-0.5/stdlib/strings/


I'd venture a guess that implementing any string encoding support in a language I'm not that familiar with would likely be quite an undertaking.

If not, then it also begs the question - if it's just that simple, why's it not already there in the core alongside UTF-8?

Anyone new to the language working through data problems is going to pick it up and start at the data collection/wrangling stage, and immediately run into the 'you can implement that yourself/run it through a converter' mentality, and likely go back to whatever they were using before. I know we did.


I am not a quant programmer, but last summer I did a ten day job for an old customer in Julia and I had to learn the language as I worked. I thought the expressiveness and conciseness of the language was good and I liked the way the package manager used github projects; easy learning curve and REPL based development.


As a quant, I can't say I've heard of a single quant I know who uses Julia. It very well could be promising, but firms will be very hesitant to switch from Matlab, R, Python, C++, and VBA


I can't find the talk that Stefan Karpinski gave where he said this, but it was something like:

"We've been spending a lot of time working with firms to replace K code."

Maybe the small percentage that uses K/Q/KDB+ are moving to something more readable?


Do you have a link to the talk? It would not make business sense for Stefan to spend time replacing the small percentage, if it were indeed a small percentage. Actually, kdb+ is dominant everywhere on wall st, and the company mentioned in that article are retiring an old, discontinued version of k, not kdb+; perhaps that is the firm Stefan refers to? The debate about readability of k code has become boring - if you learn the language, you'll find it perfectly readable, elegant even.


> Do you have a link to the talk?

https://www.youtube.com/watch?v=QRWZBWwBVR0 (I can't listen to the audio now so I'm not sure how much I misparaphrased.) And, you're right, I shouldn't have conflated KDB+ with K.

> if you learn the language, you'll find it perfectly readable, elegant even.

Just comparing K to something like Python.. Though, personally, it only became (somewhat) readable after playing with APL and J for a year.


thank you for the link. It's at 12:28 in the talk where he mentions the firm is Conning, who used k for financial simulations for insurance. In this talk, he doesn't suggest he is working with other firms to replace k/q/kdb+.


Based on random sampling of those that contact me about such transitions, the latter is somewhat true. A couple have mentioned Julia but I wouldn't say it's the majority.


Definitely possible. Though I'm not sure why those firms wouldn't switch to a more mainstream stack. Most talented quants I've met prefer to code in Matlab, R, and Python, C++. VBA is of course inescapable in the life of a financier.

When competing for talent and desperate for maintainability, companies are unlikely to introduce tools that are not the industry standard.


When Julia has an environment as good as Rstudio and a dashboard as good as Shiny, and a Spark interface that is as good as Sparkler and R-Spark then I'll be up for it.

Writing good code in R is an effort, it's possible but the language doesn't make it easy. Julia makes me write good code off the bat and facilitates writing excellent, elegant code.


This is how I tend to feel about all non-R things; I've been using R for long enough to dislike a lot about the language, but the environment and tooling is too good to give up.

Letting R be the first language I've gotten pretty good at is really spoiling me.


Yep. In my short time working on an R project, I really enjoyed how everything was right there. I'd love to see more REPLs include package managers.


Rstudio is nice, no doubts there; however, I find myself avoiding language-centric IDEs. If one uses R most of the time, it's fine, but as someone who usually writes in multiple languages in a day, I find myself appreciating emacs' modes more (of which the Julia one is fine). It's also why I like the fact that the semi-official Julia IDE, Juno, isn't a standalone application (it runs in Atom and is nice, if one doesn't mind the performance of Atom...)

edit: spelling


I liked the Julia mode in emacs, but dear god, plots, files, environment, help and viewer.. (what is viewer for?)


It is the libraries which really make R. Easy decision trees, SEM, robust regressoon, etc.


Why isn't LuaJIT instead? If they both get their speed from JIT, isn't LuaJIT faster than Julia?


I have used both in the science context and found Julia much better suited for mathematical problems. Matrix/vector support built into the language is a huge advantage. On top, running code in parallel environments is better supported by Julia (Macros, scheduling, etc...). On top of it, Julia has an excellent Python interface which makes it possible to also use this huge ecosystem of libraries. Makes it easy to use matplotlib out of the box.


I like Julia as well, but there is an interesting scientific library for LuaJIT:

http://torch.ch/

Anyone have experience with it?


having worked with it, torch is amazing for neural network stuff (thanks to the nn rock); however it raises the same problems there are with Python. Lua is basically a wrapper around C/C++ and whenever you need to do complex stuff, you find yourself writing C/C++ rather than Lua.


Isn't LuaJIT development stale?


Yes. Lua could have been the Julia of its time if they had decided to pursue a more advanced compilation system rather than trying to embed lua everywhere in everything. Luajit is fast because of the nature of optimizations upon the tables structure and single table architecture. Julia is a better, faster lua and I don't find myself missing being able to embed my code when I can just call into what I need FROM julia.


I've used both. Lua is powerful because of torch and it's GPU computing capabilities, but in terms of ease of use and modern syntax, Julia is lightyears ahead.


I really liked Julia (and Juno) when tried it last year, but it was too math-aware as all [actually good] math-originated programming environments. Hope it will break that circle and get some general-purpose attention in nearest future. For me, it is very promising even without that HPC or whatever.


Anyone used Julia for something approaching HPC, e.g. in a distributed environment instead of MPI?

I've heard plenty of buzz about it in the HPC community but haven't had a moment to play with it yet.


Yes, there's been some scientific workloads that used Julia/MPI to great effect. I don't think the paper is out of review yet, so probably can't link to details, but expect news on that front very soon.


I have. I used it for all of the research I did during my PhD over the last few years. Some of my Julia code ran on Titan.


"The importance of data and speed to information can't be understated in the capital markets ..."

That says, it's less important than you could possibly state.


"Lovas adds that currently, the de facto standard numerical programming language when talking to banks and other finance firms is Python/NumPy."


I doubt that. It's certainly the trend and a lot of new code is produced in Python. But most models will run on R/Matlab/C++/C#. I'd guess that most quant departments have less than 10% of their code in Python.

But Python is the modern way to go, so every shop stresses that they use it and they expect every new hire to master it.


... master it instead of C++ or in addition to C++?


In addition, although that's often not possible. You need people who can write C++ to maintain code. But it's hard to find because few people with a finance background have any experience with C++. So companies often end up developing in python and then passing it to developers who port the code to c++ for speed and to fit it in the existing framework. If they could be replaced with Julia, it would potentially save a lot of money.


So how come not golang instead of Julia? I don't have anything against Julia myself but it seems like go with its maturity, easy cross compilation, static compilation and easy IPC/multithreading it would fit the bill quite well. Granted the GC is a bit slow but should be improved with the next release.


Golang is far far inferior in many ways, for scientific programming. For starters, it doesn't have generics.


Given Julia's pre-1.0 status, general immaturity and good Python interop wouldn't a saner approach be to use Python as your primary language and Julia for whatever needs to be fast?


just like pandas in python, I want to give a quick impl in go lang for my backtest strategy system. now it is open sourced in github: https://github.com/qingtiandalaoye/GoDataframe

could anyone give a performance test of it? many thanks


it's almost the perfect language. The breaking issue for me was the choice to count from 1 (yes, I know there are pro arguments for counting from 1, but every other language I use doesn't do that).


Why limit yourself to zero or one? Julia now lets you use any start point you'd like. A fairly recent change, so you might have missed it.

http://docs.julialang.org/en/latest/devdocs/offset-arrays/


The problem is that I maintain other people's code, and I don't want the language to add all those extra things. It's a big cognitive load to maintain, and leads to bugs in production software that is hard to debug.

The julia docs mention that Fortran arrays can be indexed with various starting numbers. I've programmed and maintained millions of lines of Fortran over 20 years and I've never seen anybody use anything other than 0 as the starting index.


Fortran, Mathematica, Matlab, R, Lua - all start indexing from 1 (you can find an incomplete list in [1]). For users coming from these languages (which is a large part of the Julia community) the cognitive load would be in the opposite direction.

[1]: https://en.wikipedia.org/wiki/Comparison_of_programming_lang...


I suspect the number of C++/Python or Python/Fortran (where users use 0-based indexing) programmers is larger than Fortran +Math +Matlab + R + Lua.

Anyway, I'm not defending my decision as a globally correct one- it only matter to me.


I've never used Julia but it reminds me of Go in that it attempts to replace Frankenstein systems that mix low level languages and high level languages/scripts, with a single modern, high level language with sufficient low level features and performance.

However, in order to do so, they have to give up the generality of either language, and focus on a particular domain (numerical computation and web servers respectively). And because of this, the language never develops a big enough ecosystem. Because of this, I think Python combined with native C libraries is ultimately the better way forward and will be more successful.


Someone explain to me why we need a new language here?

If they are trying to express ideas - wouldn't any language do just fine?

They can 'compile' in an 'optimized manner' using something else ...

But in terms of algorithmic expression?

Unless they are writing true 'real time' stuff right for the chip ... why a new syntax?


(1) if every programming language was equally good at everything we would not have a multitude of programming languages. And there are always advances in programming language theory and concepts which have not been captured in earlier languages.

(2) The semantics of the language put restrictions on what sort of optimizations you can do or how easy they are to perform. Julia has been designed to make it very easy to optimize. E.g. something like JavaScript runs quite fast, but only through a herculean effort by the JIT makers. It's design is terrible with respect to making it easy to create high performance compilers/JIT.

(3) What is "new" syntax. All languages have variations in syntax. Wherever Julia differs in syntax from whatever you know it isn't just for the hell of it. There are usually good reasons. Mostly Julia tries to look similar to programming languages familiar to its target group. Where it differs I'd say it is usually to improve on deficiencies in said languages.

I am an iOS developer but I write smaller scripts to do custom processing of source code, assets etc. I wrote my scripts in Julia first, but decided to convert to Python afterwards to make it easier to distribute to other people, given that Julia does not come pre-installed.

I got to say Julia is WAY WAY nicer to program than python for shell script like stuff. I kept hitting on so many issues when converting to Python that I kept a long list of issues with Python. My takeaway from this experience is that a LOT has happened since Python, Ruby etc was conceived. Julia has been able to tap into all the advances that has happened since then. It has a much smoother REPL environment, much more sensible set of built in functions. More sane naming conventions. And the libraries are built to combine effectively in much better way. You simply need to write much less code when writing Julia because stuff combine so well. While in Python I find myself having to write a lot of glue code to make pieces fit together.

So the question is, why stick to a language just because it is well established. I had many more years of experience with Python. But I get the job done much faster with Julia. So why waste extra time on Python just because it is established? That has no inherent value.


"So the question is, why stick to a language just because it is well established."

You answered the question yourself:

"but decided to convert to Python afterwards to make it easier to distribute to other people"

Languages are platforms. They have vast distribution, incumbency, developers, apis etc. - and that has a lot of value.

Imagine if you did dev on Julia - how do you hire devs?

Julia has a very, very small mind-share - in order for it to reach 'platform status' it really has to cross a threshold, and for that - it has to provide significantly more value.

I'm also curious to know 'how much faster' it is - couldn't those performance problems be solved with more horsepower?

Developers have a terrible habit of over-optimizing for performance, which usually comes at the cost of complexity.

Anyhow - I know little about Julia so I'll hold my tongue, maybe it's the bees-knees ...

It'd be great to see some data points.


> Imagine if you did dev on Julia - how do you hire devs?

Julia doesn't take long to learn if you've ever written any Python, or Matlab, or R. It's very good at calling into existing code written in other languages (the above, and C) so it's possible to transition gradually.

> I'm also curious to know 'how much faster' it is - couldn't those performance problems be solved with more horsepower?

Depends on the exact type of computation. If everything you're doing is straight BLAS/LAPACK or other library calls, then Python is just glue code and not taking much of the time. If you're implementing your own custom algorithms (or writing libraries) that are difficult to express in a vectorized way (where numpy just does the work in C) or you want more customized data structures and types than dense arrays of single or double precision floating point numbers, Julia will do well. There are various projects out there that try to make Python faster (PyPy, Numba, Pyston, Pyjion, the list seems to grow every few months) but it's a very difficult problem to do in full generality since you either have to restrict yourself to a subset of the language where the semantics permit optimizations, or lose compatibility with all the C-extension libraries you're used to. Julia has designed the language semantics and the conventions of the standard library to be more amenable to optimization. It's early days yet in terms of multithreading support, but Julia has a better story there as well.

For a representative performance comparison case study, see section 4 of https://arxiv.org/pdf/1312.1431v1.pdf. There have been similar experiences reported and published in other application areas since.


> They can 'compile' in an 'optimized manner' using something else ...

Thats actually the issue Julia attempts to solve as highlighted in the article. There are many languages that you can use to express ideas, but when you need the idea to run as fast as possible, you have to rewrite in C/Fortran/etc and then add bindings to to the language.

Julia lets you write inline C, which I believe means that you can still express algorithms simply, but with some parts (trivially) optimized. (Not a huge julia user, but thats my take on it)


Whilst Julia's foreign function interface is indeed good and it is really easy to call into C, the point is that Julia itself is as fast as C. So you don't need to write any C code to get performance, instead just tune the bottlenecks in Julia itself.

For instance, the standard library of Julia [1] is written in Julia itself (and is very performant) and only calls into external C or Fortran libraries where there are well established code-bases (e.g. BLAS, FFTW). Compare this to, e.g., Python or R where much of the standard library is written in C.

[1] https://github.com/JuliaLang/julia/tree/master/base


I understand that - but wouldn't a really smart compiler be able to do the same thing?

And if you have to be able to write 'some parts' in C for speed ... then why not just use C? And have nice libs for whatever you are doing?

Or intelligently distribute it across processes?

I find it hard to believe that the 'financial industry' has performance requirements that have never been encountered in the entire rest of high tech before ...


> wouldn't a really smart compiler be able to do the same thing?

Well compilers are tied to the language they need to compile so language choice does still matters. As an extreme example, GHC can be much more aggressive with inlining because it's pure, whereas most other languages don't provide the same guarantees.

> And if you have to be able to write 'some parts' in C for speed ... then why not just use C? And have nice libs for whatever you are doing?

Yes I think they could do that, but then they'd need people to write and maintain those libraries. My understanding is they want to find something which strikes a better balance between productivity and speed. Having to write your own libraries to do everything is likely considered a productivity loss.


Ah, I see. I'm no expert, but I'd argue the response is "Maybe".

I think this is what blaze and pypy try to do with python: they compile it so that certain things are optimized. The harder part is that you have to generally hint for the compiler to truly be the best...

WRT to just why not use C, python/R/Julia are MUCH easier to prototype in than C (in my opinion, having used those languages). The "grail" is to be able to prototype quickly, and then identify the hotspots to speed up (which, in Julia's case, would be just inlining with C code).


One of the ideas behind julia is to get a language that is as expressive as python, with better functional programming capability, that runs at the same order of speed as C.

You can't get something for nothing, but it's a really good start.


[flagged]


We've asked you already to comment civilly and substantively or not at all. Please stop posting like this.


deja vu


just like pandas in python, I want to give a quick impl in go lang for my backtest strategy system.

now it is open sourced in github: https://github.com/qingtiandalaoye/GoDataframe




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: