Hacker News new | past | comments | ask | show | jobs | submit login
Accelerate Excel – 100x faster spreadsheets (datanitro.com)
67 points by vj44 on July 31, 2014 | hide | past | favorite | 46 comments



This is really cool, and I have to say I'm impressed, but is excel speed a common problem for people?

Are you trying to monetize this? Who is your target market? I'm genuinely curious.

I work at $BIGCORP in analytics with a lot of large excel spreadsheets every day. If I bump up against row limits or need functionality not available in excel I'll import into a corporate oracle db and manipulate it with SQL. A lot of the people that I collaborate with can only use Excel and it's often faster to just manipulate data there rather than put it into a real db or parse it with another tool.

With modern computers I can tell you it is a very rare occurrence that I cause excel to choke. On the order of once a quarter.


Most people who do heavy analytics in Excel don't have the option of importing something into a DB and using SQL, either because they aren't familiar with SQL or because they have to send the spreadsheet (and not just the result of the calculation) to someone who does.

Right now we're focusing on building the software over monetization - the beta is free and will be for a while. Our target market is Excel-based analysts. They're found in a lot of sectors, finance and ad-tech being two major ones.


>is excel speed a common problem for people?

There are people who will pay for any amount of speed in a spreadsheet. These people buy the fastest PCs they can to run overnight spreadsheets that involve sloshing millions of dollars around. Large chunks of the world's financial industries run in Excel macros. (Be very afraid.)


Ah, yes I could see various FI's being a fantastic market for this.

To really get entrenched they could price it at serious enterprise pricing, build case studies, do enterprise sales etc. and could make some serious money with the right sales/marketing.


I make Excel choke regularly. For example, copy-pasting a complicated vlookup down 1 million rows takes forever.


Combining =index() and =match() seems to be less computationally intensive than =vlookup() -- give it a shot.

I've had entire days spent copying formulas down 5000 rows (x20-50 cols), copy, paste-values, and repeat. Can't wait to try this out, just signed up for the beta.

Another thought -- as someone who runs windows on a VM on my mac almost purely for excel purposes, this could enable me to go 100% mac. The biggest downfall to mac's excel seems to be a computation engine that's far behind the windows version.


Why not just dump all table...?


> but is excel speed a common problem for people?

Anyone who sets excel to "manual calculation" mode to avoid pauses whenever they make a change in a big spreadsheet might be interested in this. If this engine removes the need for that, it could be a pretty big deal. It'll definitely make things easier for me when I'm messing with material testing data.

But like you said, how do you market it? How do you reach the people who need it? I suspect that for most people excel is just a tool, and they're not involved in excel communities or excel culture (if such things exist) in a way that would allow them to discover something like this.


I don't think I even work with very large Excel sheets, but our massive master sheet of 1mil+ rows (with about 15 columns) regularly causes Excel to sputter and die.

I'd rather not use it, but anything to make it work better would be awesome.


> We've rewritten Excel's computational engine from scratch. Our hand-optimized code blows Excel's native engine out of the water.

Does the rewrite incorporate all of Excel's important native functions? (e.g. the ones that would most likely be used with 10^5 rows of data)


Yes, all native Excel functions are re-implemented.


I used to write spreadsheets that were 50mb and sometimes took a few minutes to compute (native computational stuff, not macros and not circular references).

I'd be very interested to know some generalised (non-IP) background on how you've achieved this speed increase.

Also, under what circumstances is this speed increase achievable? How have you measured the 100x?


We got the speed increase primarily by dropping to low-level code (numpy, and sometimes C) in functions that cause bottlenecks. Multi-threading helps too.

The largest gains are in the scenario you mentioned - big spreadsheets with a lot of computation. Excel tends to choke on these.


There's also a huge amount of very very badly excel based 'applications' that could benefit from even a modest 2x speed increase.


I see you guys have you been exploring the spreadsheet world for a while. Have you had a look at financials consolidations? The de facto software for this is Hyperion and it is terrible. Also, nothing I know of allows to build complex web reports. Another "consolidation" tool is BPC, which is another nightmare to maintain. Everything I am using in my controlling job makes you waste literally days a month due to bad implemented UI, functionalities and overall slowness (even business objects/hana is not that fast...).


Check out BlackLine Systems -- a fantastic SaaS solution for account reconciliation and other parts of the financial close process


I am a heavy heavy Excel user. I use it to help run a large startup.

100x faster would be a huge win. There's always a tradeoff between design/human cycles and speed. You could even think of it as Excel technical debt. Legacy spreadsheets never get rebuilt/re-engineered because of a lack of human time. That's a large cause of spreadsheets that become unmanageable.

2 questions around this: - How do we know the data is secure? (the IT team will want to know this for sure) - How do we know the computations are correct?


The calculations run locally, so your data is secure since it doesn't go anywhere.

The easiest way to check correctness is to compare what Excel returns to what our engine returns. If there's no discrepancy, you can rely on our calculations from then on.

(If there is one, let us know and we'll fix it! We've tested it internally, and part of the goal of the beta is to make sure things hold up in wider use.)


Data locally == secure : not so much. Do you carry your machine through customs? Do you leave it on your desk at night? Do you leave it in your home (not locked in a safe)? More nightmarish... violence and threats can be applied to individuals with relatively low risk.

Data held on individual responsibility is not secure. To secure it you need an auditable monitored secured database at an alarmed secured location.

Now, bad men may still be able to get at data held by organizations or communes (as sysadmins and security guards are just as worried about car bombs as you and me), but it becomes risky in the way that nabbing someone, extracting their passwords, copying their data, shooting them and burning down their house is not. This is why individuals have long chosen to deposit valuables with trusted third parties rather than keeping it under their beds despite the marginal cost of doing so.

Excel spreadsheets are fundamentally hard to debug, in the sense of understanding the organization of the code is very hard (see GOTO considered harmful), this normally doesn't impact people because spreadsheets tend to be relatively simple, but the problem is that as they become useful and important they tend to become heavily used and more and more complex.

[[EDIT - I just reread this and realized I wrote it as a first person/personal set of statements, and I had no intention of aiming it at any person (especially not the parent author) and so I rewrote it to be impersonal and hopefully not inflammatory or unpleasant]].


Could you please post some benchmarks ? I'm really curious


If it is not proprietary, please explain how you force Excel to use your computation engine, rather than its own internal one?


We just turn off Excel's computations and then run our own.


Do you have any more information on the correctness of the calculations?


The underlying formulas are mathematically equivalent to the ones Excel uses.

They may not match Excel precisely when it comes to numerical issues (things like roundoff error may come out differently, for example).


This looks really cool.

My mum (not me, unfortunately for you!) works in Accountancy and the entire industry is spreadsheet-driven. IT bods on HN might not realise this, but the accounting department of nearly every BigCo that isn't a development firm basically relies entirely on Excel.

I've already signed up (tom at cishub dot co dot uk) but the key point here for me is how easy it's going to be for her -- a non-technical accountant, but excel expert -- to integrate it with her existing spreadsheets.

Also, wether it'll require modifying existing spreadsheets and/or ways of working to take advantage.

Definitely one to watch, could be a ridiculous moneyspinner if you get it right.


How much creative potential do you have? Are you limited to Excel types or could you extend it arbitrarily?

VBA is a kludge, but I've always thought that spreadsheets might be an interesting programming environment if you were restricted to the native functionality with a small addition. Namely, add a new value type: "anonymous function," which consists of parameters with an excel-native body, which can be assigned to a cell. You could then call the function by referring to the cell it's stored in (or any other way you can get a reference to it) and passing parameters. Function naming would work by using named ranges.


Where does DataNitro run - is it an add-in, is it a desktop program that operates on external Excel files, or are the files uploaded to, and processed on your site? If it's an add-in, is it Windows-only (COM)?


It all runs locally, Excel files are not sent remotely. No COM is used (god forbid!) for the engine. We have a small plugin for nice UI in Excel, but the engine can be run outside of Excel as well.


Cool. Limited exit strategy though - you either sell to Microsoft or else what? I guess if the price is reasonable it makes sense.


I'm sitting here waiting on a spreadsheet to calculate so this interests me greatly.

However, I will say that if your spreadsheets aren't calculating then you're probably not using the proper tool for the job. I've inherited the ones I'm using and hope to simplify them as soon as I can figure out exactly what they're doing.


You're right, but people in that situation don't always have the luxury of rewriting the models or moving to a different tool. Hopefully we can help in those cases.


I'd more interested in seeing a CUDA implementation. If they were able to get 100x baseline on the CPU, I'd expect the GPU implementation to be at least 1000x baseline as most tasks are "infinitely" parallelizable (row additions, multiplication etc.)


I want this, I've been wanting this for the past four releases of Excel. Excel is a great tool for engineers. It gets the subpar results fast, while code, matlab, and DBs get's the fantastic results slow, which I hardly ever need.


Excel had major issues with statistical calculations prior to Excel 2010. Post 2010 there are still a few issues. Have you taken this into account and corrected any, or just duplicated the behaviour, or some other method?


We produce the correct result rather than duplicating buggy behavior.


Kudo's on the speed up, but can you guarantee that the calculations are bug-free ?


Once again, Excel users on the Mac are unable to get all of the cool toys.


Our engine can be run independently & outside of Excel, and it works on Mac too (albeit without the nice UI in Excel).


This may mean I can finally ditch windows on a VM and just use mac excel exclusively!


Hopefully this will be the nail in the coffin of "big data".


What's this written in?


One of the founders here - it's written in a combination of C, Python and Numpy. We had to go low-level to make it fast, but it's still portable.


Doesn't having it in Python mean that its memory requirements are moderately large in comparison to pure C++ code?


We kept that in mind - the actual core is written in C, Python is used mainly for auxiliary jobs (parsing etc.)


We took "Show HN" out of the title because "Show HN" is for things that people can try out now: https://news.ycombinator.com/showhn.html.


No source code? How are we supposed to make any use of this? What is the point of an email signup?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: