
Accelerate Excel – 100x faster spreadsheets - vj44
http://engine.datanitro.com/
======
seestheday
This is really cool, and I have to say I'm impressed, but is excel speed a
common problem for people?

Are you trying to monetize this? Who is your target market? I'm genuinely
curious.

I work at $BIGCORP in analytics with a lot of large excel spreadsheets every
day. If I bump up against row limits or need functionality not available in
excel I'll import into a corporate oracle db and manipulate it with SQL. A lot
of the people that I collaborate with can only use Excel and it's often faster
to just manipulate data there rather than put it into a real db or parse it
with another tool.

With modern computers I can tell you it is a very rare occurrence that I cause
excel to choke. On the order of once a quarter.

~~~
mistermcgruff
I make Excel choke regularly. For example, copy-pasting a complicated vlookup
down 1 million rows takes forever.

~~~
bake
Combining =index() and =match() seems to be less computationally intensive
than =vlookup() -- give it a shot.

I've had entire days spent copying formulas down 5000 rows (x20-50 cols),
copy, paste-values, and repeat. Can't wait to try this out, just signed up for
the beta.

Another thought -- as someone who runs windows on a VM on my mac almost purely
for excel purposes, this could enable me to go 100% mac. The biggest downfall
to mac's excel seems to be a computation engine that's far behind the windows
version.

------
minimaxir
> _We 've rewritten Excel's computational engine from scratch. Our hand-
> optimized code blows Excel's native engine out of the water._

Does the rewrite incorporate all of Excel's important native functions? (e.g.
the ones that would most likely be used with 10^5 rows of data)

~~~
vj44
Yes, all native Excel functions are re-implemented.

------
JonoBB
I used to write spreadsheets that were 50mb and sometimes took a few minutes
to compute (native computational stuff, not macros and not circular
references).

I'd be very interested to know some generalised (non-IP) background on how
you've achieved this speed increase.

Also, under what circumstances is this speed increase achievable? How have you
measured the 100x?

~~~
karamazov
We got the speed increase primarily by dropping to low-level code (numpy, and
sometimes C) in functions that cause bottlenecks. Multi-threading helps too.

The largest gains are in the scenario you mentioned - big spreadsheets with a
lot of computation. Excel tends to choke on these.

------
kfk
I see you guys have you been exploring the spreadsheet world for a while. Have
you had a look at financials consolidations? The de facto software for this is
Hyperion and it is terrible. Also, nothing I know of allows to build complex
web reports. Another "consolidation" tool is BPC, which is another nightmare
to maintain. Everything I am using in my controlling job makes you waste
literally days a month due to bad implemented UI, functionalities and overall
slowness (even business objects/hana is not that fast...).

~~~
bake
Check out BlackLine Systems -- a fantastic SaaS solution for account
reconciliation and other parts of the financial close process

------
joez
I am a heavy heavy Excel user. I use it to help run a large startup.

100x faster would be a huge win. There's always a tradeoff between
design/human cycles and speed. You could even think of it as Excel technical
debt. Legacy spreadsheets never get rebuilt/re-engineered because of a lack of
human time. That's a large cause of spreadsheets that become unmanageable.

2 questions around this: \- How do we know the data is secure? (the IT team
will want to know this for sure) \- How do we know the computations are
correct?

~~~
karamazov
The calculations run locally, so your data is secure since it doesn't go
anywhere.

The easiest way to check correctness is to compare what Excel returns to what
our engine returns. If there's no discrepancy, you can rely on our
calculations from then on.

(If there is one, let us know and we'll fix it! We've tested it internally,
and part of the goal of the beta is to make sure things hold up in wider use.)

~~~
sgt101
Data locally == secure : not so much. Do you carry your machine through
customs? Do you leave it on your desk at night? Do you leave it in your home
(not locked in a safe)? More nightmarish... violence and threats can be
applied to individuals with relatively low risk.

Data held on individual responsibility is not secure. To secure it you need an
auditable monitored secured database at an alarmed secured location.

Now, bad men may still be able to get at data held by organizations or
communes (as sysadmins and security guards are just as worried about car bombs
as you and me), but it becomes risky in the way that nabbing someone,
extracting their passwords, copying their data, shooting them and burning down
their house is not. This is why individuals have long chosen to deposit
valuables with trusted third parties rather than keeping it under their beds
despite the marginal cost of doing so.

Excel spreadsheets are fundamentally hard to debug, in the sense of
understanding the organization of the code is very hard (see GOTO considered
harmful), this normally doesn't impact people because spreadsheets tend to be
relatively simple, but the problem is that as they become useful and important
they tend to become heavily used and more and more complex.

[[EDIT - I just reread this and realized I wrote it as a first person/personal
set of statements, and I had no intention of aiming it at any person
(especially not the parent author) and so I rewrote it to be impersonal and
hopefully not inflammatory or unpleasant]].

------
TheAlchemist
Could you please post some benchmarks ? I'm really curious

------
SeanDav
If it is not proprietary, please explain how you force Excel to use your
computation engine, rather than its own internal one?

~~~
karamazov
We just turn off Excel's computations and then run our own.

~~~
Dwolb
Do you have any more information on the correctness of the calculations?

~~~
karamazov
The underlying formulas are mathematically equivalent to the ones Excel uses.

They may not match Excel precisely when it comes to numerical issues (things
like roundoff error may come out differently, for example).

------
kalleth
This looks really cool.

My mum (not me, unfortunately for you!) works in Accountancy and the entire
industry is spreadsheet-driven. IT bods on HN might not realise this, but the
accounting department of nearly every BigCo that isn't a development firm
basically relies entirely on Excel.

I've already signed up (tom at cishub dot co dot uk) but the key point here
for me is how easy it's going to be for _her_ \-- a non-technical accountant,
but excel expert -- to integrate it with her existing spreadsheets.

Also, wether it'll require modifying existing spreadsheets and/or ways of
working to take advantage.

Definitely one to watch, could be a ridiculous moneyspinner if you get it
right.

------
infogulch
How much creative potential do you have? Are you limited to Excel types or
could you extend it arbitrarily?

VBA is a kludge, but I've always thought that spreadsheets might be an
interesting programming environment if you were restricted to the native
functionality with a small addition. Namely, add a new value type: "anonymous
function," which consists of parameters with an excel-native body, which can
be assigned to a cell. You could then call the function by referring to the
cell it's stored in (or any other way you can get a reference to it) and
passing parameters. Function naming would work by using named ranges.

------
nhebb
Where does DataNitro run - is it an add-in, is it a desktop program that
operates on external Excel files, or are the files uploaded to, and processed
on your site? If it's an add-in, is it Windows-only (COM)?

~~~
vj44
It all runs locally, Excel files are not sent remotely. No COM is used (god
forbid!) for the engine. We have a small plugin for nice UI in Excel, but the
engine can be run outside of Excel as well.

------
bhouston
Cool. Limited exit strategy though - you either sell to Microsoft or else
what? I guess if the price is reasonable it makes sense.

------
Nicholas_C
I'm sitting here waiting on a spreadsheet to calculate so this interests me
greatly.

However, I will say that if your spreadsheets aren't calculating then you're
probably not using the proper tool for the job. I've inherited the ones I'm
using and hope to simplify them as soon as I can figure out exactly what
they're doing.

~~~
karamazov
You're right, but people in that situation don't always have the luxury of
rewriting the models or moving to a different tool. Hopefully we can help in
those cases.

------
pmalynin
I'd more interested in seeing a CUDA implementation. If they were able to get
100x baseline on the CPU, I'd expect the GPU implementation to be at least
1000x baseline as most tasks are "infinitely" parallelizable (row additions,
multiplication etc.)

------
dammitcoetzee
I want this, I've been wanting this for the past four releases of Excel. Excel
is a great tool for engineers. It gets the subpar results fast, while code,
matlab, and DBs get's the fantastic results slow, which I hardly ever need.

------
SeanDav
Excel had major issues with statistical calculations prior to Excel 2010. Post
2010 there are still a few issues. Have you taken this into account and
corrected any, or just duplicated the behaviour, or some other method?

~~~
karamazov
We produce the correct result rather than duplicating buggy behavior.

------
SchizoDuckie
Kudo's on the speed up, but can you guarantee that the calculations are bug-
free ?

------
jaxn
Once again, Excel users on the Mac are unable to get all of the cool toys.

~~~
vj44
Our engine can be run independently & outside of Excel, and it works on Mac
too (albeit without the nice UI in Excel).

~~~
bake
This may mean I can finally ditch windows on a VM and just use mac excel
exclusively!

------
gaius
Hopefully this will be the nail in the coffin of "big data".

------
trapezoid
What's this written in?

~~~
vj44
One of the founders here - it's written in a combination of C, Python and
Numpy. We had to go low-level to make it fast, but it's still portable.

~~~
bhouston
Doesn't having it in Python mean that its memory requirements are moderately
large in comparison to pure C++ code?

~~~
vj44
We kept that in mind - the actual core is written in C, Python is used mainly
for auxiliary jobs (parsing etc.)

------
dang
We took "Show HN" out of the title because "Show HN" is for things that people
can try out now:
[https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html).

------
TD-Linux
No source code? How are we supposed to make any use of this? What is the point
of an email signup?

