

Ask HN: Beyond Excel, how might a non-technical analyze large amounts of data? - areohbe

I have to do analysis on a large (in the Excel world) dataset fairly regularly. Ideally, I would like to have some type of repository that I could dump the data into on a monthly basis when I receive the reports and then be able to query and sort the data as needed. Beyond Excel, I'm not sure what the next step up in the world is, but I knew the kind folks here would not be afraid to share an opinion or two.&#60;p&#62;The data consists of multiple performance measurements for multiple unique entities on a per month basis. So currently, my Excel workbook looks something like this:&#60;p&#62;Sheet 1: Summary/Analysis Page that performances basic calculations off subsequent sheets&#60;p&#62;Sheet 2: Performance Metric 1
Column A: Unique Identifier
Row A: Months
B,B and beyond: Data&#60;p&#62;Sheet 3: Performance Metric 2
Column A: Unique Identifier
Row A: Months
B,B and beyond: Data&#60;p&#62;Sheet 4: Performance Metric 3
Column A: Unique Identifier
Row A: Months
B,B and beyond: Data
======
rjempson
It depends what you are trying to do exactly, but the excel add-in called
powerpivot could be useful if you are trying to aggregate data and do adhoc
analysis.

It is free :

<http://www.microsoft.com/en-us/bi/powerpivot.aspx>

I have analysed lots of BI tools, which could be useful in your scenario, but
most require a bit of technical / database knowledge to iron out wrinkles.

~~~
ScottWhigham
+1 for PowerPivot. It kicks butt with large data sets, leverages your existing
Excel skills, and extends those skills with DAX.

------
rpackard
I worked R & D at a big consumer packaged goods company and ran into a lot of
these 'excel messes' especially dealing with raw materials specifications.
Here are my suggestions:

The 'R' programming language is pretty useful for statistics and could be used
to aggregate a lot of files (and its free). Also if you want to generate
dynamic reports using templates then you should look into sweave which is a
package in 'R'. (<http://www.stat.uni-muenchen.de/~leisch/Sweave/>) You could
build beautiful PDF reports that you simply run a script to generate if they
are easily created into a template. But that would also require learning Latex
for creating documents, something that is used heavily in the academic world
but I have yet to see it in the business world.

Another approach which I would try is to use a programming language to dump it
into a relational database and then use SQL to access the info (which is
heavily used in business setting). This option also also allows for others to
use it with familiar business tools. My unconventional tools of choice would
be Ruby and using Rails to build an interface.

------
brudgers
Tabulus might be an option.

<http://www.tabulus.com/Home.aspx>

------
deckchain
I would use Talend for Data Integration to clean and load the data into MySQL.
Then write some SQL for the reports that are query-able. Then build Ruby and
Rails front-end for the more complicates data views.

------
infinii
Try the R programming language

~~~
roseleur
I second that. Actually, Coursera is currently hosting a R-language class at
<http://www.coursera.org/course/compdata>

------
lsiebert
R is where you want to go.

------
calculated_risk
matlab, stata, c# matlab is by far the best in my opinion

------
Mz
GIS software?

Most data has a physical location associated with it (street address, lat-
long, something mappable). "A picture is worth a thousand words". Mapping data
can help you visualize and mentally process heavy data loads. And Excel is one
of the database formats typically compatible with standard software so with a
smidgeon of luck you can probably kind of plug and play.

