Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Language compiled to Excel spreadsheets?
34 points by marekr 789 days ago | hide | past | web | 47 comments | favorite

Yesterday there was a thread about biggest problems with Excel. I’d like to ask you for an opinion on an idea - source code for Excel. The goal is to give people who create models and tools in Excel the advantages of having a source code to what they are building and maintaining. It is challenging and time-consuming to build spreadsheets that are really solid, minimize the chance of errors, that can be unit-tested and version controlled.

The idea is to create a declarative language which lets you describe fields, tables and worksheets in spreadsheet. You are encouraged to use names instead of cell coordinates. For formulas you just use Excel’s formula language. There’s a simple “compiler” that converts that into a spreadsheet.

Compared to regular Excel: - you can unit-test the source code - you can keep it in version control - it’s much easier to read and reason about (it’s auditable) - the outcome is a spreadsheet that satisfies the best practices (e.g. formula cells are blocked, input cells are validated).

What do you think about the idea? Who could make a good early adopter?

There might be some merit to this in very specific use cases but I suppose for those use cases you might be better off using a language / environment such as Python or R anyway.

I know quite a few heavy Excel users (mostly of the MBA variety). They sometimes accomplish astonishing feats with Excel. They're skilled and knowledgeable problem-solvers but they're not very likely to touch a programming environment other than Excel.

The reason for this presumably is that spreadsheets and Microsoft Excel in particular have hit a sweet spot of an IDE that sort of covers many common business use cases while still being easy enough to use and getting started with for an intelligent person who knows nothing about programming at first. Spreadsheets are also a very visual environment compared to a text-based programming language.

So, yes there's lots to improved about handling spreadsheets, making them more testable and scalable but the core - the spreadsheet development environment if you will - isn't going anywhere. At least I can't see too many people (especially the MBA kind) using a declarative language instead of Excel.

>> Spreadsheets are also a very visual environment compared to a text-based programming language.

I like to think of a spreadsheet to be a REPL for the masses.

A programming language that is as easy to write as Excel formulas exists already: It's called BASIC. Lots of people fell in love with it since the late 70's. Sure it's often ridiculed, mostly by people who somehow manage to slide an ill-fitting and unwarranted Djikstra quote in their argument, but the language has evolved with the times and once enabled millions of programmers to earn a respectable and honest living. Dialects like FreeBASIC, QB64, DARKBASIC, XOJO, FNX Basic, AutoIT and GAMBAS may seem very different to one another, but they remain BASIC and easy enough to learn within a week. Communities still flourish and are very tightly knit. Source code, Modules and Libraries are abundant, as are very capable IDEs, together with complete documentation.

Go forth(cough) and give FreeBASIC a try for a start. It already runs faster than Python ever will on Linux, DOS32 and Windows. QB64 is cross-platfrom too. If you're on the move give RFO BASIC a try, when you get back to your regular PC you can make APKs of the programs you've designed on your tablet. B4J enables you to run your programs on the Java VM. And if they are still not enough, you can still write your own dialect of BASIC... in BASIC!!!

I've also noticed the trend with various dialects of Basic producing tightly-knit communities but I wonder if you can credit that to the language itself. Is a modern, object-oriented Basic really friendlier than (a subset of) Python or Lua (edit: with a similar standard library)? I find it at least as likely that today Basic's reputation as an informal, hobbyist- and beginner-friendly language shapes the communities that use it in a certain way more than the language itself. I acknowledge this may not have been the case in the 1985 or even 1995. Blitz3D [1] was an exceptionally quick and easy way to do 3D in the year 2001 but at that point it could have been written to expose the same API, say, to a Pascal compiler.

What exactly is right about Basic has been a matter of great interest to me lately because I have been thinking about developing a bootable early home computer Basic-like interactive environment for UEFI. I quickly realized that one of the hardest design decision involved in this project would be whether the language of the environment should actually be a dialect Basic (rather than Logo, Lua, Scheme, JavaScript, Tcl, Python or something else entirely).

On a related note, I wish there were large empirical studies of how quickly new programmers learn to use each of the stereotypically friendly languages. Perhaps MOOC will eventually produce those.

[1] https://en.wikipedia.org/wiki/Blitz3D

> Is a modern, object-oriented Basic really friendlier than Python or Lua?

QuickBASIC had User Defined Types, FreeBASIC not only has Pointers (and therefore Function Pointers) but also adds methods to TYPES which are defined as Subroutines just like in C++. In Fact FreeBASIC can be thought of as a covert way to learn and understand C++ for QBasic programmers.

As for it being an informal or hobbyist language, if one begins by designing the proper data structures and adds the adequate functions to handle them, the resulting programs can be just as capable as those designed in any other language. Being able to think like a programmer makes all the difference, not the reputation of the language being used.

As for Blitz3D, it can access external DLLs. DLLs can be easily written and compiled with freeBASIC. No problem for any adventurous BASIC programmer.

As for your BASIC dialect you can take a look at [1] to learn how to write your own in FreeBASIC or QB64. This code can be easily understood and ported to any language.

[Edit] The standard library may not be as complete as Python's but FreeBASIC can work with most DLLs compiled with C and the documentation will mention ways to include C headers and work with various external libraries.

[1] http://www.qb64.net/forum/index.php?topic=6388.0

>As for it being an informal or hobbyist language, if one begins by designing the proper data structures and adds the adequate functions to handle them, the resulting programs can be just as capable as those designed in any other language.

Well, that is exactly my point. Basic has the reputation of being easy for beginners and hobbyists but properly written code in a recent OOP dialect of Basic doesn't look too different from code written in any object-oriented Algol-derived language. As such, it is probably about as easy/as hard to write. Which is what makes me suspect that Basic-the-language might not really be the secret sauce of environments like QB64 or Blitz3D. Other possible reasons they get the sort of following they do could be that

1. Those environments are easy to install and create your first project with.

2. They start you with an IDE, so compiling and running your first code is only a matter of pressing a single key or clicking a single button.

3. They ship a "fun-oriented" standard library. You don't have to install any addition packages or even import extra modules into your project to access subroutines for plotting, sprite graphics, sound, 3D, networking, etc.

4. They have high-quality, beginner-friendly documentation with plentiful examples.

5. They are fast, both the IDEs and the runtimes.

6. (A language-related one.) Somewhat surprisingly, they are more often that not statically typed and offer decent error messages. Perhaps compile-time errors are a better educational tool than runtime errors.

I am not sure how important it is to have all of the above points but 1-3 seem crucial.

The secret is indeed the traditional tight integration of IDEs + Documentation + Ease of compilation and deployment of code. I'd say the first 5 points are absolutely crucial to the success of good BASIC coding environments and more languages should offer the same.

I think a great example of a non-Basic language that succeeded because of 1-6 is Turbo Pascal. Its documentation was thorough enough that you could learn and use the language and its many libraries without external references (except, perhaps, to get you started).

Going with Basic instead of a different language for your IDE can be a good choice for another reason. It is, or at least was, good for marketing to people who write code but don't consider themselves developers. To the potential buyer of the IDE using Basic signals, "this is a friendly environment; we don't expect you to be a grizzled professional programmer". (This may not work today. The latest product from Blitz Research is called "Monkey" and REALbasic was renamed "Xojo" in 2010, both apparently for marketing reasons. The company behind Xojo described theirs at http://www.xojo.com/support/faq_xojonew.php)

I absolutely agree. My first experience was with QuickBASIC in the late 90s. I don't remember how I got my hands on it, but it was remarkably easy to pick up for a 10 year old with no programming experience and no one else in the house to answer questions. It was clearly designed with a quick feedback loop and small barriers to entry in mind.

And you remember the documentation that came with that? Absolutely amazing for the time. Although syntax highlighting had not become prevalent yet, the absolute bliss of just pressing F1 and seeing all the documentation you needed was and still is unmatched. Python integration with its various IDEs still doesn't compare to what an obscure dialect like QB64 offers today.

Yeah, it was fantastic. After programming in QuickBASIC ~1998-99, I didn't really do any programming for about 8 years. When I got back into it (in college) I was frustrated by how difficult it seems to get off the ground. The world of c++, python, command line, vim, shared libraries, package repositories, etc..., is very powerful, but it takes a lot of knowledge to get to the point of creating something of value.

At my job, we have been getting non-programmers (at least people who aren't 100% programmers) to learn some python. It has been going fairly well, but there is still a lot of ancillary knowledge, often about the programming environment, that they don't have which means a programmer needs to get involved. As much as I dislike Excel, it does a great job of letting people think about their business logic without getting side tracked dealing with dependencies, libraries, whatever. I feel like BASIC had similar strengths, but was code with all of the advantages that brings. More modern languages like python could learn something from it.

This is identical to my experience teaching non-programmers to write Python. We end up reverting to Excel so often.

You should both teach them FreeBASIC. After they've written their first Objects as Extended Types they will have no trouble at all moving on to C++ if they want to. They will have little trouble finding out how to create GUIs using GTK or how to compile for DOS, Windows and Linux and they won't lose time trying to figure out how to distribute their programs or wondering if the Python runtime is installed on their client's machines or not.

I think that the advantage of Excel for non programmers is that it's less abstract and you can see all the intermediate results instead of having a mental model in your head. Also it makes some bugs show faster, because you see the immediate result.

Perhaps an alternative is to make something between Excel and "text" programming. Usually a big spreadsheet has many almost independent blocks. Make some mini-spreadsheets with names and clear title rows/columns. Allow the mini-spreadsheets to be positioned easily and connected by formulas.

(Disclaimer: I use a lot of Excel instead of small scripts, the main problem is that you cant's nest 3 for loops.)

[In a related way, I saw some physicist program small calculations in Mathematica because it's more interactive than Fortran and it's easier to make graphs and find the errors.]

[There are some similar attempts. For example in Racket you can use #lang frtime to get a "reactive" program thatauto updates the values. http://docs.racket-lang.org/frtime/ It's a pity that the docs don't have an easy example.]

> the main problem is that you cant's nest 3 for loops

I'm curious what you mean by this. Care to explain?

The problem is that the spreadsheets are 2D, so you iterate easily over two variables, the row and the column.

If you need to "program" this in an Excel spreadsheet

  For i = 1 to 5
    For j = 1 to 5
      If 2 * i + j = i * j then
        Print i, j, "*"
      End If
You can "write" this in a 6x6 block with the values

    1  2  3  4  5
  1 =If(2*B$1+$A2=B$1*$A2, "*", "")
And copy the formula from B2 to all the cells, so you get

    1  2  3  4  5
  3       *
  4    *

Ah, that's what I thought you might mean. Can you talk about why you use spreadsheets in the first place, then? You're dealing with nontrivial problems if you hit this limitation, and you could write programs in a general-purpose language if you wanted to. So what do you get from Excel?

(I care about this, btw, because my real job is making a new spreadsheet.)

I don't know what the limit for `for` loops is, but Excel has some limits to nesting of flow control structures like `if`. For example, you cannot nest `if` statements more than seven deep. I suspect the parent poster is referring to a limitation similar to this, but for nesting `for` loops.

We did something like this in OfficeWriter [1]. We built a set of APIs to create, edit and generate Excel (and Word) files programmatically without the need for MS Office in the server. In addition, we had a "template mode" so a user could place datamarkers within an existing Excel spreadsheet. At run-time, our code would open the spreadsheet, populate data from a provider data source and then expand all associated cell references such as formulas, charts, etc.

so it was not fully declarative but would not have been difficult to map a high level language to our API. Some sophisticated customers did something to that effect with a web based wizard type interface.

Microsoft purchased the IP in 2007. The independent product is still sold and maintained for existing customers.

[1] www.officewriter.com

For non-technical people whose tasks have reached the limit of what can easily be maintained in Excel, it seems like you need the opposite as well--call it a spreadsheet decompiler, which outputs this hypothetical declarative DSL.

Alice in accounting maintains a set of spreadsheets for a number of years, and finally the task of adding some new feature, or porting it to a batch system in some production line-of-business system falls to Bob, a developer. Bob runs Alice's spreadsheet through the decompiler, spends some time refactoring it by adding descriptive names to some of the "variables," building tests, and generally structuring it for extensability and maintenance. Now he can compile it back to Excel and send it back to Alice and others that need it, or (and this is the really cool part), deploy as just another module in the ordinary business automation.

You've basically built a way to turn a spreadsheet hacked together by a domain expert and a small number of development man-hours into something much move valuable and maintainable.

I don't know much about enterprise-y reporting software, but it seems that if you built this, with the right integrations, you could probably sell quite a few copies of it, plus some high-buck support contracts.

This is a bit harder. An Excel spreadsheet has no self-evident order of evaluation -- the cell dependency graph can be arbitrarily complicated.

Of course, Excel itself (and other spreadsheet programs) already do "decompiling", but this works like an always-on loop that resolves the direct, algebraic dependencies but makes no distinction for business logic. This is probably true of most real decompilation tasks.

Excel's programming model is called "non monotonic data flow programming" [1]. The diagram referenced in [1] shows which other languages support that model Therefore I would suggest to start from Oz, Alice, Curry, AKL, FCP... as the platforms for Excel replacement

[1] https://www.info.ucl.ac.be/~pvr/paradigmsDIAGRAMeng108.pdf

Wouldn't existing excel bindings for languages accomplish this?

https://www.pyxll.com/ http://xlwings.org/

When we have semi-complicated models we try to factor out the harder part to run outside Excel -- and thus have programming utilities available -- and write "front-end" logic (currency conversions, inflation adjustments, charts, etc.) in Excel. I work a lot with Matlab, and this is really to use with the .xlam add-in provided. The trick is to figure out the division between the "back-end part" (that needs debugging tools, tests, etc.) and the "spreadsheet part" (that can take an understandable, but black-boxed/stored-as-values input and do spreadsheet logic that's flexible enough for the guys who need to mess with that).

I think the main reason people choose to use Excel the way they do (aside from lack of choice resulting from policy, etc., decisions) is the easy, no-coding onramp to building things (And tools that let you do coding aside from what is built into Excel would likely, for those policy constrained, fall afoul of the same constraints.) So, I suspect a compile-to-excel language helps anything; the people that need it won't use it, the people that would use it already have many ways to use traditional programming languages to produce apps that the end-user accesses through Excel.

This is a useful idea, IMO. I would use it for programmatically generating complex reports in Excel. Would be nice if it supported advanced stuff like charts and PivotTables too.

I've recently been messing with some Office files (mostly XLSX) using various libraries in Node, and I've been thinking that a transpiler type of tool would be great for this use case, given that an XLSX file is basically just a zip file with some XML files.

Historical precedent indicates that this will not be a high value story for the product owner. Variant data types are a feature, not a bug. If they don't ship in the box, someone will union up their own to make life easier.

People use Excel to reason about their business. Reasoning about the code is a second order problem and SourceForge is paved with the rotting corpses of projects focused on making a better spreadsheet.

Sounds like a mixture of the sc file format and BASIC:

    let A0 = 123 
    let A1 = 234 
    let A2 = 345 
    rightstring B3 = "asd"
    leftstring B5 = "foo"
    label B11 = "bar"
    goto B11 A0
Just a textual description of what your spreadsheet should look like. Gosling had it right back in the 80s ;)

Forgive me if I'm an idea killer, negative nancy, devil's advocate, or whatever. Although this may be functional for a few people, I don't see it gaining traction among users of Excel. I'm a grad student and my lab mates all use Excel proficiently. However, they steer away from programming or anything to do with "code." As an anecdote, I had a few terminal windows open on my laptop. One of them walked by and remarked, "Are you hacking?!"

Also, I doubt version control is something this community would begin using, simply because of the perceived effort required to get going (in reality, source control is not that complicated, but once you start talking about "repositories" you are going to lose these people). It's unfortunate, but most of society (even really smart folks!) prefer to remain ignorant to tech in general. :/

I did something like this when I was doing lots of data analysis for a research project: [1]

The problem space I was addressing was more restricted than a fully generic spreadsheet model, but I found everything you say about reproducibility and descriptiveness to be true. I kept my report/plot descriptions in git alongside the experiment setups / parameters.

FWIW, prior to this, I often wrote one-off Python scripts that would generate, e.g., 1000 rows of a particular set of formulae in CSV then import that into Excel. It probably wouldn't be hard to build a little tool in a few hundred lines of code to do this.

[1] https://github.com/cfallin/scomp

There is far too little done to bridge traditional programming expertise with spreadsheet expertise. In that respect your idea hits close to a sweet spot. However, I think you're not quite at the optimum center of opportunity if you care about this realm; instead of making a programming language that can compile to spreadsheets (few programmers truly care about spreadsheets; few spreadsheet masters care about programming), consider making a declarative markup for ETL processes based on visualizing results in spreadsheets, with bindings in all the major programming languages.

It's a useful idea, but it would only be useful for those programmers who are saddled with using Excel as their delivery mechanism ... which is a tiny minority of the people writing complex VBA, macros or formulas, or using Excel add-ins they customize. Those people are usually business users and are not interested or not willing to act like real programmers.

Imho / ymmv, of course, but I am coming from 15 years of enterprise experience where a lot of my team's dev work seems to have been replicating Excel spreadsheets as web apps. :-/

Link to thread from yesterday: https://news.ycombinator.com/item?id=9935357

thank you

There is an Excel dialect (DSL) for Rebol - http://www.robertmuench.ch/development/projects/excel/dialec...

Here is a cleaner example I posted on HN before - https://news.ycombinator.com/item?id=6668323

How about company that will turn your spreadsheets into apps. Start with specialized consulting and start building product to cover common use cases.

Like Gigster: http://techcrunch.com/2015/07/22/uber-for-developers/

Input: Send the existing spreadsheet and maybe some description.

Output: Based on spreadsheets and their history you build and host a webapp to do that.

A friend wrote this: a project that converts Excel sheets to C or Ruby.


This is precisely what we do at Cloudstitch. If anyone is interested in this model please reach out!

Do any of the open source office spreadsheet equivalents have a sanely parsable file format that can be extended for this? Ie, this language would basically be a more verbose file format?

I'm not sure this warrants a domain-specific language, but I can see the attraction of a really nice API for generating dynamic Excel reports with VB macros and whatnot.

Hmm, there may be merit in this. Though a counter case was when my brother just wrote a C# plugin for my dad to perform some of the more annoying functions he was dealing with.

Would it be beneficial if your dad was able to achieve the same on his own by using the language?

Really hard to say I think, anything is possible.

> You are encouraged to use names instead of cell coordinates.

You do realize that it is perfectly trivial to name cells and regions in Excel and use these names in formulas?

The https://datanitro.com/ product integrates Python into Excel quite nicely.

Like Sublime Text, but for Spreadsheets?

1) Create a contained python/node/ruby environment.

2) Bind an object to a spreadsheet ui

3) ????

4) Profit?

It's a great idea. I built a whole reporting package that basically rendered to Excel and it was a real time-saver vs developers creating one-off Excel books and writing custom code to fill them in.

Rendering to Excel isn't the difficult part though (unless you're trying to do it equally well on all platforms I guess). Creating a new language/DSL and all of the other features you mentioned are the riskiest parts of this venture.

So, can you do it? If so, do it quickly and then sell it to Microsoft for a couple hundred million :)

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact