This was a nice surprise to see today as an Ex-Excel developer who worked on trying to bring Python to Excel (and, I guess, failing ;)).
7+ years ago I had the option of leaving the Excel team. My then boss’s boss knew I had an interest in bringing Python to Excel and offered me a chance to tackle it if I chose to stay. What was meant to be a 6 month project turned into a ~3 year project, the Python part faded away and we ended up enabling JavaScript Custom Functions in Excel instead.
For Python we were also running ‘in the cloud’ (AzureML v1), although there was some back-and-forth on if we should run locally. I think what made the Python part disappear was our partner AzureML team re-orged, re-released, re-hired, we lost a PM and our work caught the attention of another partner team who realised they could use our code to execute their JavaScript out-of-process. And so I spent a lot of time ensuring that feature was successfully shipped at, I guess, the detriment of Python.
I had a lot of help from some strong engineers and learnt a lot. The core of the work was modifying the calculation engine of Excel to allow functions to compute asynchronously, allowing the user to continue working on other parts of their spreadsheet while the remote endpoint (be it JavaScript, Python or something else) was computing. Previously the spreadsheet would lock up while calculations were running, and that wouldn’t be cool for long-running unbounded calculations. Have to wonder if any of the stuff we built made it into this new feature.
Super great to see this and look forward to trying it out.
Thanks for your work! The async JS functions are amazing and allowed me to ship rock-solid Custom functions with xlwings and Python. The problem with Office.js is that it’s just simply too hard to get started with all the npm bloat for the average Excel user (even more professional devs struggle). Well, I am pretty happy how I am saving users from having to use Node.js: https://docs.xlwings.org/en/latest/pro/server/officejs_custo...
Oh this must be Felix! I saw and played with xlwings many years ago and was impressed :)
Agreed with the friction around getting started. I would like to see it reduced. It’s cool to see you using them to wrap Python functions - I have done something similar before myself, and still need to get around to shimming Haskell into excel at some point.
Another thing I don’t like is how JavaScript Custom Function cell formulas have a horrible string after the _xldudf marker, which becomes noticeable if your workbook doesn’t/can’t load the custom functions addin (or you unload the addin)
It's me, yes, are you Michael then ;)? Yeah, I love the Office.js platform, I believe it's the most versatile extension platform there is, it just came out during an unfortunate time (when people believed that you can't use JS without React anymore ;) Anyhow, the magic of Office.js is that you can use it like a JS framework with total freedom what you use on the backend (it's just a fetch call away), so whether that is .NET or Node or Python/R/Julia doesn't matter at all, it's just that the official docs don't really bother to explain that or show an example other than Node & C#. What's a game changer (and only available on Office.js) is the integrated authentication via SSO/Azure AD, which is the first thing any IT department in a corporation is interested in.
> The problem with Office.js is that it’s just simply too hard to get started with all the npm bloat for the average Excel user (even more professional devs struggle)
IMO deployment is an even harder problem. The Office team makes it incredibly difficult to deploy web add-ins (presumably for business reasons) and accordingly most companies I'm familiar with are still using the ancient, barely-supported COM+VSTO add-in models.
The part that makes it a non-starter is that it has to be in some kind of a store (whether it's on prem via SharePoint) or the Internet. What is needed is that you can install something on the user's computer (xcopy install, msi, w/e), but without requiring any other Microsoft end-point. And then, the bits installed either run completely on-node (similar to COM/VSTO) or it would require that access to remote web-server be required. If a remote web-server is required (which is a bit of a limitation), it should be something that can be pointed to without Microsoft's involvement. Just like I can create a web-site, give you a URL to it, and you can go it using your WebBrowser. No app-stores.
That doesn't mean app-store shouldn't be an option. I would love to also publish my add-in in the Microsoft AppStore, but it should be my choice, not a hard-requirement.
That's why people are still using VSTO, not because we have a problem with JavaScript vs. C# or COM.
The app store / Office admin stuff. When I was dealing with this stuff 2 years ago, it was insanely difficult to deploy a web add-in internally (something involving the "Microsoft 365 admin center"), let alone deploy it to customers (gotta get approval to deploy it in the special Office app store, and hope+pray that your customers' IT departments don't block that).
Compare to VSTO, where IIRC installation just involves editing a registry value to let Office know where your add-in can be found on disk.
You just reminded me I tried to make an office addin a few years back and distribute it on the office store. Terrible experience trying to get Microsoft’s admin/business center (whatever it is/was called) to verify I was actually a real human/owner of the domain as a sole proprietor. Terrible.
By chance I built quite an interesting addin for excel using VSTO. It was great, nothing to complain about. In the corporate world Microsoft tech is widely common.
I don’t unfortunately. Given the more technical nature of the feature I’m inclined to agree Microsoft should make a bountiful supply of working examples.
> the Python part faded away and we ended up enabling JavaScript Custom Functions in Excel instead.
Nobody could have anticipated that GPT4 would author Python better than anything else.
Nonetheless, I kind of just want Google Drive except Excel, without adopting anything more than the Office ecosystem, that just works with simultaneous editing.
I wish it wasn't solely powered only by Microsoft Cloud and could support python running locally, but regardless I think this will still be huge and single-handedly modernizes Excel by a large margin.
This alone could eliminate the need for websites that just want this sort of data. I can think of a past project at a previous job I did building an analytics website used only by a handful of people internally that could have been just as well served with something like this, had it existed at the time.
> This alone could eliminate the need for websites that just want this sort of data
I catch flak for this all the time! Our users just want to be able to use Excel to free-form analyze their data. IT, of which I'm a part of, insists on building web apps to do that for them. They're never happy because all they wanted was Excel.
I keep explaining all we have to do is build a data mart built using SQL Server, and use our ETL tools to keep it populated with the data they need to analyze. Don't let them anywhere near the source data. They keep looking at me like I have ten heads. Instead they want to launch multi-million dollar projects to bring Tableau to our users - and they don't want it! It's asinine!
Our company uses Power BI for this very reason. Most of our users are content with what's available in the reports we've built in Power BI. However, a few power users directly connect to the datasets we've created (that also feed the Power BI reports) and can mix and match that data in Excel.
Excel will import datasets from the Power BI client, and you can refresh the data at any time. The nice part of this arrangement is that the whole model is imported. All the relationships between tables, all the added calculation columns, all the measures are there for use by the end user in Excel.
Power BI has been a great solution for our company (~1000 employees).
There’s also the Power Query add-in for Excel that’s super steezy.
It’s like a light Power BI.
Honestly even thought I can make interactive report webapps, that stuff needs to be built and maintained so I rather have people use Excel or a BI tool. Even I rather use Excel or a BI tool than write code.
To be fair, Tableau Desktop, Tableau Prep, and Tableau Cloud are great if expensive products. With Tableau Prep, you get a visual ETL tool. Being technical, I prefer SQL, but for business folks Tableau Prep is more of a REPL so they see what their actions are going to do. They can schedule these Prep flows in Tableau Cloud which will ETL and produce daily datasets which users can then interact via the dashboards on Tableau Cloud.
IMO, Excel is on the right path with out of the box connectors to Salesforce, Azure databases, etc. What it really needs is full blown SQL with REPL output so the users can see the effects of their SQL. (The way Tableau Prep does it is that it defaults to a sampling of rows to output immediate results). There used to be a Microsoft service called Excel online but it was another thing you had to buy and was super confusing. Excel needs to let users schedule their ETL/datasets in SharePoint online and then let other users subscribe to it. They can call it Excel Super+ and charge a small fortune. I am positive there are people with their wallets open.
That last part is almost exactly what we do with Power BI. We import tables from a SQL data warehouse into Power BI. We then build a data model (creating relationships between tables, adding calculated columns to those tables where needed, and creating measures to summarize the data). We then create reports based on those datasets. Most of our users are content with what they see in the Power BI web client.
This is the killer feature, however: power users can access those same datasets through Excel and import the entire model. All the relationships, all the calculated columns, and all the measures are there for the power users to mix and match and filter however they want in Excel. It's been a phenomenal success for our business.
>This is the killer feature, however: power users can access those same datasets through Excel and import the entire model.
I had this once. Then someone decided it was 'dangerous' and locked us out. I went back to exporting the data from the ERP and loading it into Excel...because I was allowed to do that
Non-software engineers looked me like I had ten heads for suggesting that people would need to learn a modicum T-SQL to make use of it. I even volunteered to write all the views for them! The lesson learned for me is that horrible expensive ERPs/ETLs tools are never going to be replaced with Excel + a database because even the (non-software) engineers at a large company are not interested to learn how to write basic SQL statements.
Google’s ecosystem makes this incredibly simple.. all you need is to populate BigQuery with the datasets of interest, and Sheets can pull it in via a connector that’s configureable in a GUI. I sincerely hope my org never migrates to MS for this reason alone.
> I keep explaining all we have to do is build a data mart built using SQL Server, and use our ETL tools to keep it populated with the data they need to analyze.
I can relate and have experienced this. The result was that they would end up exporting from a series of Tableau/Looker workspaces, and importing into excel and doing their analysis there.
Then continually making enhancement requests on the Tableau/Looker workspaces to get any additional data they need for their excel work.
Then of course, the excel document and its multitude of versions would get emailed around until it hit the attachment size limit, or if there were too many version of it to figure out which one was "actionable".
We finally figured out what they were doing when the enhancement requests for the reporting workspaces got too bizarre.
How do you lock down Excel - that is the issue we have where I work and the main reason excel is strongly discouraged.
There are two main issues my org has battled with:
1.People basically turn excel into their own mini databases which causes a support nightmare.
2. This is strongly related to 1) Data provenance is a nightmare because so many people were passing around excel spreadsheets it became very difficult to track down the source of the figures people were using. User A passed their spreadsheet to user B who modified it passed it to User C who further modified it, passed it to User d from there some figures ended up in front of a manager and now we have a problem...
It's not so much a problem with Power users those are the types of people using python (or R) it's the tier of users below that level that cause the damage with excel.
Are you using Power BI? If it is easy to find and use ground truth data I find that people will start using that. Encourage sharing reports, but make it "official" by sharing them through Power BI.
You're probably asking for something impossible. It's common for the raw data to not fit in excel. You have to do some sort of aggregation or filtering, which means having that app where you aggregate and filter.
Almost all these tools support exporting to Excel or CSV, so they can do whatever they want in Excel after that.
In multiple orgs now I've had raw access to data warehouses (DB2 and SQL Server based) that had aggregates of petabytes of data from ETL that I was free to access via Excel PowerQuery, direct SQL tools, Python or R, or even PowerBI.
I didn't mean aggregate as in summarized data, but aggregate as in a common repository of multiple different data sources as a single massive data warehouse. If I needed to group or summarize the data, I had to do it in my own queries.
My point wasn't that you should never give people access to raw data, but that the raw data won't fit in excel rows. You have to do aggregation or filtering somewhere to get the data within the row limit.
> I keep explaining all we have to do is build a data mart built using SQL Server, and use our ETL tools to keep it populated with the data they need to analyze.
I mean, that's like >90% of building a CRUD web app. If you truly only need the R in CRUD, your approach sounds reasonable. Otherwise, just go the extra distance and build a CRUD app with csv export / direct SQL access.
We use Excel for a bunch of "critical" tasks as defined by the financial regulator, which means any of their dependencies also have to be assessed to ensure they meet the required level of stability.
What this means is we'd need to go from just requiring a laptop and a copy of the spreadsheet to requiring redundant internet connections and validating Microsoft's backend meet the regulator's requirements. I'm pretty sure that would unfortunately rule out using it.
Having seen the quality of “critical” macros, excel spreadsheets, and “dev/prod” SMB drives anyone can mess with it’s absolutely clear no regulator gives a fuck.
I think it's similar to the current generative A.I. trend.
The companies I work for while consulting (mainly financial and health organizations) have been very cautious about any company or PII data going to these A.I. programs, and the cloud in general, and are looking to make their own local run generative A.I. LLMs for their employees to use.
By it's very nature these Excel documents will be filled with company data, so I can imagine these companies would also be a bit cautious sending those calculations to the cloud, especially if they can't control the flow of data from their I.T. departments via web applications.
But I'm just a software engineer at the end of the day, and don't professionally use Excel all that much right now (if anything I use it much more for my hobby of board game design). So I can't speak with too much authority on the matter. Just my observations.
> By it's very nature these Excel documents will be filled with company data, so I can imagine these companies would also be a bit cautious sending those calculations to the cloud, especially if they can't control the flow of data from their I.T. departments via web applications.
This is exactly what I see when talking to users about xlwings: it needs to run on their end, behind their firewall, in exactly the cloud they have picked (AWS, GCP, Azure).
I do a lot of work with US state governments, it's the same story mostly. It's possible to get the approvals required for data to be shipped off to some "cloud" but it has to be FedRAMP certified and no data can move outside the contiguous 48 states.
It won't work offline, for one. I use Office exclusively without internet as otherwise who knows what private stuff it's sending back to Microsoft, which then becomes eligible for warrantless surveillance by the federal police.
I know it's old fashioned to want local software to only change things on my local file system, but if I didn't want that paradigm, I'd just use Google Docs.
Yes, this is important where you're online but not on a high throughput connection or any part of the connection has intermittent faults.
The disruption from even a few outages is enough to prefer to run things locally. We had issues with people unable to sign into Excel and everyone went mad (rightly so, as Excel was running locally not 365 but it blocked usage)
And on top of that, there are scenarios where you'd want to process substantial amounts of data that would be prohibitive to upload, process and return the results, where locally handling it would take way less time.
I eventually gave up on it because of the locals vs cloud differences. The documentation for the whole thing was terrible as well. I think they’re in transition (assuming cloud only). In my case, I just went and used Calc from LibreOffice.
The Cloud, and therefore the integration of as much as possible in it, is core Microsoft strategy.
It's about the recurring payments, the lock-in, and ultimately literally having the knowledge and business processes of the world economy on their own computers. If this is true, the reason why it is important to run data and scripts locally should be clear, and so is the motivation of your PMs to actively combat any such things.
But if you get a summary in your inbox and show it to your PMs let us know how that goes down.
Yeah, but they also make money on the office subscriptions. So you also want current working data scientists who would not touch excel to give it a chance because they can use their existing python back-end. Eventually, there would be a service that is Microsoft exclusive, but they have to compete on services.
Idk, it's really up to Microsoft's business units. Office could be the loss leader to get cloud business, but it could be the other way too. Either way, I would hazard that they want people to actually use this integration feature.
Is there any technical reason excel couldn’t ship with Python? The license allows it [1] and it would probably solve the issue of all users being able to open any document like the cloud solution, but still run locally.
Can you just please fix the issue with date strings (such as "20230822090811") being converted into scientific notation? I've received literally hundreds of these documents over the past decade. And as long as this problem exists, I will keep receiving them.
Can you please just add a button - "unmess this document"? The number of man-hours I've wasted re-creating documents is countless. My job is to do other things not fix excel documents.
However, to my eyes, and probably most people, 20230822090811 does not look like a "date string", it's a number. It's correct to display it as scientific notation if it's too many digits to fit into the width of a cell. Please let's not encourage them to interpret more things as dates and randomly break people's workflows.
Excel should preserve all the digits you enter in the cell even if it doesn't display them, though. That has nothing to do with dates. It's a problem with things like product numbers or keys that can end up with many digits.
Are you saying that excel does preserve digits? I must be doing something very wrong (entirely possible), cause once the document is saved, the digits are gone.
The way you fix this is using the text import tool or better yet power query and specifying the column type to be text. With power query you can repeat this for as many files as you need to import.
If he's recieving excel files with this issue from other people, the text import tool isn’t a solution and Power Query is a bit circuitous to solve what can be addressed by formatting the range with this kind of data.
+1. I just discovered this, and it's happily eliminated at least half my use case for Excel, and does a better job at it too. That is, visualize CSV files without messing with the data.
The scientific notation ("2.02308E+13") is just how Excel displays a large enough numeric value like your example; it's not doing any conversion, as you can see if you paste that number into Excel, confirm that it's showing the scientific notation, and then click into the cell to confirm that the formula bar still shows the original number (20230822090811).
If you paste your date string in a more standard date format like 2023-08-220 9:08:11, then Excel will correctly infer that it is a date. However, you seem to be saying that Excel should handle ANY numerical strings that way if they are long enough to express a date or datetime, which, while it might make your particular workflow easier (or it might not), it would really mess with people's ability to use Excel with large numerical values...
Do you _really_ think I've spent the past 10 years getting files like this and I still don't understand the reason. I didn't ask for an explanation. I asked for a fix. What are you doing here??
I can provide instructions on how to do this until the cows come home. I will still get files with these issues. Is that not obvious?
Do you really think I want to examine _tens of thousands of rows_ in documents that are 100 cells wide looking for a single exponent value!?!?!?! Is that really what you're suggesting as a fix here?
My my, why so angry? You're expecting something that Excel doesn't support for good reason. The string you used as an example might be a date string to YOU, but to Excel and many of its users it's just a large integer. Lots of regular use cases would break if Excel would interpret that string as a datwle
If you want Excel to interpret strings as dates, use a more common date representation such as iso8601
It’s one of the most infuriating and dangerous behaviors I have ever encountered. Excel silently truncates data. The anger is warranted. The problem isn’t assuming a display format for data, it is that a poor assumption actually mangles data. Excel doesn’t fail safe.
Numbers in Excel only maintain 15 digits of precision. Leading zeros are also truncated. If you haven’t experienced this you are lucky (or just didn’t notice) but it is real.
> If you want Excel to interpret strings as dates, use a more common date representation such as iso8601
You don’t always control the upstream system. Your response can be considered rude because it implies you think you know more than the person expressing pain. This is especially insulting with such a widely known problem in Excel.
This behavior burned me with billing system IDs. Logically they are strings, we would never do math on them. For performance they were implemented as BIGINTs in the DB. Excel truncates everything after the 15th digit. Our IDs were all over 20. So accounting couldn’t reconcile our reports with the billing system.
> It’s one of the most infuriating and dangerous behaviors I have ever encountered. Excel silently truncates data.
That's a problem, sure, but not the one sixothree is complaining about.
> The problem isn’t assuming a display format for data
Your problem might not be, but that’s exactly the problem sixothree is complaining about that the post you are responding to addresses, so while your post references a valid complaint, it is a complete non-sequitur as a response.
> Do you _really_ think I've spent the past 10 years getting files like this and I still don't understand the reason. I didn't ask for an explanation. I asked for a fix.
I don’t believe you’ve been doing it that long and haven't found the fix.
Apply appropriate formatting if you have long strings of numeric digits [as long as they can’t have leading zeroes] that you want stored as numbers but displayed without scientific notation. (Not the best choice for what is semantically a date, in your case, but...that’s a whole bigger issue and, well, babysteps.) You don't have to look at anything, just do it for the ranges where that is the kind of data:
The text import wizard does work. The problem with it is that everyone has to use it and if anyone doesn’t then your data is silently mangled.
In the past I have added an apostrophe to the front of any integer that should be displayed as a string. Excel will not display that leading single apostrophe. It of course taints that data forever but I consider “Exported to Excel” a terminal state anyway.
I cannot remember the last time I had some long number and thought to myself, "You know what will be helpful, convert this number into scientific notation!"...
It's asinine Excel has this behavior with any large number. Try to work with a list of EAN/UPC codes... it'll wreck every single one every single time, unless you take significant care to guard against it via formatting, special characters that trick it into using a string, etc. Try working with a spreadsheet full of database id's... same thing. It's asinine.
By the very nature of Excel, large numbers belong as-is... ie. you're studying something and need the values. Approximations (what scientific notation gives you in the best case) are an exception to what people need nearly every time.
> I cannot remember the last time I had some long number and thought to myself, "You know what will be helpful, convert this number into scientific notation!"...
Okay, so set formats appropriate to what is useful for your data.
Defaults don't cover all cases, otherwise there would be no non-default options.
> It's asinine Excel has this behavior with any large number. Try to work with a list of EAN/UPC codes...
Codes aren’t numbers they are strings, even if the code is a sequence of digits. They should be formatted as text. For actual numbers that shouldn't be displayed in scientific missions, the appropriate numeric format does that.
> it'll wreck every single one every single time
If its any code that can be safely stored as a number, it won't wreck it. It’ll just look bad until its fixed. If it, say, has significant leading zerores, yes, it’ll wreck it, because excel defaults to dealing with numbers, not text, and there is a difference between text made up of digits and actual numbers.
> unless you take significant care to guard against it via formatting, special characters that trick it into using a string, etc
Knowing the shape of data you are entering and selecting a column and applying an appropriate format (including “text” as an option) before you enter any data (usually, to a whole column) isn’t that arduous.
The defaults should be sane - that's the point. What percentage of Excel users actually want data automatically converted into Scientific Notation? Close to zero I'd wager.
Your offered work-arounds only apply in a very narrow workflow. I work with Excel nearly every day, all kinds of files (CSV, TXT, XLS/X, etc). There's so many ways this issue can bite you unexpectedly - causing you to save and permanently destroy data.
If I type/paste/import/download/whatever `01234567890987654321` into a cell, 100% of the time I do not want `1.23+E18`. I don't want to even see that, I want to see the original value I entered.
It's asinine.
Excel may be the most used Office Suite app, but it's by far the most painful to use. The Excel Team has a long history of justifying downright perplexing behavior. Remember when Excel was the only Office app that didn't support Snap-To back in the Win Vista/7 days? Joel had a long-winded explanation that boiled down to "just cause"... it was perplexing even at the time. It really makes one wonder how bad the rat nest of a codebase Excel is.
"There's ways around our astonishing default behavior so just deal with it". That's not a great answer.
Are you sure? It seems to for me, which is annoying because by recognizing them it changes them to a localized date format which then breaks when sent overseas.
If I type "2023-08-22" into a cell in Excel I see "8/22/23" in the cell and "8/22/2023" as the value.
Definitely broken for me on Excel version 2302 (build 16130.20714) on an En/US locale.
None of the ISO8601 formats I've tried are recognized. "2023-08-22", "2023-08-22T01:38:22Z", "20230822T013822Z" are all valid ISO8601, but Excel treats them as unformatted text. Inputs with slashes are instantly recognized as a date.
If it works for you, it may be a locale thing. But getting away from that is the whole point of ISO8601.
Not OP but I would never want to use this if I was running a security minded business. Why would it be okay to send out my entire dataset to the Microsoft cloud to run a few simple queries. How long is the data stored on your servers? Since it’s operating on the data directly, it’s not possible to anonymize or redact the contents of the data.
Practically speaking, if you are running recent versions of Windows (I don't remember when exactly this trend started, with 2000? XP?), you have basically no control over what data you send to them and when. You can try to block traffic but it will interfere with normal Windows operations. You can try to investigate and play cat and mouse game with Microsoft, but they will always be one step ahead of you - unless you decide to turn off automatic updates and make your system less secure.
It makes little difference if it's a VM or bare metal - what matters is network connectivity. If it's on, you lose control over the data leaving your computer. In Linux, BSD and others, you can control it in a very precise way.
I can put this in an email if you want, but one good reason would be that my laptop is probably faster than the cloud compute that they are willing to give away to 365 subscribers. I can do much more inefficient things in a single cell if I know my laptop can pick up the slack :)
(Although, I'm guessing the reason they did it this way is so they could keep the execution model the same for Excel on the web? It's always running on the external service vs sometimes running locally and sometimes running externally.)
How about an actual API that lets you drive Excel from Python rather than the other way around?
With that API, it won't matter whether you want Python, Ruby, Lua, whatever, for your scripting language.
This is nice, but it welds a specific version of Python inside Excel and makes it a gigantic PITA to use your normal development tools on your Python code.
Customer verbatims are like, an incredibly valuable currency. It's all fine to say "I know the product in and out and I think this is important". But it's another thing entirely to say "I know this is important because _this customer right here asked for literally this_".
"It is difficult to get a man to implement something when his stock options depend on it remaining unimplemented." -- how a modern-day Sinclair Lewis might answer
There are few key reasons for running in a hypervisor isolated container on Azure with no access to the Internet:
- We can guarantee a consistent experience for all users. Imagine having to maintain your own local distribution of Python and guaranteeing that it works with Excel as versions diverge over time? Yikes.
- We make it possible to share your Excel workbook with other users and have the calculation just work. That wouldn't work with random local installs of Python and users would be super frustrated by this.
- Security. Imagine opening an Excel workbook that can execute Python code running locally as you.
Also, I totally agree with your second point. Trying to write an internal app in Python that integrates with all of your existing IT infrastructure is an exercise in frustration at best. Excel is already part of the IT infrastructure virtually everywhere and is a programmable reactive canvas.
Disclosure: I work on the design team for the feature.
I agree with point 3, but points one and two don't make any sense because games have been skipping with Python scripting for decades and this is never a problem. The python interpreter is embedded into your program as a library and it doesn't have any dependency on the whatever python version the user installed.
It's the dependency hell problem as packages are added to the sandbox environment. If they were just shipping python (standard library), it would be fine, but all of the useful libraries have large numbers of dependencies.
It's not hell because they carefully curate the set of libraries rather than having end-users attempt to have every combination of packages (and then blame microsoft for the dependency hell / lack of support for their exact configuration.).
You say this as a good thing... but as a data scientist that uses python every day I can assure you their "carefully curated set of third party libraries" definitely will not include most of the libraries I rely upon for my work.
> games have been skipping with Python scripting for decades
That only works if there is a forever fixed version of Python embedded in your game. The value of Python in this context is its ecosystem and folks will need to install additional packages and libraries into the execution environment. Now you're managing a local distribution of Python.
Yeah, but if the big value is supposed to be in the libraries you sort of have to accept dependency hell? Otherwise you're in the world where a small set of libraries can deliver all the functionality ever needed.
Of the currently available solutions, "cloud environment made by Microsoft" doesn't sound that different from "software update made by Microsoft" in terms of how it solves the dependency hell: basically, the critical library you need is either in or out, but there is nothing you can do about it.
Admittedly, here I'm assuming Microsoft won't let you install things into the environment yourself since that basically means hosting virtual machines running arbitrary code and putting the Excel brand in front.
Dependency hell may be an issue if you install arbitrary libraries into the container at runtime. Note that this feature isn't currently available in the preview.
On versioning: we freeze the container image that your Workbook was authored against. You need to manually accept (and validate things continue to work) updates to that container image as we roll forward.
Could you not achieve the same by shipping the required tools with Excel and running them inside a Windows Sandbox?
Isolation isn't as good, but it probably protects users against the threats you/they care about?
When you open a workbook that is using B show a warning before running anything. Then behind the scenes spin up a new venv that installs everything from requirements.txt and executes the code in workbook locally.
It running in Microsoft Cloud is precisely what I think will actually allow this to be useful functionality for my company in ways that macros aren't, because macros are seen here to present a vector for running malware on client machines, whose utility doesn't justify the risk.
Just wait for someone to run malware on your mystery cloud instance of Excel. Have you seen the cross-instance and cross-domain security holes Microsoft has been running into? The latest and greatest was on their Power Platform ("Excel-like"). Nothing like silently shipping your data off to a remote location for inspection by blackhats. Microsoft cloud security is torn apart and laying all over the floor right now...nevermind all the half baked features that let you get kneck deep in a project before you see the grande middle finger from project hellscape.
It sort of makes sense to limit it to the cloud, because making Python an integral part of the Excel file format would reduce its portability and introduce all sorts of compatibility headaches going forward.
If you've ever dealt with AWS Lambda you will know what a versioning and upgrading shit show this is going to be wherever you stick it, be that client or cloud.
I worked on a very early python integration used as a workflow DSL back in 1998 and they had to dump it and write their own DSL in the end.
Whether it runs in the cloud or locally, the included python source will be written for a specific runtime version and environment. If excel were to run python locally, it would presumably bundle an interpreter (+ any needed pip packages) rather than relying on whatever was installed locally.
The cloud execution model does still have an advantage in that it can retain old runtimes in perpetuity for backwards compatibility, though
Under the model you're proposing, would I potentially need to download an old copy of Excel because the sheet I'm trying to load relies on an older python runtime?
Computers are computers. Potentially new Excel would be incompatible with old cloud Python environments. Potentially new Excel would be compatible with old local Python environments.
Excel already has versioned file formats with significantly different affordances, and hasn't appeared to have all sorts of compatibility headaches as a result.
Integrating a whole separate language ecosystem is one or two orders of magnitude more complex though. It’s more like HTML5/WHATWG suddenly adding Python as a first-class scripting language besides JavaScript, and web browsers having to integrate that.
It’s different because WASM is much more strictly defined and has a more limited scope than Python. Thanks to WASM, browsers don’t have be able to understand Python/PyScript. The trade-off is that you can’t, for example, view/edit the Python code in the browser’s devtools.
A WASM-like approach wouldn’t work very well for Excel, because you’re supposed to be able to edit the source code/formulas within the Excel application.
> It sort of makes sense to limit it to the cloud, because making Python an integral part of the Excel file format
Moving it from the cloud to local doesn't make it an integral part of the Excel file format. Whatever python service is running in the cloud could just run locally.
Oh the cloud part is going to be a huge deal breaker in so many industries. This will automatically be blocked by default at my work place for certain.
What's the difference between Python and VBA in that respect? Do you open Excel files now? Pretty sure that wild Office macros were once one of the main sources for computer herpes.
I too would love this, but I can tel you, even in a sandbox python would be very easy to abuse from a security perspective. Consider how much havoc on humanity (no exaggeration!) VBA Macros have caused.
I have to see some red flags here. The very nature and power of Excel comes from the fact that workbooks can stand alone. This means that the process is eternally tied to the cloud now. So when you, 3 years down the line, have to open a workbook from your deceased colleague, you are at the whim of the vendor supporting the existence of this integration.
Step one of any workload will always be "work out how to make the software work again" and that's terrible.
I know this because I have been in that situation several times in the last couple of decades with random Excel, outlook and word add-ins. And those don't even do the processing remotely!
> We can't even own the things we're wageslaving away at our desk for.
But we never owned those things we're wageslaving away at our desk for, that's why its called wageslaving. Whether our employers own them or rent them seems pretty immaterial.
Yeah I didn't phrase it well. My point was even our work is becoming intangible now "for the greater good." Just funny to see the progression of things.
I think this is easier to manage to be honest. They are using Anaconda distributions so if you tie each workbook to a distribution with possibility to update/rollback it makes things a lot easier than trying to manage a local installation.
Try explaining that to the dude who inherited the workbook in 9 months who doesn't know what Anaconda is, barely knows what Python is and was told that his job was Excel which he's really good at because he did a Udemy course on it 3 years ago.
I'm not sure why that is a problem. He didn't have to know anything about Anaconda. And for stability, you can still install 10-year old python 2.7 packages from Anaconda.
I would assume that workbook would be tied to a version that it was created in and the if the user does not want to update they can just continue using it with the older Anaconda distribution. They don't really need to know what Anaconda is in this scenario. It only becomes necessary if you want to update a certain workbook to a newer version.
> This means that the process is eternally tied to the cloud now.
Not only the cloud, also the subscription. If you cancel the subscription (note that the announcement mentioned it might be a paid module), don't have the subscription (because you got the Excel from a supplier), etc. you already have a problem using this Excel sheet.
I haven't used a non-cloud spreadsheet in probably 7 years. I also wouldn't use excel for anything that has a combination of high importance and long lifetime though. It's mostly for some brainstorming or thrown in a presentation as a screenshot.
> I haven't used a non-cloud spreadsheet in probably 7 years.
Last time I use a cloud spreasheet was a month ago, to download the content into Excel
> I also wouldn't use excel for anything that has a combination of high importance and long lifetime though. It's mostly for some brainstorming or thrown in a presentation as a screenshot.
It's funny but that's exactly how I see online spreadsheet: for unimportant things that may not work at any point in the future if the account is closed. However, the collaborative features make them nice for brainstorming.
This will be true whether it uses the cloud or not. Excel worksheets can already have external links (both on the Web and to other files). If those web resources or other files aren't available, you're out of luck! And the older ones may only open on software that no longer runs on modern machines.
This is why it's important to restore your backups once in a while, or in this case, validate your critical files.
It's an exacerbation of an existing problem - who has not been sent an Excel file and opened it to discover some Sharepoint (or whatever) link is missing.
The new competition to this market have gained market share by making workbooks not stand alone, it's a whole growing product segment. So they either have to evolve or die.
I was so excited about Python in Excel, until it said "run in the cloud." There just doesn't seem to be any reason for this except to tie you into their cloud service. It feels so much like protecting me from unsafe printer ink.
I'm generally a fan of MS Office products and try to give them the benefit of the doubt.
Please help me think of a plausibly good reason for requiring Python to run in the cloud, that is not just about lock-in.
These are red flags. M$ is forcing people into their cloud. Their hijacking of your local file system with onedrive is a great example that many people already ran into.
I think we are about 5 years overdue from bailing from M$. They need to be avoided. Every product from Windows to Office already has anti-consumer features.
Not asking customers to manage a local python installation is one thing, although I totally understand how excel + python power users would likely be comfortable bringing and maintaining their own python runtime.
Anaconda distributions aren't exactly small. I'd also assume there's some sort of environment isolation, so you're getting multiple copies. Maybe one per workbook?
The embeddable package from python.org [1] is 20 MB unpacked. To install different sets of packages per workbook, just have Excel automatically set the module path as appropriate.
There are a lot, I mean a lot apps that support scripting languages and don't need you to manage the language's runtime yourself. Blender's UI is entirely in Python. Civilization games have most game logic in Lua.
Probably because most users of excel don't have any experience installing and running Python which is even a pain in the ass for people that know Python lol.
You also have people using Excel on iOS, and Android devices. This allows them to use and read Python in their spreadsheets.
>Probably because most users of excel don't have any experience installing and running Python which is even a pain in the ass for people that know Python lol.
An embedded Python is about 10 MiB. I just checked. Excel could easily have its own separate copy.
Based on the announcement it looks like the standard data-science libraries (numpy, pandas, matplotlib, seaborne) will be available. The covers a lot of use cases. However I would really appreciate the ability to set the python interpreter to whatever I want. Then I could access the niche libraries I need (shout-out to JSBSim for open source flight dynamic modelling). Or even better, hooking up my organizations custom libraries we built for our own specific data analysis applications. Huge value add. Shame this likely won't be possible. Unless I've misunderstood some part of this.
there's always Grist. its not a clone of excel but its a spreadsheet/hybrid database so it does some similar things, but most importantly let's you use python for formulas and also has a self hosted option if you want to run it locally
Back in 2005, some friends and I started a company called Resolver Systems to build a Python-enabled spreadsheet, which we called Resolver One. Sadly, it never took off in the marketplace, and we had to pivot; we would up creating PythonAnywhere, an online coding and hosting environment, which worked out pretty well and was acquired by Anaconda last year. And now, the circle is closed :-)
(Just for clarity: the team working on this inside Anaconda is entirely separate from the PythonAnywhere team. It would have made a perfect Hollywood-ready story if it had been the same people...)
It was a great product - I remember your pitch and indeed we met a few times at the London python for finance meetup (remember the "Enthought Python Distribution"?). The only issue is that Resolver was IronPython-based so there were a bunch of libs that kinda didn't work.
I have to second that, it seemed to be a great tool. I've often wondered why it couldn't get traction¹. I can recall people gushing about it at what may have been the same Python meetup in London, and also people being impressed enough by it at a Haskell meetup that they were lamenting an imagined lost opportunity to shove a "real language" in to a spreadsheet ;)
I'd always kind of assumed that the target audience would have appreciated the IronPython use, as the .net ecosystem would likely have been more valuable to them. Having just looked I see that numpy wasn't available on IronPython until 2010, and I'm sure that would have been useful to have a little earlier.
¹ Pretty sure I've referenced it here a few times too.
Right, the lack of support for C extensions in IronPython was a problem; we kicked off the IronClad project [1] to fix that -- looks like someone else picked it up later and updated it for Python 2.7 [2]. Not sure where it went after that, or whether MS fixed the underlying issue later on.
I built my first app on PythonAnywhere! I've had stuff running on there for almost a decade now. I loved being able to make small tweaks to the app directly in the PythonAnywhere web app. Such a great product. Really missed stuff like that when I started using GCP/AWS.
A colleague was joking, "I guess even Microsoft itself isn't able to manage local Python installations on Windows…"
But imagine if Microsoft actually managed to finally solve the "setup a local Python environment" situation once and for all and Excel became the standard Python package manager across operating systems!
$ excelpip install fastapi==9.11.23
Searching for fastapi version 11.09.2023…
> A colleague was joking, "I guess even Microsoft itself isn't able to manage local Python installations on Windows…"
I've been waiting for Microsoft to solve package management on Windows since I played around with debian's apt 20 years. That's not snark, it seemed like Microsoft's product complexity and dedication to backwards compatibility of software would make them the ones to move software management forward. Really dig into the problem space and reinvent the package manager beyond anything that's come out of the open source space.
Instead we've got windows update, windows features, microsoft store, software center and winget (you know, that Microsoft-built package manager for Windows you install from the other Microsoft-built package manager for Windows).
> It seemed like Microsoft's product complexity and dedication to backwards compatibility of software would make them the ones to move software management forward.
That's precisely why they _didn't_ go with the package manager model of software distribution.
Package managers in most distros assume that everything you download via apt/dnf is a part of the operating system and integrates with whatever dependencies the OS ships with. It's a great system if you're an operating system maintainer, but terrible if you're a third-party software developer who has no interest in dealing with 5-6 different OS's repo formats and dependency versions. On Windows and macOS, you vendor most everything, provide runtime installers for the rest, and that's usually enough to be resilient to bitrot.
Package formats like Flatpak, Snap and AppImage are the only projects who seem to be taking the problem of software distribution on Linux seriously, and of those only Flatpak seems to have the mature tooling and lack of myopia necessary to pull it off. For the times that I do miss apt on Windows, chocolatey is pretty good at installing software and vcpkg works great for linkable libraries.
The state space doesn't have the conveniences that linux and language tools leverage, that's why I wanted to see them try to advance the field beyond what we have now.
Which conveniences are you talking about? Chocolatey gives me the "automatically updating my programs" experience, and vendoring dependencies with vcpkg is about as turnkey as it gets - add a git submodule to your project, reference the vcpkg toolchain in your CMakeLists.txt file, and you're done.
My mind can't help think of the following problem... would Microsoft have to freeze an LTSC set of packages? How would $GOV_AGENCY using a version of Excel one generation behind collaborate with the world?
I wish Libreoffice would have smelled the coffee long time ago. In principle scripting using Python is possible but exceedingly cumbersome/ugly and there seems to be no roadmap to improve on this.
The same warning applies to the linux desktop and all its apps more generally. Increasingly proprietary platforms will be rolling out advanced "AI" functionality extensions to classic apps that are cloud based, many of them via a Python API.
Its a pity that the various strands of the open source universe are so siloed. The potential of the sum being more than the parts is squandered.
I think in a lot of cases its not even the advanced stuff that keeps people on Excel and other proprietary things. It's the seemingly little things.
My biggest beef with LibreOffice's spreadsheet (and with Apple's Numbers) is how they handle copy/paste of discontiguous selections.
Consider this 5 row by 3 column set of data:
1 2 3
4 5 6
7 8 9
a b c
d e f
Suppose you select the 1 2 3 row, the 7 8 9 row, and the a b c row, copy them, and paste just below the d e f row. In Excel you get this (first 4 rows omitted for brevity):
d e f
1 2 3
7 8 9
a b c
In LibreOffice and Numbers you get this (empty cells denoted by "-"):
d e f
1 2 3
- - -
7 8 9
a b c
I haven't found a setting to change this. Excel doesn't have a setting for this either as far as I know, so people who do want discontiguous copy/paste to preserve spacing would probably be as irked by Excel as I am by LibreOffice and Numbers. But I don't think I've ever actually wanted the spacing preserved so it is LibreOffice and Numbers that irk me.
The only thing I can think of is macOS clipboard manager somehow interfering with the copied data. Was testing on Linux kf5 UI.
Also, I can preserve the empty row, if I paste with Ctrl+Shift+Alt+V (Edit - Paste Special - Paste Unformatted Text). This opens the Text Import dialog.
I turned off the clipboard history app I use and now LibreOffice pastes like Excel.
The history app I use is 10 years old and I see that the company that made it apparently is no longer around (or at least their domain no longer works) so I guess I should get a newer clipboard history app.
To me Excel's behaviour is way more problematic here. I carefully select the data I want, and it silently includes irrelevant data as well. I don't want the spacing preserved, but I really don't want other random data included.
It doesn't include other random data. You select N rows, contiguous or discontiguous, and those are copied to N consecutive rows starting at the paste point.
It includes things that aren't selected, which is other random data. If I select, copy, and paste, the only desirable outcome is that I get a copy of my selection.
I don't understand what you mean. In my example I select the cells that contain 1, 2, 3, 7, 8, 9, a, b, c, then copy, then go somewhere else and paste, and Excel pastes 1, 2, 3, 7, 8, 9, a, b, c. I don't see an other data.
To be honest, I don't even fully understand what are the use-cases for integrating python with spreadsheets right now. I use both of them separately for different things but cannot see what it can be good at doing in an integrated environment.
I use spreadsheets for all kinds of bookkeeping, and very often there's some data you need to fetch once when a new entry is created, be it stock prices from a webpage or calendar entries from Outlook. LibreOffice's python scripts are great for those cases, because the tooling is just plain better than VB. Alas, VB is still necessary for formulae.
python is so easy to script that imo excel is nailing its own coffin. with an aging and retiring demographic of people that have never used a programming language, anyone who learns to incorporate python into excel will end up preferring python because its both more flexible and scalable.
i have no reason to use excel except for basic drawing board math, if someone demands a spreadsheet out me i automate its generation in python so i never have to deal with it again
I see spreedsheets more as an UI. An extremely intuitive (and reactive!) way of spatially decomposing computation. Yes, this creates lots of problems (you can not easily validate it etc.) but it is also very enabling. For sure more people are now exposed to programming, but how many are actually comfortable doing even simple stuff directly using code is a question mark.
A jupyter notebook is maybe somewhere in between. But even if other interfaces manage to bridge the gap further, I doubt the spreadsheet metaphor is going to go away in the visible future. Pivot tables and the like are like COBOL and mainframes :-) The corporate world would die a sudden death if you were to eliminate it (another reason to always make sure you can run things locally).
> with an aging and retiring demographic of people that have never used a programming language
This simply isn't true in any way and is completely divorced from modern businesses. Everybody I know who works in an office setting works with Excel. None of them know Python and none of them have touched a programming language. Almost all of them are younger than 30. And if anybody who's 20 starts at one of these companies, they'd better know Excel because that's what the business is built on. Not like that 20 year old is gonna know anything about programming either. 99% of people just aren't programmers and aren't going to learn programming when they just want to futz around with some numbers and the easiest (yet also the most powerful) tool they know for it is a spreadsheet.
If somebody in any business, anywhere, at any point wants to do something with numbers, they're going to use Excel.
Not a chance. There's nothing scary about VBA (except its inflexibility and lack of a lot of modern features), and people aren't learning that either. Of course, the last decade of vaguely threatening to remove it from Office completely (in favor of js and now python?) hasn't helped people think VBA would be a good investment of their time to learn. A required internet connection and having Python only work on Windows doesn't make it look like much better of an investment.
Also, imo, Python and whatever random environments and libraries that you'll have to use with it to get your work done, will be in almost every way worse than just using VBA. And if you use VBA, your work will be effortlessly portable. The only problem I have with it is that I have to come up with ways to get the code out of the embedded IDE into source control, but there are scripts that help.
> i have no reason to use excel except for basic drawing board math, if someone demands a spreadsheet out me i automate its generation in python so i never have to deal with it again
I, on the other hand, do it in VBA, and expose every important calculation within clearly labeled spreadsheets so programming-illiterate people can audit them and feel confident about the output.
This comment seems out of touch with why people use Excel. The fact is, Excel is popular because it manages to be pretty easy to use, while also being extraordinarily powerful and flexible. For most users, they'll ask "why would I program something in Python when Excel has a built in function to handle that?"
> i have no reason to use excel except for basic drawing board math
I think you underestimate the amount of work, globally, done in Excel. If Excel vanished it would be a business/economic extinction level event, It's not going anywhere.
> Python in Excel is currently available to users running Beta Channel on Windows. This feature will roll out to Excel for Windows first, starting with build 16.0.16818.2000, and then to the other platforms at a later date. [1]
> Python in Excel is available in Excel for Windows. The feature is not available in Excel for Mac, Excel on the web, Excel for iPad, Excel for iPhone, or Excel for Android. On unsupported platforms, workbooks containing Python can be viewed but Python cells display an error when recalculated. [2]
Probably because Excel runs in a web browser nowadays and they want to avoid creating a separate way of running Python locally for a locally installed copy of Excel.
Maybe they will in the future, but it makes sense to start with the approach that will work in both web and local installs.
Looking at what they're doing with Outlook I'd bet money that they intend to replace the desktop version of Excel with the web version sooner or later.
I agree but expect that Excel will be the very last app of the suite that they make web-only. Excel heads like my wife absolutely hate the web version.
That's got to be the death-knell for Excel. It might be better in every conceivable way but going web-only only makes Sheets look more attractive to the people doing the buying. And you can run JS in your Sheets right now.
I'm positive they've at least toyed with the idea of running a Python interpreter in a WebAssembly module, which should work in both environments _and_ provide some isolation. The telemetry settings, I guess, wouldn't need to be changed to call home about performance/errors.
I can't attest to the performance of such technique for now, though, or guess why they could have deemed it not worth pursuing at the time being.
A lot of people who use excel heavily are not really “programmers” and don’t have strong fundamentals or an appetite to manage their own software stack. This also simplifies development on MS end by only needing to validate a single python version.
This seems like the only solution. Do not copy all of Python’s warts, but offer a curated experience. Microsoft Experience (TM) with fixed APIs, no package management or dealing with ecosystem churn.
It will never work because Python is the ecosystem. Having Python without the ability to tap the huge catalogue of existing software makes the integration worthless. The first thing someone will want to do is use Pandas on a live spreadsheet.
The use case is someone in Excel who does not want to write VBA. You do not need to offer the entire ecosystem, just a slightly better language to handle more sophisticated modeling needs.
Sandboxing python enough that millions of users could safely pass around excel workbooks is a super hard problem, especially when you allow third party libraries.
Forcing it to run in an isolated environment they control simplifies the problem greatly.
Windows has it's own implementation of containers, and with a docker-like structure, and xlsx being a zip file, excel could simply have a dockerfile like structure "FROM excel-python:2023" "COPY xlsx:scripts/" etc. new versions of excel could keep the last few 'base' containers, or if they are in love with the cloud, the signed, verified base images could be pulled on request.
My guess is that it makes it easier to work on large files which is common without downloading the whole file and it makes CoPilot related efforts much easier.
I work on Excel but not this feature and am happy to ask around if people are interested.
Microsoft could ship its own Python distro within Excel. The biggest problem is that historically Guido / the Steering Council have avoided being involved in how py libs are distribuited, so there are many ways to do everything. But if you do have an opinion and means to enforce it, mantaining a Python environment can be a quite smooth experience.
A complication here is how large the Python distro this appears to be using is. I don't know if you've ever set up a container with all of anaconda, scikit, statsmodels, pandas, Matplotlib, seaborn, and more but it gets pretty big. A lot of people would complain about the bloat if Excel installed it by default.
Another complication is they claim the container isn't just "a docker container", but for increased security isolation (they don't want a repeat of VBA malware) it's a mini-VM focused to run on top of Hyper-V (the Windows Hypervisor) itself. That's a really complicated install process on the average machine (like installing WSL2) that sometimes involves flipping entire Windows Features on, so also something unlikely to be a smooth experience out of the box for Excel.
It might be neat if they made that an optional install and let users have offline support, but it sounded like they wanted to focus on online and collaborative UX first.
Yeah my day job involves deploying Python containers to K8s so I get where you are coming from. GB sized images are not uncommon. However:
1 - That could be an optional component, behind a "Install Python for Excel " button.
2 - You need to install Python to code in Python anyway, with or without Excel.
3 - Bloat is the norm nowadays and I'm not sure whether users care. A clean Visual Studio install takes 10 GB of disk or something. Office itself takes several GB as well.
4 - Not sure why Docker would be needed. Using Python in Windows is fine nowadays. There are caveats with libs that are very reliant on POSIX (Airflow comes to mind,) but, again, if you control your distro you can limit the libs users can install.
1) I agree, which is I why I did end my comment that I think it would be nice to see an optional install offered for offline usage.
4) I don't think they are using Docker, but they are using containers. (They mention running directly on top of Hyper-V so they might not even be considered by some developers as containers in the Docker sense but closer to "mini" VMs.) The container they are using for security/sandboxing reasons. The Excel team likely still has nightmares from the worst years of VBA macro malware and this Python system seems built specifically to sandbox all the running Python code to an execution environment (container/VM) that is easy to refresh/rebuild/recycle/tear-down. I agree that Python on Windows is usually great and most of the Python ecosystem is great at cross-platform support, but I also understand why Excel would want to sandbox any Python it runs to containers that are isolated from the host machine/user and it can easily reboot.
Python data science containers are on the order of GBs.
> And who said it must be installed by default?
I did end my comment with thoughts that I think it would be nice to have as an optional install. Mentioning why it isn't likely a default install is relevant because they want this Python feature to be accessible to everyone and they want people to collaborate and the UX for an optional install isn't what they were looking for and would complicate things. (That's not an argument against also supporting the optional install, but a reasoning for why the optional install is a harder feature to support and doesn't make the "version 1" cut.)
> Python data science containers are on the order of GBs.
Yes. And SSDs are on the order of 100s and 1000s of GBs.
> Mentioning why it isn't likely a default install is relevant because they want this Python feature to be accessible to everyone and they want people to collaborate and the UX for an optional install isn't what they were looking for and would complicate things.
How do you know these things?
It won't be accessible to everyone until it's installable. Collaboration is not cloud exclusive. The UX isn't what they were looking for is circular.
The answer is in the question. They chose this model so that Python calculations would run in the Microsoft cloud, operating on data that resides in the Microsoft cloud. When your code runs on the cloud, they get paid. When your data lives in their cloud, they get paid.
As soons as nonsense like this comes about you can tell straight away that it's not in user's favor.
It's probably to tie you into subscription, data mining, selling mined data to ad companies and squeez every single dollar they can out of you
Looking at all the packages available, I'm sure that's why. You make it a much simpler thing to just develop that runtime for a known environment that you control vs a browser/local that requires a lot more engineering effort to make sure it works correctly for everyone.
Wait just a moment. Libreoffice had python for years, but nobody cared and instead complained about needing 100% compatible VBA for "real work". Where are those people now?
Right here. It's also not about 100% compatible VBA, it's about learning an entirely new API in order to write VBA-backed Excel spreadsheets that won't run in Excel. If Libreoffice VBA were compatible with Office VBA, I'd have no reason to always keep an outdated Windows system in the closet somewhere.
The reason we're doing this is to make money, and nobody wants to buy Excel spreadsheets that don't run in Excel.
Not offended that Excel can do more, but potentially still inconvenienced by Libreoffice not having the thing they are familiar with or are forced to work with?
I would happily use Libreoffice if any of my clients did. But nobody does.
Compatibility would be nice so I can write stuff in Libreoffice Calc on Linux (Excel has 0 Linux support) and ship it to clients without issues, but this even close to be possible.
By having Python in Basic, it becomes available to cells too, since Basic macros are callable from cells.
Note that you may need to reduce the security level to 'medium' in the drop down Tools->Options->Libre Office->Security->Macro Security. When LibreOffice opens, it will ask you whether you want to enable macros or not.
Many comments are criticizing the fact that this only runs in the cloud. And if Microsoft instead announced this would only run locally, they would instead be deploring the massive remote code execution vulnerabilities that this opens.
For this feature, there's no question that providing it in the cloud is much, much better for security. Microsoft can now execute the Python code in Firecracker VMs which provide much better sandboxing than your local PC could.
that would be a different set of people though. Some people are fine with cloud everything some of the rest of us aren't and like to work local. Especially for something as basic as a spreadsheet or word doc.
xlwings creator and author of the O'Reilly book "Python for Excel" here! First of all, big congrats to the team! I've been in contact with the Excel team on and off over the years and I remember when an Excel project manager once described the task of adding Python to Excel as "turning a fully loaded ship". Well, I am happy that the ship has now turned and is ready to ship into more exciting waters! There are a few question marks I have given my decade long experience with the topic (although everything is still beta, so it will certainly change/improve on its way to GA):
(1) I have hardly seen any company that can do with an off-the-shelf Anaconda distro. Companies usually have an internal Python package that they will need to access. (2) When running Python on the backend, the first question is always "how can we authenticate the user"? Office.js is currently the only platform I know of that allows you to leverage Azure AD identities via SSO or use any other provider (as you have complete freedom to use any JS library/redirect the user to a login form). (3) IT policies: Usually, companies have made their cloud decision: “We're an AWS/GCP/Azure shop, so Python has to run on precisely AWS Lambda/GCP Cloud Run, etc.” Yes, many are on Azure, but even in Azure they may have preference of let's say Azure Container Apps or AKS instead of Azure functions. (4) The other thing that businesses are obsessed about is to securely protect their source code, again, not something that Python in Excel seems to support. (5) And finally, what I see users most excited about in the context of xlwings is being able to run standard User-defined functions (aka "Custom functions) on the server (like the ones we wrote in the good old VBA times or like the new Lambda functions), not sure if that's possible or on the roadmap for the official version.
With the modern xlwings Server, I have taken a different approach: Let users build a 100% standard Python web app using their favorite framework (Django, Flask, FastAPI, ...) including all the standard tools (logging, auth, etc.) while using Excel as the frontend. Users have complete freedom in choosing their tech stack, they can version-control the source code on GitHub, use GitHub actions to run unit tests and deploy the code automatically to their favorite cloud, etc.
So I am probably targeting more professional developers than data scientists, but in my experience, it’s often a professional developer who write the Excel add-in that is then deployed to business users/data scientists.
Interestingly enough, when Resolver One didn't work out in the marketplace, we pivoted to PythonAnywhere. That took off nicely, and was acquired by Anaconda last year. And now Anaconda is realising those old dreams :-)
BTW for clarity: the team working on this inside Anaconda is entirely separate from the PythonAnywhere team. It would have made a perfect Hollywood-ready story if it had been the same people...
Wow, thank you for replying. I'm so glad it worked out in the end.
I remember being blown away by the ResolverOne demonstration video, and if I recall correctly, some others you posted showing some of the other features and functionality.
The fact VisiData (my current 'spreadsheet' of choice) has acquired a sizeable following in that niche, and now Excel is following suit suggests you might have just been a little too far ahead of the curve.
One mystery we did not resolve (pun intended) was why it did not find its audience. Was it too early? Was data science not established enough as a field? Was it because it was not Excel (i.e. the dominant tool)?
I think it was a combination of those, plus terrible market timing -- we released it, thinking we could sell to finance types, in 2008. In our sales pipeline spreadsheet, the "notes" column started saying things like "customer has gone bankrupt".
This sounds like it provides a way for Excel to hand a bunch of data to Python code for processing and get back results, and vice versa.
That's probably extremely valuable to many people, but it does not seem to be what I hope for whenever I see something about bringing Python to Excel.
What I hope for is something that is like VBA but without the VB. I.e., a Python (or JavaScript) interpreter/JIT/compiler built into Excel exposing all the internal Excel objects and methods that VBA has access to. By "Python" I just mean the language and whatever subset of the standard library makes sense in Excel.
win32com.client uses Windows' COM machinery, which supports event-driven feedback from Excel to the script. Switching to PowerShell because I'm not familiar with that in Python and from a quick Google, Python needs some more complex multithreadding handling:
Now when you change a cell in the spreadsheet GUI and press enter, the PowerShell "write-host" runs and prints the message. Another comment mentioned "using Excel as a GUI for Python" and the COM interfaces have the machinery to do that. It looks like xlWings has something like this, but may need an add-in installed in Excel to trigger Python code from Excel? https://docs.xlwings.org/en/latest/quickstart.html and https://docs.xlwings.org/en/latest/addin.html#xlwings-addin
I am using Acceλerate for MS 365, which brings Visual Scheme for Applications (VSATM) to the MS 365 suite [1]. It can be used in the other MS 365 apps as well and ties in nicely to .NET. That and I unashamedly love Lisp over python even though I know python has the data science corner. I was very disappointed when MIT replaced Scheme with Python as their CS intro language, but such is life. For kicks, I am a big APL fan and found April (Array Programming Re-Imagined in Lisp) which has APL and Lisp united into a super PL! [2]. Maybe an APL for Excel?
I'm a developer. People have often come to me with Excel sheets that were really slow because it was just too much for Excel. Almost all of the times IMHO a specific solution should have been developed for it. I'm curious to see if this makes it better or actually worse.
Conway's law. Unless those people have devs or data engineers embedded in their group, they're going to keep on with less than ideal solutions. Management will accept those inefficient solutions because they don't come with dev pricetags.
I don't doubt that you have, but in my experience the reason they're slow is because updates haven't been turned off while the macro was running, or naïve algorithmic choices (that a query planner might fix in the realm of normal programming, but python wouldn't help with.)
Meh, as an advanced user on a trading floor we don't need Python for Excel to run on the cloud, we need it to run locally so we can tap into our in-house C++ library and KDB databases. Third-party solutions have so much overhead.
Most large organisations use an Excel front end to their enternal services and the uses uses VBA to put them together. - Basically Excel is the finaciers web browser.
Unfortunately they also have large Excel spread sheets that are passed around between users with no control. An examplke was in UK reporting of some COVID information, it was underreported as they were using Excel which only allows 64K (number chnages per version I think it was 256 originally) rows and so when exporting data rows were just lost.
My main dependence on Excel is being able to interact with pivot tables. I also have some VBA macros to help with formatting which I would welcome to be in python - alas it looks like what they're bringing is for the cloud only?
I haven't found any free, open source solutions around this including any for Jupyter notebooks (my preferred medium).
Beacuse anyone with a budget had Excel already - and had to use Excel to be able to use other peoples' spreadsheets. LibreOffice is not bug compatible with Excel.
Look what Excel had tro do to replace Lotus 1-2-3. Including making 1900 a leap year beacuse Lotus had the error.
I love free software as much as anyone, but when I'm forced to use a spreadsheet I'm glad we get office at work. The UI/UX of LibreOffice could use a lot of improvements
For someone working outside IT but who has coding skills, it's great. My previous work computers have been tightly controlled by IT and installing libreoffice on them wouldn't have been permitted.
It's interesting how aggresively enthusiastic HN are about Excel when it comes to discussions about Excel vs. open-source alternatives. Any mention of alternatives are quickly and aggresively shot down, ocassionally to the point of personal attacks (if you're satisfied with LibreOffice you don't have real job, etc.).
On the meanwhile, just a few threads ago there was a (quite fair) browser benchmark where Chrome comes up on top on almost every test. People in the comments instantly rushes in proclaiming how proud they are for using Firefox even if it is technically inferior, solely because it is not owned by Google.
I think it comes from users being deceived by the product UX. My impression is that using Firefox, I often don’t miss other browsers. Using Libre Office makes me miss Excel, Keynote et al. I can’t pinpoint it to a single thing, it just feels dated and inferior. Open source in general seems to work better for infrastructure than applications, even if I have my own favorites. It partly has to do with the lack of product teams like designers, but also that an app is a single endpoint that makes a hard business to sell multiple flavors.
I imagine this is a stage in a larger plan to incorporate an AI assistant to excel? Being able to talk to your spreadsheet in natural language and have it do a bunch of analysis and visualizations would be a huge productivity booster.
While I can see the advantages of Python over VBA, I feel like running the code in the cloud instead of locally will be a big no from a lot of people, especially enterprises running offline Office installations.
When I worked at a hedge fund being introduced to Python after years of working in spreadsheets was a revelation. ChatGPT/Copilot have dramatically reduced the cost of switching to the IDE for bankers/investors/consultants who make up Microsoft 365's core user base.
Surprising that Microsoft would give users this gateway drug, but I suppose the reasoning is that you can keep your users by making Excel a quasi-IDE.
I hope this is the start of a wholesale python replacement for VBA. Python has turned into what VBA always tried to be: A ubiquitous language for data and scripting (among other things).
Excel would be well served by replacing VBA with either JS or python because of their ecosystems and high levels of adoption. Python is the better choice because data analysts are more likely to be familiar with it.
Now if they could only fix the issue where dates are converted into scientific notation. For the past 10 years I've been getting a steady stream of excel files with sometimes thousands of dates that are formatted incorrectly. And there is literally no "easy" way to fix this.
I'm tired of it Microsoft. Just fix the darn issue please.
From my skim they are incorporating a new-ish release of python. I wonder how long until (and if) MS forks python to add some functionality (extends it) and what happens next. Is it that farfetched to picture corporates running MS python with dot net integration or something and the ecosystem getting all screwed up.
Maybe this is not so much about bringing Python to Excel, more about bringing Excel to Python. But only if you do it the way Microsoft wants you to.
As with BASIC morphing into VisualBasic and VBA you lure (beginning) python programmers into Excel. Ultimately those developers will understand and appreciate less and less of vanilla python REPL, jupyter, scripts, etc. And... Welcome to Microsoft's world.
JS has been one of the supported add-in languages for a while[0], but that's not quite the same thing as the new python offering (which sounds more like jupyter within excel).
Since excel is often (mostly?) used for working with financial calculations, I for one am glad they went with python for this feature instead of a language that insists that an IEEE 754 double is the only kind of number anybody actually needs
Financial calculations are also sometimes already used to doing math in integral pennies (or eighths of a penny) for performance/storage reasons. That works just fine with one BigInt. Or you can build arbitrary-precision rationals from just two BigInts. (There are some JS libraries out there already for that.)
"You can build arbitrary-precision rationals" is a far cry from saying, "this is a language designed for precision arithmetic." You can technically build arbitrary-precision rationals without BigInts using arrays, but it's not exactly ergonomic.
Even if we stipulate that users will be cognizant of floating point limitations and convert everything to integers, JS still requires all literals to be suffixed with n to avoid silently losing precision when integral values exceed 2*52.
Many languages that are considered "precision" languages have some form of suffix to denote large/long/arbitrary-sized literals. (C# there's a big useful difference if a literal ends with `m` or `l`.) Most languages you have to be aware of your data types and use the appropriate data types.
No one is saying that "JS is a language designed for precision arithmetic", everyone is saying "JS is a language with the tools today to do precision arithmetic (whether you like it or not)".
> Many languages that are considered "precision" languages have some form of suffix to denote large/long/arbitrary-sized literals. (C# there's a big useful difference if a literal ends with `m` or `l`.)
The failure mode is significantly different when comparing JS to other mainstream languages. Most languages have a "checked" mode (either opt-in or mandatory) so that expressions like `4611686018427387904 + 4611686018427387904` can raise an error, whereas JS will just give you the wrong answer. Python is the most ergonomic for beginners or non-developers because the runtime will convert an integral type to an arbitrary precision integer on overflow.
Yes, there are alternative primitives that can be used, but you need to know they exist. We're talking about a tool that is explicitly designed for people who are not professional programmers, and JS's behavior here is surprising unless you understand the underlying data model. I would personally much rather give beginners a tool incorporating as few footguns as possible.
> JS is a language with the tools today to do precision arithmetic (whether you like it or not)
Arbitrary precision arithmetic has been possible in JS since the language was invented, but it has always been a pain. The bignum NPM package[0] predates the BigInt numeric primitive by 9 years. The addition of `BigInt` to ES2020 was important for performance (since implementing an arithmetic primitive in JS makes calculations dog slow), but it didn't add any fundamental affordances to the language.
How will this work in practice? Will this just be python version 3.XX forever? How will packages be maintained? Do you get to specify any of this so when a future version of a package is introduced without backwards compatibility it doesn't break everything immediately?
I could swear I had seen this headline before, so much so that when I was recently asked to help out in some old vba code, I figured I’d port it over. Imagine my horror when I couldn’t find it. Imagine my horror seeing this now that I just finished.
It allowed python functions to be run on cells and ranges, and also allowed ranges to be filled with sql queries on databases.
I had managed to get also a matlab runner working (via matlab/java connector) and was just getting the R connector finished.
Basically, I made a website showing it working and gave a few people the plugin, and their feedback was "nah, my company won't pay for it, you're wasting your time" - so I shelved the project.
I have no idea what my emotions are doing right now :)
> “I’m excited that this excellent, tight integration of Python and Excel is now seeing the light of day,” says Guido van Rossum, Python’s creator and now a Microsoft distinguished engineer.
It's depressing seeing someone who has done so much work for the free software community now working to enrich the spyware-laden proprietary product offerings of a megacorp.
Nothing about this is "good for Python" or hackers - it's just good for Microsoft to sell more Excel (and Microsoft accounts). That makes the world worse.
Even installing Python seems to a hurdle for many employees. They need to get approval from IT, and it seems not to get done. They use Excel, and if they can use Python within Excel without installing anything extra, it will increase the user base of Python. After getting a taste of Python they may install Python locally and use it outside Excel. So I think this will be "good for Python". Certainly integration of VBA into Excel has been good for VBA.
For one, because people want more than 65,536 rows in their spreadsheets.
But, I'd largely agree that '97 was the pinnacle from a UI perspective. There are changes under the hood that are quite nice, though, especially with the move to xml persistence.
None's stopping you to use MSOffice '97. I still use it from time to time and current MSOffice is happily recognizing the format and converts it on the fly. Also there is a tool that you can install and see current formats to be able to open them in MSOffice '97.
As for why the rest of the world why would do that or not, is their choice, no?
Thank you. Yes, I could get ELT to accept a “RAD” Access app built and shipped during the time it takes to pitch to choose just the CRUD library or JS Framework alone.
It’s interesting but if you are good enough at Python, can’t you already write to and read from spreadsheets? Maybe moving up a Python excel library like xlrd and openpyxl and making it reactive would be more worthwhile for Microsoft. Writing Python code in that little formula bar as shown doesn’t seem attractive. Writing Python in VSCode and seeing the results appear in the spreadsheet in real time would be great though.
No, you use Excel when you need a data manipulation program quickly and you want to ship it to clients without any installation or training. You just send the workbook to your client and it's done!
It's very self-contained, what works on your machine is almost guaranteed to work on the client's computer, given that they have the same Excel version. It's also very reliable, VBA apps can work for years without any maintenance or support.
Openpyxl is fine if you want to generate a report and send the results but for anything dynamic you would need to ship an interpreter, plus generate an exe, plus ship a workbook with the data, plus you probably need to write a GUI (users freak out from the cmd). This is much less reliable and requires constant support to users, VBA/excel apps rarely do.
This Python integration is a bit disappointing though as the "killer feature" of Excel is that it's self-contained and now it will rely on MS's cloud. This is yet another integration that will fail to kill VBA (we had JS and .net already but had similar issues), it seems that MS doesn't really understand how people use Excel :)
Great points, since it’s in the cloud it’s no longer self contained. We are already in some hybrid works with extra dependencies.
And if that’s the case, why can’t the .py files live in the cloud too? The need for an interpreter didn’t go away, Microsoft is just hosting it for you.
A lot of people are taught Excel early on in school. I wonder if this would result in more people learning or being introduced to Python at an early age?
theres going to be multiple people tackling this problem but I see it as a worthy one, theres no reason we should be needing VBA in the future when we can run Python in browsers.
We're building Python (mainly python pandas) + Spreadsheets at op (https://opapp.io). Would love to know what you think. The code editor is more like a built in code notebook.
I look forward to seeing this get abused. Having worked with a psychometric analysis program entirely implemented in VBA, I genuinely wonder how Excel/Python will live alongside R in academia - at least for the people who are really conservative about the tools they use.
You can always count on out of touch Microsoft PMs to over salt a good idea.
Always one step forward, two steps back, and one to the left.
I remember interviewing in Redmond once and the PM couldn’t understand why Windows Vista wasn’t doing well because it seemed to run excellent on his overpowered PC.
If you're interested in something like this with a
bigger code notebook-style code editor + AI code generation, we're working on it at op (https://opapp.io).
I am the sole Python developer in a company where everyone uses spreadsheets. With this project, will my coworkers be able to run my Python functions? Or only functions that are built into Python and packages like pandas and NumPy?
If your coworker doesn't have the feature enabled, they will see the last cached values. If they try to run them without the feature enabled, they will get an error.
> Microsoft says Python in Excel will be included in a Microsoft 365 subscription during the preview, but “some functionality will be restricted without a paid license”
Microsoft 365 subscriptions are paid licenses. This sentence is nonsense.
Seems cool, and a bit awful. Writing elaborated python-code in that little text-field must be a nightmare of its own. Where can I connect a VS Code or vim to the sheet for coding in comfort?
My thoughts too. Not having to install outside software is treated as a feature, but it is a lost opportunity: Excel with tight VSCode integration sounds like a killer app for decades to come. Two best-of-class (respectively) programs, each in realms dominated by Microsoft. Combine the two and unique possibilities unlock, bridging the gap between business and development.
No one seems to know that an VBA IDE is part of Office. It's important to remember when you're working on an extremely locked-down system and you want to do something complicated.
Curious if we can use Pandas like framework to combine excel sheets into 1 (with common columns or specify rule for column merge) once this is launch in an easy way all in Excel?
Grist is a spreadsheet with Python support (I am a founder). Python does make some formulas far easier. Nice to see Excel has data science libraries included from the start, that's something we've had our eyes on for a while. On the other hand, Grist is open source and can be run locally.
For example:
As part of a monitoring system.
HR / crm Database.
Self serving vacation planner and approving system.
Tamperproof GMO(genetically modified organisms) database.
As my go to source as a quick and dirty data dump while hacking...
It's a spreadsheet, Excel was always our primary competition! :) The more Excel follows us, the better for everyone. Python is a start, but they have a way to go. Imagine proper relational data, self-hosting for cloud Excel, open-source, access rules....
This seems like pretty bad news for all the excel but programatic type startups I've seen floating around. Being able to work locally might be their only hope.
DataNitro! I knew very little about programming and was trying to create examination schedules for the faculty I was working at. I managed to convince the dean to buy a license and it pretty much changed my life (granted it was mostly Python that changed it but DataNitro made it even more accessible it to me back then).
clearly aimed at big business/enterprise customers, python's ecosystem of scientific & statistical libraries is massive
not so long ago they introduced Office Scripts and Power Automate, in essence coding macros in TypeScript (rather than VBA), again only for business/enterprise license users though.
I'd say they are doing this because of LLMs, so that normal users can tell Excel what it should be doing, and it then generates a script which can execute it.
Yeah, making python a first-class browser scripting language to displace JS. I think we all know it's time. Even if it takes a decade for ubiquity, it'll be worth it.
Definitely check out PyScript if you haven't already. It works pretty well and is being actively developed. (Full disclosure: I work for Anaconda, which is the primary developer.)
Now that Microsoft embraced Python, what are the odds it will go all the way to extend and extinguish it (or at least CPython as reference implementation)? There are much more excel users than programmers. If excel runs Python, it has the potential to become the "de facto" standard interpreter, used by most people worldwide, displacing CPython.
Between nil and zero. Just because excel is "bigger" by various metrics, does not give MS any more ability to quash Python. The software world is way larger than it was in the heyday of EEE.
Excel still has a fraction of the power Python does, it's not becoming "de facto" anything that it isn't already de facto for (which is highly interactive data wrangling and making reports by folks who cant/won't use a python env for that task).
7+ years ago I had the option of leaving the Excel team. My then boss’s boss knew I had an interest in bringing Python to Excel and offered me a chance to tackle it if I chose to stay. What was meant to be a 6 month project turned into a ~3 year project, the Python part faded away and we ended up enabling JavaScript Custom Functions in Excel instead.
For Python we were also running ‘in the cloud’ (AzureML v1), although there was some back-and-forth on if we should run locally. I think what made the Python part disappear was our partner AzureML team re-orged, re-released, re-hired, we lost a PM and our work caught the attention of another partner team who realised they could use our code to execute their JavaScript out-of-process. And so I spent a lot of time ensuring that feature was successfully shipped at, I guess, the detriment of Python.
I had a lot of help from some strong engineers and learnt a lot. The core of the work was modifying the calculation engine of Excel to allow functions to compute asynchronously, allowing the user to continue working on other parts of their spreadsheet while the remote endpoint (be it JavaScript, Python or something else) was computing. Previously the spreadsheet would lock up while calculations were running, and that wouldn’t be cool for long-running unbounded calculations. Have to wonder if any of the stuff we built made it into this new feature.
Super great to see this and look forward to trying it out.
Recalc or die