EDIT: I have no idea who downvoted my post because what I said is 100% true. We have to tell customers to stop opening CSVs in Excel and then uploading them to us because what they upload could be missing critical data. Excel interprets a number and then formats it as a number, but in healthcare, 10 digit numbers are really strings. Unless your IT team has created an Excel extension and had it preloaded onto your local copy, Excel will remove leading zeros and there isn't a way to get them back if you save/overwrite.
We're biased to bash on Microsoft for being "too clever" but maybe we need a reality check by looking at the bigger picture.
Examples of other software not written by Microsoft that also drops the leading zeros and users asking questions on how to preserve them:
- Python Pandas import csv issue with leading zeros: https://stackoverflow.com/questions/13250046/how-to-keep-lea...
- R software import csv issue with leading zeros: https://stackoverflow.com/questions/31411119/r-reading-in-cs...
- Google Sheets issue with leading zeros: https://webapps.stackexchange.com/questions/120835/importdat...
Conclusion: For some compelling reason, we have a bunch of independent programmers who all want to remove leading zeros.
Looking forward to the first time anyone tries to use your excel on a table of numbers and then immediately has to multiply everything by *1 (in a separate table) just to get it back into numbers...
At least you would know what's happening and be in control of it
"Hey, is that a date? I bet that's a date!" - Aaaargh Noooo!
It's user's faults for using it in ways that it was never designed for.
Excel has always been about sticking numbers in boxes and calculating with them.
If you want unmodified string input, input strings into a tool intended to handle them.
Project specifications can be hard. Using 1) .xls files after they were superseded, 2) ANY data transfer method without considering capacity or truncation issues, speaks of incompetence.
People just double click the CSV and complained that it didn't do it correctly. It is the same situation with scientific research data that researchers don't bother to use escape marker or blindly open the file without going through the proper import process. Then they blamed Excel for the that without understanding how Excel works.
Yes, Excel does have their quirks. But there are ways around those quirks, they have thousands of thousands guides out there about Excel. There is no excuses for people to complain about Excel didn't do the way that users want it to do without looking up for information.
Double clicking the CSV should open the data import dialog.
And that's what you have in Excel. What gets displayed is a separate issue.
And no, you don't want to see exactly what you typed in, not in the general case.
And no, I can't believe I am defending Excel!
I believe that's the point, it certainly does NOT need to.
I think it would have been far quicker to just manually write a new column interpreting the dates based on previous/next etc. Instead I spent God knows how long trying to be clever, failing, and being embarrassed that I could not solve this obviously trivial problem.
You can either send xlsx with format or csv without format. If this would be disabled then we'd have another group of people complaining that their dates from CSV are not parsed.
I would guess that most modern actors in the book business has been primarily using ISBN13 for at least the last decade.
Of course there is, in your case. If patient identifiers have a fixed, known length, then you can pad with leading zeros to recover them.
You only have a problem if 012345 and 12345 are distinct patient identifiers.
It is bone-headed in the first place to use numeric-looking identifiers (such as containing digits only) which are really strings, and then allow leading zeros. Identifiers which are really strings should start with a letter (which could be a common prefix). E.g. a patient ID could be a P00123.
This is useful for more than just protecting the 00. Anywhere in the system, including on any printed form, if you see P00123, you have a clue that it's a patient identifier. An input dialog can reject an input that is supposed to be a patient identifier if it is missing the leading P, or else include a fixed P in the UI to remind the user to look for P-something in whatever window or piece of paper they are copying from.
The main point in my comment is that instead of shaking your fist that the behavior of other users in the system, such as those who choose Excel because it's the only data munging thing they know how to use, you can look for ways that your own conventions and procedures are contributing to the issue.
If people are going to "Excel" your data, and then loop it back to you, maybe your system should be "Excel proofed".
There are already enough potential issues with CSV interpretation on wrapping strings, escaping characters and so on, but changing the content when a delimiter is found should not be added to that list.
You bold point is the most important, the default behaviour of Excel when opening a plain text CSV file is to alter the content for display, applying magic and often-unwanted formatting rules. That should be optional.
It should be possible to open a text CSV file in Excel, view the contents in columns but the same textual form, save the file in CSV format and open it in another viewer/editor and still see the same content as the original file.
I recently had to show a 20+ year Excel-using fanatic how to import data from a CSV file so that they could select as type Text columns that contain leading zeros. The ability exists, but I have found scant few people who know how to actually use the product properly.
Oh, and I also work in healthcare.
Any other use of Excel is bending it into a role it wasn't intended for, and user beware.
And it is all too easy to just go there since there are soooo many convenience features for those who don't want to laern how to do the tasks well.
You are taking the application as it existed 35 years ago and saying it must still be that thing, yet it has had 35 years to evolve far beyond that. Microsoft itself, when it talks about Excel, talks about using it to organize "data", not just numerics data. It has become a more general purpose tool.
You have to remember, Excel is extremely powerful beast. It have many specialized features that will handle any data it encountered with. I used Excel for 15 years and I am still finding features that made the process quicker. Of course, Excel have its limits and I am well aware of that.
I am not trying to be an elitist about this. It is just that the misuse of Excel (because people do not know how to use it) causes massive issues on a daily basis.
You are fully correct, I have seen plenty of stuff like that in life sciences projects.
Unless you instruct it to interpret the field as a string.
> but in healthcare, 10 digit numbers are really strings.
I'm wondering, if you expect 10 digits and you get less than that, how difficult is it to add some padding zeroes?
I had a package get seriously delayed one time because it kept being sent to a city whose 5 digit zip code was equal to [my 5 digit zip code, less its leading 0, and the first digit of my +4]. Fun times.
Except double clicking a CSV does not give you the option to do this. At this point Excel already decided to corrupt your data. And guess how most users open CSV files? That's right, they double click.
> I'm wondering, if you expect 10 digits and you get less than that, how difficult is it to add some padding zeroes?
If you expect 10 digits and get less than that, you have corrupted input data. Trying to "fix" this is exactly the sin excel is committing. Don't do that.
I also work in healthcare.
Or prepend a letter when producing the CSV to avoid EXCEL doing what it does.
I guess it depends on which country and which specific part of healthcare you are active in.
In the 'care' domain in the Netherlands I also see that some of our integrations are one-off, but the most important ones do have industry wide standardization that receives updates based on law.
Nothing in software development is as straightforward as we might hope.
That's a failure of the system if it can't be told to not interpret the data. However you're saying the world is as it is; can't argue.
And fucking with dates.
There are many more that don't have a clear spec, or even if they do, have every possible variation of data corruption / keying errors / user misunderstanding / total 'don't give a shit, I'll enter it how I want, that's what the computer is supposed to handle' problems in that source data, that make parsing an absolute nightmare.
I've had to manually review a list of 10k+ data readings monthly, from an automated recorder, because the guy who was supposed to copy the collected data files as is and upload them, instead opened each one and fixed what he (badly mistakenly) thought had been recorded wrong. Different changes in a dozen different ways based on no particular logic beyond "that doesn't look right". And un-fireable, of course.
Validating the data would at least prevent getting invalid data into database (and presumably this is happening already), but it doesn't actually "fix" the problem, you still then need the original provider of the data to fix what's missing.
I wonder how human operators figure out if the value is correct, or the Excel messed it up, or the input was invalid in the first place? If it's even possible to do it reliably then probably there is some set of patterns and methods that possibly could be turned into an algorithm... just thinking out loud here, but seems as an interesting problem to tackle...
Then there's additional code for dealing with all the inconvenient ways people format things, or want to add text labels, or do things closer to numerical/financial analysis, or all the other extras wrapped around the core "put numbers in boxes and do math".
That misunderstanding is at the core of Excel misuse.
That would be the Excel devs working at Microsoft. They read HN. I can feel it.
CSV is text. If you mean in Excel, if you opened it in Excel (rather than importing and choosing non-default options), you've already lost the data so formatting doesn't help you.
Thanks for bothering to respond instead of downvoting.
First, the upthread commented said "healthcare" not "government agency".
Second, as someone who has worked in public sector healthcare: HA HA HA!
I mean, sure we have the resources to pay for data scientists (of which we have quite a few) and could conceivably probably afford to develop custom scripts for any CSV subformat that we decided we needed one for (though if its a regular workflow, we're probably acquiring it an importing it into a database without nontechnical staff even touching it, and providing a reporting solution and/or native Excel exports for people who need it in Excel.)
The problem is that when people who aren't technical staff or data scientists encounter and try to use CSVs (often, without realizing that's what they are) and produce problems, its usually well before the kind of analysis which would go into that. If its a regular workflow that's been analyzed and planned for, we probably have either built specialized tools or at least the relevant unit has desk procedures. But the aggregate of the stuff outside of regularized workflows is...large.
1. They see a file (they have file extensions turned off, which is the default, so they probably don't even know what a CSV is)
2. They double click it
Excel now corrupted the data. That is the problem. Good luck teaching all end-users how to use Excel properly.
And if, also by default, Excel is setup with an association with CSVs, the CSV file will, in addition to not having an extension to identify it, will have an icon which identifies it with Excel.
Bold strategy there, let's see how that plays out.
Having been in and around military / DoD usages for a long time, I can tell you it's always an uphill battle to get processes to work well, instead of defaulting to whatever the original spec happened to get included as a result of some incompetent who wasn't even aware of good practice.
Furthermore, losing preceding zeroes in number-typed values is not unique to excel; it is a common feature in all typed programming languages.
Of course, but the problem isn't that the person who posted the comment doesn't know this - it's that many users of their systems don't know it. Most people are just going to accept whatever defaults Excel suggests and not know any better, causing problems down the line.
It can be avoided, if you go through the Data | Import tools. The complaint is that few Excel users know that the import engine is available, or use it, instead of just opening the file and getting all the default interpolations. Which can't be avoided in the usual Open code path.
Confusion is happening because 2 different ideas of Excel using csv files:
- you saying "can't turn this off" : File Explorer double-clicking a "csv" or MS Excel "File->Open" csv.
- others saying "you can preserve leading zeros" : click on Excel 2019 Data tab and import via "From Text/CSV" button on the ribbon menu and a dialog pops up that provides option "Do not detect data types" (Earlier version of Excel has different verbiage to interpret numbers as text)
Excel may predate the RFC but AFAIK MS didn't invent or coin the term CSV, so you can't just say whatever Excel does is correct. The RFC is loose because of nonsense like this, it doesn't mean it was ever a good idea.
But none of that is really the point. Because CSV files aren't just for importing into Excel. One of their main benefits is their portability. In other situations column types might be specified out of band, but even if not, putting equals signs before values is unconventional, so more likely to hurt than help. And in the cases it might help, i.e. when you only care about loading into Excel, then you have options other than CSV, rather than contorting CSV files for Excel's sake.
> What ever happened to process and a sense of responsibility and craft in your work?
I actually have no idea what you are on about. I'm talking about the "responsibility and craft" of not producing screwed up CSV files. Why do some people find that so offensive? Yes, it is not inconceivable that there could be some situation working with legacy systems where putting `="..."` in CSVs is, unfortunately, your best option. Sometimes you do have to put in a hack to get something done. But don't go around telling people (or yourself) that it is "the correct way".
If you want Excel to understand your output, perhaps use a library which can write xlsx files.
As for the actual RFC, it's worth taking a read. Any sort of value interpretation is left up to the implementation, to the extent that Excel's behavior in interpreting formulae is 100% in compliance with the spec.
Anyway the RFC doesn't mandate any value interpretation IIRC.
If Excel were the only intended consumer, .xlsx would be a preferable file format. At least it's mostly unambiguous.
I actually wanted a CSV file – preferably without having to resort to sed to strip out excel formulae.
For all the other uses in the world, that's a breaking change.
The first thing would be to write them as: entry1,"0123456789",entry2 rather than entry1,0123456789,entry2. This has worked for me in some instances in Excel whereby I have to escape certain things inside a string, but I would not be surprised if Excel still messes this up. For example, giving the triangle exclamation mark box and then helpfully suggest to convert to number.
If you want to go further, you can do something like write a routine that alters the CSV, such as entry1,hospitalString(0123456789),entry2. Sure, there are problems with this too, but Excel can break a lot of things and the above examples I do use in practise (the first example I put the double quotes to escape single quotes in foreign language unicode).
Another thing Excel can do is break your dates, by switching months (usually only for dates < 13th of the month, but often a partial conversion in your data for < 13th and >= 13th) or convert dates to integers.
5 – London Olympics Oversells Swimming Event by 10,000 Tickets
4- Banking powerhouse Barclay’s accidentally bought 179 more contracts than they intended in their purchase of Lehman Brothers assets in 2008. Someone hid cells containing the unwanted contract instead of deleting them.
3-utsourcing specialists Mouchel had to endure a £4.3 million profits write down due to a spreadsheet error in a pension fund deficit caused by an outside firm of actuaries
2- Canadian power generator TransAlta suffered losses of $24 million as the result of a simple clerical error which meant they bought US contracts at higher prices than they should hav
and the Biggest one is. -
Basic Excel flaws and incorrect testing led to JP Morgan Chase losing more than $6 billion in their London Whale disaster.
At the same time, the business world runs on Excel. How much money is Excel making?
I've done my share of cursing at Excel at various jobs. At the same time, I am grateful for the quick and easy way it allows me and many others to manipulate data. It's unfair to just cite the costs of using Excel without acknowledging the benefits it brings.
Excel's ease of use is it's downfall. It is the worlds most popular database, despite not actually being a database. I have wasted countless hours dealing with Excel where something else should have been used. I built a database for a friend recently, I think 75% of the work was cleaning the existing data from the excel to get it into the database.
We blame excel, but excel is really just being used for prototyping and nobody takes a decision at a certain point to move on from that prototype.
I disagree that lack of db knowledge is the primary reason. I'm a programmer and I usually use MS Excel because it's easier than relational databases. I prefer Excel even though my skillset includes:
+ Oracle DBA certification and working as a real db administrator for 2 years
+ MySQL and MS SQL Server programming with raw "INSERT/UPDATE/DELETE" or with ORMs
+ SQLite and programming with its API in C/C++/C#
+ MS Access databases and writing VB for enterprises
The problem is none of the above databases (except for MSAccess) come with a GUI datagridview for easy inputting data, sorting columns, coloring cells, printing reports, etc.
Yes, there are some GUI tools such as SQLyog, Jetbrains DataGrip, Navicat, etc... but none of those have the flexibility and power of Excel.
Yes, a GUI frontend to interact with backend databases can be built and to that point, I also have in my skillset: Qt with C++ and Windows Forms with C#.
But my GUI programming skills also don't matter because for most data analysis tasks, I just use Excel if it's less than a million rows. Databases have a higher level of friction and all of my advanced skills don't really change that. Starting MS Excel with a blank worksheet and start typing immediately into cell A1 is always faster than spinning up a db instance and entering SQL "CREATE TABLE xyz (...);" commands.
Of course, if it's a mission-critical enterprise program, I'll recommend and code a "real" app with a relational database. However, the threshold for that has to be really high. This is why no corporate IT department can develop "real database apps" as fast as Excel users can create adhoc spreadsheets. (My previous comment about that phenomenon: https://news.ycombinator.com/item?id=15756400)
It's often only after years of a business using what has become sacred & business critical Excels, that somebody suggests formalizing it into software. In a business with an IT function, or a consultancy looking for business, it should always be somebody's job to find these Excels and replace them with something more robust.
Honestly, I miss the days of writing VBA macros which save hours of work a week and being sneered at by the 'official IT'.
I worked in a team in a large commercial bank handling reconciliations with various funds. Some of which had to be contacted by phone to confirm the current holdings. We had a system which would import our current positions and take imports in various formats from funds. Somewhere around 40% of the differences where due to trades which had been executed over the reconciliation date. I wrote a VBA script which pulled in all of the differences and identified trades which where open over the period and automatically closed the discrepancy with a reference to the trade IDs.
Another time I wrote a VBA script which would take a case ID and look it up in a diary system (at the time the only way I found to do this was to use the Win32 APIs and manually parse the fields in the HTML from the system), it would then enter this at the top of a spreadsheet which had to be completed. People liked it so much I had to rewrite it so that it would work on a list of case IDs and automatically print out the checklist.
Much more fun than figuring out why Kubernetes is doing something weird for the 3rd time this week.
Plus in a database you can’t do the same sort of real-time analysis and also pass the document around for other non-technical folk to add to and modify.
In the real world in big companies, people often don’t want to talk to IT because they over-spec and quote what are perceived to be giant sums of money for something that can be created in an hour in excel.
"A database is an organized collection of data, generally stored and accessed electronically from a computer system."
Excel is an organized collection of data, stored and accessed electronically from a computer system. So I would call it a database.
This link will explain the difference between two of them. https://365datascience.com/explainer-video/database-vs-sprea...
"Scientists rename human genes to stop Microsoft Excel from misreading them as dates. Sometimes it’s easier to rewrite genetics than update Excel"
To be fair, I'm not sure if any of the proponents actually believed (or had read) the study, but it was definitely wheeled out in debates against Keynesians.
Or even an algorithm that can detect that you are using gene name from the cells around march1 and sept7.
Entire Japanese stock market went down last week, and it's not like there was a flood of people on HN bemoaning that. At least with Reinhart and Rogoff et al you have a responsible party.
As opposed to 'nameless machine failed, and nameless backup machine also failed, and now it's in JIRA so don't worry about it'.
The glitch stemmed from a problem in the hardware that powers the exchange, said the Japan Exchange Group, the exchange’s operator, during a news conference. The system failed to switch to a backup in response to the problem, a representative said.
I guess you still have to understand locking (especially on distributed filesystems.) I've certainly seen people mess that up with spreadsheets.
Throwing up a database and integrating it into a workflow/system isn't something anyone can just get up and do. I have to imagein you know that.
And if it was manual I am surprised that Excel did not complain about adding more than 65000 rows (or saving more than 65000 rows as XLS). If a user gets a warning about possible data loss they should investigate more.
Of course, there's a bit of a gap between "it's there on the machine" and "we can rely on it for useful work", but baby steps...
MS Access used to come as standard with Office and is actually the perfect solution to many of the problems that businesses use Excel for. It's very rarely that people actually used Access as Excel was far more intuitive and good enough for many projects especially in the early stages.
When people wants a “database” they fire up Excel, start punching in numbers, solar calculators next to keyboard, and use eyeballs to search for strings.
We are able to teach almost everyone how to use complex software like Word and Excel. Why can't we teach people how to use a terminal, SQLite, or how to create a very simple Python script?
And because Excel is "good" for a vast number of use-cases then people use it for everything.
Excel is great for many use cases, especially if you need people to enter data somewhere. Its UI is unparalleled in terms of quickly giving something to users that they can understand, mess around with, and verify. It's a very common use case to then need to pull in data from a bunch of Excel files, into one main repository of data (a data warehouse). That can be stored in an Excel file, although more commonly would be stored in a database.
But there are always problems with this process! There can be missing data, there can be weird data conversions because the program/language you're using to parse the data and get it into the database reads things differently than how Excel intended, there can be weird database issues that causes data loss, etc.
Complex systems always, always have bugs.
It is the job of a data engineering team to, among other things, test the systems thoroughly, and put in place systems to test against data loss, etc. It is pretty common, for example, to count the data going into a pipeline, and the data that you end up with, and make sure nothing was lost on the way.
Anyone can make a mistake. Any team, no matter how good, especially when they're rushed, can use shortcuts, use the wrong technologies because it's expedient, or simply have bugs. It is the job of the team, and of project management in general, to do all the manual and automatic testing necessary to make sure that mistakes are caught.
The real lesson isn't "Excel is bad". It's not. It's an amazing tool. The real lesson is "Data Engineering is hard, requires a lot of resources", and "all systems have bugs - testing is mandatory for any critical system".
Also, recognized good practices for software development (which any excel sheet that does more than just a sum() will be) like commenting and versioning are quite hard to impossible in excel. So even if you endavour to do it "right" but with excel, it is just the wrong tool for the job.
There might be, for example, a formula going through 20k lines in column F. But the one on line 1138 has a typo and the formula references an incorrect cell. No human will ever go through all the lines to check. Excel itself doesn't check stuff like that. And there are no tools for it either.
All you need to do is scroll through the worksheet you have just made.
Anything below 500k rows is a 'small table' still.
We have computers to do that sort of work.
After all, it's people that make computers do this work in the first place.
Anyone who comes at a problem with the mindset 'this is going to he hard' probably lacks experience and will throw big-data frameworks at it, really screwing things up. The most significant, and valuable, resource needed is thought first, and knowledge+experience second.
All IMO anyway.
You can write a simple test to check a function is working correctly, but how do you make sure your 100,000 item database doesn't have corrupted or missing data caused by the latest pipeline update, especially if the corrupted parts are rare?
The only correct usages of excel (and google sheets) are for user input to other processes, and visualization of data (which should be definitely stored in raw format in some database). And always assuming frequent backups of all sheets and of course manageable datasets. Anything else that includes external processes/scripts which append/ovewrite data in sheets is terrible practice and very error-prone.
Sometimes you get an number, sometimes you get a string containing a number. Sometimes the string contains white space that you don't want. And then there are dates, which can have similar problems multiplied ten times.
Again, it's hard to cry too many tears for Microsoft, but it does seem a bit off-target to blame "Excel" for this...
Excel is a wonderfully powerful tool that’s very bad at handling errors clearly.
The story I mentioned was because I wanted to look at the data before I started parsing it. I had full expectations to use either sqlite or Pandas.
JS would like to have a word with you.
It seems that warning has just been ignored by the user.
Ultimately tools are built for particular things, and if you choose to use a tool for something it's not built for, and it breaks catastrophically, that's on you.
Throwing away data without warning is almost certainly never what the user wanted.
My bet is the biggest problem here is subcontracting this work to the lowest bidder, presumably from some developing country.
knowing a little how things (don't) work in the UK, it's likely subcontracted, but to a company belonging to a mate of the director in charge of the whole thing
> > And it appears that Public Health England (PHE) was to blame, rather than a third-party contractor.
This is a problem of bad coding, and using the wrong tool for the job.
A defensive coding practice would have prevented this from going unseen. Using a database to store data would have prevented such arbitrary limits.
> And it appears that Public Health England (PHE) was to blame, rather than a third-party contractor.
As far as bugs go, it doesn’t sound that bad. They didn’t lose data - they just processed it late? And they spotted it within days/weeks, and have a workaround/correction already? And it’s only the reporting that was wrong, not the more important part where they inform people of results?
I’d rather have this system now than be waiting for the requirements analysis to conclude on the perfect system.
A delay in the test results means the contact tracing could not happen, and people will have been going around spreading the virus they caught off the original, known cases.
Also, the whole test and trace system has been in the news a lot recently here for various failings, and things like this will just further knock people's confidence in it.
A recent piece in The Atlantic argues that we got it all wrong, we should test backwards (who infected the current patient) instead of forwards.
This particularity is caused by the dispersion factor K which is a metric hidden (conflated) by the R0, meaning that the same R0 would be handled differently depending on the K factor.
A related suggestion was that we should focus more on super-spreaders as COVID doesn't spread uniformly. That's why the probability of finding a cluster is higher looking back than forward - most people don't get to actually spread the virus.
That’s the scandal: it’s the type of basic error that it screams “they’re not handling this well.”
You wouldn’t tolerate your new SWE colleague asking “What’s an array again?”
If publishing the whole set of data was delayed by excel crashing - fine. But silent failure because they ran out of columns? Cmon...
I could see that i could be a rush job problem, but in this case they're not gaining any time.
Partly I think the Excel bit is news because Excel (and mistakes made with Excel) is easily relatable to people. But bugs always come up and most other bugs are just as stupid. If it had been an off-by-one loop error in some C code somewhere it would be just as dumb but you'd get none of the facepalm memes all over Twitter.
Not when it turns phone numbers into integers and strips off the leading zero.
I guess its easier to understand since everyone uses excel, however it does end up giving a halo of blame to excel, as opposed to human processes.
In the grand scheme of things, having problems because of inexperienced people is much better than not having enough people like during April and May.
Wow. Thank you for this gem of human culture.
Why is that a broken assumption? Can you name a legitimate reason for HTTP and HTTPS sites to serve separate contents and audiences? I would rather not connect over HTTP to _anything_ nowadays.
And for sites with noncritical static content https is superfluous to dangerous. ESNI isn't implemented yet, IP addresses are still visible to the eyes. And content sizes and timing are a dead giveaway for the things you are looking at. HTTPS for everything is just a simulation of privacy at best, and misleading and dangerous at worst, because there IS NO PRIVACY in the aforementioned cases.
« PHE had set up an automatic process to pull this data together into Excel templates [...] When [the row limit] was reached, further cases were simply left off. »
The terrible thing here is dropping data rather than reporting an error and refusing to run.
It isn't clear what piece of software was behind this "automatic process".
Clearly the responsible humans are to blame.
If the software that dropped data rather than failing has that as its default behaviour, or is easily configured to do that, then I think that software (and its authors) are also to blame.
Is there anything in Excel itself that behaves like that?
I'd imagine the default error-handling behaviour of 9/10 Excel macros is to throw away data
I wonder how many bleeding edge master branches of GitHub repos, pulled in blindly by someone cobbling something together to meet a deadline, are running in places they probably shouldn't be.
It's not the fault of the original designer if he was clearly targetting a different purpose.
Lots of love for it in Rx
They probably mean a semi-automated process within excel, where each tab is a days extract or something similar and they are using external references to other sheets. In any vaguely up-to-date version of excel the way you would do this is via 'get and transform' which does not have any of these limitations (including the 1m record limit that the news article suggests).
The funny thing is that the latest versions of Excel are brilliant at aggregating and analyzing data and are more than suitable for this task if used correctly (i.e. using PowerQuery). It's just that way less than 1% of users are aware of this functionality - I would assume that even most hacker news readers probably don't know about PowerQuery/PowerPivot, writing M in excel and setting up data relationships e.t.c.