One in five genetics papers contains errors thanks to Excel (2016)

lbriner · 2024-09-15T10:46:18.000000Z

Don't get me started on the automatic conversion of strings that it thinks are in scientific notation into numbers - which you can't switch off!

We have large data exports from systems that include things like unique location code. You accidentally happen to notice that a block of these look weird and it isn't just the display of them that has changed, the contents of the cell were changed by Excel automatically, without asking, and you cannot disable it.

Absolute BS after all these years. I hate that they won't fix these niggling issues that keep tripping people up over the years and just make excuses. Microsoft's usual response is: "We only work on things that affect a large number of customers". Yeah Microsoft, if you keep closing these bug reports, then each time someone reports it, you can just say that it only affects one person and close it again.

Or...you could show how amazing your company is by doing what most of us have to do: Fix it, add more debugging for the next time it happens if you can't recreate it, or have a properly tracked reason to say, "only a very few people have asked for this but changing it might break these other areas/bacwards compatability" or something.

vikingerik · 2024-09-15T15:23:40.000000Z

Yes, it's that last reason. Backwards bug compatibility. There are zillions of recorded macros and VBA scripts and other workflows by non-technical users that inadvertently depend on these behaviors in subtle ways. It's like all Javascript's weird warts, you can't change these behaviors without breaking old stuff somehow. It's true that Microsoft could say that more directly, but of course they'd rather just avoid that argument.

tssva · 2024-09-15T12:02:39.000000Z

Is there a reason you don't specify the data type when importing or if it is a frequently used data file format automate the import using VBA specifying the correct data type?

mark-r · 2024-09-15T15:54:15.000000Z

If you open a CSV file without going through a specific import process, you don't even get the option to specify a data type. And once it's open it's too late to fix it, the original data is already gone.

tssva · 2024-09-15T16:53:16.000000Z

The specific import process isn't some sort of esoteric process. It is the data import wizard. Also as I said if you are regularly importing data from a file with the same format writing some VBA to do so is pretty quick and simple task.

Also your data isn't gone. It is still in the CSV file you imported it from. Re-import it.

mark-r · 2024-09-15T17:03:48.000000Z

I would wager more people open a CSV by double-clicking on it rather than using the import data wizard. And even if you use the wizard it takes extra work to specify the type of each column, which most people won't bother with.

Writing some VBA is a simple process if you're a programmer. I wonder how many genetic researchers fit that description?

P.S. when I said "too late to fix it", I meant by some process within Excel. Of course you can re-import the original file, but maybe you only notice the problem after you've done a lot of work with it?

tssva · 2024-09-15T17:52:30.000000Z

Expecting you to learn the basics about the tools you're using is not expecting too much. And if you are too lazy to spend a few seconds specifying data types then you get what you deserve.

fshr · 2024-09-15T18:22:23.000000Z

> Expecting you to learn the basics about the tools you're using is not expecting too much.

Do you/have you worked in a corporate environment? You seem to have an idealistic view about how end users are expected to use Excel.

tssva · 2024-09-15T21:42:51.000000Z

I worked in corporate environments for decades.

The original comment I responded to said they regularly imported large data sets and the in the case of the genetists they also are regularly importing data into Excel. In other words Excel is a regularly used and fundamental tool to their work. In this case I would expect someone to learn the basics of using it. Just as I would expect a developer to learn their editor, build system, version control system, etc.

XlA5vEKsMISoIln · 2024-09-15T17:59:32.000000Z

Excel chews up CSVs that it opens. I know this because an accountant checked each file our code produced using Excel before trying to import it into another program. We proofread our code before we realizing the problem was somewhere else. Shoulder-surfed the process, found the giant bug with a green X on it.

paledot · 2024-09-15T19:09:27.000000Z

It's no better at exporting to CSV. I wrote a CSV parser a few years ago that had one set of logic for Excel CSVs and a completely different set for everything else.

tssva · 2024-09-15T21:46:18.000000Z

Excel doesn't change CSV files when it imports them. If the imported file was being changed then the user was saving back to the same file they imported from.

XlA5vEKsMISoIln · 2024-09-15T22:59:04.000000Z

This information doesn't help anyone.

The fact is the person was double-clicking a file in a list to view its contents and Excel was trampling it. Nobody in their right mind will waste time to open Excel first, use import feature, re-navigate to the file they were already looking at, and go through the import dialog just to see what's inside.

tssva · 2024-09-16T11:08:12.000000Z

Trampling it to me implies that Excel was somehow modifying the contents of the file. Which it doesn't do by double clicking on the file and just viewing it. Do you mean that the data shown in Excel wasn't what was expected because of the auto data conversion?

XlA5vEKsMISoIln · 2024-09-16T17:29:14.000000Z

Believe me, I was blown away just the same. And it's not like the accountant clicked a save button of a on-close dialog, no. Opening a CSV file was enough.

tssva · 2024-09-17T23:19:15.000000Z

I first used Excel on Windows 3.1 and over the decades since have opened countless CSV files with it. I have never observed this behavior.

RajT88 · 2024-09-15T17:30:56.000000Z

> Don't get me started on the automatic conversion of strings that it thinks are in scientific notation into numbers - which you can't switch off!

Every week it bites me once or twice. Drives me bananas.

cynicalsecurity · 2024-09-15T15:49:46.000000Z

Why still use MS Office when LibreOffice is freely available?

lotsoweiners · 2024-09-15T19:55:11.000000Z

Because that is what your work gives you and what all of your coworkers use.

wruza · 2024-09-15T15:52:03.000000Z

Does it fix subj-related issues?

Prickle · 2024-09-14T21:18:17.000000Z

Previous popular thread, Aug 2020: https://news.ycombinator.com/item?id=24070385

> Scientists rename human genes to stop MS Excel from misreading them as dates (theverge.com)

Related details:

2023:

https://www.pcmag.com/news/microsoft-finally-fixes-excel-gli...

> Years after introducing Excel's automatic conversion features, Microsoft rolls out an update to prevent it from changing gene symbols to dates.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9325790/

> Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names

zelphirkalt · 2024-09-15T08:56:55.000000Z

Know your tools. When I see such headlines, I feel sad about the state of this world and obstacles to progress, angry about the stupidity of designing all the tools for the most lazy of users, and at the same time I also laugh about the failures of proprietary software and its users, who could easily have invested a little bit of time to learn alternatives, but apparently chose not to.

Yes the tooling they use might be terrible. It is your responsibility to either deal with the terrible tooling yourself, learn better tooling, or get a capable computer person in the room, who can navigate the tooling landscape and get you the results.

And of course, that is not even addressing checking your result yet. This is a sad state of the research landscape, often financed by public money, and then throwing money at MS for using a proprietary tool and messing up.

_Wintermute · 2024-09-15T16:24:20.000000Z

Working in this field, I think a lot of the issue is publishers often require small datasets in xls format for paper submission. So someone in the chain (corresponding author, editor, journal staff member etc.) opens the file in excel and saves it again.

Bioinformaticians are not doing their analyses in excel.

dmvdoug · 2024-09-15T12:49:02.000000Z

Yes, tools are not made for users but for the tool-builders. Everyone else must adapt!

brookst · 2024-09-15T14:58:40.000000Z

I think you’re reframing “when all you have is a hammer, every problem is a nail” into “when all you have is a screw and a hammer, it’s outrageous the hammer wasn’t designed to drive screws”.

itchyjunk · 2024-09-15T14:01:54.000000Z

A single tool must work for ever use case or should the tool be picked based on the use case? Tool X doesn't work for use case Y. Regardless of where you shift the blame, tool X doesn't work for use case Y holds.

mrgoldenbrown · 2024-09-15T17:04:24.000000Z

Tools are designed for a particular use case/userbase. Excel was not designed for gene scientists. It wouldn't make sense for me, a home owner who uses a jigsaw to cut molding, to complain to a chainsaw maker that their chainsaw isn't right for my job.

CivBase · 2024-09-15T13:59:06.000000Z

Excel is made for spreadsheets, not databases. Microsoft already has a database application in Access.

bikenaga · 2024-09-14T22:18:47.000000Z

An annoyance with spreadsheets that deterred me from ever using them in teaching is that they've perpetuated a arithmetic order-of-precedence bug. ("Bug" in the sense that it contradicts long-standing mathematical convention.) If you type

    -3^2

in a cell and press ENTER, the spreadsheet tell you it's "9". It should be "-9"; in math, exponentiation has precedence over unary minus, so you square 3, then negate the result. For instance, if you tell students to graph "y = -x^2", they should draw a parabola opening downward.

I don't have a recent copy of Excel to check this in, but this was the case in the '97 version. I just tried it in the current LibreOffice calc, and it returns "9". My guess is that one of the early spreadsheets messed up the order of precedence, and everybody after copied it for compatibility.

On the other hand, I just tried maxima and python and they both give "-9".

I wonder if this particular problem afflicts people who copy formulas from (say) math books into spreadsheets.

bc_programming · 2024-09-15T01:11:46.000000Z

This issue seems like partly an artifact of the invented binary operator ^. In math exponents are superscripts and there's no binary operator- it's part of the term. But for text on computers, binary operators ended up being fabricated for the things you couldn't represent directly. The caret is common and was the first one to appear, but doubled multiplication signs is another one. Traditionally, binary operators have lower precedence than unary operators since the unary minus is considered part of the term, so the exponent turning into an operator mucks things up if it's still implemented to adhere to that logic.

You sort of see the same issue with division. The forward slash is a completely invented binary operator since the actual division symbol was often not present- and let's be real nobody uses the binary division operator when writing formulae. It's supposed to represent the dividing line in a fraction, similar to how division is usually represented in a formula as a fraction of two other expressions. It's got lower precedence than anything in either term- but, if you just replace the dividing line with a forward slash to input the formula into a computer, you'll get incorrect results, because it's replacing what is part of a complete term (the division line) with a new binary operator inserted between sets of terms, which is now subject to precedence rules.

MrJohz · 2024-09-15T05:18:08.000000Z

I suspect it's more likely an artifact of how the number is lexed. `-3^2` probably gets lexed into Number(-3), Operator(^), Number(2), which results in the aforementioned precedence issues. The reason for parsing the operator with the number is that it makes it easier to handle the case where you just write a negative number as a literal value into a cell.

Source: I've written an Excel clone before. I don't believe it has the same bug, but if it does, that will be how it's crept in.

EDIT: looking at some of the descriptions of the bug, it seems like it happens when handling variables (i.e. cell references) as well, which makes it seem like a pure precedence issue and not a parsing issue. So I've got no idea, presumably someone simply messed up the precedence order.

tsimionescu · 2024-09-15T10:43:41.000000Z

It's almost certainly a precedence issue. It's much easier to consider that unary operators have precedence over binary operators, and just learn the precedence rules for each class, rather than a global precedence rule. Plus, there's no conflict with math notation, as there is no exponentiation operator in math.

fn-mote · 2024-09-15T03:00:55.000000Z

In my country we teach the same rules in math class. Blaming it on “unary vs binary” was a stretch. Next you’re going to blame it on the lexer for producing the -3 as a single term instead of two. (Which WOULD explain something, but… fix it?!)

In my country we use a horizontal line with a dot above and below to indicate in-line division in lower grades. Exactly like the computer /.

It’s not like there was no precedent here.

mark-r · 2024-09-15T16:59:06.000000Z

I also learned the line with a dot above and below in my country, the USA. But that was a very long time ago, math teaching has changed immeasurably since my time.

In Unicode it's U+00F7: https://www.compart.com/en/unicode/U+00F7

kazinator · 2024-09-15T01:12:47.000000Z

In the C language, -a*b parses as (-a)*b, but c-a*b parses as c-(a*b).

Unary minus has a higher precedence than binary operators.

You don't notice because the semantics allows the sign to move around, unlike with exponentiation.

But when we throw in edge cases involved in undefined behavior, oops!

  0 - INT_MIN/2 // fine: parses as 0 - (INT_MIN / 2)

  -INT_MIN/2    // not okay: parses as (-INT_MIN) / 2

The INT_MIN value need not have an additive inverse because of a quirk in two's complement.

MaxBarraclough · 2024-09-15T09:04:47.000000Z

> The INT_MIN value need not have an additive inverse because of a quirk in two's complement.

For comparison, in Java, these 3 expressions each yield Integer.MIN_VALUE (i.e. -2147483648):

    -Integer.MIN_VALUE
    Integer.MIN_VALUE * -1
    Integer.MIN_VALUE / -1

I have to admit I expected all 3 to throw.

edit On reflection I shouldn't have expected that, I recall reading John Regehr's blog post on the downsides of how Java defaults to wrapping behaviour: https://blog.regehr.org/archives/1401

Validark · 2024-09-15T08:50:17.000000Z

This is probably the most overly pedantic, nitpicky reason for not using a program I've ever heard.

I'm aware that there are two different conventions on this issue, so I just use parentheses to get the behavior I want.

But, growing up, as the top math student in my class, it never occurred to me that somebody out there wants -3^2 to equal -9, I thought it was just a weird quirk in some calculators/programs. How would you read that expression aloud? I think of it as "negative three squared" so that's why (-3)^2 makes sense to me. Do you say "the negative of three squared"?

In 8th grade, I remember being instructed to type such an expression into the calculator to observe how it does something contrary to what we expect it to. From that moment on, I thought, "Huh, guess you have to use parentheses." It certainly wasn't cause enough to throw out my calculator, let alone tell others not to use it, just because I prefer a slightly different precedence convention.

bikenaga · 2024-09-15T16:18:16.000000Z

> This is probably the most overly pedantic, nitpicky reason for not using a program I've ever heard.

If you found learning math easy, you're fortunate. But lots of people find learning math difficult and frustrating, and things which might not have bothered you can be big deals for those folks. If I used a program in teaching which has a convention about basic arithmetic operations that is the opposite of the convention that mathematicians use, it is one more source of confusion and frustration for people.

Student: "You said that -3² was -9, but Excel says it's 9."

Me: "Well, mathematicians use a different convention than spreadsheets."

Student: "So which one should I use on a test? Can we use both?"

Me: "Since this is a math class, you should use -9, not 9."

Student: "How am I supposed to remember that? This is why I hate math ..."

Everyone will weigh costs and benefits differently. There is plenty of good math software out there like Mathematica, R, Geogebra, or maxima. Spreadsheets didn't seem to offer much, and there was this arithmetic convention thing that I knew would be an issue.

I'm sorry if you find it pedantic and nitpicky. I always tried to minimize unnecessary causes for upset, because there were difficulties enough learning math without my adding to them. If you saw people getting extremely angry or in tears because they "didn't get it", I think you'd understand. Math is really hard for some people.

archargelod · 2024-09-15T10:11:50.000000Z

I think we should agree that standard notation is too ambigious and switch to reverse polish notation:

  3 2 ^ -     -9
  3 - 2 ^      9

No way to misinterpret that!

kqr · 2024-09-15T11:37:14.000000Z

Except 3 - 2 ^ makes it hard to express 3 2 - ^ without accidentally subtracting, so in this case a unary negation sign needs to be a different symbol!

kstrauser · 2024-09-15T16:16:16.000000Z

I dug out my HP 50g over the weekend to play with, and conveniently…

Sigh. I understand why we commonly enter math on basically a teletype-with-ASCII, and I don’t have an urge to go all APL, but for a while we were so close to a future where we could’ve had separate negation or multiplication or exponentiation symbols that might’ve removed so much room for error. I mean, that little calculator and its predecessors were popular and widely used by the same people who brought us things like Unicode and the space cadet keyboard. If only one of them had said, gee, it sure would be handy to have a +- key on the keyboard the person in the next cubicle is designing as I have on the calculator on my desk!

But nope, Everything Is ASCII won and here we are. At least programming languages are starting to support Unicode identifier names, which has its own issues but is excellent for non-Latin alphabet users who want to write code in their own tongue. It seems like a reasonably short hop from there to giving operators their own unambiguous names. I can imagine a near-distant future where a linter says “you typed -3. Did you mean ⁻3?”, to the chagrin of programmers still entering their code on teletypes.

kevin_thibedeau · 2024-09-15T19:04:05.000000Z

It would be nice if OSs defaulted numeric keypad / * and - to Unicode ÷, ×, −. I never use them even when I do use the digits. That would solve the more glaring typewriter legacies. Then you'd just have the apostrophe/single-quote as the last remaining unification.

kqr · 2024-09-16T08:50:52.000000Z

I'm not convinced. I was brought up with the middle dot for multiplication (and × reserved for cross products, I suppose?) and according to Wikipedia, the

> ISO 80000-2 standard for mathematical notation recommends only the solidus / or "fraction bar" for division, or the "colon" : for ratios; it says that the ÷ sign "should not be used" for division

I think these things are way less standardised even on paper than you believe.

kstrauser · 2024-09-16T16:20:23.000000Z

That was kind of my point. We somewhat settled on an ASCII representation of many symbols, and that space is so small that lots of them have multiple meanings you have to infer. There was a brief window where we could’ve taken a different path and used different symbols for different things. Alas, we didn’t.

I don’t contend we should change things today. I do think if I were personally writing a new programming language from scratch today, I’d likely use different Unicode symbols for different operators, and the accompanying language server would nudge people to use them.

charlieyu1 · 2024-09-15T12:11:05.000000Z

Everyone learnt -3^2=-9 in middle school… it is a very clear implementation mistake

lionkor · 2024-09-15T09:23:30.000000Z

"minus (pause) three squared"

cycomanic · 2024-09-14T23:23:30.000000Z

I had a discussion on one of the Nim boards because Nim does the same, i.e. Writing -2^2=4 but 0-2^2=-4 because Nim treats the first as the unary - which takes precedence over exponentiation. I realise that you can argue how it is technically correct (and interestingly several of the people I was talking with couldn't even get my point), but I still argue it's incredibly unintuitive for anyone coming from an regular engineering/science background who wants to do regular work. I would almost argue will exclude it to be used by many due to this.

bikenaga · 2024-09-14T23:45:38.000000Z

You're right that violating established mathematical convention can be a deal-killer for some kinds of adoption.

A few years ago I was reading the docs for a new programming language, thinking it might be useful in teaching. The docs were well-written and in a beautifully produced book. I got to the chapter on trig functions and discovered that they'd decided to make angles increase clockwise. And there was a graph of the sine function, with the graph below the x-axis from 0 to 180 degrees. And I sadly put the well-written beautifully-produced book on the shelf and haven't looked at it since.

I can see why people would think "angles increase clockwise" is more natural than the existing convention ("angles increase counterclockwise") - it's the way clocks do it, right? Yeah, it's just convention, but when the established convention has been around for at least a century or two and there are libraries full of books and papers which use it, "more natural" still isn't good enough reason to break with it. And it really wasn't necessary to do that for their project.

I'm sure people who do software can think of lots of conventions which may even suck but will never be replaced.

aezart · 2024-09-15T07:29:27.000000Z

The Godot game engine has its 2d graphics origin at the top-left of the screen, with the positive Y axis pointed down. Having clockwise trig functions is a natural consequence of that.

I think it's a fairly common setup for all 2D graphics software.

Dalewyn · 2024-09-15T08:44:36.000000Z

I don't think I've ever seen a graphical system of any kind that didn't have 0,0 at the top-left corner of the monitor or viewport either, with positive x going right and positive y going down.

I actually didn't even think about it until now. Now it's going to bug me. God damnit. :V

moefh · 2024-09-15T09:24:17.000000Z

Somewhat related, bitmap files (.bmp) store the image lines in bottom-to-top order, effectively putting (0,0) at the bottom-left.

It's the only image format I've ever seen that does that -- everyone else stores lines in top-to-bottom order, consistent with putting (0,0) at the top-left.

dayjaby · 2024-09-17T11:55:32.000000Z

What do bitmap and excel have in common? :D

jampekka · 2024-09-15T10:29:47.000000Z

E.g. OpenGL normalized device coordinates have x=0,y=0 at screen center, with top left being x=-1,y=1. It's easier to reason with in many cases.

bikenaga · 2024-09-15T16:35:57.000000Z

> I don't think I've ever seen a graphical system of any kind that didn't have 0,0 at the top-left corner of the monitor or viewport either, with positive x going right and positive y going down.

Is that because the electron beam in cathode ray tubes scanned from top left to bottom right?

tjalfi · 2024-09-15T18:16:01.000000Z

OS/2's Presentation Manager had 0,0 at the bottom-left corner of the monitor.

zmgsabst · 2024-09-15T08:47:25.000000Z

I just checked and Adobe Illustrator does the same.

I think it’s interesting that actually, they didnt change the rotation definition (from X+ toward Y+), but because it’s a visible change from their inversion of the plane, people believe they did.

eviks · 2024-09-15T07:50:08.000000Z

"more natural" is a good reason any time since otherwise you get to perpetuate these bad conventions for all the future generations to suffer.

bikenaga · 2024-09-15T16:31:58.000000Z

What is "more natural" is a matter of opinion, and opinion will differ.

But even if there's a lot of agreement that an existing convention could stand improvement, that doesn't by itself make it "a good reason any time" for throwing out the existing convention.

What is a "convention"? It's something followed by a large "installed base". So changing a convention means a large cost will be incurred in changing up.

Who should decide whether the benefits of changing outweigh the costs? Someone has to pay for it, and simple fairness suggests that the people who will bear the costs of changing up should have the largest say.

The point is that just because someone thinks something new is better doesn't mean that old should be thrown out. And if you ignore that installed base, the change just doesn't happen.

We tried in the U.S. to switch over to metric years ago. Many of us think it would have made sense, but many more people didn't agree and it didn't happen.

It would be easier computationally if there were 100 degrees in a circle rather than 360. But the 360-installed-base is too large and the costs of changing are judged to be too great, so we're stuck with 360.

You're absolutely right, though, that suggestions for change should always get a fair hearing, and people who believe in them should go ahead and see if enough other people will sign on.

thaumasiotes · 2024-09-15T05:51:53.000000Z

> I got to the chapter on trig functions and discovered that they'd decided to make angles increase clockwise.

That is an established mathematical convention, called "bearing". https://en.wikipedia.org/wiki/Bearing_(navigation)

> And there was a graph of the sine function, with the graph below the x-axis from 0 to 180 degrees.

But that definitely isn't a convention anywhere; bearing 0 has sine 1.

There isn't really one mathematical convention on "angles". There's a fairly strong one on angles that are named theta, but in a math class it's normal to orient phi in whatever way makes sense to you. As you trace a sphere, do you want phi to represent the angle between (1) the radius ending in your point and (2) the xy plane, as that angle varies from negative pi/2 to pi/2? Do you want it to represent the angle between (1) the radius ending in your point and (2) the positive z axis, as that angle varies from 0 to pi? That's your call. An increase in the angle just means it's getting wider; what direction that requires the angle to grow in depends on how you defined the angle and which of its sides is moving.

LegionMammal978 · 2024-09-15T06:23:22.000000Z

Regardless of relative directions around circles, the sine and cosine between 0° and 90° can be described unambiguously in terms of ratios between side lengths of right triangles. You could decide to define the functions differently, but then they'd no longer be the sine and cosine, they'd be something else. The whole point of having these two named functions is that they're a common ground, independent of whatever idiosyncratic angle measurements are useful for a given problem.

thaumasiotes · 2024-09-15T06:34:05.000000Z

> Regardless of relative directions around circles, the sine and cosine between 0° and 90° can be described unambiguously in terms of ratios between side lengths of right triangles.

That's true, if there are no angles greater than 90° or less than 0°, as is the case in a non-pathological right triangle. In this case, as ratios of nonnegative lengths, all trig functions are always nonnegative.

If you want to include angles outside those bounds, then you care about what exactly occurs where, and while you can unambiguously define angles between 0 and 90 to have all positive trig functions, you can also unambiguously define them to have negative sines and tangents. Fundamentally what's happening is that you're defining certain line segments to have negative length instead of positive length. Which line segments should have negative length isn't a question about angles.

> You can decide to define the functions differently, but then they'd no longer be the sine and cosine, they'd be something else.

Only in a sense much stricter than what people generally use. Sine and cosine themselves are hard to distinguish - you can also call them sine (x) and sine (x - 270). Some people might argue that the sine of (x - 270) is still a sine.

> In general, the two functions can be described by their differential equations

If you do that, you'll completely lose the information about where sine is positive and where it's negative. You can apply any phase shift you want (as long as you apply it to both functions) and their differential equations will look exactly the same.

LegionMammal978 · 2024-09-15T06:46:27.000000Z

> If you want to include angles outside those bounds, then you care about what exactly occurs where, and while you can unambiguously define angles between 0 and 90 to have all positive trig functions, you can also unambiguously define them to have negative sines and tangents.

You could define trig functions differently, but then you'd need a separate pair of unnamed functions to express "the ratios of unsigned side lengths of a right triangle in terms of its unsigned interior angles". It's the same reason we don't count "-1 apple, -2 apples, -3 apples, ...". Or why horizonal and vertical lines usually fall on the x-axis and y-axis instead of the (1/√2,1/√2)-axis and (-1/√2,1/√2)-axis. We optimize for the common case.

> If you do that, you'll completely lose the information about where sine is positive and where it's negative. You can apply any phase shift you want (as long as you apply it to both functions) and their differential equations will look exactly the same.

What do you mean? "sin(0) = 0, cos(0) = 1, and for all x, sin'(x) = cos(x), cos'(x) = -sin(x)" is perfectly unambiguous. If you changed the initial conditions, you'd get another pair of functions, but then they'd no longer be the sine and cosine, they'd be some other linear combination. And for that, refer to what I said about the x-axis and y-axis: better to take the stupid simple (0,1) solution and build more complex ones from there.

thaumasiotes · 2024-09-16T05:33:46.000000Z

> What do you mean? "sin(0) = 0, cos(0) = 1, and for all x, sin'(x) = cos(x), cos'(x) = -sin(x)" is perfectly unambiguous.

It's pretty straightforward. "sin(0) = 0" is not a differential equation. Any phase shift applied to sine and cosine will produce exactly the same set of differential equations that apply to sine and cosine; you can rename the shifted functions "sine" and "cosine" and you'll be fine.

RandomThoughts3 · 2024-09-15T09:57:40.000000Z

> That is an established mathematical convention, called "bearing".

Bearing is a nautical convention not a mathematical one.

I have worked on boat computer systems and can assure you that all the angles were in radians going in the proper direction while beatings were separate always shown in degrees and clockwise.

bikenaga · 2024-09-15T15:39:31.000000Z

As someone else noted, "bearing" isn't a mathematical convention. To be clear, when I say "mathematical convention" I mean what mathematicians do and teach. But I didn't know about "bearing" in navigation so thanks for mentioning that.

> There isn't really one mathematical convention on "angles".

There is for angles in the plane, which are the angles I was discussing. In every math course from trig where people first encounter angles in the plane they increase as you go counterclockwise. This is true in trig, precalc, calculus, ... You will not find a math textbook in which plane angles increase clockwise. I think that counts as a convention.

That convention determines the graph of the sine function, because sin theta is defined in trig courses as the y-coordinate of the point where the ray from the origin determining the angle intersects the unit circle. So (e.g.) if 45 degrees means 45 degrees clockwise, that ray is below the x-axis, and the y-coordinate of the intersection is negative -- and hence, sin 45 degrees would be negative.

If angles increase clockwise from the positive x-axis, then sin 45 degrees will be negative. And if sine 45 degrees is negative, then angles are increasing clockwise from the positive x-axis. And any mathematician would tell you that sine 45 degree is 1/sqrt(2), not -1/sqrt(2).

> ... in a math class it's normal to orient phi in whatever way makes sense to you.

You're correct that there are two prevailing conventions for the angle phi in spherical coordinates. Mathematicians measure phi downward from the positive z-axis, so it takes values from 0 to 180 degrees. (Actually, it's sort of like "bearing" that you mentioned.) Physicists measure phi upward from the x-y plane, so it can take values from -90 to 90 degrees. It does cause some confusion in teaching Calc 3, because students also taking a physics or astronomy course may be seeing two conventions for phi. However, in 3 dimensions (spherical coordinates) there's no natural "clockwise" or "counterclockwise".

But there is a convention for measuring phi in math classes -- it's the one I described above. Check any calculus book. Our colleagues in physics don't like it, but oh well. :-)

tsimionescu · 2024-09-15T10:52:37.000000Z

There's no regular engineering convention about the ^ exponentiation operator, because there is no exponentiation operator in regular or engineering math notation. The superscript used for exponentiation doesn't need a precedence level, it has a natural grouping behvaior.

This is most visible in the fact that you never need parentheses around an exponent expression in math notation, but you need them a lot in programming notation. They are just different notations.

Consider in math notation:

   2+2
  3    + 5

Programming notation:

  3^(2+2)+5

Completely different notations in a much more fundamental way than how they treat unary minus.

Validark · 2024-09-15T09:02:24.000000Z

It's insane to me that you'd throw away a whole language, toolchain, ecosystem, and community, just because you disagree with the caret operator not taking precedence over a unary minus. Do they not have parentheses on the keyboards where you're from?

AtNightWeCode · 2024-09-15T10:12:57.000000Z

The exponent operator is just short for the POWER function in Excel. So from that perspective it is the correct behaviour I think even if it is mathematical incorrect.

FabHK · 2024-09-15T05:12:17.000000Z

I recall reading that this was easier to implement in memory constrained machines running Excel back then, and now can't be changed for compatibility.

It has bitten me when I computed the pdf of a standard normal in Excel, invoking exp(-A1^2), say.

Someone made a website (in 2003, it's a bit out of date) tracking this issue:

http://www.macnauchtan.com/pub/precedence.html

eviks · 2024-09-15T07:53:54.000000Z

But of course it can be changed, you can make it configurable even if you decide the default should continue be this silly mistake "for compatibility"

tsimionescu · 2024-09-15T11:04:11.000000Z

There's no "mistake", the current way makes much more sense in programming notation. It's far better that -3^2 is consistent with -3+2. The caret notation for exponentiation is anyway completely different, and works very differently in terms of precedence from using super scripts in math.

eviks · 2024-09-15T11:22:01.000000Z

That's a mistake in every statement: Excel isn't programming, so programming notation isn't even relevant

There is also no difference in the caret notation vs superscript, its upward pointing form literally meant to signify SUPERscript

It's far better that

0-3^2

-3^2

are consistent, consistency between exponent and addition makes little sense since by universal convention they have different priorities, so you'd not expect any "consistency" there Also your -3+2 example is meaningless since its output is the same as

-((+3)+2)

so there is no inconsistency with

-9

And no, ^ doesn't universally work differently vs superscript, just in some poorly designed apps

tsimionescu · 2024-09-15T12:11:08.000000Z

Excel formulae are absolutely a form of computation, or programming.

While the caret is meant to symbolize superscript, it is nevertheless a completely different notation for exponentiation.

I don't see why 0-3^2 and -3^2 need to be consistent necessarily. Sign change and subtraction are different operations, so they can have different relationships with other operators.

If + worked like you want ^ to work, then -3+2 would equal -5, instead of the more common -1.

^ does work differently from superscript in all apps. The way you write "three to the power two plus two" is completely different.

eviks · 2024-09-15T12:43:21.000000Z

Abacus is a form of computation, doesn't make it programming.

> While the caret is meant to symbolize superscript, it is nevertheless a completely different notation for exponentiation.

Wait, do you believe slash / to be a completely different notation with different rules for division?

Also, it's not completely different, I've already explained that its form points to the same participle - RAISing base to the power, exactly the same as superscript. It's just that input/typesetting on computers is very primitive, so you can't really use superscript conveniently, otherwise it's semantically the same, so having different rules for the same meaning makes no sense

> I don't see why 0-3^2 and -3^2 need to be consistent necessarily

ok, if you fail to see this basic similarity but somehow think -3+2 is identical, don't have anything else to say here

> If + worked like you want ^ to work, then -3+2 would equal -5, instead of the more common -1.

Why would I ever want addition to work the same as exponeiation??? That's your weird wish for them to behave the same, I respect the math precedence of operators.

> ^ doesw differently from superscript in all apps

that's not true, https://www.wolframalpha.com/input?i=-3%5E2

Many calculators / calculator apps also behave the same

tsimionescu · 2024-09-15T13:24:38.000000Z

In Wolfram alpha as well, ^ is a different notation than superscripts: https://www.wolframalpha.com/input?i=-3%5E2%2B2

And of course / is completely different from fractions too. Math notation is two-dimensional, and requires relatively few parentheses. Computer notation is uni-dimensional and requires parentheses all over the place.

This is how math notation looks like, try to write this in C/Excel/Wolfram Alpha without parens:

      2 + 2
    -3      + 4
   _____________ = 17
      2 + 3

eviks · 2024-09-15T13:42:53.000000Z

> try to write this ... without parens:

Why? This extra condition doesn't help you, and why your link shows nothing, it behaves exactly as I'd expect, ^ is identical to superscript, you're just making an implicit mistake of thinking +2 is somehow covered by ^ and would be part of the superscript, but it wouldn't, that's a different source of ambiguity

What would help is an example where parens aren't needed, but nonetheless slash would mean something else vs horizontal line, like in the original example

That's how you show semantic "completely different"

tsimionescu · 2024-09-15T15:21:24.000000Z

The math notation doesn't need parens. The computer notation needs parens. This alone is simple obvious proof that they are different notations, at least in my understanding of what a notation is.

eviks · 2024-09-15T15:44:50.000000Z

-3^2 in the math notation doesn't need parens only because of operator precedence.

The fact that the computer ^ requires parens in more cases like -3^(2+2) is irrelevant for this and doesn't allow you justifying different precedence rules (and your downgrading from "completely different" to "different" isn't a proof, just "tautology". Hey, they also look different, so they are different!)

Biganon · 2024-09-15T11:49:06.000000Z

I don't have a horse in this seemingly extremely important race, but Excel absolutely is programming

verbify · 2024-09-15T08:29:27.000000Z

It is configurable - with brackets.

eviks · 2024-09-15T08:36:18.000000Z

You confuse configuration with data entry

mobilio · 2024-09-14T22:40:22.000000Z

But it's 9!

Check it: https://www.mathplanet.com/education/pre-algebra/explore-and....

"You also have to pay attention to the signs when you multiply and divide. There are two simple rules to remember: When you multiply a negative number by a positive number then the product is always negative. When you multiply two negative numbers or two positive numbers then the product is always positive."

So basically you have -3x-3 and result is 9.

bikenaga · 2024-09-14T23:06:17.000000Z

No - in a subtle way, you're assuming that the unary minus has precedence, when the point is that it doesn't.

You're right that "thing^2" means "thing times thing", but in "-3^2", what is it that is being squared? To write it, as you did, as "(-3) x (-3)", assumes that in "-3^2" the thing being squared is "-3". But that in turn assumes that the unary minus is done before the square. By the standard mathematical convention, in "-3^2" the thing being squared is "3". So you do "3 * 3", then you negate the result and get "-9".

spockz · 2024-09-15T06:58:47.000000Z

Interesting. In our high school math class when -3^2 was hand written it was ok to interpret it as (-3)^2. But maybe that could have been because spacing would be used as some informal parentheses.

bikenaga · 2024-09-15T15:51:53.000000Z

Leaving "-3²" to be "interpreted" according to context is not a good idea. Mathematical notation should be unambiguous and context-free as much as possible, not "do what I mean". A computation involving "-3²" will be right or wrong depending on what it means; that shouldn't depend on who is reading it or under what circumstances.

Not to unduly slight your teacher, but it could be they weren't sure about what "-3²" means. Everyone who teaches has gaps in their knowledge -- I sure did. :-)

naniwaduni · 2024-09-15T07:08:15.000000Z

Precedence rules are mostly a formalization of the underlying basic convention that the operations whose terms are written closer together are done first.

zmgsabst · 2024-09-15T08:52:22.000000Z

Mostly to make polynomials easy.

But if we’re doing math mostly on computers, we should adopt rules that make writing on computers easy — not pedantically insist typing code follow the rules of handwriting polynomials.

Izkata · 2024-09-15T21:58:00.000000Z

It was the same for me. The question seems to be whether "-3" here represents the operation "negate" or the number "negative 3". If it's the latter, then -3^2 is (-3)^2.

HKH2 · 2024-09-15T01:48:09.000000Z

Thanks. I had no idea. Seems counterintuitive at first, but it makes sense, since ’-3’ is '0 - 3'.

setopt · 2024-09-15T06:41:59.000000Z

This is a good way to think of it. It would be very confusing if “0 - 3^2” and “0 + (-3^2)” should mean different things.

zmgsabst · 2024-09-15T08:56:52.000000Z

But what about (0-3)^2?

You said that “-3” = “0-3”.

So we have “-3^2” is “(0-3)^2” is 9. Agreeing with -3^2 = 9.

You’re performing a sleight of hand when you define “-3” to be “0-3”, but move the parenthesis to get your second equation. You have to insert your definition as a single term inside parenthesis — you can’t simply remove them to change association (as you have done). That’s against the rules.

So if you think “-3” is “0-3”, then you should agree the answer is 9.

RandomThoughts3 · 2024-09-15T10:03:37.000000Z

> (0-3)^2?

It’s entirely unambiguous due to the parenthesis.

I don’t think the rest of your argument actually makes sense.

There is no sleight of hand required. The original argument is entirely related to having unary minus and binary minus which are different operators conceptually have similar precedence as being less surprising.

zmgsabst · 2024-09-15T12:48:50.000000Z

My point is that you can’t define “-3” as “0-3” to make it work: you’re assuming exactly what’s being debated (via sleight of hand) when you insert the terms with brackets in the way you do rather than the way I do.

When you try to swap in the unary operator without that to make it “less surprising”, you get 9.

Precisely what you said was wrong about the unwary operator (in Excel).

RandomThoughts3 · 2024-09-15T14:53:59.000000Z

But no one is defining -3 as (0-3). You are entirely missing the point. I am going to quote myself again:

> The original argument is entirely related to having unary minus and binary minus which are different operators conceptually have similar precedence as being less surprising.

And no, you don’t get 9 when you swap the unary operator. That’s the whole point and why it’s surprising that Excel did reverse the precedence for implementation easiness.

setopt · 2024-09-15T21:39:30.000000Z

> You said that “-3” = “0-3”.

No, I didn’t.

DiscourseFan · 2024-09-15T02:57:46.000000Z

Isn't this treating 3 as something that can be analyzed from -3? Like -3 is the union of 3 and the (-), and not something onto itself. Ought it to be the case that negative is more than just a sign? Like, I think 3 should be ontologically distinct from -3, even if the addition of 3 and -3 = 0. Idk, I don't have a reason for this, it feels right for some reason.

RandomThoughts3 · 2024-09-15T10:08:26.000000Z

You are mistaking the notation for the object.

There is indeed two ontologically different elements 3 and (-3) in Z. The question is however purely about what is the meaning of the ambiguous without precedence rules representation -3^2.

Note that it gets more complicated quickly if you want to keep thinking about it in that mathematicians often consider ontologically different but equivalent operations as the same when it’s irrelevant to what they are doing or the results trivially extend to both case. See for example 3-3 and 3+(-3).

DiscourseFan · 2024-09-16T03:26:08.000000Z

I mean, all equations are true if all terms are set to equal zero, so the term itself must have some sort of ontological distinction.

tsimionescu · 2024-09-15T11:05:32.000000Z

-3^2 is not a standard mathematical convention. -3² is, but that is completely different in many other ways.

shagie · 2024-09-14T22:49:25.000000Z

https://www.wolframalpha.com/input?i=-3%5E2

https://en.wikipedia.org/wiki/Order_of_operations

Parentheses, Exponentiation, Multiplication, Division, Addition, Subtraction

-3^2 would then be correctly parsed as -(3^2) which is -9.

Parsing it as (-3)^2 would require the addition of parentheses.

This gets to the special case of the unary minus sign... which the Wikipedia article specifically calls out.

    Special cases

    Unary minus sign

    There are differing conventions concerning the unary operation '−' (usually pronounced "minus"). In written or printed mathematics, the expression −3² is interpreted to mean −(3²) = −9.

    In some applications and programming languages, notably Microsoft Excel, PlanMaker (and other spreadsheet applications) and the programming language bc, unary operations have a higher priority than binary operations, that is, the unary minus has higher precedence than exponentiation, so in those languages −3² will be interpreted as (−3)² = 9. This does not apply to the binary minus operation '−'; for example in Microsoft Excel while the formulas =-2^2, =-(2)^2 and =0+-2^2 return 4, the formulas =0-2^2 and =-(2^2) return −4.

(edit)

Digging into this a little bit more...

https://www.gnu.org/software/bc/manual/html_mono/bc.html#TOC...

    The expression precedence is as follows: (lowest to highest)

    || operator, left associative
    && operator, left associative
    ! operator, nonassociative
    Relational operators, left associative
    Assignment operator, right associative
    + and - operators, left associative
    *, / and % operators, left associative
    ^ operator, right associative
    unary - operator, nonassociative
    ++ and -- operators, nonassociative

    This precedence was chosen so that POSIX compliant bc programs will run correctly. This will cause the use of the relational and logical operators to have some unusual behavior when used with assignment expressions. Consider the expression:
    ...

This brings us to the POSIX specification for bc https://pubs.opengroup.org/onlinepubs/9699919799.2018edition...

This also shows the unary - having higher precedence than ^.

https://github.com/gavinhoward/bc/blob/master/manuals/develo...

    This document is meant for the day when I (Gavin D. Howard) get hit by a bus. In other words, it's meant to make the bus factor a non-issue.

    This document is supposed to contain all of the knowledge necessary to develop bc and dc.

    In addition, this document is meant to add to the oral tradition of software engineering, as described by Bryan Cantrill.

... now, it would be interesting if gavinhoward could clarify some of the design thoughts there (and I absolutely love the oral tradition talk).

tsimionescu · 2024-09-15T11:14:22.000000Z

Precedence is a property of notation and operators, not a property of operations. The ^ notation for exponentiation is fundamentally different from the common superscript notation, and so it is only natural that it has different precedence rule. The biggest difference is in how exponent expressions are handled, of course: in math notation, the entire expression is written in superscript and no parantheses are needed; in computer notation, you always need parantheses around the exponent expression.

gavinhoward · 2024-09-15T17:08:44.000000Z

Gavin Howard here.

There really isn't a design choice to be made. POSIX requires unary negation to have higher precedence.

The only precedence change (I can remember) from GNU bc is that I changed the not operator to have the same precedence as negation. This was so all unary operators had the same precedence, which leads to more predictable parsing and behavior.

shagie · 2024-09-15T18:09:29.000000Z

The question that I'm curious about is "why does POSIX require unary negation to have a higher precedence?"

Was it a "this is the way that bc worked in the 70s because it was easier to write a parser for it?" or was there some more underlying reason for the "this problem gets really icky if unary negation has lower precedence than the binary operators and makes for other expressions that become less reasonable?"

It's like the Logical XOR issue ( https://youtu.be/4PaWFYm0kEw?t=2236&si=Wi0gwV-XctLGN98I ) ... and I'm of the opinion that there's a real reason why this design choice was made.

(Aside: Some other historical "why things work that way" touching on dc's place in history: Ken Thompson interviewed by Brian Kernighan at VCF East 2019 https://youtu.be/EY6q5dv_B-o?si=YKr4j_FAEp-OihiX&t=1784 - it goes on to pipes and dc makes an appearance there again)

gavinhoward · 2024-09-15T18:22:21.000000Z

Unfortunately, I can't really say much about historical design.

JonChesterfield · 2024-09-14T23:16:29.000000Z

Fortunately these rules are natural and easy to remember, especially the details of how different languages choose different orders, relative to the confusing and abhorrent prefix notation (- (^ 3 2)) or (^ (- 3) 2).

shagie · 2024-09-14T23:32:26.000000Z

As a lover of prefix and postfix notation, there is an unambiguous parsing of each of those that does not deepened on any order of operations. Neither lisp nor forth have the question at all - you can only write it exactly as you mean it.

    (- (expt 3 2))

is always -9 without needing to ask which has higher precedence.

    (expt -3 2)

is likewise always 9. There is no question if - is a binary or unary operator in prefix notation and what its order of operation should be.

Likewise, in dc

    3 _ 2 ^ p
    _3 2 ^ p

and

    3 2 ^ _ p

where '_' is the negation operator (it can be used for writing -3 directly as _3, but _ 3 is a parse error) return their results without any question of order of operations.

When you start touching infix, you get into https://en.wikipedia.org/wiki/Shunting_yard_algorithm which was not a fun part of my compiler class.

(And yes, I do recognize your credentials ... I still think that lisp and forth (above examples for dc) are better notational systems for working with computers even if it takes a bit of head wrapping for humans).

JonChesterfield · 2024-09-15T01:26:28.000000Z

Notation is a tool of thought and of communication. The clearest example of infix botching both is probably perl style regex.

Postfix is interesting in forth. It makes the stack manipulations very easy to reason about, and the stack is very important there so this looks like a win. The cost is in coherently expressing complex functions, hence the advice to keep words simple. The forth programmers are doing register allocation interwoven with the domain logic.

Lisp makes semantics very easy to write down and obfuscates the memory management implied. No thought goes on register allocation but neither can you easily talk about it.

Discarding the lever of syntactic representation is helpful for communication and obstructive to cognition. See also macros.

bikenaga · 2024-09-15T01:18:10.000000Z

Maybe something's wrong with my terminal. In dc:

    3 _ 2 ^ p   gives 0

    _3 2 ^ p    gives 9

    3 2 ^ _ p   gives 0

    5 _ p       gives 0

    _5 p        gives -5

You didn't intend that I should get those zeros, right?

shagie · 2024-09-15T02:30:30.000000Z

     ~ % dc -v
     dc 6.5.0
     Copyright (c) 2018-2023 Gavin D. Howard and contributors
     Report bugs at: https://git.gavinhoward.com/gavin/bc

     This is free software with ABSOLUTELY NO WARRANTY.
     ~ % dc   
     3 _ 2 ^ p
     9
     _3 2 ^ p
     9
     3 2 ^ _ p
     -9
     5 _ p
     -5
     _5 p
     -5
     (control-D)

The version that I have appears to have _ parsed as an operator in addition to the negation of a numeric constant.

gavinhoward · 2024-09-15T16:58:55.000000Z

I am the author of that dc.

You are correct about its behavior.

See https://git.gavinhoward.com/gavin/bc/src/branch/master/manua... (scroll down to the underscore command).

bikenaga · 2024-09-15T14:46:51.000000Z

I'm using the version from Debian testing:

    dc -V
    dc (GNU bc 1.07.1) 1.4.1

In fact, in my version "-v" as opposed to "-V" isn't recognized as a valid option.

    3 _ 2 ^ p
    0

gavinhoward · 2024-09-15T17:01:57.000000Z

I am Gavin Howard, the author of the other dc.

My dc does have a few differences from the GNU dc. I added the extension of using _ as a negative sign.

That is why you are both seeing behavior differences.

bikenaga · 2024-09-14T23:27:20.000000Z

Thanks for looking all this up, particularly the stuff about spreadsheets. I wonder what real early spreadsheets like Lotus 1-2-3 or even Visicalc would do with "-3^2". I have to feel one of the early spreadsheet programmers just made a mistake, because I can't think of a practical reason for going against the mathematical convention if they were already aware of it. Or maybe it comes from bc? ... does bc predate modern spreadsheets?

(Well, I just tried "=-3^2" in an org-mode table and it gives "-9".)

shagie · 2024-09-15T00:36:20.000000Z

You've got to go even earlier... which is where the bc link becomes interesting. It's from 1975.

Lotus 1 2 3 dates from 1983... I can't find a copy of it that is runnable.

VisiCalc would be a good one to look at at 1979. It also presents 9 https://archive.org/details/VisiCalc_1979_SoftwareArts

You've also got sc https://en.wikipedia.org/wiki/Sc_(spreadsheet_calculator) from 1981.

    docker run -it ubuntu:latest
    # apt-get update
    # apt-get install sc
    # sc
    = -3^2

And you'll see 9.00 (screen shots of those two https://imgur.com/a/L0ZvJlP and the one from VisiCalc )

This is the way its worked for a long time.

---

(edit / further thoughts)

I believe that the underlying issue is that unary - (negation) and binary - (subtraction) use the same operator and you need the unary one to have a very high precedence to avoid other problems from happening.

Consider the expression: 2^-2

Is that 0.25 or a parse error?

bonzini · 2024-09-15T04:59:03.000000Z

Unary and binary - can be given different precedence, because there is never ambiguity as to whether you're in front of a unary minus or a binary minus. A binary operator is never encountered at the beginning, or after a parenthesis, or after another binary operator, therefore those are the cases where you'd have a unary - or +.

Rather, the problem is whether -2 is parsed as a numeric literal, or a unary minus followed by a numeric literal (which would only include positive numbers).

atq2119 · 2024-09-15T06:40:06.000000Z

Why would the expression 2^-2 pose a problem? There is no ambiguity there that requires operator precedence to resolve.

n_plus_1_acc · 2024-09-14T22:44:33.000000Z

Correct, but irrelevant to the question. The usual rules of math require it to be parsed like -(3²), so there are only positive numbers being multiplied.

alpinisme · 2024-09-14T23:33:34.000000Z

How we right negative literals in expressions is the issue. Nobody would seriously think -x^2 = -4 had no non-imaginary solutions. But if we swap out x for a literal number people start interpreting it ambiguously as maybe a negative number exponentiated, but fundamentally it’s the same syntax as before.

smartmic · 2024-09-14T22:25:47.000000Z

The problem with Excel is that it tries to do almost everything in one software tool: front-end (editing, presentation), analysis, semantic modeling, data storage/database.

Almost all reasonable engineers see that there is something wrong with such an approach. But almost all everyday computer users think that this is the way computing has to be.

Sometimes I wonder why even I voluntarily open it for certain tasks - anyway, despite all the criticism, Excel has reached the Lindy[1] threshold for me and is here to stay.

[1] https://en.wikipedia.org/wiki/Lindy_effect

philipwhiuk · 2024-09-15T00:25:31.000000Z

The biggest problem is that the development cost for a solution 'a bit better' than a complex Excel spreadsheet is often stratospheric.

Many tech start-ups are 'replace this thing people do in Excel with a purpose built tool'

HKH2 · 2024-09-15T01:38:53.000000Z

Sometimes people need to be saved from themselves. E.g. spreadsheets can have mistakes very quickly when people treat them as databases and start copying columns between spreadsheets assuming that the primary keys (e.g. in one column) are identical while they happily add rows and move rows around.

StableAlkyne · 2024-09-15T06:23:49.000000Z

To be fair, there are no good, accessible database tools around for your average non programmer user.

Access tried to be this a decade ago, until MS started to let it die. So now, your only option is basically Excel. There's a reason it's the main thing people gravitate into.

albrewer · 2024-09-17T14:52:50.000000Z

For really simple databases, I've had limited success getting people to use a UI in front of SQLite:

https://sqlitebrowser.org/

pjerem · 2024-09-15T07:17:56.000000Z

> Access tried to be this a decade ago

three decades ago :)

wuming2 · 2024-09-16T02:03:58.000000Z

Claris FileMaker was that tool. Easy to understand and provided clear separation between data and the rest. Bundled with Mac OS so everyone used it. It is now an expensive enclave.

shermantanktop · 2024-09-15T00:03:30.000000Z

My problem with excel is that I love it too much, even though I know that I maybe shouldn’t. It hits some perfect point between structured and unstructured data that is conducive to just filling in cells and seeing what happens.

I say “I shouldn’t” because the off-ramp from a working solution to a proper productized code-based approach can be very painful.

sitkack · 2024-09-15T02:06:45.000000Z

Spreadsheet are wonderful and powerful, Excel is ass. Love the concept, hate the implementation.

lazide · 2024-09-15T03:19:25.000000Z

Unfortunately, it’s also industry standard ass.

faresahmed · 2024-09-15T03:39:36.000000Z

Excel can also send mail through VBA scripts, thus satisfying Zawinski's Law[1].

[1] http://www.catb.org/jargon/html/Z/Zawinskis-Law.html

ravetcofx · 2024-09-14T22:39:06.000000Z

Except that it's proprietary and will eventually be un-maintained and stop working. But spreadsheets, in general, fall under the Lindy effect and open source software will continue it for centuries to come.

Closi · 2024-09-14T23:16:36.000000Z

Although excel spreadsheets have had more longevity than lots of other formats…

Dalewyn · 2024-09-15T03:08:16.000000Z

You might have had a point in the days of .xls, but Excel by default uses .xlsx now which is just XML packaged inside a .zip archive if I recall.

And yes, Excel still fully supports .xls too.

I fear whatever format LibreOffice uses will die first, case in point I don't even remember what it's called even though I should as a computer nerd.

wizzwizz4 · 2024-09-15T03:43:03.000000Z

MS-OOXML is barely an open format. Have you tried implementing it? ECMA-376 part 1 is over 5000 pages, and there are four parts to it! (Part 1 contains an extra bit about SpreadsheetML, but by that point we were two zips deep. I turned back ere I got lost.)

The OpenDocument formats, meanwhile, are older, simpler and better than their MS-OOXML equivalents. (The ODF spec is 1041 pages altogether – 215 pages of that are the spreadsheet formula language.) LibreOffice's implementation is a little janky, sure, but I can edit OpenDocument files by hand. Try doing that to a MS-OOXML file. (Good luck.)

Yossarrian22 · 2024-09-15T03:33:40.000000Z

Its easier for me to imagine a world in which AI makes spreadsheets invisible to the modern person than it is to imagine a world in which Excel isn't the de facto spreadsheet.

v1ne · 2024-09-15T08:49:50.000000Z

We have a saying in German: A bad worker always blames their tools.

Excel has many quirks, but I'm still very grateful that it exists, for quickly putting together some numbers and still being able to change the inputs to my formulas.

jyunwai · 2024-09-15T14:57:51.000000Z

But at the same time, there is a saying in user experience design: don't blame the user.

At least at the time of the article, there was no way to disable the auto-conversion of certain strings (like "SEPT2") into dates. A setting to disable this would have stopped many errors amplified by researchers working late at night or rushing to meet a deadline.

It's true that there has to be some point where the users of the tool need to put in the effort to learn how to best use it. But effort poured in from the other end by the developers, too, can go a long way to prevent common errors and save users time.

hennell · 2024-09-15T08:59:13.000000Z

A poor worker blames their tools, but no-one can write the Bible on a stamp with a carrot...

wruza · 2024-09-15T16:15:59.000000Z

Again, this saying has a nuance people often miss.

You don’t blame your tools.

Not all tools. Not tools forced upon by some archaic industrial standard or habit. Not stupid tools you’d never use otherwise but have no choice.

Excel has many quirks, but I'm still very grateful that it exists, for quickly putting together some numbers and still being able to change the inputs to my formulas.

That’s nice, but Excel didn’t invent spreadsheets. It invented adding BS to them and if it didn’t exist, you’d still have WhateverCalc successor available at the moment.

augustk · 2024-09-15T12:00:10.000000Z

I think what you mean here is that you are grateful that spreadsheet applications exist.

https://simple.wikipedia.org/wiki/Spreadsheet

asjir · 2024-09-15T14:53:45.000000Z

In Polish we have: a bad ballerina is hindered by the hem of her skirt

someotherperson · 2024-09-15T17:05:33.000000Z

In Arabic, “the one who can’t dance says the floor is crooked”

XlA5vEKsMISoIln · 2024-09-15T18:09:25.000000Z

A bad dancer finds even his ballsack annoying.

AStonesThrow · 2024-09-15T08:51:15.000000Z

That's interesting. In English, the idiom is: "It's a poor artist who blames his tools."

thebytefairy · 2024-09-15T14:53:02.000000Z

As a native English speaker, I've actually only heard 'a bad workman blames his tools', not the artist form. Now that I google it, did you mean craftsman? I don't see any references to artist.

Eddy_Viscosity2 · 2024-09-14T22:43:42.000000Z

But why oh why can't you disable automatic date reading? Put something in settings with a checkbox that I, the user, can tell excel to F-off with that nonsense.

Is there anybody who can argue the 'for' case for having this on all the time without recourse?

XlA5vEKsMISoIln · 2024-09-15T10:31:26.000000Z

Introducing checkboxes somewhere in the bowels of settings adds more ways for things to go wrong.

What is the default? Do the defaults differ across versions? How do you keep it consistent across computers and installations? What if you actually need the function ad hoc?

This reminds me of CSV export. I haven't used Windows for a decade but I remember that if you wanted to change how decimal numbers were exported you had to change the locale and reboot the computer. To change a setting in Excel. That is insane. Sprinkling checkbox patches isn't too far from this.

albrewer · 2024-09-17T14:58:56.000000Z

> What is the default?

The old behavior

> Do the defaults differ across versions?

No

> How do you keep it consistent across computers and installations?

It should be per-cell, so it's document specific.

> What if you actually need the function ad hoc?

Every option in the formatting pane is ad-hoc, this wouldn't be any different.

simonh · 2024-09-15T01:15:15.000000Z

Aside from setting all your cells to text, they have added checkboxes for this now.

https://insider.microsoft365.com/en-us/blog/control-data-con...

Eddy_Viscosity2 · 2024-09-15T13:10:27.000000Z

I just tried this out. I'm using office 2021 and when I first opened excel, this setting was there just like in the link and it seemed to work when I entered a few strings in. 'Fantastic!' I thought.

Then I closed it and thought of a few other date-like strings to try and this time the option had disappeared! Every date-looking string was instantly turned into date! I tried a few other times and this setting is gone. WTF is that about?

simonh · 2024-09-15T15:24:53.000000Z

Have you opened an older file? Maybe it only works with files created with the feature. I’ve no idea why that would be the case, but it’s all I can think of.

Eddy_Viscosity2 · 2024-09-16T13:27:18.000000Z

Both times (and a few after that), it was accessing excel settings before opening any files or starting a new one. I've tried after opening files, same thing. The setting is just not there like it was the first time.

jasinjames · 2024-09-14T22:19:16.000000Z

I had a similar issue just two weeks ago at $DAYJOB. I was scraping a log file and I had written an awk script to convert the HH:MM:SS.uuuuuu timestamps to nanoseconds since epoch for ingestion by another tool. Little did I know that awk uses floating point for numbers internally, and so my conversion function was incorrectly rounding the result, which messed up my later analysis in arcane ways. To add insult to injury, I had the same problem again when paging through the data in Visidata, but there was an easy workaround I don't exactly remember.

freehorse · 2024-09-14T21:47:55.000000Z

At work we have had problems with excel changing values of numerical entries due to different locale formatting multiple times in the past, and many times causing hours of lost time till the culprit (excel) was found. Other types of values that are completely inconvenient for excel include social security numbers and phone numbers; which one would assume are more general purpose and thus closer to the standard use cases of excel than genes. Still excel is messing these things up. Opening a csv file is also a non-trivial overly complicated endeavour for excel in year 2024, which should be done with a double click (somehow other spreadsheet programs manage to handle csv files much more easily). The worst is that excel is usually used by non-technical people, who misunderstand these idiosyncrasies and time is wasted over and over.

yial · 2024-09-14T22:13:23.000000Z

My work frustration with excel is not so much excel itself, but that people will destroy / delete / not understand formulas (or think they’re improving them…) and then the end result of their edits is garbage. However this is my fault usually for not using the protect / lock features well enough. I do know that notes explanations get overlooked. I’ve mitigated this slightly by almost always having in our shared a blank start, that’s just a backup to start from, and then the actual one to be used.

redbell · 2024-09-14T21:38:12.000000Z

Here's another interesting article on the same topic (2016) entitled "Gene name errors are widespread in the scientific literature" : https://genomebiology.biomedcentral.com/articles/10.1186/s13...

From the article:

"The problem of Excel software inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively."

golergka · 2024-09-14T20:36:21.000000Z

I studied bioinformatics in university back in 2005, and the first subject we were trained on was not Python or Java, but using Excel. Our teachers explicitly told us about excel's conversions to dates and other pitfalls. I find it mind-boggling that some researchers still made these mistakes 10 years later.

Excel is a wonderful tool. But you need to learn your tools and find out about possible footguns.

rectang · 2024-09-14T20:43:39.000000Z

Haranguing your users not to make mistakes is utterly worthless. The only way to move the needle is to design systems that are easier to use correctly.

prepend · 2024-09-14T20:54:42.000000Z

Excel is not a genetics tool. It has millions if use cases, many more important than genetics. Excel didn’t care about this.

It’s like people complaining because sugar gets misused. Or that murderers stab people with knives. The solutions isn’t to “fix” knives.

bobbylarrybobby · 2024-09-14T22:54:56.000000Z

It's not like you have to be in genetics for excel to bite you. I once had a “business” column in a CSV with one cell set to July 11. Why? Because the business was 7/11. (I assume in a different locale it would've become November 7?)

The simple solution is to do what every CSV → DataFrame library does, which is ensure columns are a homogenous type. In this case a single non-date entry in a column would be enough to treat the whole column as string.

saagarjha · 2024-09-14T21:11:42.000000Z

Excel is a tool for general-purpose data processing. What the genetics people are doing is exactly that.

jbjbjbjb · 2024-09-15T11:53:17.000000Z

General purpose means you’re trying to meet general purposes not every specialised purpose

lukan · 2024-09-14T21:58:12.000000Z

Also it is a somewhat working solution, to ban knives in certain places, like clubs, so people don't become drunk murderers in the first place ..

the_gorilla · 2024-09-15T08:40:05.000000Z

This is an awful mindset and I'm tired of people acting like tools only ever have pre-defined purpose and you need a special purpose tool to do anything. Also that's a dumb analogy. Here's a better one: A knife that breaks 20% of the time when you try to cut pork, because you didn't buy a separate pork-cutting knife.

I'm not doing anything nearly as special and always have dates import as numbers for whatever reason. Thanks microsoft.

prepend · 2024-09-15T12:42:37.000000Z

How a knife that breaks 0.00001% of the times because it’s used a billion times a day?

If excel broke 20% of the time, I’d agree. But it rarely breaks. It’s just widely used.

I’ve used Excel for decades. I just set the data types on my columns. The reason Excel does that is because the vast majority of people like it and rely on it. And changing it now will break millions of workflows.

People assume their workflow is super important and worthy of software making special exceptions just for them. There’s an easy solution that people can follow now. Let’s focus on that rather than introducing a “fix” that breaks it for other people.

Excel has thought about this and there’s no simple fix. Nobody is forced to use Excel.

the_gorilla · 2024-09-16T04:33:36.000000Z

What are you talking about? You responded to the wrong person because this has nothing to do with what I said.

prepend · 2024-09-16T11:43:22.000000Z

> Also that's a dumb analogy. Here's a better one: A knife that breaks 20% of the time when you try to cut pork, because you didn't buy a separate pork-cutting knife.

Excel doesn’t break 20% of the time. It rarely breaks. I think you’re assuming that genetics is more worthwhile than the millions of other uses. Think about how widely it’s used and your analogy doesn’t work very well.

Mistletoe · 2024-09-14T23:04:30.000000Z

Perhaps the first time you start excel maybe a short setup tutorial that sets the default parameters you want for every spreadsheet you create.

prepend · 2024-09-15T12:45:37.000000Z

I don’t think it would have much of an impact. And I hate those forced wizards and tutorials as I use excel in lots of environments.

I remember Excel team writing about why they didn’t have advanced settings to turn it off. I don’t remember the rationale but I’d rather have some switches I can set for the situations where I don’t want it.

Although I do want it on and just check my data types. And for the most part I solved this by opening and never editing in Excel. It seems to be the one hack I’ve gotten coworkers to stick with is “don’t click save” when opening large files in Excel.

nitwit005 · 2024-09-14T20:51:06.000000Z

This isn't Microsoft lecturing them.

It's entirely valid for the school to tell students to avoid easily avoidable pitfalls.

rectang · 2024-09-14T20:56:07.000000Z

Of course they can lecture them — it just isn't going to work. The error rates will barely budge.

There is one effect: it allows smug gloating about how stupid, lazy, and irresponsible these users are.

Blaming the user is the last refuge of the incompetent.

TeMPOraL · 2024-09-14T21:01:58.000000Z

> Blaming the user is the last refuge of the incompetent.

Why then is it the dominating mindset in software design today, and advertised as being the opposite to the mindset that gives you Excel?

TeMPOraL · 2024-09-14T20:55:34.000000Z

> The only way to move the needle is to design systems that are easier to use correctly.

If you do it like most software vendors do, by simplifying and removing functionality, you're moving the needle in the wrong direction.

lukan · 2024-09-14T22:02:09.000000Z

Depends I think.

Ideally on the screen UI only those things are shown, that are relevant in the context.

And the context of beginners is very small, so they don't need to see advanced tools they never will use anyway. But for sure it is not the right way to also remove the tools for the advanced users who do need them.

But it is possible to make UIs that can be customized ..

eviks · 2024-09-15T08:34:27.000000Z

Do you think the comment is written by Microsoft??? The sloppy researchers are not his users. Another way to move the needle is to place higher value on technical competence.

mobilio · 2024-09-14T20:46:10.000000Z

Follow up:

https://www.theverge.com/2020/8/6/21355674/human-genes-renam...

__mharrison__ · 2024-09-15T01:46:13.000000Z

One wonders if these scientists took a week to learn sufficient Python and pandas, how much better the world would be?

(Certainly possible, I teach "I don't want to be a programmer"-types all the time. Taught a class this week in fact.)

ClassyJacket · 2024-09-15T02:11:01.000000Z

I work with this type of scientists, supporting them in an IT capacity. Many of them do try to learn Python, I've been quite impressed by it, but realistically Excel isn't going away any time soon regardless. Sometimes it's the right tool for the job.

bluedino · 2024-09-14T23:16:16.000000Z

Missing checkbox. "Assume Everything Is A String"

We used to hit all kinds of Excel weirdness with inventory etc.

It was our fault that our part numbers could look like this:

00010190-95.020

alexlll862 · 2024-09-15T00:10:01.000000Z

That feature already exists. It's technically not a "checkbox", but you can set a cell (or all of them) as string type in like 3 clicks.

epistasis · 2024-09-15T05:17:34.000000Z

The problem was that you could not import the data without it being destroyed. By the time you could see it in Excel, the conversion had already been completed, without a way to get text.

Arrath · 2024-09-15T02:00:54.000000Z

Still, an easy footgun to stumble into if you weren't mindful when setting up your spreadsheet

benrutter · 2024-09-15T14:06:17.000000Z

I know that the "anything" goes nature of excel is part of what makes it such a universally reached for tool, but every time I see sonething like this, I think that there would be a clear use case for a more restrictive spreadsheet.

9/10 spreadsheets are tables, but because they're spreadsheets they inherit the behaviour of "no conistent behaviour in columns, everything is independent and different".

zamadatix · 2024-09-15T14:09:17.000000Z

You can set the format of the column when making a table. Much more than that and you've ended up at Access. The problem here with Excel is it's wonky default until you've made that explicit choice.

mbreese · 2024-09-14T20:08:21.000000Z

Even when this paper was published, errors due to Excel mangled gene names were well known for at least a decade.

Somewhat recently, some of the more error prone genes were renamed to accommodate Excel. (Ex: SEPT7 -> SEPTIN7)

rectang · 2024-09-14T20:36:50.000000Z

Only one in five? Evidence of egregious overengineering by the Excel team.

ThinkBeat · 2024-09-15T13:41:22.000000Z

This seems to be a problem quite smilar to dynamic or static typing.

The users expect that Excel will figure out what to do with the input correctly.

But you can manually tell Excel what to do with data in a column.

magicalhippo · 2024-09-14T21:21:18.000000Z

I use Excel a fair bit at work, mostly for looking at data, but also to fairly easily generate SQL statements based on such data, often after some formulas have been applied.

Though I think what I'd really want is some tool which has the same grid-like visualization, filtering and direct entering, but was code-based under the hood and without magic conversions.

So, take Excel, and when you enter a formula in a cell, it actually writes a line of code for you, which you can inspect and edit. Including adding your own functions and such.

When importing delimited text data, you'd have to specify what the data is in each column. It should still save the original text data so you can change your mind, but yeah, no automagic stuff.

scrlk · 2024-09-14T21:56:28.000000Z

The Data Wrangler extension for VS Code might be of interest. As you apply operations to your dataset, it generates Pandas code.

https://code.visualstudio.com/docs/datascience/data-wrangler

https://www.youtube.com/watch?v=5tWJVLF6PuA

lukan · 2024-09-14T21:54:48.000000Z

Yes, something like this is on my mind since quite some years as well.

It would be limited to programmers, though.

erehweb · 2024-09-14T21:06:11.000000Z

Original paper https://genomebiology.biomedcentral.com/articles/10.1186/s13...

databoy24 · 2024-09-15T00:39:47.000000Z

Excel will convert many CUSIPs (strings) into numbers (scientific notation). For a relatively comprehensive list, see below.

https://www.empirasign.com/cusip-excel-rosetta/

Barring some types of corporate actions, CUSIPs numbers cannot change, and I doubt the ABA is aware of this issue.

theodpHN · 2024-09-14T22:05:10.000000Z

So, the question is: Would there be fewer or more errors if something other than Excel was used? Has much rigorous research been done on this?

Djdjur7373bb · 2024-09-15T14:05:32.000000Z

One should probably also ask how much research would not have gotten done or other issues that would crop up if less tech-savvy researchers didn't have a beginner friendly tool like Excel available.

eviks · 2024-09-15T08:06:38.000000Z

Unfortunately researchers can't fork the universe with a different set of defaults for this rigorous study you appear

jimnotgym · 2024-09-15T08:21:37.000000Z

Fixing the headline...

One in five genetics papers have errors caused by mistakes in the use of Excel

As if longhand calculations never have errors?

remus · 2024-09-15T08:24:11.000000Z

The difference here is that the excel errors are systematic so should be more easily avoidable, whereas long hand calcs are more varied and harder to correct.

jimnotgym · 2024-09-15T18:51:12.000000Z

I use Excel a lot. These things don't trip me up in my field. Could it be that this field is a rather niche use case, and therefore the users should learn to watch out for them?

fifilura · 2024-09-15T18:09:01.000000Z

I always felt that there is a hierarchy for working with numbers that goes something like this

Ease of use:

1. Excel

2. SQL

3. Functional programming (e.g. Scala, Python to some measure e.g. Pandas)

4. Imperative programming (C/C++/Java)

But then there another hierarchy that (roughly) goes in the other direction, which is about quality, repeatability, tooling.

If you are at 1 or 2, you responsibility will not be about writing tests and verifying your code using traditional engineering methods.

However! You are responsible for cross checking your results based on the input. This may be a manual process. But actually looking at the numbers from several different angles can give higher quality than writing contrived testcases (in 3 or 4).

anthk · 2024-09-15T18:38:39.000000Z

2 s,SQL,j

https://www.jsoftware.com/indexno.html

Also:

https://www.jsoftware.com/help/dictionary/intro.htm

EDIT: The help section has 6 books. If you want, you can do self-teach yourself advanced math stuff with very few lines. I suggest to install Gnuplot as a dependency, for plots.

fifilura · 2024-09-15T18:54:41.000000Z

Very good! I believe you.

Unfortunately I feel like it would be irresponsible to transition our stack to working with J because of available competence and relearning.

I can see it being used in research though.