Finally! Every. Single. Time. I have been forced to use VLOOKUPs or INDEX/MATCH I wonder why nobody on the Excel team had still not built a new easier interface. I wish I could only use Pandas but excel is the lingua franca of business.
It's almost impressive how long it's taken for Excel to comfortably mimic "SELECT Prefix FROM Whatever WHERE Country = 'Brazil'" I have no doubts why it's the de facto data tool, but this is laughably overdue.
pandas is inconsistent and verbose. Isn't SQL a better alternative? Or even dplyr? Heck, I would learn one of those array languages such as J or APL rather than use pandas.
The syntax and grammar of SQL commands do not have an elegant symmetry. Examples:
The ordering is not always name of table and then names of columns...
Write data:
UPDATE customer SET membership_plan='PLATINUM'
Read data (incorrect syntax if specifying table name then column name):
SELECT customer GET membership_plan -- incorrect
Read data (correct syntax of columns then table):
SELECT membership_plan FROM customer
For UPDATE, the column names are adjacent to the value with the '=' equals sign. However the INSERT statement splits column names away from the values; all column names as one delimited list in parentheses and then all the values are another delimited list:
UPDATE customer SET membership_plan='PLATINUM', timezone='UTC'
INSERT INTO customer (membership_plan,timezone) values ('PLATINUM','UTC')
Prepositions...
Prepositions before table name: SELECT FROM t, INSERT INTO t, DELETE FROM t
No preposition before table name: UPDATE t
For syntax consistency, the UPDATE would also have had a preposition such as "UPDATE TO t", or "UPDATE ON t", or "UPDATE ONTO t" ... or ... the other 3 commands would have removed the need for "FROM/INTO". Either way, all 4 SUDI commands could have looked more alike.
Yes, one eventually gets used to the cosmetic inconsistencies but it's annoying for beginners because memorizing 1 of the 4 commands doesn't really reward you with the ability to predict the grammar of the other 3 commands. You have to learn the different quirks of the 4 commands.
Maybe we should be grateful that SELECT/UPDATE/DELETE at least all happen to share the same "WHERE" clause. It seems like SQL's inconsistent syntax design should have tortured us with memorizing "WHERE" for one command and totally different synonyms such as "FILTER" and "CONDITION" for the others!
[To downvoters, I'm willing to be persuaded that I'm wrong and that SQL syntax is actually "consistent" so please reply with your counterargument.]
I agree with your overall theme: SQL is not composable. On the UPDATE front, some implementations (e.g., PostgreSQL) do allow tuple/row assignment, which does make things a bit nicer. (Small conveniences can make things better, while not perfect.)
UPDATE customer
SET (membership_plan, timezone) = ('PLATINUM', 'UTC") -- ...
I tend to use row comparison and assignment where possible (in JOIN clauses, UPDATE statements, and WHERE clauses) to reinforce thinking in tuples, where a single column is the degenerate case (though I do then omit the surrounding parens).
Mysql certainly has some syntactic warts, but you picked what I consider rather odd complaints.
> The ordering is not always name of table and then names of columns
I would point out that only SELECT (not INSERT, UPDATE, DELETE) can be run without providing a table reference. Also, the select doesn't take a simple list of column names, but is a more complicated tool for organizing and formatting data. (This section would not by syntactically or semantically similar to anything in the INSERT or UPDATE that is not itself part of a SELECT statement)
> For UPDATE, the column names are adjacent to the value with the '=' equals sign. However the INSERT statement splits column names away from the values; all column names as one delimited list in parentheses and then all the values are another delimited list:
UPDATE and INSERT both support the "assignment_list" syntax. So they are perfectly consistent. UPDATE also supports other formats (such as VALUES or SELECT).
The real inconsistency I would have called our here is that while INSERT can be used with subqueries (via VALUES or with a SELECT clause), UPDATE must use joins to achieve a similar functionality.
> For syntax consistency, the UPDATE would also have had a preposition such as "UPDATE TO t", or "UPDATE ON t", or "UPDATE ONTO t" ... or ... the other 3 commands would have removed the need for "FROM/INTO". Either way, all 4 SUDI commands could have looked more alike.
Well "INTO" is an odd case, it is completely optional and serves no syntactic purpose. The FROM portion of SELECT statements does serve a syntactic purp0se but isn't actually required if it is not needed (E.G `SELECT "hello world"`). The inconsistent thing here is that the FROM portion of the DELETE syntax (which serves no syntactic purpose) is mandatory and not optional.
> Maybe we should be grateful that SELECT/UPDATE/DELETE at least all happen to share the same "WHERE" clause. It seems like SQL's inconsistent syntax design should have tortured us with memorizing "WHERE" for one command and totally different synonyms such as "FILTER" and "CONDITION" for the others!
If you look at the spec, SQL is pretty good at reusing syntactic units. You can see this in the documentation for both MySQL and Postgres.
The worst thing in terms of SQL consistency is the syntactic and semantic differences between different flavors of SQL. This greatly reduces the value that SQL could have as a plug and play query format.
>Mysql certainly has some syntactic warts, [...] UPDATE and INSERT both support the "assignment_list" syntax. So they are perfectly consistent.
(I was making generic comments about SQL so not sure why you interpreted my post as criticizing MySQL.)
Anyways an FYI for any readers not aware... MySQL has non-standard extensions[0] to the reference SQL language. Other databases like Oracle, MS SQL Server, and SQLite do not have INSERT with optional "assignment_list" syntax.
>Well "INTO" is an odd case, it is completely optional
Another example where MySQL is a special case because its proprietary extensions makes "INTO" optional. In other databases like SQLite, the "INTO" is absolutely required because omitting it in "INSERT customer" returns a syntax error.
>I would point out that only SELECT (not INSERT, UPDATE, DELETE) can be run without providing a table reference.
Yes, the ability for SELECT to leave out the table name is handy for functions like "SELECT date();" but its optionality would not have violated consistency with table-then-column ordering of INSERT & UPDATE. In cases where SELECT needs to specify table(s), it would have been more consistent grammar by putting the table names before the columns like INSERT & UPDATE. (This would have also greatly simplified editors with intellisense since putting the table first would let the parser know which columns to list as autocomplete suggestions. E.g. in Navicat SQL text editor, to get a list of autocomplete column names, I have to first type out "SELECT FROM customer" then move the cursor backwards to the left of "FROM" and then start typing "mem" to see "membership_plan" as an autocomplete suggestion.)
No, they just don't use it for your specific scenario.
Disclaimer: Work at MS. We use Excel plenty, we just tend to do things differently than many other businesses as things that shouldn't be done in Excel we don't do in Excel (since we have other tooling available).
The challenge is that in reality, people are diverse and will use Excel as long as they can as they don't want to buy other software, lack skills to implement it differently, or for some other reason need to keep data in Excel.
Software like Excel has been around for so long and has so many edge cases in how it is used that the dev teams could spend 100% of their time adding features and still probably not touch some potentially large user base ones.
I'd like to believe you but some of the beefs I have are not edge case at all.
Top of my head, a few examples (in addition of the index/match):
1. the fx "Insert function" window, with a search feature that just doesn't work, a listbox that shows only 7 results and is not resizable (when the built-in functions alone is a huge list, plus all custom udf). Same thing for the Function Arguments window.
2. excel incapable of preserving the default for calculation mode (automatic/manual) between sessions
3. excel recalculating the whole sessions when opening a csv file even when calcs are on manual
4. terrible default formats (serious people don't want thick lines, 3d charts and all that crap)
5. Recent regression: can't copy a table from excel to powerpoint while preserving the format, formatting a table in powerpoint is a pain in the butt, default format are just bad
6. No way to link a powerpoint to a spreadsheet. Think all the bankers, consultants, engineers, who must present the result of some analysis, the data is in excel, the deliverable is a powerpoint deck, how do you link the two. Only ways I know are awefully manual and/or unstable and/or result in tables displayed as images in powerpoint
7. VBA not developped or improved since the late 90s (there was an interesting attempt at one point, VSTA, but it was canned)
I could go into more edge case things but these are mass usage features.
For MS, an edge case is something that probably has under 100k users unless someone on the dev / feature team decides to champion that item.
So no - some of your examples are very much edge cases, preference, or lack of understanding how MS works.
Example(s) off your list: item #4 - that's subjective and I am certain there are users that do want that feature. This same feedback probably applies to #3 also. Or not enough people care / complain about #2 and #3 to get them prioritized.
#5 and 6 is probably because people tend to think that MS is one unified company; it's more like little ones that group together to produce a product. The Excel and PowerPoint teams pre-Satya wouldn't have even talked on features outside of integration testing. That item also might be a PowerPoint issue, a Windows OS issue (for how the formatting gets preserved on copy) and not related to Excel at all.
The VBA item someone else has already addressed.
MS dogfoods heavily; it's just that MS doesn't represent their user base (hence the push for insiders - as insiders may represent the user base). I think most large companies (Google, Amazon, etc) that make software have that same issue (lack of user base representation internally).
> VBA not developped or improved since the late 90s (there was an interesting attempt at one point, VSTA, but it was canned)
VSTA was not canned [0], but core Office apps never supported it. IIRC, the only Office app that ever used (uses?) VSTA instead of VBA is InfoPath. VSTA was more of a replacement for VBA as an offering for other people (VBA, as well as being used in office, was available for others to use in their apps), who didn't have Office’s backward compatibility constraints.
I believe at that time there were talks about VSTA being made available in other parts of office, living alongside VBA. It never happened but I thought that was a good way forward to phase out VBA.
Why does LINEST in Excel return regression coefficients in the OPPOSITE order of the columns of data? Could you at least add an optional boolean argument that returns them in the same order as the columns?
Agree. Looks like I can finally get more people using the functionality provided by INDEX([returninfo],MATCH()). Previously, folks would get too confused initially to commit it to memory. MrExcel has this exact comparison: Youtube.com/watch?v=8dwmvTka3gs
So, instead of using SUMIF, I always use SUM or SUMPRODUCT using the trick that in Excel True and False are 1 and 0 when used in arithmetical operations. This is unbelievably flexible. For example, if you want to sum all the elements of column A if the element in B is between 1 and 10 then you have SUM(A1:A100 * AND(B1:B100 >=1, B1:B100 <=10)). You need to press Ctrl+Shift+Enter, as this is an array formula, but otherwise it works like a charm.
A small but important difference between this new function and SUMIF: SUMIF will sum all values in the array that match your search criteria, but this function will only return a single value.
A decade ago as an intern at Microsoft, I attended a talk by <distinguished engineer whose name I forget> about data systems. The thing I remember from it was: "In terms of total data stored, what data system do you think is the biggest in the world today? ... Probably Excel. Probably Excel by a lot."
Excel has a simple mental model combined with powerful tooling. It's the most beautiful means to get non-technical people to start to think like a programmer. An acquaintance of mine tells a great story of how Excel and VBA got him into software: https://news.ycombinator.com/item?id=7950736.
This is amazing. I also found a love for programming after i learned some spaghetti VBA to create automated sales reporting at my finance job. Finally starting up CS degree in Jan 2020.
For a lot of individuals and small businesses true databases have a tough learning curve. I commend the Excel developers for recognizing this and including things like Power Query and Power Pivot.
They aren't there to completely replace databases, but they are a decent middle-ground for meeting people where they are.
If you're using Google Sheets, the QUERY function [1] can do all kinds of lookups. For complex logic, I find its SQLish syntax a lot more readable than mixtures of VLOOKUP and IF and other functions (but that might just be me). So, um, anyone here on the Office team who can get QUERY into Excel?
My office is split between Excel and Sheets users. I was once trying to mash-up my Sheets knowledge with an Excel colleague. I told him it might be easier to accomplish his task with Sheets because query() uses a SQL-like syntax. I told him Excel doesn't have that. He responded, "lmao you can do that with vb". I looked up instructions on how to that and realized he had no idea what he was missing.
I know the IDE has tooltips, but why don’t they ‘invent’ enums instead of those magic constants for match_mode and search_mode? Does that complicate the parser that much?
For VLOOKUP and HLOOKUP, they can’t for backwards compatibility, but this breaks new ground.
Oh man, I would have killed for this when I was in structured products trading at Goldman, the spreadsheets that we were using to keep tabs of positions were beautiful and insane in their complexity and in their raw, insane abuse of VLOOKUP/INDEX/MATCH.
I moved from Technology into Trading and all my code access was cut off, and the switch from coding in Slang/SecDb to only being able to use Excel was absolutely brutal. If someone can figure out how to break the Microsoft stranglehold on "business" then there will be a boom in productivity unseen since... Microsoft's suite of business software.
I wonder when we can start using this function without running the risk that our bosses/clients Excel doesn't support this function. Lots of people still use Office 2007 and 2010, so It'll be a while I guess.
I was thinking about that too. Although any form of delivering automatically executing code is just another attack vector. It's macro's all over again.
I almost cried when I saw the word return_array. Then I realized my mistake. Return_array is the name of an input parameter.
How I long for the day I can return multiple values from an excel lookup function. Few users know of such a thing. But it is amazing. So far I have gotten it to work using the most complex formula string I ever made.
Excel has allowed array formulas for literally decades. It’s not as complicated as the link you posted makes it seem. You just need:
1. A formula that supports array returns (like MMULT)
2. Know how big the returned array dimensions are
3. Highlight that array size, type in the formula, and press ctrl+shift+enter
Returning multiple matches from a single criteria isn’t what VLOOKUP was designed for, so rather than complain that a hammer does a bad job at screwing in nails, search for a screwdriver. Excel has tons of data filtering and table features, and allows DB connections for SQL queries.
Never knew of mmult. Will try to look at that technique to simplify the formula.
Aside from that, another user posted excel campus example of filter function. That also returns an array. And pretty much does what I need. However, it is only for office 365 insider and not to be released anytime soon.
vlookup: (value to lookup, column to search, numbered column from the left, yes/no use range)
xlookup: (value to lookup, array to search, value to return array)
You can also replace hlookup with xlookup. Two optional arguments are 4) exact match or not and 5) go up or down or use binary search. I guess binary search would require a sorted array and improve speed?
xlookup seems obviously better in retrospect. I wonder if they had any reason not to implement it earlier.
Nope. New features appear only in Office 365 (the subscription version). If you have the one-time purchase version (Office 2019), you'll need to wait for and buy Office 2022.
No joke: I spent the better part of an hour a few weeks ago trying to remember how to use VLOOKUP properly to join two lists in Excel before a coworker chatted me that he had done it in 10 seconds by running the Linux `join` command.