It frustrates me to see people in the smash community treat measures like elo as "the truth" because they "don't have any human input". This simply factually incorrect - these so-called objective measures have as much human input as anything else, codified into the constants and design choices of their algorithms. Designing these things is as much an art as it is a science, and the choice on how to weigh placements, upsets, losses, consistency, peaks, and the like are all just that - choices, made by a human sitting in a chair with Sublime Text 3 open.
I feel like this is applicable nigh everywhere. From social media timeline sorting, to industrial processes, to Melee rankings. Using an algorithm doesn't eliminate the human element from a system, it only abstracts it away.
People have been advocating radical empiricism in some increasingly uncomfortable contexts recently, and I hope it's just that it's the only thing they were taught and the only way they know how to think about their craft. The alternative is that an increasing number of people really do want machines to triumph over human judgment and morality.
> Rump’s example is to compute the expression
> ƒ = 333.75 * b^6 + a^2 * (11 * a^2 * b^2 − b^6 − 121 * b^4 − 2) + 5.5 * b^8 + a/(2b)
> with a = 77617 and b = 33096. On an IBM S/370 main frame he computed ƒ in (1.1) using single, double, and extended-precision arithmetic, to produce the results:
> Single precision: ƒ = 1.172603...
> Double precision: ƒ = 1.1726039400531...
> Extended precision: ƒ = 1.172603940053178...
> This suggests a reliable result of approximately 1.172603 or even
1.1726039400532. In fact, however, the correct result (within one unit of the last digit) is
> ƒ = −0. 827396059946821368141165095479816... (1.2)
> Even the sign was wrong.
It's worth remembering that computers are very fast, but also very, very stupid. They do exactly what you tell them (or teach them, if we're talking about MCL), which isn't always what you want. Just because you've got a very scientific looking algorithm doesn't mean it's correct. Just because you're aggregating all your data doesn't mean the results are truly meaningful.
Ignoring the parenthetical "e.g. match outcomes", this is a correct description of Arrow's impossibility theorem. I don't see how match outcomes could possibly be an example of a set of rankings in the sense of the theorem, though.
It especially doesn't mean that all voting systems are equally bad, and yet the most common reason for people to cite Arrow's theorem is to try and dispute the idea that we can do better.
Voting systems which obey the assumptions of the theorem frequently break down in ways that cannot be described as "rock-paper-scissors between the three most popular candidates", as when strategic voting caused the papal conclave of 1334 to unanimously elect the least popular candidate to the papacy.
Were you perhaps thinking of Condorcet's work when you referred to rock-paper-scissors? He, not Arrow, famously wrote about how voter preferences were not necessarily transitive.
One of Arrows assumptions, "Irrelevance of Independent Alternatives", is far too strong. If you relax it to "Local Independence of Irrelevant Alternatives", then Arrows theorem no longer applies. It's just not a relevant theorem.
Claiming that allowing a voting system to recognize a rock-paper-scissors cycle among the top candidates is the same as allowing dictators is obvious rubbish. But if you interpret Arrow's theorem naively, you might be tempted to make such a claim.
Not exactly sure how relevant it is though.
And even when AI becomes practical, no more than 1000 people can realistically be involved in its design and implementation. Our supposed 'objective machines' will in fact be designed to the ideals of those designing them: a generally non-diverse group of people. Food for thought!
The benefit to algorithm isn't that it's infallible (it may well be), but rather than it's consistent. It's accurate, even if not correct. Considering how much of human judgement is inconsistent, there's value in quantifying it in a standard way.
Reasonable minds can argue about whether than quantification is correct or fair.
That said, the terms more often used here are "precise, but not accurate" (http://blog.minitab.com/blog/real-world-quality-improvement/...) but note that this uses a different meaning for accuracy than parent used. For example, say you use 22/7 to derive the value of pi. You can easily calculate it to 60 or more decimal places. You'll be very precise. However, you won't be accurate because your methodology of using 22/7 is flawed. It's also very easy to create more precision: just keep calculating. On the other hand, it can be very difficult to create more accuracy: How do we even measure pi to be able to confirm the ratio is correct?
In Melee there people that end up as local kings that don't do well in nationals. There are also people that are exceptionally good on a national level but simply don't travel (aka "Hidden Bosses").
Nintendo is very hands-off with Melee so tournament organization remains in the hands of the community. There is no single major overseer of Melee tournaments. Anyone can hold a tournament and throw the bracket onto Challonge or Smash.gg. I imagine if ELO was implemented as part of seeding, people would start gaming the system.
> The way seeding gets done is that players get placed into broad tiers, and then those tiers are then fed into pools, attempting to avoid region conflicts or repeat matches from recent tournaments.
This is where the human-in-the-loop part of seeding shines. Mid-tier players are entering national tournaments for the experience. They will not win, and their reg fee is essentially donating money to the winner's pot. But what they gain from the experience is tournament matches with players that they are not familiar with. Many of them will only get two games in-bracket, so it's a huge waste for them if they end up playing against buddies from their own region.
The community actively polices good seeding. There is often an outcry if say too many Nor Cal players get shoved on the same side of a bracket.
There are huge differences between the Swiss system used in chess (which works great for ELO, since seeding is done by rating and players are not eliminated) and the double elimination system used in Melee tournaments. I don't think it's possible to have an objective ranking system in Melee because of the intricacies of this issue (seeding influences final placement, low-seeded players will hit a wall where they lose to high-seeded players earlier, etc).
(With Wizard (https://en.wikipedia.org/wiki/Wizard_(board_game)) and The Fantasy Trip (https://en.wikipedia.org/wiki/The_Fantasy_Trip) (Yay, 1970s!), Melee made up the best fantasy role playing game. The only competition is the Hero system; GURPS is definitely a victim of the second-system effect.)
Edit: Yes, I'm apparently old. I'll return you now to your regularly scheduled discussion.
The TFT wiki page says, "A revival of TFT and associated MicroQuest adventures is underway at http://www.darkcitygames.com.*" The "Legends" rules (http://www.darkcitygames.com/docs/Legends.pdf) [PDF] there look a lot like the basic mechanics of TFT.
I managed to miss Rolemaster, although I liked the titles, particularly "Claw Law." :-) But I know what you mean about complexity; too much "realism" leads to things like Ben Sergeant's Car Wars cartoon (lower left, here https://i.ebayimg.com/images/g/HSYAAOSwTglYlP-b/s-l300.jpg): "My goodness! 08:00:06, already?"
You can do much better just by actually running the logistic regression over the games. In this framework, incorporating any per-game bias such as the characters chosen is a trivial variable to add to the model and fit jointly.
Our ranking systems are holdovers from a time when the calculations had to be done by hand. If the whole set of games fits in ram, there's no need to use ancient optimization methods.
Right like there are A rank fencers, and then there are A rank fencers who actually have a shot at placing on the points table.
I'm not sure why.
- A high rank player can consistently execute a strategy that wins against the majority of players most of the time ("beats the meta")
- The above has a counter strategy, but this strategy often fails against the majority of the players ("loses to the meta")
When these two players meet, they go 50-50, but have very different results in tournaments. Alternatively, one player is generally bad but exploits a particularly hard to observe weakness in the first.
I know nothing about fencing, but I suspect something similar is going on here.
Basically players want rating systems to be reward loops; they hate systems where their rating can change randomly, and they want the system to be very volatile in response to their own results. If they go on a statistically insignificant winning streak, they want their ratings to shoot up. Not a rating system to go "meh, it's probably just random chance".
I can't follow this argument; the point of doing this match-by-match and percentage-to-win -wise is exactly so that the number of games and placement do not matter. You won a round against someone with higher ELO? Your elo increases, their decreases. Doesn't matter if this was one game out of 20, or three.
So if a player wants to optimize for ranking, its actually in their best interest to throw round one of a tournament, play more games, and have their skill update more times.
The number of games matter because with more games you have more chances to win and update your score.
That's only the case if you believe your current ELO underestimates your real ability respective to the opponents you'll meet in the lower bracket.
Also if you lose your first game against a low-ranked player, you'll immediately lose a lot of points; then the wins against other low-rank players will not give you many points back.
If you're within a calibrated ELO system, your expected change in rating should be 0 for a match, and then having more matches doesn't actually help you.
(1) Figure out a matchup discrepancy matrix
e.g. Peach vs Puff winrate is 0.43
(2) Use an Elo head-to-head variant where the Elo update function takes matchup discrepency into account
- A vs B has an expected 0.9 winrate
- A is Peach and B is Puff
- Elo update is done expecting A to have a winrate of 1 - (1 - 0.9) * (0.5 / 0.43) = 0.884
The next best Puff player is #38, and doesn't have any wins against top 10 foxes. Is HBox just the best player ever, consistently winning a "bad" matchup, or is Puff a better character than people commonly believe? Who's to say?
Well, TFA had no bones about calculating one.
> Is HBox just the best player ever
The current data says pretty definitively, yes.
If other players can learn how to get his winrates vs Fox, then the matchup matrix would end up reflecting that. The matchup matrix doesn't need to reflect the perfect ("objective") state of the matchup, just the current one.
(The system I'm talking about would look more suspicious if HBox wasn't considered the best, because it would probably put him at #1 anyway.)
I didn't do a good job of clarifying what I meant. Hbox is obviously the #1 player right now. The question is if he's just totally on another level of every other player, or if we're underestimating puff as a character.
Note that this is a really deep question. There are strong arguments (parry) that in the "20XX" yoshi would be the most viable character right after fox. Given that, is Amsa overrated because he's underperforming how his character should, or underrated since he's overperforimg the "average" Yoshi player?
The system you describe basically just ends up rewarding above average players who use unusual characters. Should Abate be ranked top 20? Probably not, but considering how much he outperforms the "average" luigi (same thing for Amsa, does he deserve to be, say, top 10), he probably would be.
It really depends on what you want the ranking to mean.
If you want it to mean: "If all the players in the world played in a tournament, what would the expected result be", then a normal Elo-like rating system (e.g. glicko-2) should be fine, because all the data available is from real tournaments, and it's not really feasible for players to strategically dodge bad matchups to pad their ratings.
But one criticism TFA has of this method is matchup discrepancy. I'm not sure that's actually important (players choose their mains freely), but if it is can't you just correct for it?
I think you're right that this correction would create an undesirable result. That just means that the matchup discrepancy criticism isn't good.
Because anyone can host a tournament, that makes it very tricky. You can assign a points breakdown for points for the top 64/128 based on number of entrants, prize money but that could inflate people's rankings for doing well in an easy region.
For example, there are very few top 100 ranked players in Europe. Under this system, the 4th-8th best players in Europe could get a huge rankings boost over American counterparts that perform worse in American tournaments where there are many more skilled players. Tennis benefits from that fact that top 50-100 players are usually required to play in most major tournaments. There's not enough money in melee for that to even be a possible requirement for players. (Another example would be small strong regions like Florida or SoCal would be treated equally to weaker regions like Texas/Arizona for local events)
Invitationals would also throw things off, as they often have a large prize pool, but only 16 players invited. With melee, these would need to be treated as an exhibition (worth no points) which would probably lower the stakes for players, lower seriousness, etc. or only sanction certain well known invitationals which might reduce outside investment in Melee.
Another common complaint to this is how it favors seeded players. Although this would have some impact initially, I think this would level off over time once an official ranking was adopted by all tournaments and individual tournament organizers lose seeding powers. In fact, I would expect this to be even less of a factor than in tennis, since in tennis being a top 100 player gets you auto invited to most major tournaments. In smash, anyone can compete at any major tournament, regardless of rank.
This is making me reconsider, although one thing of note is that you choose to play who you want in our setup.
Overall I think this leads to fair rankings, since 'worse' players lose to 'better' players most of the time. As such, the people we think should be in the top and bottom spots have them at the end of the season.