>> Using data from drafts carried out by humans, I trained a neural network to
predict what card the humans would take out of each pack. It reached 60%
accuracy at this task.
Going by what's in the linked notebook, the model was evaluated on its ability
to match the decks in its training set card-for-card.
Without any attempt to represent game semantics in the model, the fact that
the deck sometimes "predicts" different picks than the actual picks in the
dataset tells us nothing. It probably means the model has some variance that
causes it to make "mistakes" in its attempt to exactly reproduce its dataset.
It certainly doesn't say that the model can draft a good M:tG deck, certainly
not in any set other than Guilds of Ravnica.
>> The model definitely understands the concept of color. In MTG there are 5
colors, and any given draft deck will likely only play cards from 2 or 3 of
those colors. So if you’ve already taken a blue card, you should be more
likely to take blue cards in future picks. We didn’t tell the model about
this, and we also didn’t tell it which cards were which color. But it learned
anyway, by observing which cards were often drafted in combination with each
This is a breathakingly brash misinterpretation of the evidence. The model's
representation of a M:tG card is its index in the Guilds of Ravnica card set.
It has no representation of any card characteristic, including colour. If it
had learned to represent "the concept of colour" in M:tg in this way, it
wouldn't be a neural net, it would be a magick spell.
The author suggests that the model "understands" colour because it drafts
decks of specific colours. Well, its dataset consists of decks with cards of
specific colours. It learned to reproduce those decks. It didn't learn
anything about why those decks pick particular cards, or what particular
cards are. All it has is a list of numbers that it has to learn to put
together in specific ways.
This is as far from "understanding the concept of colour", or anything, as can
There are many more "holes" in the article's logic, that just go to show that
you can train a neural net, but you can't do much with it unless you
understand what you're doing.
Apologies to the author for the harsh critique, if he's reading this.
>> Using data from drafts carried out by humans, I trained a neural network to predict what card the humans would take out of each pack. It reached 60% accuracy at this task. And in the 40% when the human and the model differ, the model’s predicted choice is often better than what the human picked.
How the model's pick is "better than what the humn picked" is never made clear, but since accuracy is measured by the model's ability to match its training set, I assume that's also what is meant by "better": the model was better than a human in memorising and reproducing the decks it saw during training.
Well, you'd never evaluate a human's deckbuilding skills by how well they can reproduce a deck they've seen before. Given the same deck archetype, 10 humans will probably make 10 different card choices, for reasons of their own. It's like trying to evaluate how people style their hair by measuring how similar their hair looks to some examples of particular hair styles. It's a concrete measure, but it's also entirely meaningless.
This effort really suffers in terms of evaluation, and so we have learned nothing about how good the model is, which is a shame.
Maybe it's too different of an idea, but in I draft I absolutely evaluate (part of) my skill by how well I've reproduced important components of a good deck. Did I find my seat? Good curve? Enough removal? Then there are format specific things - did I include enough 1/3s for 2, knowing that I'm likely to lose to fast decks with 2/1s if I don't?
Hi. From your write-up and a quick look at your notebook that's what your model is doing. And you measure its accuracy as its ability to do so. Is that incorrect?
> This is as far from "understanding the concept of colour", or anything, as can be.
It is very arguably bad feature engineering - if you have the information readily available, don't make the network infer it - but I think the description is fair.
Word2vec uses a similar model. It starts out knowing nothing about each word except an arbitrary numeric index, and learns everything else by predicting words that appear next to each other. By the end of the training it clearly has internal representations of concepts like "color", "verb", "gender", etc.
The same concept should apply here - by observing what cards are used in similar decks, with enough training data it should eventually associate concepts like card type, color and mana costs to each card.
In this case there isn't enough training data for that kind of resolution, but it has learned that blue cards go with blue cards, and red cards with red cards, and there's no hard lines from there to the concept of color.
Sure this isn't going to "solve" MtG, and I don't think it is a particularly good approach for the problem statement, but I think the idea is workable, and the network could already contain a proto-concept of "color" that would be refined with more training.
A card is blue (resp. red, etc) because it has a blue mana symbol in its
casting cost. Not because it is found in the company of other blue cards. That
is the concept of colour that a model must represent before you can say with
any conviction that it "understands" the concept of colour. In terms of "hard lines"- that's the hard line you must cross.
The kind of model you're talking about then would be a classifier able to
label individual cards with their colours, or an end-to-end model with an internal representation of cards' charactersitics. That is not what was shown here.
A blue card is found in the company of other blue cards, because humans picked them, because of the blue mana symbol in its casting cost.
With proper training, you end up with exactly the "end-to-end model with an internal representation of cards' charactersitics"
Since it can't see the cards, it can't say anything useful about a card it hasn't seen during training, but if you added some new cards and started training again, a pre-trained net might learn the new cards faster than one you train from scratch. That would be evidence that the network has learnt a meaningful embedding.
There is no proof that this network has done so, but I think word2vec shows that it's a feasible approach.
You're assuming way too much capability that is not present. Just because a human can make this inference, it doesn't mean that a neural net can. Neural networks are notoriously incapable of inference, or anything that requires reasoning.
>> There is no proof that this network has done so, but I think word2vec shows that it's a feasible approach.
Word2vec (word embeddings in general) are actually a good example why this kind of thing doesn't work the way you think it does. A word embedding model represents information about the context in which tokens (words, sentences, etc) are found but it does not, in and of itself, represent the meaning of words. The only reason why we know that words it places in the general vicinity of each other have similar meaning is because we already understand meaning and we can interpret the results. But the model itself does not have anything like "understanding". It only models collocations.
Same thing here. You seem pretty certain that with more data (perhaps with a deeper model) you can represent something that the model doesn't have an internal representation for. But just because the behaviour of the model partially matches the behaviour of a system that does have an internal representation for such a thing, in other words, a human, that doesn't mean that the model also behaves the way it behaves because it models the world in the same way that the human does.
And you can see that very clearly if you try to use a model like the one in the article, or one trained on all the magic drafts ever, to draft a set of cards it hasn't seen before. It should be obvious that such a model would be entirely incapable of doing so. That's because it doesn't represent anything about the characteristics of cards it hasn't seen and so can't handle new cards. A human understands what the cards' characteristics means and so can just pick up and play a new card with little trouble.
As to what I mean by "internal representation"; machine learning models that are trained end-to-end and that are claimed to
learn constituent concepts in the process of learning a target concept
actually have concrete representations of those constituent concepts as part of their structure. For
example, CNNs have internal representations of each layer of features they
learn in the process of classifying an image. Without such an internal reprsentation all you have is some observed behaviour and some vague claims about understanding this or learning that, at which point you can claim anything you like.
This is a mostly meaningless semantic distinction. I can ask you to give a synonym for "king" and you might suggest ruler, lord, or monarch. I can ask a word2vec model for a synonym for "king" and it will provide similar suggestions. What "understanding" of the words' meanings do you have that the model lacks? Be specific!
Definitions are abstract concepts, so the fact that you can pick similar words and so can the model are equivalent. To put it differently:
>The only reason why we know that words it places in the general vicinity of each other have similar meaning is because we already understand meaning and we can interpret the results.
Is not correct. The only reason why we know that the words it places in the general vicinity of each other have similar meanings is because our mental models put the same words in the same vicinities.
>Same thing here. You seem pretty certain that with more data (perhaps with a deeper model) you can represent something that the model doesn't have an internal representation for. But just because the behaviour of the model partially matches the behaviour of a system that does have an internal representation for such a thing, in other words, a human, that doesn't mean that the model also behaves the way it behaves because it models the world in the same way that the human does.
This doesn't matter. Just because the model's internal representation of a concept doesn't map obviously to the way you understand it doesn't mean that the model doesn't have a representation of that context. Word2vec models do represent concepts. We can interpolate along conceptual axes in word2vec spaces. That's as close to an internal representation of an isolated concept as you're gonna get. Like, I can ask a word2vec model how "male" or "female" a particular term is, and get a (meaningful!) answer. We never explicitly told the word2vec model to monitor gender, but it can still provide answers because that information is encoded.
>Without such an internal reprsentation all you have is some observed behaviour and some vague claims about understanding this or learning that, at which point you can claim anything you like.
Again, who cares? If it passes a relevant "turing test", what does your quibble about the internal representation not being meaningful enough to you matter? Clearly there's an internal representation that's powerful enough to be useful. Just because you can't understand it at first glance doesn't make it not real.
To address another one of your comments:
> Hi. From your write-up and a quick look at your notebook that's what your model is doing. And you measure its accuracy as its ability to do so. Is that incorrect?
neither I nor the person you responded to is the author. But yes, this understanding is incorrect. It is indeed trained on historic picks, but this is not the same thing as reproducing a deck that it has seen before. To illustrate, imagine that the training set of ~2000 datapoints had 1999 identical situations, and 1 unique one.
The unique one is "given options A and history A', pick card a". The other 1999 identical ones are "given options A and history B', pick b" (yes this is as intended). A model trained to exactly reproduce a deck it had seen previously would pick "a". The model in question would (likely, depending on the exact tunings and choices) pick "b".
This bias towards the mean is intentional, and is completely different than "trying to recreate an exact deck it's seen before", which isn't a thing you normally do outside of autoencoders and as others have mentioned, doesn't make much sense.
Why do you need to admonish me to be specific?
word2vec can only represent meaning by mapping words to other words. I have
a human understanding of language that goes well beyond that. For example, I
don't need to limit myself to synonyms of king- I can use circumlocution: "a
king is the hereditary monarch leading a monarchist nation". word2vec can tell
you which of those words are close to king, in its model, but it can't put
together this simple sentence that describes their relation.
Not to mention I can generate and recognise who knows how many more representations
of the concept "king" than word2vec can. I can draw you a cartoon of a king,
or rather, an unlimited number of them, each different than the other. I can
sing you a song about kings. I can write you a poem. I can dance you an
interpretive dance about kings.
I don't know if you really think that word2vec is really as good as a human at representing meaning, but, just in case: it's not even close.
>> Again, who cares? If it passes a relevant "turing test", what does your quibble about the internal representation not being meaningful enough to you matter? Clearly there's an internal representation that's powerful enough to be useful. Just because you can't understand it at first glance doesn't make it not real.
What is that internal representation?
And I never said as such.
>Why do you need to admonish me to be specific?
Because I'm confident that for any particular definition of "understanding", the difference won't be relevant. Case in point, the one you provided. You're now claiming that a word2vec model doesn't have some "understanding" based on it being unable to demonstrate a specific skill (circumlocution/definition). All of your other objections follow the same general format. Because the word2vec model can't perform a skill that you can, its "intuitive" understanding of a concept must be lesser.
Following such an argument to its logical conclusion, you'd have to agree that you have a better intuitive understanding of language than a paralyzed person, because you can dance the word while they cannot. I doubt you actually hold such a belief.
So if the demonstration of an arbitrary skill isn't the marker of understanding, since that would be unfair to our quadriplegic linguist friends, perhaps performance on specifically relevant skills is how we should measure whether or not some model has the "understanding" you want. To be less abstract, given some embedding that we think has some "understanding" of some concept, we need to get the I/O right. If the same embedding can be placed in models that are wired up to interface with the world differently, but still perform well, perhaps the "understanding" is more than surface level.
Word to vec models clearly "understand" synonyms and antonyms and similar word relations. Word2vec/word embedding based models are also I believe still SoTA in automatic summarization and language translation tasks, although the machinery is fairly distinct from the original paper.
So what we have is representation that can
1. Show you which words are similar to which other words
2. Use that knowledge to summarize text
3. Use that knowledge to translate text to a different language
4. Be poked at by humans where we can find semantically meaningful clusters and patterns via tools like t-SNE.
>What is that internal representation?
For word2vec, for example, its that the vector space the words are in clusters similar words. For this model, its that the vector space clusters similar colored cards.
For complex neural models, who knows. On the one hand, it would probably be very useful if we could glean useful structure from the internal representation, and indeed people are working on that. But on the other hand, they're demonstrably useful even if we don't have a perfect understanding of the structure. And given that we don't understand how and why we humans understand concepts, that's fine for now.
Of course, all of this assumes that "understanding" is even the right word to use. There's a good argument to be made that a neural network can and never will "understand" anything, because that's only something that self-aware entities can do. But again, that's mostly a semantic distinction. If we're discussing the efficacy of word-embedding models and whether or not the representation of concepts in those embeddings is real or just...happenstance, I'm not really sure what you're going for there, the entire question of things like self-awareness is irrelevant.
: I apologize for over-anthropomorphizing an ML model here, but it's the best way of putting this I can think of.
>> For this model, its that the vector space clusters similar colored cards.
I mean- what is the representation you speak of in the previous comment. What data structure holds the model's understanding of M:tG colour? The source code is available online.
The network's hidden layers. I can elaborate, but looking at your profile, you've implemented an LSTM before, so I shouldn't need to delve deeply into how that works. I'm honestly not sure where your confusion or aversion to the idea that much models can learn to encode semantically meaningful concepts is.
The concept of "color" is never explicitly encoded anywhere by a human. It infers clusterings based on the correlations between which cards are chosen. Unsurprisingly, given reasonable training data, those clusters form along useful boundaries in the game world, one of which is color. If you similarly passed a pack containing every card into the model, you'd likely get out what the model's opinion on the best limited card is. No one ever told it that, but based on the training data, the model "figures it out".
My comments on understanding can be summarized as such: either
1. You're of the opinion that nothing that isn't "strong AI" can have understanding, because understanding is some concept unique to conscious entities (or some reasonably similar opinion). This is an almost completely semantic argument, and isn't particularly interesting. Its an argument about definitions that avoids any actual useful academic questions.
2. You think that non-conscious entities can "understand" concepts, but deny that implicit understandings based on learned clusterings is "understanding". This is marginally more interesting, but wrong: if an implicit understanding can pass a "turing test" whereby I mean that the statistical/learned model can perform as well as whatever you're comparing against, whether it is a human or an expert system, at some task, the two things have the same understanding when confined to that domain.
In other words, sure saying a model doesn't "understand language" might be reasonable because language is multifaceted. But suggesting a model that outperform humans on the synonym portion of the LSAT doesn't understand synonyms is silly. Of course it does. Better than humans. Sure it can't express its understanding of synonyms as music or dance, but that's not because it lacks understanding of synonyms, that's because it lacks other basic faculties that we take for granted.
The question of whether or not you or I can introspect the model to see how its understanding is structured doesn't matter. I can't look inside your head to see how your understanding of language is structure. There's no ArrayList<WordDefinition> I can see in your mind. But I think anyone would agree that you and I both "understand" synonyms despite that lack of transparency. Why would you expect anything different from a statistical model?
Please don't do this. Too many assumptions about what and how I think leave a
Yes, a model that can identify synonyms accurately lacks human faculties,
including understanding. That's what modern machine learning boils down to.
There are many tasks we thought would require human intelligence or reasoning,
that can, after all, be reduced to dumb classification. In other words, there
is no need to claim "understanding" to explain the output of a classification
model, just because a human can perform the same task _and_ can understand it.
As to the representation- that is the only thing that matters. If you want to
claim a model represents a concept, you have to be able to show where in the
model's structure that concept is represented. If there is a representation-
where is it?
> If there is a representation- where is it?
Where is your understanding of language?
But, I'm a human being. Why do I need to show you my representation to
convince you I possess human understanding of language?
Conversely, to claim that a statistical model possesses understanding is a
very strong claim that requires equally strong evidence. And since we can
inspect a statistical model's representation- that is where the evidence
should be sought.
According to your own metric, yes! You, not I, are the one who claimed
> If you want to claim a model represents a concept, you have to be able to show where in the model's structure that concept is represented.
So by your own metric, I must question whether or not you understand language, since you have yet to point out to me where you represent that understanding.
(To be clear, I think you absolutely do understand language, but I think the whole idea that the structure matters is absurd. It clearly doesn't to you or I, so why hold a statistical model to a higher standard of "understanding" than you hold yourself to?)
But, I'm a human being. Why do I need to show you my representation to convince you I possess human understanding of language?
Conversely, to claim that a statistical model possesses understanding is a very strong claim that requires equally strong evidence. And since we can inspect a statistical model's representation- that is where the evidence should be sought.
Why not? What makes you special? You're giving humans a double standard here. It's not like we have a particularly clear understanding of the human model of cognition.
And since we're venturing further into the philosophical, I feel safe making the assertion that no, I, a priori, have no reason to believe that you have more understanding of language than a model. I'm only interacting with you online. For all I know, you are a computer program. Why should I trust you and hold you to a different standard than I would any other model?
There is no double standard and the matter is not philosophical. We know humans to be able to understand language- we do not need to prove it in any way, including by examining our representation of meaning in language.
We don't know that statistical language models have an understanding of language, it's a very strong assumption to make that they do, and it must be justified with equally strong evidence. Their internal structure is available for inspection, therefore if they are capable of understanding language this understanding should have a concrete reprsentation that we can identify. If we can't, then they don't have anything like "understanding".
>> Why should I trust you and hold you to a different standard than I would any other model?
Assuming that I'm not a model is a reasonable assumption with a very high probability of being true and the simplest explanation for my participation in this thread.
There is also more nuance in draft. For example, if you see that a particular color set is closes (i.e. people are drafting it), you might grab a particularly powerful card in that color set just to prevent them from obtaining it.
Thinking about your jellybean guessing (wisdom of the crowds) analogy, I don't want an unweighted average of a Ben Stark and a random player. Years ago mtgo elo ratings were public - seems like that could be useful. With something like the deckmaster twitch extension you could (theoretically) grab drafts from stream replays of great players. Especially with the MPL players required to stream there must be some great data on video.
At the end of the day, these are games sufficiently complex that no single strategy will be optimal. Dominion has something on the order of 10^15 possible game configurations. Simulations are just a tool available to prepare and test potential strategies. I suspect MtG would land in a similar space if an "optimal deck builder" surfaced. You still have to play the game and beat your opponent.
Constructed players already (for a very long time) use the collective data of other players — this has lots of names, it is commonly called “net-decking”. It turns out, that the best players and the best deck builders have a lot of overlap, but not completely so, and it’s common for multiple high-ranking players to play the same decks. A key skill to winning tournaments is to figure out what decks are going to be played by the other top players, and how to give yourself the best chance against them.
In other words, MtG is a game where deck-construction and deck-playing are both important and distinct skills, and high-level tournaments generally assume a high level of both, so a helper bot for the former is unlikely to give an advantage (because it’s compared to “the best efforts of a lot of people over a lot of time”).
That is no way to evaluate the quality of an M:tG deck. For instance, it can never tell us anything about "sleeper" decks, or about the value to an existing deck of new cards that are added to a format as sets rotate and so on.
All that the accuracy metric used by the author can do is tell us how good the model is at representing the past. I am of the firm belief that WoTC will be laughing in their tea cups in the tought of banning something as pointless as this. In M:tG the past is about as valuable as a hat made of ice in the tropics.
Edit: For some added context. The way M:tG metagames work is that at the start of a season, there are some "decks to beat" that are usually the most obvious ones in the format. As the format progresses, players often find strategies to beat the decks to beat, initially known as "rogue" or sleepers. These can't be predicted by representing the current decks to beat. Some level of understanding of the game and what's important in a format in terms of strategy etc, is required.
Famous example. "The solution" by Zwi Mowshowitz  that dominated the 1st Pro Tour–Tokyo 2001 Invasion Block Constructed. Mowshowitz noticed that the dominant aggro decks' clocks (sources of damage) were predominantly red, so he stuffed a deck with anti-red cards, shutting down the dominant aggro decks.
That requires way, way more than modelling the current metagame at any point in time.
There is a vast amount of interesting space (both as AImresearch and as a fun toy) between “fully groks the current and predicatable near-future meta on a professional level” and “drafts playable decks”. As I read it, the OP is saying “Hey, I built something surprisingly simple that can draft playable decks. When I compare it to an available corpus of draft data from random human players, I think it does as well or better.” On the other hand, I don’t see anything like “I built AlphaGo for Magic drafts.”
This seems to me to be similar to poker, and considered a game of luck. But i am not familiar enough.
I also found this
Yes, routinely. https://magic.wizards.com/en/articles/mythic-invitational although the prize money amounts can vary wildly, most are nowhere near that high.
> This seems to me to be similar to poker, and considered a game of luck
It's like half-luck, half-skill. Deck construction is largely skill-based (both directly and metatextually), but decks are randomized during shuffling, which is obviously luck-based.
But of course the whole concept of MTG is collectible card game, buying and owning cards is core part of it
It's like saying, "why do movie theaters pay so much for films? Why don't they just download the film from a BitTorrent like everyone else?"
I mean "copyright" as in the act of paying for the game you are playing, rather than reproducing it yourself.
For another example, what stops you from playing Monopoly by just writing "Boardwalk" and "Park Place" on a piece of paper, and drawing up your own fake money and cards? Nothing really, but generally it's considered a good idea to pay for games people make, as a way to reward them for making the game.