
Kelly Criterion (2007) - kwikiel
http://r6.ca/blog/20070816T193609Z.html
======
freddie_mercury
The Kelly Criterion was the subject of an incomprehensibly bitter argument in
the 1970s/1980s. Paul Samuelson, considered by many to be the greatest
economist of the 20th century, believed the Kelly Criterion was wrong. And not
just wrong but SO WRONG that anyone who believed it was an idiot.

The kind of idiot who could only understand single syllable words. So he wrote
a paper in the Journal of Finance and Banking in words of only a single
syllable saying why no one should use the Kelly Criterion.

[http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Con...](http://www-
stat.wharton.upenn.edu/~steele/Courses/434/434Context/Kelly%20Resources/Samuelson1979.pdf)

~~~
joker3
Samuelson's argument basically comes down to the fact that the investments
that maximize growth don't maximize your utility unless your utility is
log(wealth), and to him it was completely obvious that people should act to
maximize their own utilities.

However, there have been a couple recent developments that somewhat undercut
his line of thinking. First of all, to the extent that people's behavior can
be described by utility theory, we all roughly have isoelastic utilities
([https://en.wikipedia.org/wiki/Isoelastic_utility](https://en.wikipedia.org/wiki/Isoelastic_utility))
with a parameter not too much greater than one. As a result, the ideal
investments for most of us are near Kelly.

Second, there's a body of theory that asks what the optimal strategy for
investment is given a certain number of years, and it turns out that the ideal
strategy is to follow the Kelly approach until near the deadline and then to
switch to the investments that maximize your own utility as the time to act
grows short. This work is known as turnpike theory for reasons that never
really made sense to me.

There are still concerns about Kelly investments, but Samuelson's argument
isn't really one of them.

~~~
evrydayhustling
> investments that maximize growth don't maximize your utility unless your
> utility is log(wealth)

This needs a slight adjustment to be _exactly_ true:

> investments that maximize growth don't maximize your utility _at a specific
> time horizon_ unless your utility is log(wealth)

The reason is that in situations of repeated investment, Kelly's policy
actually does maximize many time-indefinite notions consistent with linear
utility -- for example, if you have a fixed wealth goal (retirement) that is
enough successful wagers away, Kelly will minimize the expected time to get
there [1].

For me, the important limitation for Kelly is that it is designed around
timescales that involve many many repeated bets. Like most asymptotic results,
Kelly can win marathons and lose sprints -- so it's important to consider
which situation you're in.

[1]
[https://www.stat.berkeley.edu/~aldous/157/Papers/Good_Bad_Ke...](https://www.stat.berkeley.edu/~aldous/157/Papers/Good_Bad_Kelly.pdf)

------
soVeryTired
The the main flaw of the kelly criterion (along with a number of other results
in investment theory like Markowitz allocation) is that in practice it's
extremely difficult to know the distribution of the result you're betting on.

The mean and variance of a prospective investment are not observable. But more
to the point, if you try to use some sort of proxy like a sample mean or
standard deviation, you'll get inconsistent results over time. We're a long
way from the clean, simple, i.i.d world that theorists like to play in.

~~~
miketery
In such cases what are models that are more effective?

~~~
iopq
You could probably model something that uses sampled variance less
aggressively

~~~
sjg007
Something bayesian

------
evrydayhustling
One thing I find really interesting about the Kelly Criterion is that it
exposes a very stealthy and fundamental "rich get richer" phenomenon.

Most real-life risks have a minimum and maximum investment amounts, meaning
that you can't just size the bet exactly as Kelly says. So if your wealth is
low, you cannot rationally participate in many risky but positive-expected-
value investments.

Simply put, the poor can't take many worthwhile risks (think college!) without
rising ruin (and sub-optimal growth). Conversely, the rich can come closer to
maximizing EV in many risky markets at once, increasing income and growth
while even decreasing variance.

------
lordnacho
Ex hedge fund guy here. You can extend this into continuous space and use that
to tell you how much leverage you should have, given some Sharpe ratio.

Results may surprise you (it's a lot for even a modest Sharpe). But also most
practitioners aren't going to use the full number. If you've overestimated it
you are always worse off on the right side of that.

~~~
joosters
It's common to use some percentage of the Kelly-calculated stakes in betting,
Kelly might be the _optimal_ sized bets but it is extremely aggressive. IIRC,
even if you are calculating your % edge correctly, Kelly staking means that
any point, you've got a 50% chance of losing 50% of all your existing bank at
some point in the future. Not many investors/gamblers can stomach that!

~~~
iopq
I mean, it's not acceptable to use it if there are no stakes below a certain
point and no way to borrow more money. Or if the stakes below a certain point
have worse returns (higher house take)

------
haliax
The Kelly Criterion is the subject of an absolutely incredible book by William
Poundstone called "Fortune's Formula".

In the course of discussing the formula, the book takes you through the birth
of the MIT blackjack team, the genesis of statistical arbitrage, and mini
biographies of people like Claude Shannon and Ed Thorpe. I can't recommend it
highly enough.

~~~
r00fus
Don't forget the organized crime connection. It was an absolutely fascinating
read.

My dad was a daytrader (read: armchair gambler) but this helped him curb his
trading - he wasn't ready to do all the statistical analyses to keep
rigorously investing.

------
aidenn0
The most interesting thing to me about the Kelly criterion is that it
demonstrates that the martingale system[1] is a bad strategy even if the odds
are in your favor!

While it's immediately obvious that the martingale is bad if the odds are in
the house's favor, it's less obvious that you are likely to go bankrupt with
the martingale even if the odds are slightly in your favor (assuming the
house's bankroll is much greater than yours).

1:
[https://en.wikipedia.org/wiki/Martingale_(betting_system)](https://en.wikipedia.org/wiki/Martingale_\(betting_system\))

------
avvt4avaw
This massively oversells the usefulness of the Kelly Criterion. The opening
lines are

> One should buy stock when it is undervalued. What I have always wondered
> about is how much stock one should buy. A few months ago I stumbled upon the
> answer which is given by the Kelly criterion.

But the rest of the post analyses a mathematical game which has nothing to do
with buying stocks, and is in fact only useful in theoretical situations where
you know the precise distribution of outcomes.

~~~
phkahler
>> But the rest of the post analyses a mathematical game which has nothing to
do with buying stocks, and is in fact only useful in theoretical situations
where you know the precise distribution of outcomes.

You might want to think about that. In the real world you don't know anything
with precision which means you probably can't do better. One of the
unrealistic aspects of it was the idea that you can play as many times as you
like, but that's not possible in the real world. But we can move toward
playing many times through diversification. That's a good lesson in itself.

~~~
np_tedious
I don't think diversification is much like playing multiple games at all.

One expands on the axis of time, with the game constant. The other expands on
the number of unique games all played simultaneously

------
anonu
Always a good topic to discuss. Many applications to high-frequency trading
due to the probabilistic nature of outcomes.

Here are 2 previous HN discussions:

[https://news.ycombinator.com/item?id=13143821](https://news.ycombinator.com/item?id=13143821)

[https://news.ycombinator.com/item?id=2504222](https://news.ycombinator.com/item?id=2504222)

------
krackers
I've only barely looked into the Kelly Criterion, but can someone explain the
intuition behind maximizing the expected value of the _log_ of your wealth?
Trying the same derivation mentioned in the article but without the logarithm:
the expected value comes out to 0.5×(1 + 1.1×f) + 0.5×(1 − f) = 1 + 0.05f
which would make it seem that betting the entire fraction always maximizes
your expected value. But why does this reasoning break down in the long term,
and why does maximizing the log seem to make it work?

~~~
JoshuaDavid
Betting everything maximizes your mean wealth. Using the Kelly Criterion
maximizes your median wealth.

Let's say there are 1024 (2^10) people with $1 each, and a bet which is 50 /
50 of returning either 3x or 0x (i.e. you bet $1 and either get back $3 or
$0). Let's further specify that there will be 10 rounds of betting.

Using the "bet everything" strategy, you end up with the following
distribution of outcomes:

10 wins, 0 losses -- 1 person: $59,049 1 or more losses -- 1023 people: $0

The average ending balance is $57.67, while the median ending balance is $0.

Using the Kelly Criterion, one will bet 25% of the bankroll each time. You
will end up with the following distribution of outcomes.

0 wins, 10 losses -- 1 person: $57.66 1 wins, 9 losses -- 10 people: $28.83 2
wins, 8 losses -- 45 people: $14.41 3 wins, 7 losses -- 120 people: $7.20 4
wins, 6 losses -- 210 people: $3.60 5 wins, 5 losses -- 252 people: $1.80 6
wins, 4 losses -- 210 people: $0.90 7 wins, 3 losses -- 120 people: $0.45 8
wins, 2 losses -- 45 people: $0.22 9 wins, 1 losses -- 10 people: $0.11 10
wins, 0 losses -- 1 person: $0.05

The average ending balance is $3.25, while the median ending balance is $1.80.

Incidentally, the choice of maximizing the median is somewhat arbitrary. You
can also use a similar approach to maximize, say, the 25th percentile outcome,
at the expense of average and median outcomes (You would bet 10% of the stake
each time, yielding a 25th percentile of $1.08, a median of $1.25, and a mean
return of $1.28).

~~~
OscarCunningham
If you're allowed to bet a different proportion of your money in each turn
then you can push the median even higher than the Kelly criterion. For example
consider the strategy that bets nothing if it has more than $2.98, whatever is
needed to reach $2.98 on the next bet if it has less than $2.98, or bets
everything if it can't reach $2.98 on the next bet. This strategy has a
greater than 50% chance of ending up with $2.98 since it has a 50% chance of
success on the first bet and a nonzero probability of success after that. So
its median outcome is $2.98.

I don't know which strategy actually maximises the median, but I suspect it
involves being conservative after you've been lucky and agressive after you've
been unlucky.

~~~
szemet
I guess you are right. A minor issue is that you have to beat the 1.8 value of
OP in (or in less than) ten rounds because if you have infinite rounds then
median can grow infinitely (am I right?). It is not clearly visible in your
solution, that it really happens in less then 11 rounds.

But it is easy to extend the strategy with this: First get half of the people
above 1.8 by betting 0.400000000...1. Now half of the people have 1.80...1.
They are finished.

Half of the people have .5999. They can do all-in twice, then 25% of them will
be above 1.8. That is 3 rounds in total.

edit: the maximum of this exact 3 round strategy is betting 8/11$=0.72 in
first round, then in round 3 we will have 62.5% of people at 2.45$ edit 2: got
the numbers wrong in previous calculation, fixed

(Now someone should chime in, and present the actual optimal solution. ;)

~~~
OscarCunningham
The reason I chose $2.98 was that if you follow my strategy then that will
lead to you betting $0.99 in the first round. If you win then you end up with
exactly $2.98 and stop betting. If you lose then you end up with $0.01. If you
were to bet all of it for the next 9 rounds then there is a 1/2^9 chance of
you ending up with $0.01×3^9 = $196.83. Since this is more than $2.98 my
strategy must have at least a 1/512 chance of achieving $2.98 even if it loses
the first round. Since the chance of winning the first round is 1/2 the
overall chance of ending up with $2.98 is greater than 50% and hence $2.98 is
the median.

I think I can run a computer search to find the actual optimal strategy. I'll
report back with the results.

~~~
szemet
Ok.

An additional thing is, if you find the optimal strategy in the form of: "in n
round the best achievable median is x betting this and this way", you should
check the strategy when n approaches infinity - I will not have been surprised
if it turns out to be the Kelly Criterion after all - and OP will be right in
this sense (though presented the fact _somewhat_ badly by limiting the number
of rounds)

~~~
OscarCunningham
Okay so I ran my program and I didn't manage to find the exact optimal median
but I narrowed it down a lot. I'm assuming each bet is a whole number of
cents.

I found that there was a strategy that achieved $6.84 with probability 1/2 +
1/2^10. So the median is at least $6.84 (which is 3.8 times the median
achieved by Kelly criterion!).

I found that there was no strategy that achieved $6.87 with probability 1/2 or
greater, so the median is less than $6.87.

I found that $6.85 and $6.86 could both be achieved with probability 1/2 but
no greater.

So the median is somewhere in the range [$6.84, $6.855].

The optimal starting bets of these strategies show some interesting behaviour.
For example if you want to have a greater than 1/2 chance of achieving $6.84
then your starting bet can be $0.22, $0.26, $0.31, $0.36, $0.40 or $0.50, but
no other value. Make of that what you will.

------
n4r9
It's the expected growth rate, so yes it would vary in a real-world instance.
I wondered for a while why they've taken the logarithm, but I think it's just
because growth models are normally defined exponentially (with the rate being
a parameter inside the exponential) - it shouldn't make a difference to the
result.

As for varying, that Samuelson article says:

> For N as large as one likes, your growth rate can well (and at times must)
> turn out to be less than mine - and turn out so much less that my tastes for
> risk will force me to shun your mode of play.

~~~
btilly
The real reason to take logarithms is that investment strategies repeatedly
multiply our net worth by random factors. Taking the logarithm turns
multiplication into addition, and we know a lot about the statistics of adding
lots of random things together. (Thanks to the Weak Law of Large Numbers, the
Strong Law of Large Numbers and the Central Limit Theorem.)

~~~
arnioxux
I think the log is just the utility function. You could substitute it with any
non-linear utility function and it would've gave you another answer that makes
sense for that utility.

The main takeaway is that a linear expected utility doesn't make sense.

It would've told you to bet all your wealth every game, which does result in a
higher linear expected value, where you win (1+1.1)^N with probability 1/2^N
at time N, but 0 otherwise. But no real human would take the bet of extreme
high payoff at extremely rare chances with ruin otherwise.

Also see St. Petersburg paradox for a similar "paradox" resolved with expected
utility theory.

~~~
btilly
I'm sorry, but you are plain wrong. The log has nothing to do with utility.
And there is no chance of really understanding the result if you're confusing
yourself with that bad idea.

To start, EVERY utility function that is both increasing and sublinear will
agree that Kelly is the best strategy. Whether square root, log, or bounded -
it doesn't matter. The details of your utility function are unimportant.

What matters is that each iteration of an investment strategy multiplies your
net worth by a random factor. But log turns multiplication into addition. And
statistics has very strong results about sums of independent variables.

The result is that with 100% odds, a player following Kelly will eventually
wind up ahead of any other static strategy that you could choose. Both wind up
ahead and eventually remain ahead. Which is why a wide variety of utility
functions will conclude that Kelly is the optimal strategy.

~~~
OscarCunningham
> To start, EVERY utility function that is both increasing and sublinear will
> agree that Kelly is the best strategy. Whether square root, log, or bounded
> - it doesn't matter. The details of your utility function are unimportant.

This is simply false. This is easy to check for the sqrt utility case. You can
calculate the optimal proportion for a single bet and note that it's different
than for Kelly, and then you can calculate the utility-function-given-that-
you're-about-to-make-a-bet and check that it's still proportional to sqrt. So
by induction you are always going to bet the same proportion no matter how
many bets you have to make, and this proportion is different from Kelly.

> The result is that with 100% odds, a player following Kelly will eventually
> wind up ahead of any other static strategy that you could choose.

This is true in the sense that the probability tends to 100% as the number of
bets tends to infinity. But this doesn't make Kelly optimal, because in the
event that the Kelly isn't ahead the expected utility of the other strategy
could be much higher than Kelly.

~~~
btilly
For one iteration? Sure, you can get any answer. However attempting to apply
induction to that is wrong because as the number of iterations increases, the
range of likely rates of return for each strategy converges, and Kelly is the
one that converges to the highest rate.

As for the 100% odds answer, what I said was true is true in the sense that it
is actually true. No ands, ifs, or buts. With 100% odds, Kelly eventually wins
over any other strategy. Period.

The question of whether this makes Kelly optimal is not the question that the
theorem was trying to answer. And therefore is irrelevant. Now in fact this
does make Kelly optimal for a wide range of utility functions. But far from
all possible ones.

The point being that it is important to separate a mathematical point from our
interpretation of what that point implies. When you confuse the two then you
get yourself into an unnecessary muddle. Kelly is a statement about the
probability of one strategy beating another. It isn't a statement about how
you should bet.

~~~
arnioxux
Using ln as our utility, at time N we get:

    
    
      E[ln(X_1*...*X_N)] = sum E[ln(X)]
    

So kelly happens to maximize this expected utility by construction since it
was derived by maximizing expected log of one round of betting (E[ln(X)]).

But if we use square root as our utility instead:

    
    
      E[sqrt(X_1*...*X_N)] = prod E[sqrt(X)]
    

We would maximize this expected utility by maximizing E[sqrt(X)]. Going
through the same calculus we can see that we don't arrive at kelly.

Where did I go wrong?

------
bedhead
There was a hedge fund manager named Mark Sellers who blew up his fund in 2008
by following the Kelly Formula exactly, which had told him to put 90% of his
fund in a single stock, which was a small offshore oil/gas driller. It was
real money too, over $200 million.

~~~
ignoranceprior
Nominative determinism? Although perhaps Mark Buyers would have been even more
appropriate.

------
pyrex41
The reasoning behind the Kelly Criterion was explored recently in a more broad
context, showing that the logarithmic utility is not required:
[https://aip.scitation.org/doi/10.1063/1.4940236](https://aip.scitation.org/doi/10.1063/1.4940236)

Taleb has a good discussion here: [https://medium.com/incerto/the-logic-of-
risk-taking-107bf410...](https://medium.com/incerto/the-logic-of-risk-
taking-107bf41029d3)

------
pyrex41
Most of the examples of Kelly criterion application are either concrete bets
with discrete payoff/loss odds and values, or assumed to be normally
distributed. This paper discusses how extremely skewed outcomes (eg, stock
options) should affect the Kelly calculation:
[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2956161](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2956161)

------
OscarCunningham
I think the Kelly criterion doesn't apply as widely as people think it does.
Its derivation is based on on maximising the growth rate of your fortune. But
this is equivalent to assuming that money has logarithmic utility for you. If
you don't value money in a logarithmic way then you shouldn't use the Kelly
criterion.

Personally I feel that my utility function is sublogarithmic. If I'm just
spending on myself then beyond a certain point additional money makes me
absolutely no happier. Note that the usual justification of progressive
taxation also assumes sublogarithmic utility. So based on this we should be
more conservative than Kelly.

On the other hand, if I plan to give money to charity then my utility function
is almost linear. Big charites can absorb a lot of money without becoming less
effective. So in this case you should be maximally aggressive, betting
everything at every opportunity.

Sometimes people say that because the Kelly criterion maximises growth rate it
will be the best "in the long run" even if your utility function isn't
logarithmic. But I've never seen any evidence of this. Does anybody know of a
toy model where you can prove the Kelly criterion is optimal even if your
utility is linear?

~~~
imh
>Sometimes people say that because the Kelly criterion maximises growth rate
it will be the best "in the long run" even if your utility function isn't
logarithmic. But I've never seen any evidence of this. Does anybody know of a
toy model where you can prove the Kelly criterion is optimal even if your
utility is linear?

Here's one
[https://greek0.net/blog/2018/04/18/kelly_criterion3/](https://greek0.net/blog/2018/04/18/kelly_criterion3/)

Basically it says that if you are making bets where money_(i+1) = f_i(money_i,
x_i), such that your money always remains above zero, then you can apply the
product form of the law of large numbers
[http://www.jams.or.jp/scm/contents/e-2006-6/2006-60.pdf](http://www.jams.or.jp/scm/contents/e-2006-6/2006-60.pdf)

That means that over a long enough period any betting strategy that maximizes
the geometric mean of the rates will beat any other bet with probability
approaching 1. p(money(optimal strategy) > money(other strategy)) --> 1.

If your utility is monotonic (x > y implies that u(x) > u(y)) then I think
this also implies that p(utility(money(optimal strategy)) >
utility(money(other strategy))) --> 1.

Basically, you are eventually almost sure to have more money with this
strategy than any other. If more money implies more utility, then you are
eventually almost sure to have more utility with this strategy than any other.

~~~
OscarCunningham
> this also implies that p(utility(money(optimal strategy)) >
> utility(money(other strategy))) --> 1

That's true (at least as long as the other strategy also bets a constant
proportion of wealth each turn). But it doesn't mean that Kelly is optimal. It
could be that in the (increasingly unlikely) cases when the other strategy
beats Kelly, the utility produced by the other strategy is _much_ greater than
that produced by Kelly. Then the other strategy could still be better overall.

In other words, even though we have

    
    
        p(utility(money(Kelly strategy)) > utility(money(other strategy))) --> 1
    

we also have

    
    
        Expectation[utility(money(Kelly strategy)) - utility(money(other strategy))] < 0
    

Here's a simplified example of what's going on: consider the bet where you get
a penny with probability 1-1/n, and otherwise you lose $(2^n). I think this
bet gets worse and worse larger n gets, but the probability of having higher
utility if you take the bet tends to 1.

~~~
imh
>That's true (at least as long as the other strategy also bets a constant
proportion of wealth each turn).

Does it require that assumption? I don't think it even requires identically
distributed returns on each component bet. It just needs money_(i+1) =
f(money_i, x_i) to have strictly positive support right? Then you can just
push it into log space and apply the law of large numbers, telling you to
maximize the expected log of f(money_i, x_i) wrt x_i. Nothing about fractions
or constant fractions shows up in that derivation.

The usual statement of the Kelly criterion is about fractions, but the more
general question is whether maximizing expected log is (eventually) optimal,
which seems to only require being able to apply the law of large numbers in
log space.

~~~
OscarCunningham
Imagine we started with $100 and were betting on a fair coin with fair odds.
There's no edge so Kelly says to bet $0, and hence the Kelly strategy stays at
$100 forever. You can give yourself a very high chance of beating the Kelly
strategy if you use a Martingale strategy. Bet $0.01. If you win then you have
$100.01 and you should not bet again. If you lose then bet $0.02 next time,
and then $0.04 and then $0.08 doubling each time until you win. When you win
you will go up to $100.01, which beats the Kelly strategy. So Martingale beats
Kelly unless you go bankrupt by losing too many times in a row before getting
a win, which only happens with a very small probability. So there are
strategies that have a high probability of beating the Kelly criterion.

For simplicity I gave the above example in the degenerate case where the edge
is zero. But I think the analogous strategy works in all cases. If you aim for
just a penny over the Kelly strategy then you have a high probability of
success.

~~~
imh
Fair point. And while it's too late to edit my post, I think I found the flaw
in the math.

Just because X converges in probability to x, f(X) doesn't necessarily
converge in probability to f(x). If that were so, then logarithmic utility
would be sound, but it isn't.

------
dafty4
"If all the economists in the world were placed end to end they would not
reach a conclusion" -Isaac Marcoson (attrib. 1933 by O.O. McIntyre)

[http://www.systemicrisk.ac.uk/sites/default/files/downloads/...](http://www.systemicrisk.ac.uk/sites/default/files/downloads/publications/dp-52.pdf)

------
praptak
The correct number of individually picked stocks you should buy is most
probably zero: httpedmarkovich.blogspot.com/2013/12/why-i-dont-trade-stocks-
and-probably.html?m=1

~~~
iopq
If you plan on tax loss harvesting through direct indexing, you can buy 500
stocks directly and sell one stock to buy a highly correlated one to decrease
your tax payments in taxable accounts.

------
bigpicture
Wow, this is the 2nd time in two weeks that something has appeared on my
probability homework and then made the front page of HN almost immediately.

Coincidence?

Who else is taking Stat 110?

------
rlander
A similar (but more useful) position sizing strategy is the Optimal F formula,
described by Ralph Vince in his book Portfolio Management Formulas. But its
real value is in showing you your 'cliff of death' curve: how close you can
get to bankruptcy given your position sizing.

In my opinion, position sizing is more way more important (and less
understood) than market timing.

