The Kelly Criterion— Maximizing a Gambler's or Investor's Most-Likely Final Amount of Wealth

A Case for Kelly

There is, for example, such a thing as a “listed stock option”, of which there are two types. For present purposes we could say that a stock option is a bet that the price of a particular stock will be either above or below a stated price, the “strike price”, by a given expiration date. If the option pays off when the stock finishes above the strike price then it's a “call” option; “put” options are bearish bets in the other direction. If at expiration the option is a winner, if it finishes “in the money”, it returns 100 times the absolute value of the difference between the strike price and the price of the stock on that day (customarily each option contract pertains to 100 shares of stock, hence the multiplier of 100). The options marketplace sets the price of each option, the amount bet, which is called the “premium”.

Now there are some options that don't expire for, say, two years. And “warrants” are very similar to call options and at issuance they may be set to expire more than a decade hence. But the most liquid options contracts are those that expire within about three months, and most of the action in the options marketplace is with options whose strike prices are not very different from the current market price. That means that the odds of winning at least something or of getting nothing back for the premium on any given bet are usually something like 50-50, most often not outside of, say, 70-30 either way. Given the short times until expiration and those odds, an options “investor” could hypothetically make many, many such bets in a career, each of them posing a very substantial risk of getting back nothing for the premium.

We immediately see the problem. If some night at the casino you want to guarantee that you'll be retiring early you can just put everything on black at the roulette table and let it ride. You'll have lost all in at most a few spins of the wheel. The stock option investor can't very well “let it ride”, put up everything on the option contract each time and hope to survive. So how much should the investor be willing to pay out as premium each time? What fraction of his capital? Well, the famous “Kelly criterion” determines a formula for the optimum size of each bet in a given set of gaming circumstances with the goal being to maximize the growth rate of the accumulated wealth over the long run. The purpose of this article isn't to develop some scheme for options trading; it is to explain the Kelly criterion.

And to figure out what the Kelly criterion is all about is to understand that if the most-likely rate of growth of an investor's equity is to be maximized then the fraction of his equity that should be risked at any one time is often, especially in the circumstances of retail investors, considerably less than 1.0 . That is not what, say, mutual fund managers typically do. They are usually obliged to remain as close to 100% invested as possible and, as we shall see, there is some virtue in that.

The concern about what fraction of equity to risk on a risky asset arises when simply holding stocks and even with the now-popular approach of investing in Exchange Traded Funds (ETFs). True, it's a much milder concern than is the case with options, and the concern is particularly mild with the ETFs because with such securities there is generally no chance of ever finishing absolutely out of the money, of suffering anything like a total loss of the amount put up. But the fact remains that the compounding of returns on investments that are not risk-free does not proceed in quite the same way as the compounding of risk-free investments and that untoward outcomes might be ameliorated by paying some attention to the mathematics of compounding.

In part due to its pure emphasis and applicability only to outcomes after many, many trials, which generally means to long-term outcomes, the Kelly criterion is hardly in use by investment advisors and portfolio managers who allocate money to stocks, bonds, ETFs and the like. Not only are they held accountable for their performance annually, not over the long term, and not only do their clients have limited time horizons, but we'll also see that in order to effectively apply the criterion some statistics on the future performance of the securities must be rather well known in advance. And those statistics are never well known (possibly they are if the game is Blackjack, but not if it's stock market investing). Most such advisors and managers therefore rightfully disregard Kelly's observation entirely. In all, although professionals who conduct many, many transactions through the years might benefit by paying some heed to the Kelly criterion, the truth is that the criterion will not bring you fortune— not if you are a retail investor. However, what the Kelly mathematics has to say about whether or not everyone should always be fully invested, holding essentially nothing in cash, is of practical importance. Please read on!

Kelly's Criterion

It's not too much of a reach to refer to the purchasing of a stock option as a “bet”. Kelly was not a gambler but although he developed his formula while working constructively on information theory for the improvement of electronic communications his published article did in fact demonstrate application to gambling— to parimutuel betting on horse races. So the Kelly criterion has also been applied to gambling and the greatest need for knowing about it is with regard to all such risky endeavors.

The Kelly criterion requires the computation of an “expectation value”. Where a quantity has a set of possible values the expectation value of that quantity is the arithmetic average of those possible outcomes, weighted in proportion to the theoretically-known likelihood of occurrence of each. In general, some particular value might be vastly more likely to occur than any others yet not be so much as close to, let alone equal to, the expectation value.

Let us consider that an investor begins with a given starting wealth and does nothing else with it but use it to repeatedly assume a position of some size in a given security. For example, the investor could at regular intervals— e.g., every week, month or year— adjust the amount committed to the security with the rest being held as cash. Or, a gambler could repeatedly bet on a particular game of chance. And let us further assume that the individual persists through such a large number of “trials”— that's what the statisticians call them— as to, in effect, fully encounter the entire distribution of possible outcomes for each trial, possibly many times over.

And for each trial we can compute an overall return ratio: the total amount of wealth at the end of the trial divided by the total at the beginning, which definition does not preclude the funds put at risk each time being by choice only a fraction of the available wealth. The Kelly criterion determines the fraction of the wealth at the beginning of each trial that must be committed each time in order to maximize the average rate of growth toward the final wealth amount over many trials. We will see that this maximization amounts to maximizing the expectation value of the logarithm of the final wealth amount (not the final wealth amount)— with the expectation value being taken using the probabilities of occurrence of the theoretically-expected distribution of the return ratios. Again, the rationale is that under the specified imagined circumstance of an ultimately large number of trials the distribution of the return ratios that would thereby hypothetically be realized would thoroughly exhaust and therefore replicate the entire theoretically-expected distribution. “Taking the expectation value” amounts to substituting the latter distribution for the former, the latter being the one that is assumed to be known (though in reality it may be poorly known). In the United States we used to have an idiomatic expression, a colloquialism, for that: “buying the average”. That's what it meant. In academic circles it's called the “law of large numbers” and it's attributed to Gerolamo Cardano.

Yes! The logarithm. The Kelly purpose is to come as close as possible to maximizing the rate of growth of the starting capital— should that be what you are determined to do, the attendant risk notwithstanding. The Kelly criterion is really a natural thing, no voodoo about it, just math; the logarithms are not there as a contrivance, not there in the guise of a “utility function”, the favorite tool of the welfare economist Paul Samuelson, but are demonstrably mathematically necessary to maximize the rate of growth of the investor's wealth. But we're getting ahead of ourselves. The logarithms and the rest of the mathematics are derived on page 3 of this article under “A Bit of the Mathematics”.

I am aware of this article by Samuelson and Merton. The latter had been Samuelson's student and later became a promoter and director of Long-Term Capital Management, which he helped “blow up”. The article is in part an attack on the use of the Kelly criterion as a potential cornerstone of portfolio management. Perhaps the authors' real concern is about guidance for retirement plan portfolios and the like; the authors are probably not talking about whether or not it could be OK for some venturous hedge fund to commit some portions of the assets of their “accredited investors” to strategies that are in some way modulated by the use of the Kelly criterion. That is to say that I have only skimmed the article and do not intend to finish reading it. Early on the authors seem to commit to imposing utility functions on investors so as to compel them to assert their own risk tolerance in particular ways. It would not be surprising to find that these authors merging the Kelly math with their own particular utility function math might produce untoward outcomes. I might have gotten further in the article had I not encountered, on the fourth page, a “thoughtful person” invoked as a component of the argument. But the authors do, commendably, in their second paragraph, admit to the gross failures of “mean-variance” models, which are today known as modern portfolio theory (MPT) and which are still foisted off on investors by many firms and advisors.

William T. Ziemba, academician and respectful colleague of Samuelson, has responded to Samuelson's various concerns at length here. The article is also generally informative about the use of the Kelly criterion and the list of references is extensive.

With schemes that fully implement the criterion come levels of risk that can be formidable, especially in the early going. That does not refer to the early going of your efforts to understand and correctly apply Kelly. Rather, with Kelly perfectly applied, account equity can dive towards zero before recovering. But you can moderate the risk by being less aggressive and accepting sub-optimal rates of growth. And we'll soon see how the mathematics of Kelly helps us with that decision.

p class="indent">When it comes to developing expectations for real-life circumstances such as actually trading in stocks or stock options, nothing can be done that is very accurate and so great care must be taken to assess whether or not the resultant trading scheme is likely to have any reliability to it at all. You have to do proper hypothesis testing, which happens to be the business of Retail Backtest. “Wheels of Fortune” are depicted on this page. A truly random wheel of fortune game doesn't present any of the complications of securities and the theoretically-expected distribution that we need to know in order to implement the Kelly criterion is printed on its face. That's what we'll consider next.

Mike O'Connor is a physicist who now develops and tests computerized systems for optimizing portfolio performance.

A Poundstone Wheel History

Geometric Mean & Deviation Therefrom
v. Betting Fraction

Click-drag to zoom in; double-click to zoom out;
shift-click-drag (quickly) to pan.

Poundstone's Wheels of Fortune— Click to Spin

Note: The wheels are used with the kind permission of Mr. Poundstone.


The Book “Fortune's Formula”

It's by William Poundstone and it's about the Kelly criterion and the characters who gave it life. The title is from an article by Edward O. Thorp, a renowned mathematician and hedge fund manager who used the Kelly criterion in both gambling and investing with great success. It's a worthwhile book overall, a lively one, one for which this web article is no substitute (owing in part to its utter failure to reference gangsters and ponies). But I'd have to say that the book isn't quite going to suffice if you want to learn the mathematics of the Kelly criterion so as to be able to apply it to anything: the book is written so as to be readable by the general public; for full understanding integral and differential calculus is needed, albeit mainly just calculus of a single variable. Thorp's articles are the main place to go for the mathematics but you will find an introduction to the math, one that avoids most of the difficulties, on page 3 of this article under “A Bit of the Mathematics”.

This article is related in part to a particularly important section of the book, one in which the Kelly criterion is discussed in relation to three wheels of fortune— “The Trouble with Markowitz” section in Part Three, “Arbitrage”. There's one wheel for each of three penny stocks, each with its own possible outcomes. A spin of a wheel is taken to simulate the outcome of a $1 investment in a penny stock over a holding period of a year.

The wheels are shown on the first page of this article and it may be helpful if you open that page in another tab or window of your browser for access as you read this page. The numbers on each wheel are the possible dollar values of your initial dollar investment in the penny stock at the end of the year. In the book the idea is to see which wheel is the best one, from the point of view of a Kelly investor versus one who does not heed the effects of compounding the returns of risky investments.

Those wheels of fortune fairly cry out for the JavaScript-powered widgets that I have provided. The widgets allow you to spin a wheel of your choice yourself, very rapidly and many times in succession— take that, Vanna White. Of course JavaScript must be at least temporarily enabled on your browser for any of it to work. All of the calculations are done on your own computer.

While preparing the JavaScript I became puzzled by one key paragraph in that section of the book. It is in order to be able to effectively clarify the meaning that I have taken the liberty of re-using the very same wheels that Mr. Poundstone used (actually I have his kind permission). My understanding comes about in part from having actually applied the Kelly mathematics to the given wheels. Here's the paragraph in question:

The worst wheel by the Kelly philosophy is the second. That's because it has a zero as one of its outcomes. With each spin, you risk losing everything. Any long-term “investor” who keeps letting money ride on the second wheel must eventually go bust. The second wheel's geometric mean is zero.

Whether the player repeatedly bets an optimal amount as determined by the Kelly criterion or simply uses the let-it-ride approach, the geometric mean after n spins of the wheel is the positive real number which when multiplied by itself n times produces the ratio of the player's final wealth to his starting wealth. So a geometric mean of zero would mean that the player lost everything; the bigger the geometric mean the better; should a geometric mean of 1.00 ever happen that would mean that in the final analysis there was no change in the player's wealth notwithstanding the ups and downs along the way. If we seek to determine the geometric mean that is characteristic of a particular wheel by experiments on it rather than by reading the numbers that can come up off of the face and and making certain simple theoretical assumptions, then n has to be a very large number in order for the experimentally-determined geometric mean to nearly equal the theoretical value. If furthermore a let-it-ride policy is assumed then the experimentally-determined geometric mean will converge to the value for each wheel that is shown in the book.

The quoted paragraph of the book is basically true. It seems that the author had in mind the usual practice of stock market investors which is in effect to let it ride and is simply saying that if that is to be the policy then the general theory behind the Kelly principle immediately leads to the understanding that wheel #2, with it's let-it-ride geometric mean of zero, should be utterly avoided.

However, when the Kelly criterion is actually employed so as to adopt an optimal bet size the second wheel performs for the Kelly investor about as well as the third, with the return ratios having a decidedly non-zero geometric mean thanks to his having bet only a fraction of his wealth each time. Certainly the second is not an utterly bad wheel notwithstanding zero being one of the outcomes and we could easily make it better than the third by tweaking the non-zero returns upward while its let-it-ride geometric mean would remain zero. It's all because the Kelly criterion compels the investor to not let it ride but to instead hold back some cash each year. In that way the Kelly criterion naturally avoids utter ruin even if sometimes the amount that is bet is entirely lost.

We Spin the Poundstone Wheels

How do we see all of that about the second wheel? When the first page of this article loaded all three graphs on the right above the Poundstone wheels (or at the bottom of that page if your screen is not of sizable width) were initiated using the possible payouts of wheel #2; otherwise you can simply click on the image of any wheel to initiate the graphs with the distribution of that particular wheel. The first graph shows a single possible history of trading using the distribution of the chosen wheel. The “Spin the Same Wheel Again n Times” button does what is says and you should press it numerous times and whenever you please as that will allow you to see how wildly the outcomes can vary from one history to another. The option of a ridiculously long trading period of 300 years is offered that we might get a glimpse of the long-term trend which is otherwise almost indiscernible within the 30-year view due to the volatility of the outcomes and the fact that the frequency of the trials is only once per year. Click-dragging within any of the graphs so as to zoom in is sometimes very helpful; just double-click to zoom back out.

The key thing to understand is the value, in real-world circumstances, of the basic Kelly idea of committing only a fixed fraction of the wealth each time, holding the rest as cash or as a cash-equivalent. However, wheel #1 is rigged as a non-real-world can't-ever-lose wheel and so the best policy for it would be to instead simply borrow all of the money that you could and let it all ride. For it I've simply accepted the fact that the Kelly criterion does not establish a preferred fraction of wealth to commit and I only plot the let-it-ride option, with 100% committed each time.

We'll instead focus on wheels #2 and #3 on which we see numbers less than 1 that present losses. The Kelly approach comes into play only when losses are possible. For such wheels the conjoined second and third charts inform us that we should consider adopting a “Betting Fraction” from the horizontal axes of those charts, “f” in the common notation, that's greater than zero but less than or equal to the f that has the highest “Annual Geometric Mean” as shown on the second chart. Why confine ourselves to that range of f values? Because outside of that range the geometric mean of the return is less while the risk as represented by the standard deviation is greater. The optimal betting fraction that maximizes the geometric mean is usually denoted in Kelly literature by f*.

If you're not following the f business, if f is 0.5 then we keep half of our money as cash and bet the rest. Per se, fixed-fractional betting, always using the same f, was no invention of Kelly; it's old hat. However the basic Kelly idea does employ fixed fractions and there are theorems that support the use of fixed fractions in conjunction with awareness of the Kelly criterion.

Back to our wheels #2 and #3, with other wheels, or stocks, fractions below or above the zero-to-one range of f might be feasible and would respectively represent selling the stock short or borrowing money to buy an excess of it.

Given the “Average Payouts” of the Poundstone wheels, all of which exceed 1.00, none of them would show a long-term profit with short selling. And for wheels #2 and #3 it turns out that boosting your bet with borrowed money would be either ill-advised or catastrophic, but the story could be different with some other wheel such as #1 or even with a wheel that would occasionally present a loss.

Let's look at the second chart in detail, with wheel #2 selected. We see a sort of inverted, lopsided horseshoe curve having a maximum at a betting fraction f of about f*=0.63, which yields a maximum geometric mean of 1.24— it helps to zoom in, even twice if you wish, in order to pick off the utter maximum. So to get the fastest rate of growth of our wealth we would bet 63% of our wealth on wheel #2 each time.

Had you previously understood that there are investments that pay off when investing only a fraction of the funds that you have available but are sure losers if you simply commit nearly 100%? Read on!

Wheel #2 is like that. If we go off to the right, settling on a higher betting fraction f > f*, not only does our geometric mean deteriorate— at about f=0.96 it goes below 1.00 which means that beyond that we would be losing— but the risk would also be increasing as is represented by the “Standard Deviation” on the conjoined chart (which is, more exactly, Euler's number raised to the standard deviation from the logarithm of the geometric mean). And we are absolutely barred from adopting a betting fraction of f=1.0, or higher which would mean investing with borrowed money, because if we ever get so much as fully invested with f=1.00 then that would mean that when the number zero on the wheel comes up it would cause us utter ruin.

Now if we go off to the left of the maximum geometric mean with f < f* then things are qualitatively different. True we also have to settle for a reduced geometric mean, but the risk decreases. So we can pick any risk-return combination that we like, any f that is between zero and up to and including the point of maximum geometric mean f*=0.63, and we should have nothing much to regret. But we might well prefer to get closer to  f* than to zero due to the fact that the mean does not roll off to the left from its maximum as rapidly as the deviation drops. Accepting a sub-optimal betting fraction is called “fractional Kelly”. It may be advisable to have a general policy of always betting only a fixed fraction of the optimal Kelly fraction f*, to systematically quell the risk.

Still on wheel #2, let's examine the top chart, which is based on a single history of a succession of wheel outcomes over either 30 years or 300 years, your choice. On that chart are plotted two wealth histories that share that single wheel-outcome history— one for let-it-ride trading and one for Kelly-optimal-betting-fraction trading. If you spin the wheel several times you'll see that, oddly and rather inappropriately, the red plot for let-it-ride often stops abruptly at some year short of 30. About one out of every six times there's only a dot at the beginning. The cause of that is the fact that with wheel #2 the wealth of the let-it-ride investor often goes to zero but the logarithm of zero is minus infinity which can't be plotted on that chart because the vertical scale is logarithmic. Hence the chart usually fails to show a complete let-it-ride history.

I much prefer to plot the wealth histories on a logarithmic scale to better show that they look somewhat like straight lines, which they should, at least over the 300-year span notwithstanding the volatility. Furthermore, changes of a given percentage are represented by the same vertical distance on a logarithmic chart, anywhere on the chart; not so on a linear scale. But we still need a fix.

So, to get the fix you simply double-click on the label “A History for Poundstone Wheel #2”. The chart will then change because instead of that wheel there is substituted a very similar wheel that only differs from #2 in that where #2 has a payoff of zero the modified wheel has a payoff of 0.01— meaning that if that number comes up you will lose only 99% of what you put up. It's a have-our-cake-and-eat-it-too thing: we get to keep the logarithmic scale but still get to see all of the dismal results of let-it-ride.

And so now the label will say “A History for Modified Poundstone Wheel #2” and the fun of it is to click the “Spin the Same Wheel Again n Times” button numerous times and particularly with the 300-years election. You'll see the dramatic riches-instead-of-rags difference that the Kelly principle can make. Note that it's not that the tiny change from 0 to 0.01 improved the performance using the optimum Kelly fraction; it didn't, not noticeably.

And before we leave wheel #2, we can ask what amount should accumulate from the geometric mean of 1.24 with f at the Kelly optimal value f* that we previously found. The answer should be 1.24300 if we're on the 300-year scale. That comes to roughly 1028 (type “=1.24^300”, sans the quotation marks, in Google's search engine). The other way of writing that would be 1.0e+28. And sure enough, if you hit the spin-again button a number of times on the 300-years scale there are substantial fluctuations but the final value averages roughly that.

We can now quickly go over wheel #3 as it produces qualitatively similar results when used with the optimal Kelly betting fraction, which for it is f*=0.75— only a bit bigger than the optimal fraction for wheel #2. But this time, since there is no chance of losing utterly everything that is put up on a single spin it would be at least possible to use borrowed money— all the way up to about f=1.5, at which point the geometric mean has declined to about 1.00, beyond which there would be losses. But as with wheel #2, fractional Kelly or full Kelly with 0 < f <= f* is the preferred range of bet sizes with nothing beyond f* ever being advisable. And especially note that the top chart confirms that use of the Kelly optimum generally beats let-it-ride and at less risk, with let-it-ride this time showing a profit. That you can easily see with repeated spins of wheel #3 on the 300-years scale.

And finally, if we consult our second chart to see what the geometric means are for f=1.00, the let-it-ride case, for each of the Poundstone wheels, then we see that they all agree with the values that are given in the book.

The book compares the Kelly emphasis on the geometric mean with the reliance of “mean-variance” analysis upon the arithmetic mean, with regard to assessing the relative attractiveness of the wheels. “Modern Portfolio Theory” (MPT) and specifically the “Capital Asset Pricing Model” (CAPM) are theories that are based upon mean-variance analysis. Inasmuch as they involve schemes that use diversification to maximize returns at given levels of risk they are intended to be applied to portfolios and not to single issues, and in a way that is very dependent upon correlations among the price performance histories of the individual issues. But no such correlations exist among the three wheels so that an uncompromised application of mean-variance analysis to them is not possible. Since MPT/CAPM practitioners manage portfolios none would ever plan to put all of the assets into a single security. Hence if there were a single security in one of their portfolios that had the possibility of becoming worthless that would not lead to the ruination of the portfolio. And if a security has a multi-period “average payout” substantially greater than 1, as with wheel #2, then it might actually be reasonable to include such a security in an MPT- or CAPM-managed portfolio in spite of it presenting the possibility of a total loss.

The reasonability would follow from the fact that were the security the likes of wheel #2 or not, then surely only a certain small fraction of the assets would be assigned to it— portfolios are generally policy-limited to a small fixed range of permissible position sizes to guard against the risk of any one issue going belly-up. So the circumstances of any one issue in such a portfolio differ little from what we have called fixed-fractional betting with the use of a very small fraction. The Annual Geometric Mean plot, the second chart on page 1, goes through zero at f = 0 and if you work it out the calculus shows that the slope there is the arithmetic average payout (the “mean” of mean-variance) minus 1— not influenced by the geometric mean at f = 1. Thus any such minimal successive exposures to the risks and rewards of securities that performed like wheel #2, whose average payout exceeds 1, would ultimately be profitable notwithstanding the zero geometric mean at f = 1.

The chief distinction then is that none of the mean-variance models make any allowance whatsoever for the mathematics of the subsequent and inevitable compounding. It's a dimension that they do not incorporate. Doesn't the theory of the Kelly criterion then however suffer in comparison with mean-variance analysis for its neglect of correlations within portfolios? Well, no, not really. For example if there are p non-risk-free issues in a portfolio we could assign “betting fractions” \(\scriptstyle\text{f}_1,\, \text{f}_2\ldots\,\text{f}_p\), one to each security, where \(\scriptstyle\text{f}_1+ \text{f}_2\ldots\,+\,\text{f}_p =\,\)f and with the fraction 1 - f being committed to a risk-free security or cash. And then we could vary the \(\scriptstyle\text{f}_k\) so as to find the values that maximize the logarithm of the final wealth, just as we do for single issues with just one f.

Note that if we were talking about investing in a single security then the let-it-ride mode that we have discussed would actually be the same as “buy and hold” with 100% invested. Does that sound more familiar? Let's now go on to understand how we calculate the dependence of our final wealth upon the betting fraction f.


A Bit of the Mathematics

If you didn't immediately comprehend the expectation-value-of-the-logarithm business on page 1 of this article... you could be normal. It can be hard to find in readily available Kelly literature anything much that is properly instructive as to how the logarithm actually comes about. Various authors insist on bring up utility functions and fail to make it clear that you're not entitled to a choice of utility functions, not if you want to maximize the rate of growth of your wealth; it's the logarithm, nothing else. Here I'll try to fully explain the mathematics behind the Kelly criterion because it is, at base, rather simple. And it helps that the wheel-of-fortune setup with which we started is really rather generally applicable, such as to stocks or stock options or even to funds that hold them. Where Mr. Poundstone talked about penny stocks that had six equally-likely outcomes he also pointed out that for realism we could simply add more outcomes and repeat, as he did, the more likely outcomes.


By “\(\equiv\)” in the equation immediately below is meant “is defined to be”; \(X_i\) is the wealth of the investor after i spins of the wheel; \(X_0\) is the starting wealth; \(X_n\) is the final wealth if there are n spins in all. The numerator of each fraction is canceled by the denominator of the next fraction, but no numerator can be zero else we must terminate the sequence right then and there with the investor utterly broke.

\begin{aligned} (\text{geometric sample mean})^n&\equiv\frac{X_n}{X_0}\\ &=\frac{X_1}{X_0}\cdot\frac{X_2}{X_1}\ldots\,\cdot\, \frac{X_n}{X_{n-1}} \end{aligned}
\begin{aligned} &(\text{geometric sample mean})^n\\ &\equiv\frac{X_n}{X_0}\\ &=\frac{X_1}{X_0}\cdot\frac{X_2}{X_1}\ldots\quad\cdot\,\frac{X_n}{X_{n-1}} \end{aligned}

To explain the Kelly criterion we won't have to immediately focus on the geometric mean; we're mainly concerned with the composition of the ratio \(\frac{X_n}{X_0}\). We'll get back to it a bit later as it's fairly often mentioned in Kelly literature, such as in the Poundstone book.

Letting it Ride

Given the sequence of payout numbers \(r_1, r_2\ldots , r_n\) that are the result of n sequential spins of the wheel and are therefore random choices of the numbers \(R_1, R_2\ldots , R_6\) that are printed on the wheel, then in the equation above with let-it-ride betting we must set \(\frac{X_i}{X_{i-1}}=r_i\,\). If any of the \(r_i\)'s turns up zero then the sequence ends and the investor is broke.

Although the discussion here continues to refer to the wheels-of-fortune examples, the simple mathematics of this page has much broader applicability. We could just as well take those various \(r_i\)'s to be, say, the annual return ratios of some huge fund that contains various kinds of securities— with the task at hand being to decide what fraction of our wealth we should commit to the fund.

Fixed-Fractional Betting

Given the same sequence of payout numbers \(r_1, r_2\ldots , r_n\) from the face of the spun wheel then with fixed-fractional betting we would not compute the same \(\frac{X_i}{X_{i-1}}\) ratios. Instead, if f is the betting fraction then \(\frac{X_i}{X_{i-1}}=1-\text{f} + \text{f}\cdot r_i\).

We see immediately that if f = 1 then the ratios for fractional betting reduce, as they should, to the ratios for let-it-ride betting. But if f < 1 then if any \(r_i\) is zero \(\frac{X_i}{X_{i-1}}\) will simply be 1 - f, which will be greater than zero. In that way the investor can be prevented from ever going entirely broke.

Of course we are not dealing here with any real-world annoyances such as transaction costs, taxes or dividends, much less policies affecting the use of margin that are in effect at brokerages. But we can see that if f is negative then, but for those complications, the expression for \(\frac{X_i}{X_{i-1}}\) would also represent short selling correctly: the first term 1 represents 100% put up to avoid “going on margin” and it returns the starting wealth for the \(i^{\text{th}}\) trial because if f were zero then \(X_i\) would be the starting wealth \(X_{i-1}\); the second term -f is positive and it's what you would get for short-selling the stock (per dollar of starting wealth).

Fixed-fractional betting is not Kelly betting per se. Of course everyone always knew, before Kelly came along, that you could bet only a fraction of your wealth if you wished and avoid sudden utter ruin that way.

The Kelly Optimum Betting Fraction

Here we are actually going to avoid integral and differential calculus and just use some rules involving exponentiation and natural logarithms. So if you have some mathematical inclinations you should be able to follow even if you don't know calculus— we'll just apply the rules.

If \(y\) is a positive number then \(\text{log}(y)\) increases as \(y\) increases but not as fast. In fact it has a downwardly concave appearance when plotted as the vertical coordinate with \(y\) the horizontal coordinate, and that concave aspect is crucial for the fulfillment of the Kelly criterion. The logarithm is defined only for positive \(y\) because the value of the logarithm plunges towards negative infinity as \(y\) approaches zero from above; the logarithm of one is zero; \(\text{log}(y)\) is that power of Euler's number \(e=2.718\ldots\,\) that yields \(y\). So \(y = e^{\text{log}(y)}\).

Now if we have two positive numbers \(y_1\) and \(y_2\) and multiply them together then the logarithm of the product must be the sum of the logarithms of each because after we multiply \(e\) by itself \(\text{log}(y_1)\) times to get \(y_1\) we must multiply the result by \(e\) multiplied by itself an additional \(\text{log}(y_2)\) times in order to form the product. Hence we find that \( \text{log}(y_1\cdot y_2)=\text{log}(y_1) + \text{log}(y_2) \). So the logarithm of a product is just the sum of the logarithms, and that's true for however many terms that form the product.

With that definition and the rule about products we go to work on our first equation above, the one for the all-important ratio of final wealth to starting wealth. We find the following:

\begin{align} \frac{X_n}{X_0}&= e^{ \text{log}\left(\frac{X_n}{X_0}\right) }\\ \text{log}\left(\frac{X_n}{X_0}\right) &= \text{log}\left(\frac{X_1}{X_0}\right)+\text{log}\left(\frac{X_2}{X_1}\right)\ldots +\text{log}\left(\frac{X_n}{X_{n-1}}\right) \end{align}
\begin{align} \frac{X_n}{X_0}&= e^{ \text{log}\left(\frac{X_n}{X_0}\right) }\\ \text{log}\left(\frac{X_n}{X_0}\right) &= \text{log}\left(\frac{X_1}{X_0}\right)+\\ &\quad\quad\text{log}\left(\frac{X_2}{X_1}\right)\ldots\\ &\quad\quad+\text{log}\left(\frac{X_n}{X_{n-1}}\right) \end{align}

We now focus on that expansion, on the sum of logarithms. Each term can take on only one of six values, each based on a random choice \(R_j\) of the six \(R\)'s from the face of the wheel:

$$\text{log}\left(\frac{X_i}{X_{i-1}}\right)= \text{log}\left(1-\text{f} + \text{f}\cdot R_j\right)$$

And now comes the easy but profound step... how many are there in the expansion representing each of the \(R_j\) values? We know. Oh we don't really know because any and all sequences are possible. But we have a very good idea concerning the likely number of appearances of each \(R_j\) value. That would be \(\frac{n}{6}\) of course, or the integer closest to that number. Yes, since each segment of the wheel is equally likely to be selected and since there are six segments we expect the following approximation to hold:

\begin{align} \text{log}\left(\frac{X_n}{X_0}\right) &\approx \text{log}\left(\frac{Xp_n}{X_0}\right)\\ &= \text{n}\cdot \left[ \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_1\right) + \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_2\right)\ldots\\ + \frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_6\right) \right] \end{align}
\begin{align} \text{log}\left(\frac{X_n}{X_0}\right) &\approx \text{log}\left(\frac{Xp_n}{X_0}\right)\\ &= n\cdot \biggl[\frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_1\right)\\ &+\frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_2\right)\ldots\\ &+\frac{1}{6}\cdot\text{log}\left(1-\text{f} + \text{f}\cdot R_6\right) \biggr] \end{align}

The notation \(\frac{Xp_n}{X_0}\) with the \(p\) added to the \(X\) has been used to indicate the use of the probability distribution that is defined by the numbers on the face of the wheel and the n has been factored out on the right-hand side. And we recognize the quantity in the square brackets []. It's the expectation value of the log terms, taken over the distribution of the face of the wheel, the “theoretically-expected” distribution.

And if we take the logarithm of both sides of the very first equation in the left-hand column at the top of this page then the rule about the logarithm of a product being the sum of the logarithms of each term yields the following:

\begin{aligned} \text{n}\cdot\text{log}(\text{geometric sample mean})=\text{log}\textstyle\left(\frac{X_n}{X_0}\right)\displaystyle\\ \text{n}\cdot\text{log}(\text{geometric mean})=\text{log}\textstyle\left(\frac{Xp_n}{X_0}\right)\displaystyle \end{aligned}

Then comparing that with the equation above it we see that the quantity in square brackets, the expectation value of the logarithm of the return ratio, is also our best estimate of the logarithm of the geometric mean of the distribution of \(\frac{X_n}{X_0}\). We're supposing here that as n goes to infinity the value of \(\text{log}\left(\frac{X_n}{X_0}\right)\) approaches n times the expectation value of those six log terms.

That \(\text{log}\left(\frac{X_n}{X_0}\right)\) is, for very large n, approximately n times the expectation value, the sum of the terms inside the square brackets, means that the square-bracked terms represent the rate of growth of \(\frac{X_n}{X_0}\) with respect to the number of trials n. So as we maximize it by a suitable choice of f we are maximizing the rate of growth of \(\frac{X_n}{X_0}\).

We have to stop right here to celebrate the fact that we're essentially done. We have the answer. We only need to compute and sum the terms inside the square brackets and find the f that yields the maximum value for that sum. That's the Kelly fraction f*. We could find it using a computer, just varying f over a wide range and finding the f that produces the maximum value for the square-brackets sum.

The Central Limit Theorem

We now need to discuss a most important theorem. We are only using the simple, classical version of it. It doesn't matter how the log terms of the theoretically-expected distribution are distributed as numbers. They could be skewed to one side or the other of their average. The theorem says in part, along with the law of large numbers, that the expectation value that we have computed— the sum inside the square brackets which is the logarithm of the geometric mean, when multiplied by n as above, is the best estimator of the mode (most probable), the median (mid-percentile) and the mean (average) of the distribution of all of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values that might actually happen. That is to say that the mode, median and mean are the same number. And the bigger the n, the better the estimate. In our particular context, this is true whatever the value of f that we choose to use— whether it be the f* value that maximizes the mode/median/mean or not.

We should be clear here that while our results and the theorem only pertain to large-n circumstances, and so there is a subtext concerning the greater precision that happens as n is further increased, “the distribution that consists of all of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values that might actually happen” does not refer to a distribution derived from a series of increasing n values. No. Think of n as being utterly fixed. We spin the wheel n times and record the resultant \(\text{log}\left(\frac{X_n}{X_0}\right)\). Then we spin it again n more times and record another outcome, and again n more times, and again... repeating the n spins many times. It's that distribution of outcomes that we want to know about. We want to know such things as the most likely value for \(\text{log}\left(\frac{X_n}{X_0}\right)\)— that, we've already figured out how to calculate— or perhaps instead we'd like to know the most likely value for \(\frac{X_n}{X_0}\).

The equivalence of the mean, mode and median of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values is guaranteed because the theorem also states that in the large-n limit the distribution of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values becomes the famous “bell curve”— when the probability density, the likelihood of particular outcomes, is plotted against \(\text{log}\left(\frac{X_n}{X_0}\right)\) the shape is like that of a bell that is utterly symmetric and centered on the maximum value of the \(\text{log}\left(\frac{Xp_n}{X_0}\right)\) expression that we have just computed— which is, because of that symmetry, at once the mean, the mode and the median.

Now we might indeed prefer, instead of the mode/median/mean of the distribution of the \(\text{log}\left(\frac{X_n}{X_0}\right)\) values to get the mode of the \(\frac{X_n}{X_0}\) values. Of course: we want our most likely final dollar amount, not some statistic on the logarithms of the possible dollar amounts. To get the mode of the \(\frac{X_n}{X_0}\) values you have to subtract the variance from the mode/median/mean of the distribution of \(\text{log}\left(\frac{X_n}{X_0}\right)\) values. So the most likely final \(\frac{X_n}{X_0}\) value is less than the \(\frac{X_n}{X_0}\) of the most likely \(\text{log}\left(\frac{X_n}{X_0}\right)\) value.

Kelly on Kelly

It's simple to say what Kelly did but you may prefer to read his article. Unless you're already profoundly committed to horse racing you'll find the section “The Gambler With a Private Wire” to be of first importance because it does not involve the more complicated case of parimutuel betting. (Where he wrote “Gmax = 1 + ...” he meant “Gmax = log(2) + ...”.)

Other Reading

There is a blog post on the subject of the degree to which the most-likely outcome, the mode of the \(\frac{X_n}{X_0}\) values, can fall short of the most likely \(\text{log}\left(\frac{X_n}{X_0}\right)\) value which the Kelly criterion maximizes. The simple formula for computing the mode of the \(\frac{X_n}{X_0}\) values is given here and also here. This matter is really unrelated to the Kelly criterion. That is, the facts concerning the lognormal distribution are applicable whatever the choice of the betting fraction f.

— Mike O'Connor

Comments or Questions: write to Mike. Your comment will not be made public unless you give permission. Corrections are appreciated.

Update Frequency: Infrequent, as this article is about the principle of the Kelly criterion and not about the current state of the market.