Recently MachineGhost (who joined in 2009 but appears to have come out of hibernation!) pointed out a 2002 article by Wilson & Jones that was published in the Journal of Business that claims there are several flaws in the commonly used "S&P 500" data (as used by Ibbotson, Shiller, Siegel, Damodaran, and others) and provides a fix.
The full paper is here: http://www.jstor.org/stable/10.1086/339903
siamond is inquiring whether it is permissible to include their data in the Bogleheads/simba backtesting spreadsheet. The question remains, does their methodology make sense?
Prior work on bogleheads.
Long time readers may remember that in 2014, nisiprius started a thread where he raised a number of questions about the validity of the returns data before 1939: viewtopic.php?t=144112
More recent threads have also wondered about the validity of data before 1957: viewtopic.php?t=199104
This paper addresses some (but not all) of the concerns raised in those posts.
The differences in practice.
I'll start by illustrating that we're not exactly talking about trivial, small differences.
This shows the year by year difference between the new data (which I'll call W&J after the authors, Wilson & Jones) and Shiller/simba's data. You can see that it can differ by up to 10% in a single year. For instance, in 1928, Shiller says the "S&P 500" gained 46.41% but W&J say it only gained 39%. In 1934, Shiller says it lost -8.03% but W&J say it gained 3.02%.
A tell tale chart, which shows the cumulative returns of the two, expressed as a ratio, makes visible how this difference would show up to an investor starting in 1871.
The differences show up in other measures, such as P/E. The historical mean P/E of S&P 500 is 15.63. But W&J say it should be 14.28. W&J provides a variety of metrics showing how their index differs from others in Table 4. Overall they find their index has lower returns and higher volatility:
Total Percentage Change 1871-1999
S&P Historical: 32,077.47
S&P Historical: 18.5176
Of particular note, they found that the Great Depression was worse than previously reported (with a P/E of 136, due to earnings being destroyed), while the late 1960s/early 1970s was better. (In fact, their data results in a slight increase to the "Safe Withdrawal Rate" of about .1%
What did they do?
They identify a number of claimed flaws in the original work.
Cowles originally his first edition in 1938. Errors were discovered and revised editions were published in 1939 and 1940. W&J claim that other indices only use the uncorrected first edition. They use the last version Cowles published.
Up until 1957, the S&P published a daily index based on the prices of around 90 stocks. This "S&P 90" was expanded in 1957 and became the S&P 500. Thus, from 1919 to 1956 the index only included 90 stocks. However, S&P also published a weekly index of around 400 stocks. (Calculating a daily index of 400 stocks was apparently too challenging before calculators & computers.) This "S&P 400" was discontinued in 1957 and eventually lost to the annals of time. W&J base their index on this weekly data of a broader set of companies, arguing that this is a clearly better data set.
Cowles' data is "averaged". He took the average of prices across a month instead of reporting the price at the start or end of the month. This reduces volatility.
Cowles used monthly arithmetic means of the S&P weekly indexes from 1918 through 1940 and the midpoint of the monthly high and low prices of stocks in constructing monthly indexes back to January 1871.
The authors provide a way to "de-average" Cowles' data by looking at DJIA data to calculate an adjustment. They use Cowles' calculation on the DJIA historical data to see the difference between the average and un-averaged prices. They then assume that is a good enough adjustment to unaverage the Cowles data. This produces a data series with (approximately) the same average but higher volatility.
They estimate monthly dividends (not just annual dividends), which allows dividends to be reinvested throughout the year and not just at year's end.
Why not just use this?
Well, their arguments sound reasonable enough. But of course they would, otherwise they wouldn't have published their paper The bigger question is: why does Googling find virtually no references or follow up to this paper? Why do so many smart, respected authors no follow this paper's suggestions?
Does the paper make sense or it is all a bunch of mumbo jumbo, best ignored?