Simba's backtesting spreadsheet [a Bogleheads community project]

Discuss all general (i.e. non-personal) investing questions and issues, investing news, and theory.
User avatar
LadyGeek
Site Admin
Posts: 95696
Joined: Sat Dec 20, 2008 4:34 pm
Location: Philadelphia
Contact:

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by LadyGeek »

The table is also in our Canadian sister site's wiki: Periodic table of annual returns - finiki, the Canadian financial wiki
Wiki To some, the glass is half full. To others, the glass is half empty. To an engineer, it's twice the size it needs to be.
robertdc
Posts: 2
Joined: Tue Feb 14, 2023 11:34 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by robertdc »

siamond wrote: Thu Feb 16, 2023 8:01 am
robertdc wrote: Tue Feb 14, 2023 11:46 pm Canadian Investor, been looking for months to find something almost exactly like this. As someone much less literate in excel though, I'm finding it terribly difficult to get it to do what I want it to, namely show all returns in CAD, and include Canadian asset classes for home bias. I can see the data in the raw data sheet, but cant use them in the sheet's built in calculators? Is there a setting for this? Or are they included for reference only? I can't for the life of me find any information or instruction on how to use Canadian Data in the document anywhere, in the document nor elsewhere.
Hi there. Yes, the intent of Canadian series is mostly for reference. With a bit of care, one can set up the spreadsheet to use all the tools with those series though. This would be limitative as there are not that many CAD series and the spreadsheet makes an implicit assumption that portfolios are made of a single/consistent currency, but you could certainly run backtests for 3-funds/4-funds portfolios constructs.

Let me assemble a slightly custom version of the spreadsheet with such CAD-centric set-up and I'll share (probably later today). If you find it useful, then maybe I'll take a stab at documenting the process.
Wow, that would be amazing, thank you so much. I wasn't expecting anyone to make any adjustments, was just more or less asking if it was capable of doing what I'm asking, and if so, how to do it. But hey, I'll definitely take it, and I'll get back to you on how it's working out for me. thanks again, and Please don't feel rushed by any means though.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

robertdc wrote: Thu Feb 16, 2023 8:29 pm
siamond wrote: Thu Feb 16, 2023 8:01 am Let me assemble a slightly custom version of the spreadsheet with such CAD-centric set-up and I'll share (probably later today). If you find it useful, then maybe I'll take a stab at documenting the process.
Wow, that would be amazing, thank you so much. I wasn't expecting anyone to make any adjustments, was just more or less asking if it was capable of doing what I'm asking, and if so, how to do it. But hey, I'll definitely take it, and I'll get back to you on how it's working out for me. thanks again, and Please don't feel rushed by any means though.
Ok, try this one, click here to download.

What I did is fairly simple (once you know!). In the Data_Series tab, I removed all the references to US-centric series in the first row (just clearing cells content). Then I entered the name of the Canadian series we have in Raw_Data. That's it! A couple of things to remember though:
1. don't forget to update the inflation series at the very end
2. keep the order of the first four series unchanged (e.g. domestic, international, bonds, bills)
3. to make things prettier, it's also desirable to redefine the portfolios set up by default in Analyze_Portfolio, Compare_Portfolios, Lazy_Portfolios

Let me know how this works for you...
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: https://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
Last edited by siamond on Mon Jun 05, 2023 8:04 pm, edited 1 time in total.
TheContrarian
Posts: 165
Joined: Wed May 03, 2023 6:42 pm
Location: Greatest snow on earth

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by TheContrarian »

siamond wrote: Fri Feb 24, 2023 2:56 pm Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: http://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
This download link doesn't seem to work.
User avatar
Eagle33
Posts: 2392
Joined: Wed Aug 30, 2017 3:20 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by Eagle33 »

isaachemingway wrote: Mon Jun 05, 2023 10:59 am
siamond wrote: Fri Feb 24, 2023 2:56 pm Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: http://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
This download link doesn't seem to work.
The link downloaded the file for me.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

isaachemingway wrote: Mon Jun 05, 2023 10:59 am
siamond wrote: Fri Feb 24, 2023 2:56 pm Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: http://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
This download link doesn't seem to work.
Hm. This is weird. More than 600 folks clicked on the link and I didn't get any complaint so far. And yet, I just had the same issue as you... :shock:

I changed the shortened link to use https and this seems to work better... Please give it a try and let me know?
https://bit.ly/3xOGxsi
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

Tyler9000 wrote: Fri May 27, 2022 3:16 pm
McQ wrote: Thu May 26, 2022 10:15 pm I'll share results once achieved (not next week or next month :-)
I just wanted to second Siamond's interest in your work on this. I look forward to seeing what you're able to share.
It took longer than anticipated, but here is the paper: https://papers.ssrn.com/sol3/papers.cfm ... id=4457203

Table A.1 has the annual returns free for anyone to download and reuse as they see fit, with attribution.

I started a thread on the paper, so as not to burden the SIMBA thread: viewtopic.php?t=405951.

There's a fair amount of history on mutual funds, expense ratios long ago, problems with the intermediate Treasury index in the SBBI, and more.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

From another thread, a discussion about the relevance of fund costs (expenses)...
McQ wrote: Wed Jun 07, 2023 6:16 pm
siamond wrote: Wed Jun 07, 2023 12:12 am PS. Amusingly enough, skimming through this paper rekindled a line of thinking which was getting stronger and stronger in my mind during my last forays of historical data analysis. Something possibly shockingly opposite to what most people might get out of McQ's research.

Instead of using real-life index funds returns for known years, why not use index numbers ALL ALONG? And then apply a constant and more modern expense ratio for the entire historical record. It seems to me that this would be much more consistent when it comes to analyzing the past to try to determine a strategy for the future (as opposed to establishing a more factual past for the sake of history). And well, it seems to me that only scholars do the latter while most of us personal investors do the former...

Fact is past fund costs have varied a lot (and this new research from McQ makes the point even more glaring), to the point of really obfuscating what a large index fund could have achieved, had it existed by then. Maybe removing such cost factors 'noise' would actually help to isolate the real 'signal'...

I am not 100% convincing myself, it is certainly uncomfortable to ignore real-life data in favor of more theoretical constructs, but fact is modern index funds technology got pretty darn close to 'theoretical' index numbers...
Thanks for taking a look, Siamond. Your stewardship of the Simba spreadsheet makes you part of the core audience when I write papers of this kind. Speaking of which, I have been getting great value from the Simba spreadsheet of late, not least because it is timely updated each year. Wanted you to know how valuable I have found it to be.

Now to your point about the relevance of costs and / or the best way to apply costs to long ago index returns.

No question but that academics prefer to ignore costs. It simplifies things. The justification will be that costs are hard to estimate, may vary across investors, and may not be stable over time, thus obscuring the phenomenon of interest, which is typically the return on a risk asset versus the return on a risk-free or riskless asset, or the return on risk assets under one macroeconomic circumstance rather than another, or the return on one risk asset versus another (small, large). In this context, costs are like a variable that occurs on both sides of the equals sign, hence can be removed without loss of information.

Ignoring costs leads to certain absurdities, of course. Consider a small stock trading on the NYSE near the bottom in 1932; quite a few of these were quoted at 1/8 bid, 1/4 ask, while not trading very often. No problem if you are an academic: you can always buy and sell arbitrary amounts at the midpoint of the bid-ask spread, even if the stock hasn’t traded for weeks. To paraphrase an old Mel Brooks movie, it’s good to be the professor.

In this example, an academic might record a purchase at 3/16, say 100 shares, for a cost of $18.75; next month, if the market bounced and the quote moved to 1/4 bid, 3/8 asked, you’d value the position at $31.25 (5/16), showing a 67% gain on that part of your, ahem, small value portfolio.

The actual investor would have had to pay the ask quote, opening the position at $25 + $7.50 commission, or $32.50; then had to sell at the bid, realizing $25 – $7.50, or $17.50, booking a loss of 46%. Costs matter, as Mr. Bogle used to say.

The other absurdity comes about when trading costs are different across assets. For the paper I estimated intermediate Treasury returns assuming purchase at the ask and sale at the bid; that reduced the returns in Bengen/SBBI, but not by much compared to that 158 bp reduction for purchasing a stock mutual fund rather than assuming stock index returns.

Returning to your dilemma as steward of the SIMBA spreadsheets, currently I believe you apply the most recent expense ratio on an index mutual fund to the index, for each year prior to the availability of the fund. Admiral fund expenses have stabilized at 4-5 bp, so results don’t change much on the yearly update.

If not an onerous burden, I would do it differently:
1. Actual with-expense fund return each year the fund exists (as now)
2. Before the fund exists (1977 for the V S&P 500 fund), a drop down list allows the user to see returns with 5, 10, 20, or 50 bp deducted.* The default in the downloaded copy would be 5 bp, close to current values. The user can then apply a larger expense to older data, copy for pasting into their own spreadsheet, and restore the default.

Worth your time? Probably not, but worth my time to mention😊

*50 bp was a a typical management fee for decades. The 158 bp shortfall reflects in addition trading costs, failed market timing efforts, failed stock selection, drag of holding cash, and everything else that differentiates a fund in the world from an index in the lab.
I fully agree the default settings about applying fund costs (expenses) to past index/model returns in Simba aren't too good. This is a methodology which has been in place since this spreadsheet was created (way before my time). I never liked it, but I never changed it, all I did was adding some flexibility with more advanced settings I'm pretty sure nobody uses...

I would also assert that using past expense ratios for long-lived (real-life) funds isn't too good either. For similar reasons, such old cost structures are things of the past, not of the present and very unlikely to re-occur in the future (low-cost index funds are there to say, I believe).

I do agree we need to find a better way, and preferably something applicable to the default settings. It isn't straightforward though and likely to be controversial. I'll share more thoughts later on. PS. inputs and suggestions welcome.
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

siamond wrote: Thu Jun 08, 2023 3:00 pm ...
I fully agree the default settings about applying fund costs (expenses) to past index/model returns in Simba aren't too good. This is a methodology which has been in place since this spreadsheet was created (way before my time). I never liked it, but I never changed it, all I did was adding some flexibility with more advanced settings I'm pretty sure nobody uses...

I would also assert that using past expense ratios for long-lived (real-life) funds isn't too good either. For similar reasons, such old cost structures are things of the past, not of the present and very unlikely to re-occur in the future (low-cost index funds are there to say, I believe).

I do agree we need to find a better way, and preferably something applicable to the default settings. It isn't straightforward though and likely to be controversial. I'll share more thoughts later on. PS. inputs and suggestions welcome.
Here is another possibility (and I do emphasize these are all suggestions for you to judge).

1. Currently once an index fund becomes available (VFIAX, say) it’s returns are entered in the highlighted column for its section as ‘the’ returns for this category (e.g., from 1977).

2. An alternative would be to place the cost-free index itself in the highlighted column as ‘the’ return for this category, same as for the years before the fund existed (but without the back application of the current expense ratio).

3. And then express VFIAX returns in its column as the *difference* from the index. So, for 2022, enter not ‘-18.15’, but just ‘-.04’, with the index value of -18.11 in the main column.

4. Sometimes the difference will be positive, as in the chart I posted upthread. Whatever. The possibility of positive tracking error is useful to know.

This is one solution if you are serious about scrubbing out the misleading effect of changing costs. It has the additional advantage of showing at a glance how fund costs have changed over time, without obscuring how the index behaved over time.

Can only do it for index funds, of course. And when you get to some cases (Total Market) the dispersion across indexes will give you fits (subject of a future thread).
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
longinvest
Posts: 5682
Joined: Sat Aug 11, 2012 8:44 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by longinvest »

In Rev22c, cell $Raw_Data.X159 contains the value -19.525165 as VTSAX 2022 return. Vanguard's VTSAX investor web page reports a return of -19.53. Where does the spreadsheet value, -19.525165, come from?
Variable Percentage Withdrawal (bogleheads.org/wiki/VPW) | One-Fund Portfolio (bogleheads.org/forum/viewtopic.php?t=287967)
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

longinvest wrote: Thu Jun 08, 2023 7:09 pm In Rev22c, cell $Raw_Data.X159 contains the value -19.525165 as VTSAX 2022 return. Vanguard's VTSAX investor web page reports a return of -19.53. Where does the spreadsheet value, -19.525165, come from?
Well, obviously, -19.53 is the rounded version of the same number. But to answer your question, the quasi-majority of real-life funds and securities data comes from Morningstar, which provides a total return index. This approach strongly simplifies my life for annual returns, allowing me to automate most of the data retrieval.

Then the annual return is index (N+1) / index (N) minus 1, hence a bunch of digits (which I arbitrarily limited to 6). Then I never needlessly eliminate digits for calculations, only displays of the various returns, metrics and charts are rounded. Doesn't make much of a difference in practice, of course.

EDIT: you got me curious, I checked the total return index numbers, and they are provided with 5 digits. I thought I had used the same number of digits for the result of "index (N+1) / index (N) minus 1", but... it appears that I often used 6. Yeah, I probably should clean this up and settle on 5 digits. I mean, this has zero practical consequence, but it is a tad inconsistent. I took a note of this.
Last edited by siamond on Thu Jun 08, 2023 10:13 pm, edited 1 time in total.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

McQ wrote: Thu Jun 08, 2023 5:39 pmHere is another possibility (and I do emphasize these are all suggestions for you to judge).

1. Currently once an index fund becomes available (VFIAX, say) it’s returns are entered in the highlighted column for its section as ‘the’ returns for this category (e.g., from 1977).

2. An alternative would be to place the cost-free index itself in the highlighted column as ‘the’ return for this category, same as for the years before the fund existed (but without the back application of the current expense ratio).

3. And then express VFIAX returns in its column as the *difference* from the index. So, for 2022, enter not ‘-18.15’, but just ‘-.04’, with the index value of -18.11 in the main column.

4. Sometimes the difference will be positive, as in the chart I posted upthread. Whatever. The possibility of positive tracking error is useful to know.

This is one solution if you are serious about scrubbing out the misleading effect of changing costs. It has the additional advantage of showing at a glance how fund costs have changed over time, without obscuring how the index behaved over time.

Can only do it for index funds, of course. And when you get to some cases (Total Market) the dispersion across indexes will give you fits (subject of a future thread).
My inclination would be to have a default approach using cost-free index returns till now (and whatever index and/or synthetic model numbers for older returns in the same series), combined with a fixed ER equal to the real-life fund latest ER (expense ratio). Which is essentially how the existing spreadsheet works if we were to (shockingly to some people?) ignore all real-life fund returns. We can actually already do it, but through a few advanced settings nobody pays attention to... And then the actual use of real-life funds returns (i.e. the existing default approach) would become an option disabled by default, and something which could be restored by a single toggle. Such (default) approach would eliminate a lot of the historical vagaries of expenses & costs.

And then yes, I like your idea of showing tracking errors in an explicit manner. Although we might then field questions about odd tracking errors which could be due to simple reasons like the index fund changing its index of reference mid-year (why would a fund manager do that is beyond me, but fact is this happened quite often), but that's ok, data is data and if this can make people think about such data, then... that's good.

The trouble with this index-centric approach (as I think you hinted at) is that we also have various series in there which provide historical data about active funds and/or funds without an index of reference. I thought for a while that the answer would be to use 'GR' values (i.e. the fund cost-free return, before the ER is applied). Ideally, we would aim at either finding such historical GR values or historical ER values, but such history is unfortunately limited (plus M* recently made it harder to retrieve). And... I got a little stuck when I went down this path.

Another approach I considered in the past would be to think more in historical terms and make a deliberate effort to find the ER that was used the first full year of existence of a given fund (passive or active). And then use this number when combining with past index/model numbers. This certainly would be more consistent than the current approach, although it wouldn't solve the 'noise' of varying historical expenses. It might also introduce unfortunate side-effects (operating expenses of new/small funds might be high by nature, albeit transient). Trouble is such old ER data appears to be hard to find. I may have given up on this approach a tad too easily though, a one-time brute-force manual research might be effective... And maybe we could come up with an arbitrary default value for tough cases, as you suggested in your first response, although this seems rather arbitrary and not reflective of reality (e.g. Int'l funds are harder / more costly to manage, etc). If (and that's a big if) we could make this work, we could then package the index-only approach in a more user-friendly manner (but not make it the default), while displaying some big red warning that it just won't work with active funds.

Maybe the first step would be to agree on a high-level goal. What is more important? True historical data (very factual, but kind of contaminated by the noise of historical costs)? Or cost-free data (less contaminated by noise, maybe idealistically so), more relevant to strategy definition? I would like to have both in truth, but agreeing on what should be the default approach would help guiding priorities. Practicalities would likely remain in the way though...
FactualFran
Posts: 2777
Joined: Sat Feb 21, 2015 1:29 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by FactualFran »

siamond wrote: Thu Jun 08, 2023 8:48 pm
longinvest wrote: Thu Jun 08, 2023 7:09 pm In Rev22c, cell $Raw_Data.X159 contains the value -19.525165 as VTSAX 2022 return. Vanguard's VTSAX investor web page reports a return of -19.53. Where does the spreadsheet value, -19.525165, come from?
Well, obviously, -19.53 is the rounded version of the same number. But to answer your question, the quasi-majority of real-life funds and securities data comes from Morningstar, which provides a total return index. This approach strongly simplifies my life for annual returns, allowing me to automate most of the data retrieval.

Then the annual return is index (N+1) / index (N) minus 1, hence a bunch of digits (which I arbitrarily limited to 6). Then I never needlessly eliminate digits for calculations, only displays of the various returns, metrics and charts are rounded. Doesn't make much of a difference in practice, of course.
The total return index from Morningstar likely starts with the initial NAV and adjusts for each distribution. The NAV was 117.56 at the end of 2021 and 93.1 at the and of 2022. The fund had the following distributions and reinvestment NAVs during 2022.

Code: Select all

Dist     NAV
0.3422  110.44
0.3621   91.10
0.3847   91.29
0.4507   94.00
The total return for the year is given by the formula

=93.1/117.56*(0.3422/110.44+1)*(0.3621/91.1+1)*(0.3847/91.29+1)*(0.4507/94+1)-1

which is -19.525164%, using the number of digits used in the other posts. Because the values used in the formula have four or more significant digits, the result is meaningful up to four significant digits, a rounded value of -19.53%.
TheContrarian
Posts: 165
Joined: Wed May 03, 2023 6:42 pm
Location: Greatest snow on earth

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by TheContrarian »

siamond wrote: Mon Jun 05, 2023 8:07 pm
isaachemingway wrote: Mon Jun 05, 2023 10:59 am
siamond wrote: Fri Feb 24, 2023 2:56 pm Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: http://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
This download link doesn't seem to work.
Hm. This is weird. More than 600 folks clicked on the link and I didn't get any complaint so far. And yet, I just had the same issue as you... :shock:

I changed the shortened link to use https and this seems to work better... Please give it a try and let me know?
https://bit.ly/3xOGxsi
That worked. Thank you so much!

- @isaachemingway
TheContrarian
Posts: 165
Joined: Wed May 03, 2023 6:42 pm
Location: Greatest snow on earth

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by TheContrarian »

siamond wrote: Mon Jun 05, 2023 8:07 pm
isaachemingway wrote: Mon Jun 05, 2023 10:59 am
siamond wrote: Fri Feb 24, 2023 2:56 pm Another Simba update with 2022 returns and a few minor improvements.

You can download the spreadsheet here: http://bit.ly/3xOGxsi

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev22c:
1a. Fixed bug in Analyze_Series: Sharpe ratio and Int'l correlation formulas were referencing incorrect data series.
1b. Implemented the 1994 version of the Sharpe ratio math in Analyze_Series, to replace the older type of math
2. Added instructions in the README tab to customize the spreadsheet for Canadian data series
This download link doesn't seem to work.
Hm. This is weird. More than 600 folks clicked on the link and I didn't get any complaint so far. And yet, I just had the same issue as you... :shock:

I changed the shortened link to use https and this seems to work better... Please give it a try and let me know?
https://bit.ly/3xOGxsi
Also, in the portfolio results section of the spreadsheet, do the real and nominal CAGR results factor in rebalancing? If so, how often does the rebalancing occur?

Thank you so much!
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

isaachemingway wrote: Sun Jun 11, 2023 2:15 pmAlso, in the portfolio results section of the spreadsheet, do the real and nominal CAGR results factor in rebalancing? If so, how often does the rebalancing occur?
Yes and yes. Annual rebalancing.
TheContrarian
Posts: 165
Joined: Wed May 03, 2023 6:42 pm
Location: Greatest snow on earth

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by TheContrarian »

siamond wrote: Sun Jun 11, 2023 4:33 pm
isaachemingway wrote: Sun Jun 11, 2023 2:15 pmAlso, in the portfolio results section of the spreadsheet, do the real and nominal CAGR results factor in rebalancing? If so, how often does the rebalancing occur?
Yes and yes. Annual rebalancing.
Thanks! - @isaachemingway
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

Calculation of Shiller S&P return

Siamond, this is likely the nth iteration for you regarding Shiller, with martincmartin most recently, but there have probably been more. Apologies if the answer to my query has already been posted. And double apologies for dragging you back into what has probably become an onerous burden (Simba maintenance and updates).

I understand that Shiller calculates the S&P return differently than, say, Standard & Poor’s itself (averaged prices rather than month end prices, yada yada). That’s fine, no problem.

My concern is how to derive the Simba spreadsheet value for the Shiller S&P return from the Shiller spreadsheet posted on his web site (and if Shiller’s web site is not the source for Simba, please let me know).

Here is a snippet of the Simba values for Shiller and selected other large cap indices, from the LCB section (columns AG and following in Simba). Only the most recent six years are shown.

Image

The official S&P numbers are highlighted. As we would all hope, VFIAX the mutual fund tracks these S&P values to within a few basis points each year.

Next, compare the Russell 1000 results to the highlighted S&P column. This comparison shows what might be called permissible or expected deviation from using two different good faith estimates of how large cap stocks performed: the one an edited selection of 500 stocks with a quality tilt (S&P), the other a strict capitalization cut, covering twice as many stocks, with most of the additions midcaps (Russell).

The Russell-S&P difference jumps around through a wide range. The difference was only 6 basis points in 2019, but over 200 bp difference in each of the next two years.

All the entries discussed thus far are copy and paste, in my understanding. No calculation by the Simba steward.

Now look at the Shiller column compared to the highlighted S&P column. Shiller entries are not copy and paste, in my understanding, but calculated somehow during the annual Simba update (more below). The gap between Shiller values and the highlighted S&P column is sometimes huge—about 600 bp in each of the last two years. That seems more than could be accounted for by averaging prices over the month.

Is there an error in Simba? Or is Shiller’s own approach to S&P calculations weirder than I understand?

Next, here is the current Shiller spreadsheet downloaded a few days ago from: http://www.econ.yale.edu/~shiller/data.htm
Highlighted values are calculations by me.

Image

I could not reproduce the Simba value of a 11.16% decline in 2022. I tried two approaches:

1. Take the December 2022 value of the composite index. Add the trailing 12 month dividends (next column over, December 2022 row). Divide by the prior year’s December index value. This gives a 2022 return of -14.88%. That’s internally consistent, about 150 bp greater than price change alone, corresponding to the S&P yield in early 2022.

2. Alternatively, compound the dividends monthly. Take January’s index value, add 1/12 of January’s trailing 12 month dividend, and divide that by the prior month’s index value. This gives a decline of about 2.05%. Compound those monthly returns, and this produces a decline of 14.99% for the calendar year.

Still a long way from the Simba value of minus 11.16%

Simba must use different inputs or a different calculation procedure. Can you advise?
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

McQ wrote: Tue Jun 13, 2023 4:13 pm Now look at the Shiller column compared to the highlighted S&P column. Shiller entries are not copy and paste, in my understanding, but calculated somehow during the annual Simba update (more below). The gap between Shiller values and the highlighted S&P column is sometimes huge—about 600 bp in each of the last two years. That seems more than could be accounted for by averaging prices over the month.

Is there an error in Simba? Or is Shiller’s own approach to S&P calculations weirder than I understand?
Shiller's approach tries to enforce some form of consistency between early years and recent years. This doesn't make a lot of sense to me, but this is his way... My memory might be a little fuzzy (it's hard to find a proper description of Prof. Shiller's methodology), but here is my recollection of the key points:
* Yes, the price averaging thing is one big factor (and I remember being quite surprised by the impact this single factor can have)
* The dividends provided by Prof. Shiller appear to be a TTM (trailing twelve months) value, recalculated on a monthly basis

In the S&P series, price is January 1st, while dividends are reinvested as soon as they are distributed (e.g. daily when historical data started to be recorded more precisely).

Finally, Prof. Shiller appears to do the math himself, possibly with the help of a student or another. At some point I spotted a few minor errors in his data entry against some S&P values (I was trying to reconstruct elements of his methodology), I contacted him, he very kindly answered, he acknowledged the mistakes and discreetly fixed the numbers! Well, most of them, he left a couple of rounding errors and I didn't push it. So... I don't think there is any peer review in the process and well, there might be a few other 'oddities' I didn't spot by then...

To be honest, I find it quite impressive that when we compute the CAGR of Prof. Shiller series and the S&P (1936) series over 90-ish years, differences turn out to be very small. Yes, if you compare one single data point to another, it appears troubling, but it's mostly small quantities being shifted around between year N and year N+1...
McQ wrote: Tue Jun 13, 2023 4:13 pm [...] Simba must use different inputs or a different calculation procedure. Can you advise?
To try to approximate dividends reinvestment the best I could, I ended up building a monthly TR index (idea inspired by an article from Philosophical Economics, and something consistent with the way I extract more regular data series). The TR index is derived from the good professor's monthly series, increasing the index according to the variation of price (Jan to Jan) and 1/12th of the latest known dividends. And then I simply compute the annual return by comparing the index from Jan of the current year and Jan of the previous year. I can send you a spreadsheet extract if you want to dig a bit further, just PM me.
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

[All: Siamond graciously provided his calculations on the Shiller data offline]

Ah, now I see what happened: For the Shiller data only, "2022 return" is the return from January 2022 through January of 2023, and so also for other years: January to January. Which recalculation gives me the Simba value, no error.

My calculations above assumed a December to December return, understood as year-end to year-end return, as with the other stock indexes, and in keeping with ordinary mutual fund reporting, with performance measured over a calendar year.

Philosophically, a case can be made that the way Siamond does it in Simba is correct, for the following reasons:

1. Cowles and Macaulay, Shiller's sources, don't have a December 1870 anchor point. Their very first index value is January 1871. For Siamond/Simba to maintain consistency all the way to the present day, returns must be calculated January to January; else, an 11 month year has to be inserted somewhere.

2. Since Shiller prices are an average of the month's prices, not month-end prices, a case can be made that January-to-January averages are no worse an approximation to the calendar year returns than December to December prices. Both are "off" by half a month, in a manner of speaking.

The gap will be greatest when there is a big movement in either January--as in 2023, with a big bounce, thus the 600 bp difference between Shiller and the other S&P estimates for 2022 performance.

Siamond, with that cleared up, I would only ask for a clear labelling in the spreadsheet, whether by font color or asterisk, to indicate that the Shiller series, unlike virtually every other one in Simba, is *NOT* a calendar year series.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
User avatar
martincmartin
Posts: 900
Joined: Wed Jul 02, 2014 3:04 pm
Location: Boston, MA USA

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by martincmartin »

Thank you very much, McQ and Siamond!

To confirm my understanding, since you've been looking at this recently and have it in your head, is this the correct interpretation of Shiller's data, to get total monthly returns for stocks and bonds?

Use the columns P (stock price), D (dividends), CPI and RateGS10 (10-year bonds).

Here, a prev_ prefix means the value of that column from the previous month.

Total nominal stock returns: (P + D / 12) / prev_P

Total nominal bond returns: prev_RateGS10 / RateGS10 + prev_RateGS10 / 1200 + ((1 + RateGS10 / 1200) ^ -119) * (1 - prev_RateGS10 / RateGS10)
That's the formula in the "Monthly Total Bond Returns" column, column T, in Shiller's ie_data.xls, tab "Data".

To adjust either stocks or bonds for inflation: value * prev_CPI / CPI

This seems to be the calculation in Shiller's ie_data.xls . It would be great if all four of us (Shiller, Siamond, McQ and I) could agree on the same formula.
GAAP
Posts: 2556
Joined: Fri Apr 08, 2016 12:41 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GAAP »

It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
“Adapt what is useful, reject what is useless, and add what is specifically your own.” ― Bruce Lee
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

McQ wrote: Wed Jun 14, 2023 4:31 pm [All: Siamond graciously provided his calculations on the Shiller data offline]

Ah, now I see what happened: For the Shiller data only, "2022 return" is the return from January 2022 through January of 2023, and so also for other years: January to January. Which recalculation gives me the Simba value, no error.

My calculations above assumed a December to December return, understood as year-end to year-end return, as with the other stock indexes, and in keeping with ordinary mutual fund reporting, with performance measured over a calendar year.

Philosophically, a case can be made that the way Siamond does it in Simba is correct, for the following reasons:

1. Cowles and Macaulay, Shiller's sources, don't have a December 1870 anchor point. Their very first index value is January 1871. For Siamond/Simba to maintain consistency all the way to the present day, returns must be calculated January to January; else, an 11 month year has to be inserted somewhere.

2. Since Shiller prices are an average of the month's prices, not month-end prices, a case can be made that January-to-January averages are no worse an approximation to the calendar year returns than December to December prices. Both are "off" by half a month, in a manner of speaking.

The gap will be greatest when there is a big movement in either January--as in 2023, with a big bounce, thus the 600 bp difference between Shiller and the other S&P estimates for 2022 performance.

Siamond, with that cleared up, I would only ask for a clear labelling in the spreadsheet, whether by font color or asterisk, to indicate that the Shiller series, unlike virtually every other one in Simba, is *NOT* a calendar year series.
Yes, points 1 and 2 were indeed part of my (admittedly somewhat arbitrary) reasoning. I was also uncomfortable with Dec to Dec math because the end of year dividend distributions would not have been accounted for (or more generally any end-of-year dynamics).

I saw other publications make the same Jan to Jan assumption. Case in point, the Shiller PE (CAPE) math from Multpl.com reports values derived from the Jan price from the good professor, while (inaccurately) labeling the CAPE value as Jan 1st (hence implying end of previous calendar year).

The drawback of doing Jan-to-Jan is that Prof. Shiller tends to publish an update mid-Jan with a price *snapshot* (e.g. Jan 13th) and without dividends. Which I use to compute the corresponding last year return (using the previous dividends entry as a first approximation). But then when he updates the January numbers in February, those numbers change (e.g. price becomes a full month average)... And I have to do the math again... And case in point, I forgot to do it this year!! :shock:

Ok, I'll add a comment/warning somewhere. I need to issue a Simba update in any case to account for the latest ie_data.
martincmartin wrote: Thu Jun 15, 2023 6:10 am[...] This seems to be the calculation in Shiller's ie_data.xls . It would be great if all four of us (Shiller, Siamond, McQ and I) could agree on the same formula.
Let me send you a link to the same little spreadsheet I shared with McQ and you can double-check we're in sync.
GAAP wrote: Thu Jun 15, 2023 1:06 pm It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
Did you mean a new thread or just a post? Because if you meant the latter, I really don't have much more to say than what I said here:
viewtopic.php?p=7310522#p7310522

I can try to edit the post to be more articulate if you don't find it clear enough, but I did include all the salient points... Let me know.
GAAP
Posts: 2556
Joined: Fri Apr 08, 2016 12:41 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GAAP »

siamond wrote: Fri Jun 16, 2023 9:32 am
GAAP wrote: Thu Jun 15, 2023 1:06 pm It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
Did you mean a new thread or just a post? Because if you meant the latter, I really don't have much more to say than what I said here:
viewtopic.php?p=7310522#p7310522

I can try to edit the post to be more articulate if you don't find it clear enough, but I did include all the salient points... Let me know.
I can certainly bookmark the discussion starting with that post, but it is a non-optimal way to find the information -- especially a few years from now when additional questions may have arisen. A separate thread would be better.

Actually, Shiller is important enough that a wiki article might make sense. The topic comes up frequently enough that a dedicated place would be easier to direct people to. You even noted that "it's hard to find a proper description of Prof. Shiller's methodology". The wiki has all of this to say about the PE10 calculation he does (https://www.bogleheads.org/wiki/P/E#Shiller_PE10) :
Shiller PE10

Robert Shiller has developed a stock valuation metric known as "PE10"; alternatively called CAPE (Cyclically Adjusted Price Earnings) ratio, or Shiller PE ratio. It is a variation of P/E, but with EPS (Earnings Per Share) averaged over the prior 10 years. Both terms (Price and Earnings) have to be expressed in real terms, i.e. adjusted for inflation.[8]

PE10 = Stock Price / average EPS over prior 10 years

Note that Prof. Shiller uses the same methodology as Standard & Poor's (S&P) to compute the 'Price' and 'Earnings' components of the equation (i.e. simple aggregate).
Clearly, that is oversimplified.
“Adapt what is useful, reject what is useless, and add what is specifically your own.” ― Bruce Lee
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

GAAP wrote: Fri Jun 16, 2023 5:03 pm
siamond wrote: Fri Jun 16, 2023 9:32 am
GAAP wrote: Thu Jun 15, 2023 1:06 pm It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
Did you mean a new thread or just a post? Because if you meant the latter, I really don't have much more to say than what I said here:
viewtopic.php?p=7310522#p7310522

I can try to edit the post to be more articulate if you don't find it clear enough, but I did include all the salient points... Let me know.
I can certainly bookmark the discussion starting with that post, but it is a non-optimal way to find the information -- especially a few years from now when additional questions may have arisen. A separate thread would be better.

Actually, Shiller is important enough that a wiki article might make sense. The topic comes up frequently enough that a dedicated place would be easier to direct people to. You even noted that "it's hard to find a proper description of Prof. Shiller's methodology". The wiki has all of this to say about the PE10 calculation he does (https://www.bogleheads.org/wiki/P/E#Shiller_PE10) :
Shiller PE10

Robert Shiller has developed a stock valuation metric known as "PE10"; alternatively called CAPE (Cyclically Adjusted Price Earnings) ratio, or Shiller PE ratio. It is a variation of P/E, but with EPS (Earnings Per Share) averaged over the prior 10 years. Both terms (Price and Earnings) have to be expressed in real terms, i.e. adjusted for inflation.[8]

PE10 = Stock Price / average EPS over prior 10 years

Note that Prof. Shiller uses the same methodology as Standard & Poor's (S&P) to compute the 'Price' and 'Earnings' components of the equation (i.e. simple aggregate).
Clearly, that is oversimplified.
I tend to agree with GAAP that the wiki is the place to get out some of the idiosyncrasies of the Shiller data; but that requires a wiki editor to take an interest. The quoted version misses the fact that CAPE is not a concern of many BH who consult the Shiller website and data series; rather, they are looking for the longest possible monthly return historical series for drawdown or other studies. And that's where the idiosyncrasies documented in the current exchange become important.

I have a paper in progress titled something like "Can Market Returns Be Known to Four Decimal Places?" The Shiller data is more grist for my mill, but it is far from the only case where the precision the, ahem, engineer-minded might expect from historical finance data just isn't there.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

martincmartin wrote: Thu Jun 15, 2023 6:10 am Thank you very much, McQ and Siamond!

To confirm my understanding, since you've been looking at this recently and have it in your head, is this the correct interpretation of Shiller's data, to get total monthly returns for stocks and bonds?

Use the columns P (stock price), D (dividends), CPI and RateGS10 (10-year bonds).

Here, a prev_ prefix means the value of that column from the previous month.

Total nominal stock returns: (P + D / 12) / prev_P

Total nominal bond returns: prev_RateGS10 / RateGS10 + prev_RateGS10 / 1200 + ((1 + RateGS10 / 1200) ^ -119) * (1 - prev_RateGS10 / RateGS10)
That's the formula in the "Monthly Total Bond Returns" column, column T, in Shiller's ie_data.xls, tab "Data".

To adjust either stocks or bonds for inflation: value * prev_CPI / CPI

This seems to be the calculation in Shiller's ie_data.xls . It would be great if all four of us (Shiller, Siamond, McQ and I) could agree on the
same formula.
I use your formula for stocks in the spreadsheet above. Hover over cells in Shiller's Column J and you'll see a somewhat different formulation that I believe is equivalent.

Shiller's bond returns are too much of a confection for me to want to go there.

I may not have mentioned that Shiller's stock price series is based on prices averaged over the month; the SBBI uses month-end prices. At shorter time frames (ten years and less) the differences can be quite remarkable in terms of estimated price appreciation.

There is also more than one way to skin the dividend cat ...

Note also that Shiller calibrates things backwards from today rather than forward from 1871
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
Alpha4
Posts: 176
Joined: Tue Apr 17, 2012 8:47 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by Alpha4 »

McQ wrote: Fri Jun 16, 2023 5:29 pm
GAAP wrote: Fri Jun 16, 2023 5:03 pm
siamond wrote: Fri Jun 16, 2023 9:32 am
GAAP wrote: Thu Jun 15, 2023 1:06 pm It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
Did you mean a new thread or just a post? Because if you meant the latter, I really don't have much more to say than what I said here:
viewtopic.php?p=7310522#p7310522

I can try to edit the post to be more articulate if you don't find it clear enough, but I did include all the salient points... Let me know.
I can certainly bookmark the discussion starting with that post, but it is a non-optimal way to find the information -- especially a few years from now when additional questions may have arisen. A separate thread would be better.

Actually, Shiller is important enough that a wiki article might make sense. The topic comes up frequently enough that a dedicated place would be easier to direct people to. You even noted that "it's hard to find a proper description of Prof. Shiller's methodology". The wiki has all of this to say about the PE10 calculation he does (https://www.bogleheads.org/wiki/P/E#Shiller_PE10) :
Shiller PE10

Robert Shiller has developed a stock valuation metric known as "PE10"; alternatively called CAPE (Cyclically Adjusted Price Earnings) ratio, or Shiller PE ratio. It is a variation of P/E, but with EPS (Earnings Per Share) averaged over the prior 10 years. Both terms (Price and Earnings) have to be expressed in real terms, i.e. adjusted for inflation.[8]

PE10 = Stock Price / average EPS over prior 10 years

Note that Prof. Shiller uses the same methodology as Standard & Poor's (S&P) to compute the 'Price' and 'Earnings' components of the equation (i.e. simple aggregate).
Clearly, that is oversimplified.
I tend to agree with GAAP that the wiki is the place to get out some of the idiosyncrasies of the Shiller data; but that requires a wiki editor to take an interest. The quoted version misses the fact that CAPE is not a concern of many BH who consult the Shiller website and data series; rather, they are looking for the longest possible monthly return historical series for drawdown or other studies. And that's where the idiosyncrasies documented in the current exchange become important.

I have a paper in progress titled something like "Can Market Returns Be Known to Four Decimal Places?" The Shiller data is more grist for my mill, but it is far from the only case where the precision the, ahem, engineer-minded might expect from historical finance data just isn't there.
What about the Stooq data, then (please see https://stooq.com/q/d/?s=^spx )? It is--AFAICT--supposedly actual end-of-day data for US stock prices back to 1885 (and end of month before that....but what they call "end of month" may in fact just be monthly average before 1885; I can't tell 100% for certain).

Granted, I have no idea how accurate it is. I am almost certain that it is price-only, though, and that from 1928 or 1950 onward it is just the regular price-only S&P 500.
GAAP
Posts: 2556
Joined: Fri Apr 08, 2016 12:41 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GAAP »

McQ wrote: Fri Jun 16, 2023 5:36 pm
martincmartin wrote: Thu Jun 15, 2023 6:10 am Thank you very much, McQ and Siamond!

To confirm my understanding, since you've been looking at this recently and have it in your head, is this the correct interpretation of Shiller's data, to get total monthly returns for stocks and bonds?

Use the columns P (stock price), D (dividends), CPI and RateGS10 (10-year bonds).

Here, a prev_ prefix means the value of that column from the previous month.

Total nominal stock returns: (P + D / 12) / prev_P

Total nominal bond returns: prev_RateGS10 / RateGS10 + prev_RateGS10 / 1200 + ((1 + RateGS10 / 1200) ^ -119) * (1 - prev_RateGS10 / RateGS10)
That's the formula in the "Monthly Total Bond Returns" column, column T, in Shiller's ie_data.xls, tab "Data".

To adjust either stocks or bonds for inflation: value * prev_CPI / CPI

This seems to be the calculation in Shiller's ie_data.xls . It would be great if all four of us (Shiller, Siamond, McQ and I) could agree on the
same formula.
I use your formula for stocks in the spreadsheet above. Hover over cells in Shiller's Column J and you'll see a somewhat different formulation that I believe is equivalent.

Shiller's bond returns are too much of a confection for me to want to go there.

I may not have mentioned that Shiller's stock price series is based on prices averaged over the month; the SBBI uses month-end prices. At shorter time frames (ten years and less) the differences can be quite remarkable in terms of estimated price appreciation.

There is also more than one way to skin the dividend cat ...

Note also that Shiller calibrates things backwards from today rather than forward from 1871
I have to think that for Shiller's original purpose -- determining whether or not the long term was related to the short term -- minor variations may not matter. He was just looking for a degree of correlation, not accurate prediction.
“Adapt what is useful, reject what is useless, and add what is specifically your own.” ― Bruce Lee
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

Alpha4 wrote: Fri Jun 16, 2023 7:59 pmWhat about the Stooq data, then (please see https://stooq.com/q/d/?s=^spx )? It is--AFAICT--supposedly actual end-of-day data for US stock prices back to 1885 (and end of month before that....but what they call "end of month" may in fact just be monthly average before 1885; I can't tell 100% for certain).

Granted, I have no idea how accurate it is. I am almost certain that it is price-only, though, and that from 1928 or 1950 onward it is just the regular price-only S&P 500.
It is indeed the regular price-only S&P 500 from 1928 onward, I double-checked. We already have this 1928+ data series in Simba (column NS in the v22c version in raw_data). Simba starts in 1871 and for now, the Shiller (January average) price series fills the gap of the first 50-ish years.

I don't know what the Stooq data actually represents in the early years, the fact that it includes annual data starting from 1789 (!!) seems rather suspicious to me.

In any case, I am not too hot about replacing the Shiller price (PR) data for those 50 years (1871+) by the Stooq series. We also use the Shiller data for the first 50-ish years of the S&P 500 total return (TR) series. I'd rather keep some kind of consistency in this respect.

Still, I included a link to the Stooq annual data in the "See also" section of the S&P 500 PR series. This will show up in the next update. Thanks for the pointer.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

GAAP wrote: Thu Jun 15, 2023 1:06 pm It would be great if one of y'all could put a separate summary of how to interpret Shiller as an independent post that can be bookmarked for later reference.
Here you are, please check this new thread.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

Sorry for the delay, I got a bit hung up finalizing a last-minute feature request. Here is the latest version, with 2023 annual returns. The corresponding wiki update will be done in a couple of days, to allow for feedback in case I messed up something!

Do pay attention to item 3b. The ability to assemble a growth chart with annual extra investments (or withdrawals) is useful, but not terribly original, while the ability to visualize the same data in a telltale chart format is definitely less common AND quite informative. All credits to dcabler for this idea.

You can download the spreadsheet here: https://bit.ly/3TUduQl

(copy the link to a separate browser tab/window if it doesn't download from here)

Rev23a
1a. Added 2023 annual returns: funds and index returns updated
1b. Expense Ratios (ER) updated, too many changes to list (numerous ERs increased!)
1c. Inflation is a rough estimate for now (Nov to Nov); updates for a few synthetic models are pending
2. Added RMSE computation to the Portfolio Cycles Comparison charts in Analyze_Portfolio
3a. Added an explicit toggle to choose between growth charts (nominal or inflation-adjusted) and telltale charts in Compare_Portfolios. This should make things clearer for new users.
3b. Added a new feature to the growth/telltale charts in Compare_Portfolios: ability to make an extra (inflation-adjusted) investment or withdrawal on an annual basis
4. Normalized annual returns raw data to numbers rounded to 5 decimal places (ongoing process, five most recent years were normalized while introducing 2023 returns)
boglesmkcents
Posts: 94
Joined: Tue Jul 24, 2012 4:57 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by boglesmkcents »

Thank you, Siamond ... as always, this spreadsheet along with the regular and timely updates is super-helpful and very much appreciated!
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

Likewise, my thanks to you, Siamond. Simba is a tremendous community resource from which I've gotten a great deal of value.

The penalty for making such a great contribution, of course, is people give you well-meaning suggestions for improvement which only make more work for you. Here's mine:

Over on the wiki there is a page with historical expense ratios for some but not all of the Vanguard funds that appear in Simba. One can get tracking error in Simba simply by subtracting the Vanguard fund return from the index return for a year; but without a source for the expense ratio, there's no way to know how much of tracking error is expense per se, and how much trading costs etc.

Perhaps at least a link at the bottom of the column for Simba funds back to the corresponding wiki page? And encouragement to the wiki editors to have historical expenses for every fund that does appear in Simba? (not true today)

Regardless, thank you again.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

McQ wrote: Wed Jan 10, 2024 5:43 pmThe penalty for making such a great contribution, of course, is people give you well-meaning suggestions for improvement which only make more work for you.
Honestly, I really wouldn't mind getting more constructive feedback about this spreadsheet! It is usually downloaded several thousand times per year, and I am always scratching my head as to why I don't get much feedback...
McQ wrote: Wed Jan 10, 2024 5:43 pmOver on the wiki there is a page with historical expense ratios for some but not all of the Vanguard funds that appear in Simba. One can get tracking error in Simba simply by subtracting the Vanguard fund return from the index return for a year; but without a source for the expense ratio, there's no way to know how much of tracking error is expense per se, and how much trading costs etc.
I had no idea we have wiki pages with historical ERs! Now that you pointed me in the right direction, a quick search sent me to this page:
https://www.bogleheads.org/wiki/Categor ... s:expenses
McQ wrote: Wed Jan 10, 2024 5:43 pm Perhaps at least a link at the bottom of the column for Simba funds back to the corresponding wiki page? And encouragement to the wiki editors to have historical expenses for every fund that does appear in Simba? (not true today)
I am not too eager to include a bunch of links in Simba on a per fund basis. But I would definitely propose to include the link above (from where people can easily dive as they see fit) in the Data_Sources tab. Now as to wiki editors, I can't speak on their behalf, but I can easily provide copies of the archived Simba annual updates, from which some of the recent ER history gaps can be filled.

All this being said... I'm sure you have in mind our past discussion about the not-quite-satisfying ER processing in Simba. For which I was a bit stumped on how to proceed. Well, those wiki pages provide the ER of the first year where a given fund existed. This is incomplete data (many funds don't have such a wiki page), but still this could be very useful information... Let me chew on this for a little while!
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

I am starting to wonder if we shouldn't archive the known history of expense ratios in the Simba spreadsheet for the various funds being tracked. Since the current ER is updated every year anyway, most of the annual process is kind of already there. The question is how to bootstrap it.

We actually have several starting points:
- Those Wiki pages documenting the history of various Vanguard funds
- A snapshot I archived in 2016 of the entire ER history of the funds tracked by Simba at that time (I had a way by then to easily get such historical info; unfortunately I don't have it anymore...); unfortunately we were tracking investor funds by then
- The archived versions of Simba year after year (going back to 2012) with the current ERs for the funds being tracked

Also, for newer funds, Vanguard now documents the last 5 years of ER values in their current SEC annual reports, which would speed up the process of looking in such individual reports. Fidelity appears to do the same. Not iShares though.

I mean, it's going to be a good chunk of work, but well, we don't have to do it all at once, it can be an incremental process...
GAAP
Posts: 2556
Joined: Fri Apr 08, 2016 12:41 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GAAP »

Wayback Machine -- it has a lot of snapshots of Vanguard, probably has a lot of the others.
“Adapt what is useful, reject what is useless, and add what is specifically your own.” ― Bruce Lee
McQ
Posts: 1425
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by McQ »

Also EDGAR. Most funds report the trailing five years. So, 2022 reports give 2017-2021 values, 2017 reports give 2012 to 2016 values, etc
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

Side-tracking from expense ratios for a minute...

I am planning to add a new data series for International Small Value:
- passive series based on iShares ISVL, its FTSE index and a fairly similar S&P index
- active series based on DFA DISVX and the same S&P index as above

Relevant indices for this market are:
1995+: MSCI World Ex USA Small Value GR USD
1990+: S&P Developed Ex US Small Value TR USD
2004+: FTSE Developed ex US ex Korea Small Cap Focused Value USD

What changed recently compared to the last time I looked at this market segment (credits to dcabler for the friendly push!):
- a passive index fund was launched (iShared ISVL, with annual returns starting in 2022)
- the DFA fund is now much more accessible to regular investors through its new ETF counterpart (DISV, annual returns starting in 2023)

I drafted all the historical returns for this change, it will be part of the Simba February update (which will come with official 2023 inflation numbers, etc). Feedback welcome until then!
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

Also, I am pondering some changes to the regular TIPS and short-term TIPS series. Again triggered by dcabler.

Those series used to track Fidelity FIPDX and Vanguard VTAPX (both starting annual returns in 2013) and to be complemented by the relevant Barclays index.

The question is to possibly make good use of iShares TIP and iShares STIP. TIP has solid history (2004+) while STIP has a little more than VTAPX (2011+).

I see three options, plus a variation:
1) not bother because the index returns are already good enough (and the Fidelity and Vanguard funds are popular and provide very low ERs)
2) replace FPIDX by TIP and VTAPX by STIP (there is no question that iShares is an excellent provider of passive index funds)
3) assemble more complex series, splicing index, then iShares, then Fidelity/Vanguard

The variation would be the same as 3), but disable the use of the iShares funds by default, which would result in 1) by default, while either 2) or 3) would be very easy to activate by just toggling a boolean in the spreadsheet. I'm not very fond of this one, to be honest, it's too complicated.

I drafted the third approach, if only to have all data handy before making a decision. I am kind of inclined to pick option 2, if only for its simplicity and consistency, but I don't feel that strongly about it. Thoughts?
longinvest
Posts: 5682
Joined: Sat Aug 11, 2012 8:44 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by longinvest »

siamond wrote: Mon Jan 15, 2024 12:11 am I am kind of inclined to pick option 2, if only for its simplicity and consistency, but I don't feel that strongly about it.
I like the simplicity of option 2, so it would be my first choice. Option 3 would be my second choice.
Variable Percentage Withdrawal (bogleheads.org/wiki/VPW) | One-Fund Portfolio (bogleheads.org/forum/viewtopic.php?t=287967)
dcabler
Posts: 4544
Joined: Wed Feb 19, 2014 10:30 am
Location: TX

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by dcabler »

siamond wrote: Mon Jan 15, 2024 12:11 am Also, I am pondering some changes to the regular TIPS and short-term TIPS series. Again triggered by dcabler.

Those series used to track Fidelity FIPDX and Vanguard VTAPX (both starting annual returns in 2013) and to be complemented by the relevant Barclays index.

The question is to possibly make good use of iShares TIP and iShares STIP. TIP has solid history (2004+) while STIP has a little more than VTAPX (2011+).

I see three options, plus a variation:
1) not bother because the index returns are already good enough (and the Fidelity and Vanguard funds are popular and provide very low ERs)
2) replace FPIDX by TIP and VTAPX by STIP (there is no question that iShares is an excellent provider of passive index funds)
3) assemble more complex series, splicing index, then iShares, then Fidelity/Vanguard

The variation would be the same as 3), but disable the use of the iShares funds by default, which would result in 1) by default, while either 2) or 3) would be very easy to activate by just toggling a boolean in the spreadsheet. I'm not very fond of this one, to be honest, it's too complicated.

I drafted the third approach, if only to have all data handy before making a decision. I am kind of inclined to pick option 2, if only for its simplicity and consistency, but I don't feel that strongly about it. Thoughts?
Thank you Siamond.
As an aside, in case folks didn't know, iShares changed the index provider for its TIPS funds early in 2023. These are TIP & STIP. They are now tracking indexes from The ICE. As far as I can tell, the construction rules for these indexes from both the original index provider and the new index provider are identical.

And still, TIP from ishares has an unusually high e/r compared to its competitors. For example TIP is 0.19% vs. SCHP at 0.03%

Cheers.
GraySwan
Posts: 2
Joined: Mon Jan 15, 2024 1:24 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GraySwan »

Siamond, thank you for the continuation of the annual updates and improvements.
It is much appreciated by me (and probably of many other silent users).

Here is some feedback / inspiration which may or may not be helpful 😉:

I have been adapting the excel file each year based on my personal preferences.
Mainly focuses on the “Compare_Portfolios” sheet and the second portfolio comparison (because I don’t use the first one):

- Changed color formatting to have more contrast for better visibility
Image

- More consistent colors between portfolio numbers and diagrams:
Image

- Pulled annual portfolio %-change values from sheet “Portfolio_Math” to “Compare_Portfolios” cells AK269 ff for easier comparison between them and to show differences vs. benchmark
Image

- Added some specialized and more “exotic” fund returns (shaded yellow in “raw data”)
Image

Caution: not all source data here is complete and beyond doubt.
I did some creative extrapolation and substitution for missing data which may not be up to the quality standard of the other data.

- Added previous portfolio compositions in columns Z onward with (static) values for comparison and to allow the recalculate same compositions in future years (needs to be of same time frame)
Image

- The most powerful usage however can imo be the excel solver tool.
It allows to define a set of constraints for variables (cell values) such as percentages (min. to max.) of allocations or limit max. drawdown to a certain value.
Then you can have the solver optimize for another value based on these constraints.
For example, you can have the solver try to find the allocation with the best sharpe ratio or highest alpha.
Now while this is not rocket science, its usage probably not easy to understand if one has never previously used excel solver tool.
For ease of use I have put in a sheet “solver” where constraints can be set up outside of solver to be stored and afterwards imported into solver.

First the value to be optimized for (min. or max.) is to be chosen:
Image

Then (some) of the funds are displayed where you can more easily put in min. values (upper row) and max. values (lower row)
Image

Afterwards the values from column J need to be copied (must not be a formula to be imported) and inserted as "values" into column G (from purple cell downward)

Then within the solver tool the values from the purple cell downward can be imported (replaced).

Note: performance of the solver tool itself as well as during solver calculation depends a lot on the number of constraints. The more constraints you define, to slower it will be (also depending on hardware)

The usage of the sheet “solver” is entirely optional – all constraints can also be set-up in the solver tool directly.

For solver to work properly, it is advisable to have starting values that are roughly equal weight.
Otherwise, the starting values may lead to “biased” results.

To see the effects on portfolio 1, you can hit “solve” to try it with the constraints already set up.
Disclaimer: Since I use a non-English version of Excel, I don't know if that will impact functionality for different languages.

Anyway, a lot of my changes are my personal flavor and may not suit everybody.
What changes, if any, find their way into a future “canon” version is up to debate 😊

Link to edited Excel (based on rev22b).
//edit - Link deleted: the file host service seems to cause issues
Re-uploaded on Google Drive by Siamond - refer to post of Thu Jan 18
Last edited by GraySwan on Fri Jan 19, 2024 11:51 am, edited 2 times in total.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

GraySwan wrote: Mon Jan 15, 2024 4:35 pmHere is some feedback / inspiration which may or may not be helpful 😉:

I have been adapting the excel file each year based on my personal preferences. [...]
Hi there! It seems that that you're a true 'power user' of this spreadsheet, this is cool.

Let me download your custom version and ponder about it. The use of brighter and consistent colors seems like an idea which could become mainstream. The rest seems more like your own custom work, but let me take a good look and think a bit more.

There are some cool features in Analyze_Portfolio, I would notably encourage you to take a good look at the Portfolio Cycles charts if you didn't do so already. Some people put simple formulas in the portfolio definition part of Analyze_Portfolio, formulas referring to P1+benchmark portfolios of Compare_Portfolios, to easily switch from one tab to another while looking at things.

PS. something at the edge of my mind is to add the ability to look in depth at 'delta portfolios' (i.e. custom portfolio minus benchmark). You kind of started it with your extra columns, but it would be cool to be able to use the various charts and metrics to illustrate such deltas. One might argue that a Telltale chart already provides such information, but it's only one type of chart... Anyhoo, just an idea in the air!
User avatar
LadyGeek
Site Admin
Posts: 95696
Joined: Sat Dec 20, 2008 4:34 pm
Location: Philadelphia
Contact:

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by LadyGeek »

Don't go too crazy with the colors. Consider that they're perceived differently in different devices and monitors. Also consider accessibility - readers may have some color blindness. I recommend to stick with what you have now.
Wiki To some, the glass is half full. To others, the glass is half empty. To an engineer, it's twice the size it needs to be.
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

The December US Bureau of Labor Statistics (BLS) numbers are now available, CPI is 306.746, which means the official inflation number for 2023 is 3.35%.

I'll wait for more pending 2023 stats to issue a new update, but power users can update the US inflation number in Raw_Data, if so inclined.
Hydromod
Posts: 1052
Joined: Tue Mar 26, 2019 10:21 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by Hydromod »

LadyGeek wrote: Tue Jan 16, 2024 10:11 am Don't go too crazy with the colors. Consider that they're perceived differently in different devices and monitors. Also consider accessibility - readers may have some color blindness. I recommend to stick with what you have now.
I usually use the red-white-blue scale to avoid the most common color-blindness issue, where red and green are both seen as gray.
GAAP
Posts: 2556
Joined: Fri Apr 08, 2016 12:41 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GAAP »

As long as you're adding features, you could add a choice to select a color pallet. More fun in your spare time. :wink:
“Adapt what is useful, reject what is useless, and add what is specifically your own.” ― Bruce Lee
User avatar
siamond
Posts: 6010
Joined: Mon May 28, 2012 5:50 am

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by siamond »

GAAP wrote: Tue Jan 16, 2024 1:38 pm As long as you're adding features, you could add a choice to select a color pallet. More fun in your spare time. :wink:
After spending 2 hours getting rid of white snow on my (stupid large) driveway, I am a little tired and I like Ladygeek's comment better... :D

Feel free to implement it though! :beer
GraySwan
Posts: 2
Joined: Mon Jan 15, 2024 1:24 pm

Re: Simba's backtesting spreadsheet [a Bogleheads community project]

Post by GraySwan »

Color choice is obviously subjective and color blindness is a valid concern (is however already an issue with currently used red/green combo). The suggested red/white/blue would be an improvement for the more common types of color blindness though.
Anyway, since conditional formatting can be copied across files, it takes me barely 30 seconds once a year to change it 😉
Some people put simple formulas in the portfolio definition part of Analyze_Portfolio, formulas referring to P1+benchmark portfolios of Compare_Portfolios, to easily switch from one tab to another while looking at things.
Oh yes, I actually did that as well, just forgot to mention it, like a couple of other small changes. 🤷‍♂️
E.g. I color coded different asset classes (equity = blue, fixed income = green, alternatives = red)
Image
…and applied a small formula to quickly show these high-level % allocations across portfolios.
It allows to quickly detect if one is making apples to oranges comparisons on asset allocation level.
Image

Added some custom ratios (mainly as solver target values) and also Drawdown (unbalanced) since I want to know the “worst case”, and some people are probably unaware that the normal drawdown value assumes rebalancing.
Image

Of course this is mostly flavor, but in my experience these small things can help with clarity and understanding which may lead to broader adoption by new users (smaller hurdles for those that may be overwhelmed by spreadsheets with lots of numbers). :happy
Post Reply