Statistics question
Statistics question
Is it possible to work out what the expected return would be at the 1st and 5th percentile for a portfolio with an expected return and standard deviation of 5.8% and 11.4%? I'd like to calculate those returns expected in 5 and 10 years.
Thankyou
Kelly
Thankyou
Kelly
Re: Statistics question
If I understand correctly, you want to know what the worst 1% and 5% returns could be given a mean return of 5.8% and St Dev 11.4%?
If so, you could grab a normal distribution calculator like
http://onlinestatbook.com/2/calculators ... _dist.html
And calculate the "below" value of different bad returns (say 18) and then see how much area is under the curve. You want to find the negative return that produces an area under the curve below it of 0.01 and 0.05. It looks like 20% and 13% do that respectively.
Hopefully that's what you wanted to know.
If so, you could grab a normal distribution calculator like
http://onlinestatbook.com/2/calculators ... _dist.html
And calculate the "below" value of different bad returns (say 18) and then see how much area is under the curve. You want to find the negative return that produces an area under the curve below it of 0.01 and 0.05. It looks like 20% and 13% do that respectively.
Hopefully that's what you wanted to know.

 Posts: 29
 Joined: Mon Jan 28, 2019 3:38 pm
Re: Statistics question
Assuming that the expected return and standard deviation are for annual returns (rather than the whole period), I think you would want to use a parameterized Monte Carlo calculator to do this. Portfolio Visualizer has a highly customizable calculator. Here is what I would do to answer your question: https://www.portfoliovisualizer.com/mon ... nt=1000000
Re: Statistics question
Thank you. No surprise an MIT alum would know of this website. I was on the other side of the river at Northeastern where I did well in stats but that was 30 yrs ago.305pelusa wrote: ↑Tue Oct 08, 2019 1:06 pmIf I understand correctly, you want to know what the worst 1% and 5% returns could be given a mean return of 5.8% and St Dev 11.4%?
If so, you could grab a normal distribution calculator like
http://onlinestatbook.com/2/calculators ... _dist.html
And calculate the "below" value of different bad returns (say 18) and then see how much area is under the curve. You want to find the negative return that produces an area under the curve below it of 0.01 and 0.05. It looks like 20% and 13% do that respectively.
Hopefully that's what you wanted to know.
If I enter below 9.5, the calculator returns an "Area (probability) = .05". Is that the return expected at the 5th percentile?
Thankyou!
Re: Statistics question
Oh if it's over 5 or 10 years, you could also use a normal distribution calculator that takes a sample size as input as well. You could then figure out the lowest mean return of that sample of 5 or 10 years at a 1% or 5% worst case scenario. Then you'd convert that mean sample return into a CAGR (CAGR = Mean  1/St Dev^2 or something like that).
Re: Statistics question
For a mean of 5.8, St Dev of 11.4, I'm finding 9.5 gives me an area of 0.089. Are you sure you put in the right mean and St Dev?Kelly wrote: ↑Tue Oct 08, 2019 1:18 pmThank you. No surprise an MIT alum would know of this website. I was on the other side of the river at Northeastern where I did well in stats but that was 30 yrs ago.305pelusa wrote: ↑Tue Oct 08, 2019 1:06 pmIf I understand correctly, you want to know what the worst 1% and 5% returns could be given a mean return of 5.8% and St Dev 11.4%?
If so, you could grab a normal distribution calculator like
http://onlinestatbook.com/2/calculators ... _dist.html
And calculate the "below" value of different bad returns (say 18) and then see how much area is under the curve. You want to find the negative return that produces an area under the curve below it of 0.01 and 0.05. It looks like 20% and 13% do that respectively.
Hopefully that's what you wanted to know.
If I enter below 9.5, the calculator returns an "Area (probability) = .05". Is that the return expected at the 5th percentile?
Thankyou!
Remember this will only tell you the likelihood for 1 year. If you're looking for 5 or 10 years, you'll have to use a calculator that takes into account sample size.
https://mathcracker.com/normalprobabil ... tributions
You should find a given mean and CAGR return is much more unlikely as you increase the sample size
Re: Statistics question
You could have done worse. You could have been at the other end of Mass Ave.
A scientist looks for THE answer to a problem, an engineer looks for AN answer and lawyers ONLY have opinions. Investing is not a science.
Re: Statistics question
Nb. monthly equity returns are not normally distributed and in fact looking backwards do not appear to have followed any known distribution; the PV monte carlo sim has some knobs to turn to account for some aspects of the nonnormality observed in historic US equity returns, as do some others.
If you look at yearly returns, things look more normal but the sample size isn't large enough to have confidence that the tails aren't really fatter than a normal distribution with appropriate parameters would predict.
If you look at yearly returns, things look more normal but the sample size isn't large enough to have confidence that the tails aren't really fatter than a normal distribution with appropriate parameters would predict.
Re: Statistics question
Mathcracker: with a mean of 5.3, std dev of 9, sample of 5 I get a Pr of 0.089 of having a return equal/less than 0.11.
Portfolio visualizer shows a 10th percentile, 5 year return of 0.11.
Mathcracker reports Pr 0.01 for a return less than 4 (1st percentile)
Pr 0.058 for a return less than 1 (5th percentile)
So mathcracker seems to have what I'm after.
Thanks very much!
Portfolio visualizer shows a 10th percentile, 5 year return of 0.11.
Mathcracker reports Pr 0.01 for a return less than 4 (1st percentile)
Pr 0.058 for a return less than 1 (5th percentile)
So mathcracker seems to have what I'm after.
Thanks very much!
Re: Statistics question
Agreed. I use financial planning software that uses a normal distribution but doesn't report below the 10th percentile. My question was how it's MC returns compared to historical worst returns. It's tenth percentile returns for five years works out to 0.10 NOMINAL CAGR . The five year REAL return for 60/40 from Dec '68 through Nov '73 was 3.37. That period starts the worst 30 year retirement period observed.cheezit wrote: ↑Tue Oct 08, 2019 1:42 pmNb. monthly equity returns are not normally distributed and in fact looking backwards do not appear to have followed any known distribution; the PV monte carlo sim has some knobs to turn to account for some aspects of the nonnormality observed in historic US equity returns, as do some others.
If you look at yearly returns, things look more normal but the sample size isn't large enough to have confidence that the tails aren't really fatter than a normal distribution with appropriate parameters would predict.
Once I add inflation is it fair to say that the MC worst case is as bad as the historic?
Re: Statistics question
You're OK. I didn't think that NU was even in the same city let alone on the same street. I was thinking of a little school called Havad or something like that.
A scientist looks for THE answer to a problem, an engineer looks for AN answer and lawyers ONLY have opinions. Investing is not a science.
 patrick013
 Posts: 2712
 Joined: Mon Jul 13, 2015 7:49 pm
Re: Statistics question
Wondering what the frequency distribution of returns looks like graphically. The confidence interval of the larger side would be more reliable than the confidence interval of the smaller side. Not a normal distribution so subjectively 1.5 x SD for the smaller side ? Those big drawdowns will happen but with small frequency. A different stat in itself.
age in bonds, buyandhold, 10 year business cycle
 Taylor Larimore
 Advisory Board
 Posts: 28682
 Joined: Tue Feb 27, 2007 8:09 pm
 Location: Miami FL
Re: Statistics question
Kelly:
During my investing life (69 years), I have learned that statistics (nearly always based on past performance) are almost useless to forecast "expected return" for stocks and bonds. Statistics are not going to tell you the stock and bond returns tomorrow, next week, next month or next year. Statistics are useful for many things, but not "expected return."
"There are three kinds of lies: lies, damned lies, and statistics."  Benjamin Disraeli, British Prime Minister
Best wishes.
Taylor
Jack Bogle's Words of Wisdom (about forecasts): "Nobody knows nothing."
"Simplicity is the master key to financial success."  Jack Bogle
 willthrill81
 Posts: 13225
 Joined: Thu Jan 26, 2017 3:17 pm
 Location: USA
Re: Statistics question
MC analyses are known to produce significantly worse 'worst' cases than occurred in history because they fail to incorporate any type of mean reversion. Derek Tharp noted this a couple of years ago in this post on Michael Kitces' website. For instance, a 'standard' MC analysis assumes that a 50% drop in stocks is just as likely to occur before stocks drop 50% as before, which is extremely illogical; it's widely agreed that the expected return of stocks after a 50% drop is much higher than the expected return before the 50% drop. Consequently, many of the MC generated scenarios look downright apocalyptic (e.g. Great Depression part 2 followed by 1970s stagflation followed by the popping of the tech bubble, etc.), something that could very well make the entire financial system implode.Kelly wrote: ↑Tue Oct 08, 2019 2:07 pmAgreed. I use financial planning software that uses a normal distribution but doesn't report below the 10th percentile. My question was how it's MC returns compared to historical worst returns. It's tenth percentile returns for five years works out to 0.10 NOMINAL CAGR . The five year REAL return for 60/40 from Dec '68 through Nov '73 was 3.37. That period starts the worst 30 year retirement period observed.cheezit wrote: ↑Tue Oct 08, 2019 1:42 pmNb. monthly equity returns are not normally distributed and in fact looking backwards do not appear to have followed any known distribution; the PV monte carlo sim has some knobs to turn to account for some aspects of the nonnormality observed in historic US equity returns, as do some others.
If you look at yearly returns, things look more normal but the sample size isn't large enough to have confidence that the tails aren't really fatter than a normal distribution with appropriate parameters would predict.
Once I add inflation is it fair to say that the MC worst case is as bad as the historic?
Annual returns in stocks may not come from a normal distribution, and they are definitely not independent of each other, both of which are assumptions for most statistical tools. Hence, using only the mean and std. dev. in a predictive manner, at least with tools that have these assumptions, is likely to lead to very erroneous results, never mind the fact that we don't know a priori what the mean and std. dev. will be in the future.
There are ways to address this shortfall, but it's difficult to do with 'synthetic' data, as you apparently wish to do.
“It's a dangerous business, Frodo, going out your door. You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to.” J.R.R. Tolkien,The Lord of the Rings

 Posts: 34
 Joined: Sat Jan 19, 2019 5:34 pm
Re: Statistics question
I was given Bs in all my statistics courses.
 willthrill81
 Posts: 13225
 Joined: Thu Jan 26, 2017 3:17 pm
 Location: USA
Re: Statistics question
Were you 'given' Bs or is that what you earned?
“It's a dangerous business, Frodo, going out your door. You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to.” J.R.R. Tolkien,The Lord of the Rings

 Posts: 891
 Joined: Mon Mar 27, 2017 10:47 pm
 Location: CA
Re: Statistics question
Not quite. One is in the Hub and the other one is in Our fair city.

 Posts: 891
 Joined: Mon Mar 27, 2017 10:47 pm
 Location: CA
Re: Statistics question
The crux of a normal distribution is the central limit theorem. Since you take a rather small number of samples from a set which does not have a normal distribution, how can you get other estimates with a high degree of confidence.
 JupiterJones
 Posts: 2790
 Joined: Tue Aug 24, 2010 3:25 pm
 Location: Nashville, TN
Re: Statistics question
Well I went to school (briefly) on Mass Ave between MIT and NU. I don't think stats was even offered as an elective, so what do I know?
That said... I seem to recall reading somewhere that stock returns are not normally distributed. If so, I would caution reading too much into standard deviation here as any other than a general "relative risk" metric to be used where comparing funds.
Stay on target...
Re: Statistics question
This is true but it's important to ask what returns are not normally distributed.MathIsMyWayr wrote: ↑Wed Oct 09, 2019 1:45 pmThe crux of a normal distribution is the central limit theorem. Since you take a rather small number of samples from a set which does not have a normal distribution, how can you get other estimates with a high degree of confidence.
It is stock market daily returns that have been shown to not be normally distributed (strongly so). But that means that if you're looking at 10 years of returns, it's more like 2500 samples. So the CLT can be applied to say, approximately, that the stock market decade returns are normally distributed.
Re: Statistics question
[OT comment removed by admin LadyGeek] care to control the overlapping problem? Cheers
"whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought" ET Jaynes
Re: Statistics question
There's no overlap, I'm not talking about rolling returns.
I'm saying the average decade returns of the following decade will be equal to 2500 times the average of the following 2500 daily returns. The daily returns aren't normal but their average is. And 2500 times the average is too. So the upcoming average decade return should be approximately distributed (to the extent that 2500 samples is enough for the CLT).
If you wanted to show the above in practice with past data, you'll want to use independent decades like you said of course

 Posts: 891
 Joined: Mon Mar 27, 2017 10:47 pm
 Location: CA
Re: Statistics question
"the average decade returns of the following decade will be equal to 2500 times the average of the following 2500 daily return"  yes, if geometric mean, not algebraic mean.305pelusa wrote: ↑Wed Oct 09, 2019 2:50 pmThere's no overlap, I'm not talking about rolling returns.
I'm saying the average decade returns of the following decade will be equal to 2500 times the average of the following 2500 daily returns. The daily returns aren't normal but their average is. And 2500 times the average is too. So the upcoming average decade return should be approximately distributed (to the extent that 2500 samples is enough for the CLT).
If you wanted to show the above in practice with past data, you'll want to use independent decades like you said of course
A problem with "2500" samples is that the actual number is not quite large as 2500, but may be much less because of the strong correlation among them. They are not really random, but there is strong correlation. The problem is that the correlation is not predictable.
Re: Statistics question
^I'm talking about arithmetic mean above exclusively. One can then convert to geometric via an approximation with St Dev. I'm not sure what an algebraic mean is.MathIsMyWayr wrote: ↑Wed Oct 09, 2019 3:46 pm"the average decade returns of the following decade will be equal to 2500 times the average of the following 2500 daily return"  yes, if geometric mean, not algebraic mean.305pelusa wrote: ↑Wed Oct 09, 2019 2:50 pmThere's no overlap, I'm not talking about rolling returns.
I'm saying the average decade returns of the following decade will be equal to 2500 times the average of the following 2500 daily returns. The daily returns aren't normal but their average is. And 2500 times the average is too. So the upcoming average decade return should be approximately distributed (to the extent that 2500 samples is enough for the CLT).
If you wanted to show the above in practice with past data, you'll want to use independent decades like you said of course
A problem with "2500" samples is that the actual number is not quite large as 2500, but may be much less because of the strong correlation among them. They are not really random, but there is strong correlation. The problem is that the correlation is not predictable.
Your point about correlation and dependency is a good one. As well as many other shortcomings of the approach. So don't use the above as gospel and purely as very rough/crude approximations. Then again I think that's something we ALL know so I don't think it bears repeating.

 Posts: 120
 Joined: Sat Feb 09, 2019 2:27 am
Re: Statistics question
I tried simulating this in R, with 100,000 trials. I’m getting a 28% loss after 10 years in the 1st percentile and a 7% loss in the 5th percentile. I know there are people on this forum who code for a living, so would welcome anybody who would like to check my work. Here is the code:
# Inputs
n < 100000
years < 10
meany < 0.058
standy < 0.114
x < rep(1,n) # Vector of length n
# do n trials of duration years
for (i in 1:years){
x < x * (1 + rnorm(n,meany,standy))
}
# check
mean(x)
(1+meany)^years
ordered < sort(x)
# 1st percentile
ordered[n/100]
# 5th percentile
ordered[n/20]
# Inputs
n < 100000
years < 10
meany < 0.058
standy < 0.114
x < rep(1,n) # Vector of length n
# do n trials of duration years
for (i in 1:years){
x < x * (1 + rnorm(n,meany,standy))
}
# check
mean(x)
(1+meany)^years
ordered < sort(x)
# 1st percentile
ordered[n/100]
# 5th percentile
ordered[n/20]
 abuss368
 Posts: 15539
 Joined: Mon Aug 03, 2009 2:33 pm
 Location: Where the water is warm, the drinks are cold, and I don't know the names of the players!
Re: Statistics question
I have learned much over the years from investing. One of which is to not rely on past performance (which statistics is usually based on).
John C. Bogle: "Simplicity is the master key to financial success."
Re: Statistics question
28% loss is approximately an average yearly return of 2.6%. That sounds about right. I've never used R but you code seems ok.Small Savanna wrote: ↑Wed Oct 09, 2019 7:23 pmI tried simulating this in R, with 100,000 trials. I’m getting a 28% loss after 10 years in the 1st percentile and a 7% loss in the 5th percentile. I know there are people on this forum who code for a living, so would welcome anybody who would like to check my work. Here is the code:
# Inputs
n < 100000
years < 10
meany < 0.058
standy < 0.114
x < rep(1,n) # Vector of length n
# do n trials of duration years
for (i in 1:years){
x < x * (1 + rnorm(n,meany,standy))
}
# check
mean(x)
(1+meany)^years
ordered < sort(x)
# 1st percentile
ordered[n/100]
# 5th percentile
ordered[n/20]

 Posts: 644
 Joined: Tue May 15, 2018 10:14 pm
Re: Statistics question
The giant assumption is that portfolio returns act as if they are random variables whose distribution is a Normal distribution. The second giant assumption is that each year's investment result is independent of every other year's result. Since neither of these is true, the following mathematical discussion can almost surely be taken with a grain of salt.
But, here goes:
So if the mean of the ROI is 5.8 percent, then to get next year's balance you would multiply by 1.058; you have to add a one to the percentage return to get the multiplication factor.
And if the standard deviation is 11.4 percent, then the variance is the square of 0.114 which is 0.012996.
If this process repeats for N years, it is like we are multiplying N random numbers together, each with the same mean and s.d.
See the wikipedia page on probability function when multiplying random numbers.
For the mean: "When two random variables are statistically independent, the expectation of their product is the product of their expectations."
So the expected value for the mean is 1.058^N. And to get the ROI from that, just subtract 1.
For the standard deviation, the relevant section is "Variance of the product of independent random variables"
To get the variance of the product of several random variables, for each variable, add the variance to the square of the mean. Now multiply all those together. That is the first term of the equation. Next take just the square of the mean for each variable; and multiply all those together. That is the second term. Now subtract the second term from the first term. Now you have the variance of the product of those random variables. Next take the square root: and you have the standard deviation.
In our case:
sqrt{(0.012996 + 1.119364)^N  1.119364^N }
Code: Select all
N mean var sd % ROI sd(%)
1 1.058 0.013 0.114 5.8% 11.4%
5 1.326 0.104 0.323 32.6% 32.3%
10 1.757 0.378 0.615 75.7% 61.6%
Just because you're paranoid doesn't mean they're NOT out to get you.

 Posts: 8
 Joined: Wed Jun 21, 2017 11:40 am
Re: Statistics question
I did the same thing in MATLAB and got the same results. For a 5year simulation, the 1st percentile is 29%, and the 5th percentile is 15%. See code below:Small Savanna wrote: ↑Wed Oct 09, 2019 7:23 pmI tried simulating this in R, with 100,000 trials. I’m getting a 28% loss after 10 years in the 1st percentile and a 7% loss in the 5th percentile. I know there are people on this forum who code for a living, so would welcome anybody who would like to check my work. Here is the code:
# Inputs
n < 100000
years < 10
meany < 0.058
standy < 0.114
x < rep(1,n) # Vector of length n
# do n trials of duration years
for (i in 1:years){
x < x * (1 + rnorm(n,meany,standy))
}
# check
mean(x)
(1+meany)^years
ordered < sort(x)
# 1st percentile
ordered[n/100]
# 5th percentile
ordered[n/20]
% Define normal distribution parameters
mu = 0.058;
sigma = 0.114;
% Define sample size for MonteCarlo simulation
sample_size = 100000;
% Define simulation length in years
years = 10;
% Define initial portfolio value
initial_value = 1;
% Define initial portfolio value for all samples in the simulation
portfolio_value = initial_value*ones(sample_size, 1);
% Loop for each year
for i = 1:years
% Randomly sample annual return from normal distribution for each
% sample in the MonteCarlo simulation
annual_return = normrnd(mu, sigma, [sample_size, 1]);
% Calculate portfolio value
portfolio_value = (1+annual_return).*portfolio_value;
end
% Calculate total return as a percentage
total_return = 100*(portfolio_valueinitial_value)/initial_value;
% Calculate 1st and 5th percentiles of total return
percentile_1 = prctile(total_return, 1);
percentile_5 = prctile(total_return, 5);
Re: Statistics question
Maybe (s)he meant to capitalize the "S"
Re: Statistics question
At first I was like "I don't think statistics is capitalized?".an_asker wrote: ↑Thu Oct 10, 2019 10:30 amMaybe (s)he meant to capitalize the "S"
I surprise myself with my own stupidity some times
Re: Statistics question
I don't think it is you  I have to explain my PJ multiple times ... such as the one I made about folks not recognizing climate change as being "person non Greta"!305pelusa wrote: ↑Thu Oct 10, 2019 11:26 amAt first I was like "I don't think statistics is capitalized?".an_asker wrote: ↑Thu Oct 10, 2019 10:30 amMaybe (s)he meant to capitalize the "S"
I surprise myself with my own stupidity some times
 JupiterJones
 Posts: 2790
 Joined: Tue Aug 24, 2010 3:25 pm
 Location: Nashville, TN
Re: Statistics question
I think that's a spoton simulation of the problem at hand, given the assumptions we're using at least.Small Savanna wrote: ↑Wed Oct 09, 2019 7:23 pmI know there are people on this forum who code for a living, so would welcome anybody who would like to check my work.
I would only make two small modifications, neither of which change the gist of it (they're just niceties). I've marked them with "JJ" comments:
Code: Select all
# Inputs
n < 100000
years < 10
meany < 0.058
standy < 0.114
x < rep(1,n) # Vector of length n
# JJ: Set the PRNG seed, for reproducibility
set.seed(8675309) # Actual constant used doesn't really matter
# do n trials of duration years
for (i in 1:years){
x < x * (1 + rnorm(n,meany,standy))
}
# check
mean(x)
(1+meany)^years
# JJ: A little easier than fooling with ordering, etc. :)
1  quantile(x, c(0.01, 0.05))
Stay on target...