The question, however, is this:

**do these "leading" indicators actually "lead" in a meaningful way**, that is, do they give advance notice sufficient to take action?

I decided to take a closer look into this question.

A few weeks ago I came across this article from CNBC titled "There's a 70% chance of recession in the next six months, new study from MIT and State Street finds". This article introduced me to the

**Mahalanobis Distance**(MD for short), a technique for correlating multivariate data sets. The MD determines a "center point" of the multivariate sample population, and then calcuates the distance from that center point to a candidate point. The MD is indexless, except that the closer it is to zero (the center point), the more likely the candidate is to be a part of the set. If the variables don't relate, the result is noise.

I applied the MD technique to my data model based on the linked articles in the OP, to see if it can really identify a recession from the set of leading indicators in the articles. (See this website for instructions on how to build this in Excel: Example of Calculating the Mahalanobis Distance).

**The Variables**

The variables that I used were those listed in the OP articles, with some adjustments described below.

S&P 500

I am using a 9-month moving average as a proxy for a 200 day moving average. The set is S&P 500 month-end closing price (Shiller data) relative to its 9-month moving average (the percent difference).

Unemployment

The set is the unemployment rate (from FRED) relative to its trailing twelve month (TTM) average (the percent difference).

YoY Job Growth

The set is simply the YoY percent change in jobs as reported by FRED.

YoY Industrial Production Growth

The set is simply the YoY percent change in industrial production (FRED).

YoY Retail Sales Growth

The set is simply the YoY percet change in retail sales (FRED).

YoY Personal Income Growth

The set is simply the YoY percet change in personal income (FRED).

YoY New Housing Starts

The set is simply the YoY percet change in new housing starts (FRED).

S&P 500 Total Return Earnings Per Share (TR EPS) Growth

The set is described in the OP linked article Introducing the Total Return EPS Index: A New Tool for Analyzing Fundamental Equity Market Trends, based on Shiller data. For this analysis, the set was converted to log10 and then relative to its linear trend (percent difference).

New Jobless Claims

The set is simply the four week average of the weekly jobless claims (FRED). This is NOT a part of of the OP article leading indicators, but I added it to my model.

**The Historical Data**

The S&P 500 data in my model is sourced from the monthly Shiller data set. The accounting oddities in 2003-2009 as indicated in the TR EPS article above have been corrected as described in the article. (Source for correction data: https://us.spindices.com/documents/addi ... s-est.xlsx.

Periods of recession (start/end dates) were gotten from Wikipedia. I separately indicate "bear market" recessions as those with significant S&P 500 price drops, and extend the dates (before and after) to reflect the beginning and end of the price drops.

The Shiller S&P 500 data set goes back to 1871. United States industrial production data begins in 1920. Unemployment data begins in 1929. Retails sales data begins in 1948. Jobs data begins in 1949. Housing starts and personal income growth, begins in 1960. Jobless claims begins in 1967.

The sample population was built to be all the above indicators as of 1967, filtered ONLY for the months that were indicated as "bear market" recessions.

**The Process**

The "candidates" are all the months from January 1967 to the present. The MD is the distance of the candidate month from the "center point" of the sample population. Because this is an indexless number, I normalized the MD to a 0-100 scale and then inverted it. Therefore, closer to 0 means the candidate is NOT likely to be in the "bear market" recession population, and closer to 100 means that the candidate IS likely to be in the "bear market" recession population.

I ran the analysis using different combinations of the leading indicators to see how they behaved in the model. I won't recount that initial analysis here, but it was sufficient to give me confidence that the above indicators do work using the MD technique.

**First Pass Results**

The chart belows shows the MD "score" from 1967 to present.

As you can see, the MD index is at or greater than 90 during the bear market recession periods (dark grey areas), and then drops off after recessions.

Note the spike at the end of 2018. This is the whipsaw at the end of 2018 which is not in the bear market recession sample population, but was identified by MD. Also note how low the MD index goes in non-recessionary periods, down to the 20s and low teens.

Now take a look at this next chart, which calculates the MD index using ONLY the S&P 500 and Unemployment, as the OP is using in his strategy.

The indicators still identified the bear market recessions, but the non-recessionary months don't fall as low. There is a floor of about 50 when only using these two indicators. This also identified the light recession of 1990-1991, and gave it the highest score (closest to the MD center). The fuller indicator analysis did not call out the 1990-1991 recession.

**Conclusion 1: The MD technique can identify bear market recessionary periods.**

What should also be obvious from these two charts is that, while they both identified the recessions, did they "lead" enough to act? Based on these charts, I'd say no.

**The Second Pass Results**

It occurred to me that if the MD technique was successfull in identifying the bear market recessions, could it also identify the months leading up to the recession? Let's find out.

I added a new flag to my data to identify the period leading up to the bear market recessions. I played around with different durations and finally settled on defining this new sample population as the six months BEFORE the start of the bear market recession. I figured this would give sufficient time to forewarn, review the results, and handle some of the lag time in obtaining the data.

Here's the chart.

The periods leading up to the recession can be clearly seen with scores exceeding 95. They drop off during the recession, because they are identifying the months before the recession. The scores do get high the months further back from the recession, so a certain amount of judgment is necessary to determine the threshold, but 90+ is good (maybe even 95+).

Let's see how the unemployment-only set fared.

The unemployment-only approach could not identify the period leading up to the recessions. It is only noise. However, it clearly identified the recessions as NOT leading up to the recessions.

**Conclusion 2: The MD technique can identify the months leading up to bear market recessionary periods.**This should give enough lead time to exit the equity market prior to a deep correction.

This then begs the question... if MD can find the months leading up to a recession, can it also find the final months of the recession? If it can, then it would be nice to know when to move back into the equity market near the bottom, before the recovery begins.

**Third Pass Results**

I added another flag to my data to identify the last four months of the bear market recessions to define my sample population. I went with four months to account for data gathering lag.

Once again, the MD technique seems to have identified the periods at the end of the bear market recessions.

Now let's look at the unemployment-only set.

It appears that unemployment-only was better at finding the last months of a recession than the months before one, likely because it was still in a recession. However, it isn't as clear at finding the last months.

**Conclusion 3: The MD technique can identify the last months of a bear market recessionary period.**This should give enough lead time to buy back into the equity market near the bottom in advance of the recovery.

**Final Thoughts**

I think the first question is, is this just data fitting? I don't know. The MD technique works off of mean and variance, and set covariance. I don't know if that's precise enough to be data fitting, or general enough to define a population. I do know that using the same data with different combinations of indicator variables leads to different results. This suggests that it is not data fitting.

I can use more data by dropping some of the indicators (which I tested). By limiting the sample population the S&P 500, unemployment, industrial production growth, and TR EPS, the data goes back to 1929. The question is whether the trade-off of data for indicators is informative?

Here is the recession chart.

The recessionary periods are not as clearly defined.

Now the months prior to the recession.

And the last months of the recession.

In this case, the extra data did not make up for the loss of additional indicators. I will leave it to the discussion to see if this is germane.

Also, I looked at calculating the means and variances for each variable separately based on available data, letting Excel handle the missing data. This proved to be error-prone, as the covariances and following matrix multiplications led to square roots of negative numbers. I discarded this analysis and settled for the set of data where all variables had the same number of data points.

I look forward to any comments or questions.

-B