The data endpoint choices in the "original" 1964 CRSP paper

Discuss all general (i.e. non-personal) investing questions and issues, investing news, and theory.
Post Reply
User avatar
Topic Author
nisiprius
Advisory Board
Posts: 51228
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

The data endpoint choices in the "original" 1964 CRSP paper

Post by nisiprius »

I've been looking at the first paper published by CRSP, in hope of refreshing my memory on why 1926 is the start of all the "traditional" CRSP-based data. The paper is L. Fisher and J. H. Lorie (1964), "Rates of Return on Investments in Common Stocks," The Journal of Business, Vol. 37, No. 1, pp. 1-21; available here.

It turns out my memory is faulty. The starting point of 1926 is never actually explained.

The paper is candid about saying that "Merrill Lynch, Pierce, Fenner & Smith Inc. provided the funds to establish the Center." It fails to mention that Merrill Lynch's motivation was that Merrill Lynch wanted to run an ad saying that stocks were a prudent investment. At that time Merrill Lynch liked to run full-page fine-print book-chapter-length ads "educating" people on stock investing, and this paper became one of them.

Some interesting features of this pioneering paper include:
  • All return calculations included the effect of taxation, and all results were in three columns for investors who were "tax exempt," made "$10,000 in 1960," and made "$50,000 in 1960."
  • All calculations included brokerage commissions on round-lot transactions.
  • It only included the New York Stock Exchange, although a monumental earlier study by the Cowles commission, "Common-Stock Indexes 1871-1938," had included data from about twelve city exchanges and the Curb Exchange, which by 1964 had matured into the American Stock Exchange.
  • Returns are for an equal-weighted portfolio:
    Table 1 shows the results of investing an equal sum of money in each company having one or more issues of common stock listed on the New York Stock Exchange...The decision to invest equal amounts in each company with common stock listed on the Exchange was based on the desire to calculate rates of return that would on the average be available to the individual investor who selected stocks at random with equal probabilities of selection--that is, exercised no judgment. A policy of allocating funds in proportion to shares outstanding or according to any other criterion implies less neutrality of judgment in making investments.
Although the paper has quite a lot of detail about how the data values were calculated, it doesn't really explain "1926." It says only:
monthly closing prices of all common stocks on the New York Stock Exchange from January, 1926, through December, 1960, have been placed on tape. Their accuracy has been appraised in several ways, and, after three years of checking and rechecking, it is estimated that the incidence of error is extremely low and that remaining errors do not bias the results.
So it is not clear whether 1926 is really the oldest data they had, or whether they simply chose to begin transcribing there.

The authors say "earlier studies have dealt only with one or two brief time periods in contrast to the twenty-two time periods within a thirty-five-year span covered here." This is where things get interesting! Those twenty-two time periods were:
1/26-12/60
1/26- 9/29
1/26- 6/32
1/26-12/40
1/26-12/50
9/29- 6/32
9/29-12/40
9/29-12/50
9/29-12/60
6/32-12/40
6/32-12/50
6/32-12/60
12/50-12/52
12/50-12/54
12/50-12/56
12/50-12/58
12/50-12/60
12/55-12/56
12/55-12/57
12/55-12/58
12/55-12/59
12/55-12/60
I charted these in graphic form:

Image

The authors say
The periods were chosen for obvious reasons.
Not to me, they weren't! But they go on to say:
The period from 1926 to 1960 is a long span with booms and epressions--prime examples of each!--and war and peace. The periods beginning in September, 1929, were included to indicate the experience of those who invested at the height of the stock-market boom of the 1920's. The periods beginning in June, 1932, were included to show the results of investing at the nadir of this country's worst depression. The numerous brief, recent periods were included to bring details of postwar experience into sharp focus. Aside from most periods ending in 1932 or 1940, the rates of return are surprisingly high.
What they did was to pick five starting points and nine endpoints and calculate returns over all combinations. I really don't know what to say about this. They give an explicit explanation for the starting points of 9/29 and 6/32, but they don't really give any reason for 1/26, 12/1950, or 12/1955. And the fact that they've picked actual months within the year, rather than the start of the year, heightens my feeling that they really should have.

Because the endpoints do not coincide with any of the starting points, it is not possible to splice together their numbers to get returns over any period of time other than those twenty-two.

I don't think Fisher and Lorie inserted any bias in support of Merrill Lynch's marketing effort. I'm just saying that they didn't explain their methodology for choosing endpoints anywhere near as well as their methodology for calculating returns, and that they don't explain the "1926" endpoint at all.
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
Geologist
Posts: 2570
Joined: Fri Jan 02, 2009 6:35 pm

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by Geologist »

If you go to the source:

https://www.crsp.org/about-us/

It says here “In March 1960, with the initial grant of $50,000 from Merrill Lynch, the Center for Research in Security Prices was established. James Lorie and Professor Lawrence Fisher collaborate on collecting and researching NYSE common stock returns between 1926 and 1960” with a photo of (I assume) Lorie and Lawrence with punch cards and in front of computers (with reel to reel magnetic tapes).

This also doesn’t say why 1926, but they had to start somewhere, had to deal with the computing technology they had (including punch cards as input) and no doubt had to research all the data from newspaper files. That’s a lot of manual work for 34 years (and only $50,000 to do it with).

Therefore, I think they had to start somewhere and picked a date a couple years before the 1929 crash. Don’t think it was easy. It just looks that way now, when data are digital to begin with. (I don't work on financial data but I can tell you that getting older, pre-digital age, geological data is a big effort.)

Lots of people are using this. William Bernstein has used these data in his books, for example.
User avatar
Topic Author
nisiprius
Advisory Board
Posts: 51228
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by nisiprius »

Geologist wrote: Sun Nov 19, 2023 2:52 pm If you go to the source:

https://www.crsp.org/about-us/

It says here “In March 1960, with the initial grant of $50,000 from Merrill Lynch, the Center for Research in Security Prices was established. James Lorie and Professor Lawrence Fisher collaborate on collecting and researching NYSE common stock returns between 1926 and 1960” with a photo of (I assume) Lorie and Lawrence with punch cards and in front of computers (with reel to reel magnetic tapes).

This also doesn’t say why 1926, but they had to start somewhere, had to deal with the computing technology they had (including punch cards as input) and no doubt had to research all the data from newspaper files. That’s a lot of manual work for 34 years (and only $50,000 to do it with).

Therefore, I think they had to start somewhere and picked a date a couple years before the 1929 crash. Don’t think it was easy. It just looks that way now, when data are digital to begin with. (I don't work on financial data but I can tell you that getting older, pre-digital age, geological data is a big effort.)

Lots of people are using this. William Bernstein has used these data in his books, for example.
That's my point. It doesn't explain "why 1926." Specifically, was it all the data they had? Or was it (as you guess) a choice based on resource limitations and intuition? I wouldn't blame them for choosing a cutoff to limit their workload, just as I don't blame them for not including the AMEX and the Curb Exchange. But, come to think of it, they should have said something about that, too. Something simple like "Our calculations include only the NYSE, not the AMEX, the Curb Exchange, or other US stock exchanges, because our data source covered only the NYSE."

I'm well aware that everybody uses this data. As nearly as I can tell, there are two major sources for older date. If it started in 1871, it's the Cowles Commission's 1939 book. If it's 1926, it's CRSP, often by way of the Ibbotson "SBBI" (Stocks, Bonds, Bills and Inflation) data series.
Last edited by nisiprius on Sun Nov 19, 2023 4:10 pm, edited 1 time in total.
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
McQ
Posts: 1239
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by McQ »

nisiprius wrote: Sun Nov 19, 2023 11:46 am I've been looking at the first paper published by CRSP, in hope of refreshing my memory on why 1926 is the start of all the "traditional" CRSP-based data. The paper is L. Fisher and J. H. Lorie (1964), "Rates of Return on Investments in Common Stocks," The Journal of Business, Vol. 37, No. 1, pp. 1-21; available here.

It turns out my memory is faulty. The starting point of 1926 is never actually explained.

...
Hello Nisiprius; I can provide some clarification, although it may not be the complete answer you are seeking.

In 2009, when I was just embarking on my second career, here in financial market history, I wrote a paper titled "The Myth of 1926" (search engines will surface a a variety of takes on it). I had gotten tired of reading "since 1926 stocks have ..." when the evidence was so plentiful that 1926 was no true point of beginning.

So why does CRSP start (or if you will, end) with an anchor on the last day of December 1925?

Keep in mind: Fisher and Lorie didn't know about Cowles. He had sunk out of sight in the 20 years since he published his 1871 - 1939 history of returns (even though some of the work had been done at Chicago). I sometimes joke that folks at Chicago have taken the "markets have no memory" thing a bit too far; Jensen (1968), who pioneered the analysis of fund manager's alpha in 1968, had no memory of Moffit (1952), an earlier analysis of mutual fund performance, published in the Chicago house journal.

I have heard, but can't source, that Fisher and Lorie worked backward in time, and that time and money were running out by the time they got to 1929; 1926 was a feasible stopping point, a few years before the crest of the boom, good enough.

They would also have been familiar with the fact that when Standard Statistics launched the S&P 90 in 1928, they backdated the index data to ... December 1925.

Why F&L ignored the S&P weekly, anchored December 1917, I do not know...

In any case, they used the same data source as Cowles (see F&L footnote 13), a Commercial & Financial Chronicle publication (Bank and Quotation Record). All the CFC issues from 1871 to 1963 are online at FRASER now (https://fraser.stlouisfed.org/blog/2017 ... 1865-1928/). You can download the 1926 and 1925 ... 19xx CFC issues and confirm that there is no difference in data availability beginning 1926; it truly is an arbitrary starting point.

Nothing in the real world of financial markets started or ended December 31st 1925.

If you want to delve deeper, see the paper by Wilson and Jones (2002): https://www.jstor.org/stable/10.1086/339903, who go into considerable detail about the relation between the CRSP, S&P, and Cowles indexes.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
pizzy
Posts: 4149
Joined: Tue Jun 02, 2020 6:59 pm

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by pizzy »

It seems to me they chose 1/26 to have an “even” 35 full years.

If they chose 1/30 to have an even “30 years” they would have missed the 9/29 they wanted.

They had an end point of 1960.

Five years back.

10 years back.

They two “random” points of 9/29 and 6/32 for reason stated.

And then 35 years back.

And then once they had the 35 years back as starting point.

They added +15 years, +25 years.
Vanguard/Fidelity | 76% US Stock | 16% Int'l Stock | 8% Cash
Geologist
Posts: 2570
Joined: Fri Jan 02, 2009 6:35 pm

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by Geologist »

Aside from McQ’s interesting response:

You need to think about whether you are trying to apply modern approaches/conventions in description of data analysis to what was done 60+ years ago. You can’t do that. It is probable that authors of any finance paper in the early 1960's didn’t describe what you think should be there during that era; it wasn’t the convention.

(There are a lot of publication conventions that have changed in the natural sciences in just the last 40 years, let alone the last 60, so natural scientists can't complain that 1980's publications aren't done the way we would do them now.)

(Furthermore, I can see that Lorie and Fisher would have been reticent to write that “we quit at 1926 because we ran out of money” if McQ’s story is correct.)
User avatar
Topic Author
nisiprius
Advisory Board
Posts: 51228
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by nisiprius »

McQ wrote: Sun Nov 19, 2023 4:04 pm...Keep in mind: Fisher and Lorie didn't know about Cowles....
:!: :!: :!:

Do you happen to know how well the CRSP data concord with the Cowles data over the period of overlap, 1926-1938? That would be a really good touchstone on the general accuracy of data from that era. You would expect some discrepancies due to differences in the universe of stocks being indexed.
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
User avatar
Topic Author
nisiprius
Advisory Board
Posts: 51228
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by nisiprius »

Geologist wrote: Sun Nov 19, 2023 4:20 pm ...You need to think about whether you are trying to apply modern approaches/conventions in description of data analysis to what was done 60+ years ago. You can’t do that. It is probable that authors of any finance paper in the early 1960's didn’t describe what you think should be there during that era; it wasn’t the convention...
I perceive, perhaps incorrectly, that you're reading my post as a critique of CRSP. It isn't. It is more bemusement and surprise at the culture drift. Also, I fully understand that what they are doing was not easy, and I'm particularly impressed that they included the effects of taxes and brokerage commissions. That's a convention that is better than what we do today, and it's a pity that it is not the current norm.

If we acknowledge that it was difficult for Fisher and Lorie in 1960-1964, with a Univac I and IBM 7090, one can only be stunned at what Cowles achieved in 1938, without even punched cards (although the technology existed), I assume transcribing by hand from newspapers... even collecting the newspapers from a dozen cities would have been difficult... and a team of around thirty student volunteers using desk calculators. It was said to represent 25,000 human work hours. If I recall correctly, Alfred Cowles was independently wealthy and financed the work personally? Is it possible that he had more resources at his disposal than Fisher and Lorie?
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
User avatar
Topic Author
nisiprius
Advisory Board
Posts: 51228
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by nisiprius »

McQ wrote: Sun Nov 19, 2023 4:04 pm Keep in mind: Fisher and Lorie didn't know about Cowles.... Jensen (1968), who pioneered the analysis of fund manager's alpha in 1968, had no memory of Moffit (1952), an earlier analysis of mutual fund performance
One does forget what it was like before Internet search engines. It was not easy to locate relevant work if it wasn't famous and widely cited. "Literature reviews" were important services. Do you happen to know when and how Cowles was rediscovered?
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
McQ
Posts: 1239
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by McQ »

nisiprius wrote: Sun Nov 19, 2023 4:25 pm
McQ wrote: Sun Nov 19, 2023 4:04 pm...Keep in mind: Fisher and Lorie didn't know about Cowles....
:!: :!: :!:

Do you happen to know how well the CRSP data concord with the Cowles data over the period of overlap, 1926-1938? That would be a really good touchstone on the general accuracy of data from that era. You would expect some discrepancies due to differences in the universe of stocks being indexed.
Wilson and Jones go into gory detail on that score. Given your interests, worth your while to get an ILL copy from your local library (I don't see a free copy online).

Short version: accuracy is not the issue. It's the difference in coverage and the arithmetic applied.
1. strongest performer: the S&P 90 (what you see in the SBBI).
2. Middle: Cowles = S&P weekly ~ 220 stocks = Cowles = the pre-1926 data on Shiller's web site (he switches to the S&P 90 in 1926)
3. weakest returns = CRSP (but they've revised their data multiple times, even in the last 20 years, much less the 60 years since F&L wrote). About 500 stocks in 1926.
All are only NYSE. Diverse sorts of averaging applied.

Again, accuracy is not the issue, it's coverage.

-No banks in CRSP or Cowles or S&P during the 1930s (banks hadn't been listed on the NYSE since the 1840s). Yep, the stock returns we all know from the 1930s do not include any bank stocks...

-no truly small stocks. These traded on the Amex or OTC. Grab a New York Times from, say January 1926 and you'll see just how much was left out based on the restriction to NYSE. Only the biggest stocks listed there.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
McQ
Posts: 1239
Joined: Fri Jun 18, 2021 12:21 am
Location: California

Re: The data endpoint choices in the "original" 1964 CRSP paper

Post by McQ »

nisiprius wrote: Sun Nov 19, 2023 4:45 pm
McQ wrote: Sun Nov 19, 2023 4:04 pm Keep in mind: Fisher and Lorie didn't know about Cowles.... Jensen (1968), who pioneered the analysis of fund manager's alpha in 1968, had no memory of Moffit (1952), an earlier analysis of mutual fund performance
One does forget what it was like before Internet search engines. It was not easy to locate relevant work if it wasn't famous and widely cited. "Literature reviews" were important services. Do you happen to know when and how Cowles was rediscovered?
I believe it was by Robert Shiller in the early 1980s...but I don't have any details and would love to learn the backstory.
You can take the academic out of the classroom by retirement, but you can't ever take the classroom out of his tone, style, and manner of approach.
Post Reply