15 October 2010

Human performance, psychometry, and baseball statistics III

(Part 3 of 3. Part 1 is here. Part 2 is here.)

Distribution of performance across the sample and the replacement-level player

The second biggest fallacy among baseball personnel managers, according to Bill James, is they do not understand how ability is distributed amongst professional baseball players. He defines the concept of replacement level player, and insists the vast majority of the fellows working in the Major Leagues are easily, quickly, replaceable. His reasoning is simple.

If you have a random selection of humans and measure nearly any measurable trait--height, weight, speed, strength, reflex time--the frequency plot will be the familiar bell shape Gaussian curve. People playing baseball professionally are an extreme non-random sample. 98% of the left-hand portion of the curve is gone, because none of those people have the physical requirements to get employment playing baseball. The resulting distribution is a truncated Gaussian distribution, with few at the highest levels, and the vast majority of participants of nearly indistinguishable quality. When performance is creamed at stage after stage after stage, little league to high school to college to minor leagues to the majors, almost all the remaining players are excellent and interchangeable.

If you are managing a corporation and you only hire candidates with golden resumes you have a truncated Gaussian distribution of talent. If in your evaluation process you shove those people into a Gaussian distribution, Bill James says you are doing it totally wrong. Another common mistake is that managers think there is something magical about "major league" talent, that some guys have it (as Thomas Wolfe referred to the "right stuff") and some do not, and they mislabel players who could help them win baseball games as not having it, due to the circumstantial variations of where the players have found themselves employed up until now. Organizations that hire top talent and pay high salaries have far more options than they generally presume. Nearly every single person working for your company is easily replaceable.

There is a story, possibly apocryphal, about Benoit Mandelbrot and his early preoccupation with financial market data. His questioner thought finance was a fuzzy science and hard scientific data really ought to be much more attractive to his scientific temperament. Mandelbrot explained that the great feature of studying financial data was that there was so much of it, and it was thus endlessly fascinating. Many statisticians have a similar fondness for baseball statistics. It is reliably recorded, unambiguous in definition, and there is so much of it. Many subtle statistics results are best explained in the context of baseball statistics, and there may be unknown statistical theorems sitting in the archives waiting to be extracted by clever statisticians. The wikipedia page on Stein's paradox (first published by Charles Stein in 1956) has a reference to a well-known (well-known to baseball statisticians, anyway) article from the May 1977 issue of Scientific American using baseball statistics to illustrate Stein's paradox.

After my article was nearly finished, I stumbled upon this "news" in the New York Times Sports section:

Sniffing .300, Hitters Hunker Down on Last Chances. (Here they are presenting research from a couple of economists from U. Pennsylvania's Wharton School of Business. The academic publication is here.)

The preceding should be of interest to anybody who is interested in the subjects of human achievement, psychometry and baseball statistics. My own interest is narrower and the lesson I personally draw is a hybrid from the sequence of lessons here. I have an ambitious scope for the company I am building. Ten thousand hours is close to the limit I am choosing for myself as the point when I will write off these lessons and losses (if they be) and go back to rejoin the American corporation employment market.

