Comparing the Grading Services: Paul Richard's Coinworld Article

What did you folks think of Paul Richards' statistical analysis in the November 17th Coin World, of Coin World's "blind test" of 11 coin submissions to all the major services six months ago?
For those who don't know/remember, Coin World secretely submitted 11 coins to all of the major grading services in sequence, cracking them out and submitting them raw. They published the results, which were interesting for the patterns of consistency/inconsistency and variability that emerged among the services. The effort was tantillizing but, of course, was inconclusive because of the small number of coins involved.
Richards is a statistician, and applied an analysis of variance approach to the data Coin World secured. Richards also changed the Sheldon scale (who's 70 points don't actually reflect 70 exact increments) into a 35-point scale for his study, (e.g. equating the VF 35-VF40 jump to the MS63-MS4 jump), for purposes of comparison. By these means, he demonstrated the possibility that there could be such a thing as a "true" (i.e. concensus, or "mean") grade for a particular coin, from which a particular grading company's grade would vary in a fairly predictable way. He also was able to statistically demonstrate the consistency of a given company's grading.
On the basis of the small Coin World sample, for example, Accugrade actually proved to be very consistent in their grading (at least of raw coins anonymously submitted), but on the average .2 grading points "looser" than the concensus ("mean") grade for the coin. NTC was measurably less consistent, but also much looser, by 1.4 grading points. SEGS proved conservative in their grading, but quite consistent. Not surprising to the folks here, PCGS was the most conservative, grading coins on the average of a little more than one grading point below the average grade. More surprising, though, was the finding that PCGS was the least CONSISTENT in their grading as compared with the other companies.
Again, the samples are small. If the stats held true with larger numbers, however, it would raise the possibility (among others) that PCGS has graders who routinely differ substantially from one another, and that your grade is more a luck of the draw than we'd hoped. While PCGS coins would be justified to command more $$$ in the market because you are demonstrably less likely to have an overgraded coin in a PCGS holder than any other, such a finding would justifiably raise concerns about the accuracy of the actual grade.
There was a thread here a week or go or so, in which one of our numbers proposed collectors establish a mechanism to routinely submit coins to the services under the conditions of a sound research design, for gathering objective data and monitoring, and to get accurate information back to the numismatic community.
Such detailed analysis as Richards offered in his article would make such data massively more useful and informative.
What do you folks think?
For those who don't know/remember, Coin World secretely submitted 11 coins to all of the major grading services in sequence, cracking them out and submitting them raw. They published the results, which were interesting for the patterns of consistency/inconsistency and variability that emerged among the services. The effort was tantillizing but, of course, was inconclusive because of the small number of coins involved.
Richards is a statistician, and applied an analysis of variance approach to the data Coin World secured. Richards also changed the Sheldon scale (who's 70 points don't actually reflect 70 exact increments) into a 35-point scale for his study, (e.g. equating the VF 35-VF40 jump to the MS63-MS4 jump), for purposes of comparison. By these means, he demonstrated the possibility that there could be such a thing as a "true" (i.e. concensus, or "mean") grade for a particular coin, from which a particular grading company's grade would vary in a fairly predictable way. He also was able to statistically demonstrate the consistency of a given company's grading.
On the basis of the small Coin World sample, for example, Accugrade actually proved to be very consistent in their grading (at least of raw coins anonymously submitted), but on the average .2 grading points "looser" than the concensus ("mean") grade for the coin. NTC was measurably less consistent, but also much looser, by 1.4 grading points. SEGS proved conservative in their grading, but quite consistent. Not surprising to the folks here, PCGS was the most conservative, grading coins on the average of a little more than one grading point below the average grade. More surprising, though, was the finding that PCGS was the least CONSISTENT in their grading as compared with the other companies.
Again, the samples are small. If the stats held true with larger numbers, however, it would raise the possibility (among others) that PCGS has graders who routinely differ substantially from one another, and that your grade is more a luck of the draw than we'd hoped. While PCGS coins would be justified to command more $$$ in the market because you are demonstrably less likely to have an overgraded coin in a PCGS holder than any other, such a finding would justifiably raise concerns about the accuracy of the actual grade.
There was a thread here a week or go or so, in which one of our numbers proposed collectors establish a mechanism to routinely submit coins to the services under the conditions of a sound research design, for gathering objective data and monitoring, and to get accurate information back to the numismatic community.
Such detailed analysis as Richards offered in his article would make such data massively more useful and informative.
What do you folks think?
0
Comments
The one problem I see is with ACG. From what I've read on the Boards, no one has suggested that Alan Hager can't grade, but that he deliberately overgrades (or, as was indicated in the ANA testimony, undergrades when his grading fee will be paid in coins from that submission - actually that suggests that he knows how to grade very well).
In any event, I presume that ACG usually only receives submissions from known submitters and that when a submission comes in from an unknown source, it might be graded accurately in case it is some sort of consumer test. That would really skew the results for ACG and might actually give them credibility in the marketplace!
Check out the Southern Gold Society
The Ludlow Brilliant Collection (1938-64)
I am an engineer with an MSEE degree that specialized in communications theory. Therefore, I have had a lot of probability and statistics education. I also use some of this in my current job. Although I use probability and statistics, I also use something called COMMON SENSE! Common sense is not so common anymore. The guy who published this article was obviously working in a vacuum and did not seek the advice of experts in the field.
Edited to Add: There may be another reason for these results that I just thought of. This guy could have been somehow biased by the companies that were more consistent.
Check out a Vanguard Roth IRA.
Robert
The point was a demo of what MIGHT be accomplished with a careful study undertaken with an ADEQUATE sample size. This is was RYK was exploring with us (Sorry, RYK. I was too lazy to find and link your thread).
Everybody take a deep breath.
I wasn't asking about what you all thought of the prospect that Accugrade wasn't crap. I was asking what you thought of taking a systematic approach to the grading services to get good data and sample sizes, and what you though statistical analyses of various kinds with such data might offer the hobby?
Here's a warning parable for coin collectors...
<< <i>What do I think? I think there are a lot of dumbasses in the world with big mouths!
The guy who published this article was obviously working in a vacuum and did not seek the advice of experts in the field.
Edited to Add: There may be another reason for these results that I just thought of. This guy could have been somehow biased by the companies that were more consistent. >>
Dumbasses? what was dumbassed about that article/analysis? why should he seek the "advice of experts" when he's dealing with the data that was produced by the "study."
i thought it was a pretty good read - including the reduction of the 70 point grading system to the effective 32 points that are actually used by graders. he took the data available, actually applied some scientific "common sense" and came up with reasonable results taking into account his admitted problem with the small sample.
he pretty much did as much as i think one could with such a ridiculously small sample.
it seems to me that his results are purely scientific and not based on any biases whatsoever.
z
As for the reduction of the 70 point system to the 32-35 grades we actually use, I think that assumption should be incorporated into the submission samples, with a minimum of 20 "approximate" grade submissions per numerical grade.
I'm sure that with the experienced members we have on this forum, and their grading abilities, this little project would be simple enough to pull off, though it would require a significant amount of time to gather all the results.
Mojo
-Jim Morrison-
Mr. Mojorizn
my blog:www.numistories.com
Check out a Vanguard Roth IRA.
<< <i>This WAS a dumbass because he obviously did his study in a vacuum without the advice of experts. His hypothesis that there are really not 70 unique grades is probably the only thing that is accurate. I think his scientific method was flawed, so the results are meaningless... >>
apparently the word "dumbass" holds a special allure for you, so we'll just have to live with it.
the problem, "ddude," is that the author of the most recent article had nothing to do with the original study - he simply used the data that was generated and did his own analysis. Thus it was not "his scientific method" that was flawed, or that you need to be taking issue with; his analysis is in fact far more satisfying/interesting than the study that initiated it.
I'm sure no one would have a problem with you funding a blind test with a big enough sample to satisfy you/everyone, but until that happens we will look forward to more "dumbass" threads....
z
This back and forth grading separated a lot of coin collectors from their money, in my opinion.
<< <i>OK you Malleable Knickelhead, please tell us why you dug up this old thread. >>
Agreed. The thread is ten years old.....
...let's just please leave the politicians out of it all!
<< <i>In my opinion, Accugrade was very clever back in the day in sometimes grading very conservative then buying the coins back from the customers at very cheap prices then regrading them very high to then sell them at very high prices.
This back and forth grading separated a lot of coin collectors from their money, in my opinion. >>
Careful Oreville. You might get sued. The last lawsuit around here was touch and go for a while.