Home U.S. Coin Forum

Thoughts on Automated Coin Grading – MS66, or MS67, or AU?

2»

Comments

  • cameonut2011cameonut2011 Posts: 10,181 ✭✭✭✭✭

    @rmpsrpms said:
    However, if there were a service that did technical grading by computer, don't you think it would be competitive? I really don't know.

    I wouldn't publicly disclose it if it were me. I would have a computer do the initial grading and then a human finalizer and use vague language about the grading process (e.g. every coin is graded by at least two graders - nothing requires that it/they be human).

  • northcoinnorthcoin Posts: 4,987 ✭✭✭✭✭
    edited July 17, 2018 9:37PM

    @RogerB said:
    A thought - Based on several commercial systems I've seen within the past decade, everything, every nuance and subtly collectors mention about factual descriptions of a coin, can be defined within an automated identification and sorting system. (This can also include tarnish patterns and colors.)

    While errors still occur in such systems, they are nearly always attributable to causes outside the system - usually from humans doing something inadvertent or intentionally incorrect. (This would occur if proof ASEs were being graded and someone added a Morgan dollar to the line.)

    As the Colonel and others mention, cost effectiveness of this would be a deciding factor. Hucksters can push only a certain quantity of slabbed ASEs per year. Value might be achieved if any coin could be examined and "graded" but the software development cost for circulated (and easily human graded) coins might prevent reaching a profit point.

    Nikon has advanced photo parameters to the point that the chip in a camera can match a scene to a multitude of alternatives to determine the best settings. No reason a computer can't match a given coin to a data base of multiple others to "grade" a coin as being most similar to one of millions of others.

    We are not talking about just "technical" photos as the camera is already performing as an artist based upon a myriad of programmed subjective factors. Similarly, the potential is there for computer grading of coins that leaves the technical limitations that we equate to "technical grading" in the dust.

  • ColonelJessupColonelJessup Posts: 6,442 ✭✭✭✭✭
    edited July 17, 2018 10:23PM

    @RogerB -

    What's your estimate on the number of images needed to grade the 68/69/70 line on some random year issue of ASE's?

    What would you estimate be of the number of TPG- grader-curated images of 1884-O S$1 obverses needed for computer-grading at the MS63, 64 and 65 levels?

    Do these images exist at this time?

    Bear in mind MS, PL and DMPL libraries must be filled.

    How much of what our clever deus ex machina learns from the 84-O group will be useful in teaching the the 81-S group?

    Bear in mind MS, M/SPL and MS/DMPL grades must be generated.

    I don't know either, but I'm likely not getting funding without some idea.

    "People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf." - Geo. Orwell
  • messydeskmessydesk Posts: 20,475 ✭✭✭✭✭

    I'm not Roger, but I'll answer these in light of what I've already said about the subject.

    What's your estimate on the number of images needed to grade the 68/69/70 line on some random year issue of ASE's?

    Bulk ASEs in 68-70 are different from other MS coins, since they are graded purely based on surface defects (i.e., contact marks). Strike, luster, and eye appeal don't come into play for over 99% of these. Accordingly, I estimate it would only take tens of images to train software to be able to localize and then quantify defects on the surface. The simple "defect map" concept was used by Compugrade in 1991.

    What would you estimate be of the number of TPG- grader-curated images of 1884-O S$1 obverses needed for computer-grading at the MS63, 64 and 65 levels?

    Now we're talking about projecting the 4-dimensional (strike, surfaces, luster, eye appeal) into a single number, so we're going to need far more datasets. I'm going to avoid the word "image," since that means a static, 2D view of a coin. A dataset representing a single coin must contain views that represent what a human grader looks at when grading a coin. Let's say tens of static views per side. Oh, and oversaturated shadows and highlights aren't acceptable for either training or grading. I would estimate that a few hundred coins would be sufficient to train software to distinguish 84-O in 63, 64, and 65. Again, the coins have to be selected to avoid bias toward a single grade.

    Do these images exist at this time?

    Do these datasets exist at this time? No.

    Bear in mind MS, PL and DMPL libraries must be filled.
    How much of what our clever deus ex machina learns from the 84-O group will be useful in teaching the the 81-S group?

    Some, but you still need the input data for 81-S. You're basically saving some computational time using "transfer learning" to train 81-S software after training 84-O software, but you need the data.

    The common dates and grades don't illustrate the real problem, though. How about differentiating 1901 in 61-63, 1884-S in 58-62, 1896-O in 63-65, and others where the grade spread starts making a huge difference in money? You have to have a sufficient representative sample chosen to avoid bias, but the populations are very low. You can use similarly appearing dates as proxies for each other, and you'd probably have to in order to avoid overfitting (having the software memorize a specific coin).

    To make matters worse with these low-pop coins, we all know that the same coin often shows up in population reports multiple times, and at multiple grades levels. If this happens, the ground truth grade for that coin has been corrupted, making it not as usable when training the software.

    Again, the challenges aren't so much the technology as they are the data.

    This topic would make for an interesting "staged debate" at a major show.

  • Insider2Insider2 Posts: 14,452 ✭✭✭✭✭

    @messydesk said:

    I'm not Roger, but I'll answer these in light of what I've already said about the subject.

    What's your estimate on the number of images needed to grade the 68/69/70 line on some random year issue of ASE's?

    Bulk ASEs in 68-70 are different from other MS coins, since they are graded purely based on surface defects (i.e., contact marks). Strike, luster, and eye appeal don't come into play for over 99% of these. Accordingly, I estimate it would only take tens of images to train software to be able to localize and then quantify defects on the surface. The simple "defect map" concept was used by Compugrade in 1991.

    What would you estimate be of the number of TPG- grader-curated images of 1884-O S$1 obverses needed for computer-grading at the MS63, 64 and 65 levels?

    Now we're talking about projecting the 4-dimensional (strike, surfaces, luster, eye appeal) into a single number, so we're going to need far more datasets. I'm going to avoid the word "image," since that means a static, 2D view of a coin. A dataset representing a single coin must contain views that represent what a human grader looks at when grading a coin. Let's say tens of static views per side. Oh, and oversaturated shadows and highlights aren't acceptable for either training or grading. I would estimate that a few hundred coins would be sufficient to train software to distinguish 84-O in 63, 64, and 65. Again, the coins have to be selected to avoid bias toward a single grade.

    Do these images exist at this time?

    Do these datasets exist at this time? No.

    Bear in mind MS, PL and DMPL libraries must be filled.
    How much of what our clever deus ex machina learns from the 84-O group will be useful in teaching the the 81-S group?

    Some, but you still need the input data for 81-S. You're basically saving some computational time using "transfer learning" to train 81-S software after training 84-O software, but you need the data.

    The common dates and grades don't illustrate the real problem, though. How about differentiating 1901 in 61-63, 1884-S in 58-62, 1896-O in 63-65, and others where the grade spread starts making a huge difference in money? You have to have a sufficient representative sample chosen to avoid bias, but the populations are very low. You can use similarly appearing dates as proxies for each other, and you'd probably have to in order to avoid overfitting (having the software memorize a specific coin).

    To make matters worse with these low-pop coins, we all know that the same coin often shows up in population reports multiple times, and at multiple grades levels. If this happens, the ground truth grade for that coin has been corrupted, making it not as usable when training the software.

    Again, the challenges aren't so much the technology as they are the data.

    This topic would make for an interesting "staged debate" at a major show.

    Don't forget spots. They do not cause an impression INTO the surface and definitely affect the eye appeal/grade.

  • northcoinnorthcoin Posts: 4,987 ✭✭✭✭✭
    edited July 18, 2018 9:16AM

    @northcoin said:

    @RogerB said:
    A thought - Based on several commercial systems I've seen within the past decade, everything, every nuance and subtly collectors mention about factual descriptions of a coin, can be defined within an automated identification and sorting system. (This can also include tarnish patterns and colors.)

    While errors still occur in such systems, they are nearly always attributable to causes outside the system - usually from humans doing something inadvertent or intentionally incorrect. (This would occur if proof ASEs were being graded and someone added a Morgan dollar to the line.)

    As the Colonel and others mention, cost effectiveness of this would be a deciding factor. Hucksters can push only a certain quantity of slabbed ASEs per year. Value might be achieved if any coin could be examined and "graded" but the software development cost for circulated (and easily human graded) coins might prevent reaching a profit point.

    Nikon has advanced photo parameters to the point that the chip in a camera can match a scene to a multitude of alternatives to determine the best settings. No reason a computer can't match a given coin to a data base of multiple others to "grade" a coin as being most similar to one of millions of others.

    We are not talking about just "technical" photos as the camera is already performing as an artist based upon a myriad of programmed subjective factors. Similarly, the potential is there for computer grading of coins that leaves the technical limitations that we equate to "technical grading" in the dust.

    Actually, there is no reason that a computer generated grade could not be two fold. One grade purely "technical" and a secondary grade "artistic." Heck, maybe even a third, "market acceptable." :)

    As for source data bases to input into the software, one could begin with the already existent TrueView library.

  • ColonelJessupColonelJessup Posts: 6,442 ✭✭✭✭✭

    @messydesk said:

    I'm not Roger, but I'll answer these in light of what I've already said about the subject.

    What's your estimate on the number of images needed to grade the 68/69/70 line on some random year issue of ASE's?

    Bulk ASEs in 68-70 are different from other MS coins, since they are graded purely based on surface defects (i.e., contact marks). Strike, luster, and eye appeal don't come into play for over 99% of these. Accordingly, I estimate it would only take tens of images to train software to be able to localize and then quantify defects on the surface. The simple "defect map" concept was used by Compugrade in 1991.

    What would you estimate be of the number of TPG- grader-curated images of 1884-O S$1 obverses needed for computer-grading at the MS63, 64 and 65 levels?

    Now we're talking about projecting the 4-dimensional (strike, surfaces, luster, eye appeal) into a single number, so we're going to need far more datasets. I'm going to avoid the word "image," since that means a static, 2D view of a coin. A dataset representing a single coin must contain views that represent what a human grader looks at when grading a coin. Let's say tens of static views per side. Oh, and oversaturated shadows and highlights aren't acceptable for either training or grading. I would estimate that a few hundred coins would be sufficient to train software to distinguish 84-O in 63, 64, and 65. Again, the coins have to be selected to avoid bias toward a single grade.

    Do these images exist at this time?

    Do these datasets exist at this time? No.

    Bear in mind MS, PL and DMPL libraries must be filled.
    How much of what our clever deus ex machina learns from the 84-O group will be useful in teaching the the 81-S group?

    Some, but you still need the input data for 81-S. You're basically saving some computational time using "transfer learning" to train 81-S software after training 84-O software, but you need the data.

    The common dates and grades don't illustrate the real problem, though. How about differentiating 1901 in 61-63, 1884-S in 58-62, 1896-O in 63-65, and others where the grade spread starts making a huge difference in money? You have to have a sufficient representative sample chosen to avoid bias, but the populations are very low. You can use similarly appearing dates as proxies for each other, and you'd probably have to in order to avoid overfitting (having the software memorize a specific coin).

    To make matters worse with these low-pop coins, we all know that the same coin often shows up in population reports multiple times, and at multiple grades levels. If this happens, the ground truth grade for that coin has been corrupted, making it not as usable when training the software.

    Again, the challenges aren't so much the technology as they are the data.

    This topic would make for an interesting "staged debate" at a major show.

    Excellent analysis:

    I could quibble over some minor points (1000 ASE's would be better, but also trivially easy) but I think you've generally nailed it.

    In the non-Modern domain, it would be interesting to know how deep and broad the data base of imagery was and how much tweaking was involved to "educate" the previous computer-grading system developed, but I'd be very proprietary about that information. It has a potential for monetization.

    My dagger in the heart for this project is exemplified by the 1893-S $1.
    The combined populations in AU-55 and above

    55 - - 58 - - 60 - - 61 - - 62 - - 63 - - 64 - - 65 - - 66 - - 67
    67 - - 30 - - 02 - - 16 - - 13 - - 13 - - 13 - - 07 - - 01 - - 01

    That would be over 100 high-end circs and 66 uncs (2.0 per year of TPG grading) that the data base will not see.

    The CAC pop is 8 in 55 and a sole 65. 9 coins out of 173, While CAC has only been around the last 10 years, they've likely seen half or more. Not saying that's the ultimate test, but it's surely meaningful.

    Where/what are the inputs for this level 1893-S?
    Having made the first PCGS 1892-S MS67, a couple of 65's and the having fondled the Eliasberg MS67 three or four times, I will claim sufficient knowledge to tell you they are not texturally the same, so there's a limited learning curve even if those more common ;) examples were available.

    "People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf." - Geo. Orwell
  • messydeskmessydesk Posts: 20,475 ✭✭✭✭✭
    edited July 18, 2018 10:28AM

    @Insider2 said:
    Don't forget spots. They do not cause an impression INTO the surface and definitely affect the eye appeal/grade.

    Spots would be learned by the software being trained, provided there were some coins with spots on them used in the training dataset. Distinguishing spots from schmutz is a different story.

  • ElcontadorElcontador Posts: 7,720 ✭✭✭✭✭

    As others have said, I can see it for high end moderns. But I've seen all sorts of, say 1861 O Halves in MS 65 which I think are all correctly graded, but appear so different to the eye. Setting up an algorithm for this I think would be prohibitively expensive, if it could be done at all.

    "Vou invadir o Nordeste,
    "Seu cabra da peste,
    "Sou Mangueira......."
  • lkeneficlkenefic Posts: 8,797 ✭✭✭✭✭

    IDK... in the end, I'm buying a coin. I don't care if a human from a TPG graded it, AI from outer space, or if ...oh my gawd... it's raw!... if I don't agree with the grade at the price point given, I'll pass. Is automated graded somehow going to separate me from my money?

    Collecting: Dansco 7070; Middle Date Large Cents (VF-AU); Box of 20;

    Successful BST transactions with: SilverEagles92; Ahrensdad; Smitty; GregHansen; Lablade; Mercury10c; copperflopper; whatsup; KISHU1; scrapman1077, crispy, canadanz, smallchange, robkool, Mission16, ranshdow, ibzman350, Fallguy, Collectorcoins, SurfinxHI, jwitten, Walkerguy21D, dsessom.
  • RogerBRogerB Posts: 8,852 ✭✭✭✭✭

    Initial suggestion is based on items that are purchased in bulk and sent to TPGs.

    In reply to ColonelJessup's interesting questions -
    The only potentially profitable approach, at least initially, would be applied to proof and unc ASE (American Silver Eagle) and AGE (American Gold Eagle) coins.

    Potential volume would be equal to the quantities of these items sent to TPGs by bulk purchasers. These are commodities with no collector value until they are slabbed.

    I do not know the volumes the major TPGs receive but it is likely all of the bulk packaging shipments plus some strays. For 2017 suitable coins would total about 20 million sold by the Mint and maybe 1/3 of that sent to TPGs - let's say 7 million. Only the TPGs know the real quantities.

    Automation is straightforward - the coins are examined in about 1 second each + transport time, separated into MS/PR-70, 69, and anything else. 70s and 69s go to label printing and packaging lines, then fulfillment to submitters. All the coins are anonymous until they meet label and slab at which point they receive their ID number and the data file is stored under that number. (Expansion to other coins would have to be done slowly and carefully.)

    System cost to own depends to large extent on the quality of expertise a TPG is willing to invest. Better quality requirements, SW development, error detection and management will produce a system of higher quality and a long-term lower cost. (If they have a Harvard or Warthen MBA n charge, their cost will likely be lower and the long-term expense much higher. People from such programs are trained to think short-term and get out before something fails.)

    All of the above require data only TPGs have, so use mortals cannot not do more than present some ideas. :)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file