Home U.S. Coin Forum
Options

PCGS Cert data policy

I'm a data science-y guy, and the PCGS cert data appeals to me as a rich source of information. I'm wondering what issues I'd run into if I wanted to collect and analyze this dataset.

Over on one of the Bass Collection threads, someone (it appears manually) went through consecutive cert numbers looking for newly graded Bass coins. My thought was to programmatically do something similar.

I know that largely speaking, there is legal support for the kind of "web scraping" I'm describing. I looked for some FAQ on PCGS websites that addressed policy related to this, but couldn't find anything.

I'm not planning to do anything commercial with the data, just poke around to look for interesting stuff. I love to find connections in datasets. I've also wondered if anyone has ever tried to run TrueView images through a machine learning process to match grades to images (it would be hard to turn that into a useful predictive process, since you always get a grade with a TV, but I'm interested from a theoretical standpoint).

So I don't want to do anything that would, for example, interfere with the cert webpage performance. And I don't want to step on any virtual PCGS toes. But I'd love to explore the cert data just out of personal interest.

If anyone can point me to guidelines I should abide by, or provide me with direct advice, I'd appreciate it. If anyone has any suggestions on what connections or details in the data I might look for, I'm happy to take those. And if in the end, there's a way to share what I find in a way that's consistent with any PCGS data policies (always linking to their source data maybe?), I'd love to do that if I can.

Any feedback appreciated,
Best,
Bill

Comments

  • Options
    scotty4449scotty4449 Posts: 684 ✭✭✭✭✭

    This is old, but could be useful.

    https://github.com/brg8/pcgs

    You could try to email them and ask if they have an API they would grant you access to.

  • Options
    yosclimberyosclimber Posts: 4,594 ✭✭✭✭✭
    edited August 7, 2022 10:18PM

    @waisaacs said:
    Over on one of the Bass Collection threads, someone (it appears manually) went through consecutive cert numbers looking for newly graded Bass coins. My thought was to programmatically do something similar.

    That sounds familiar! And yes, I did it by hand. It was not very hard. I could have tried doing it with code, but the number of coins was small. Incidentally, the hbcf.org page has a Captcha that triggers if I traverse the coins faster than a certain pace. So code is not guaranteed to be successful.

    A few years ago, I used web scraping techniques to download all the PCGS CoinFacts photos of seated half dimes. I had been doing it manually, but that was a slow process and I wanted to make sure I got all the years, because it was such an important photo source for attribution. This was back when all the TrueViews were shown on the index page, not just the top 3 plus Registry sets like it is now. I did one year at a time, and I ran my programs in the middle of the night, so I could be sure it would not impact other users.

    I also did a similar thing for Heritage seated half dime photos.

    Since I have the permission of both places to use their photos for half dime attribution guides (citing them as the source on each photo), I figure it did not really matter how I download them. (As long as I did not negatively impact any users).

    Several people have tried to use CV to automate grading from photos. There has not been much success. I have thought of using it for die pair attribution. But I have not gotten past simple stuff like cropping auction photos.

  • Options
    waisaacswaisaacs Posts: 88 ✭✭

    @scotty4449 said:
    This is old, but could be useful.

    https://github.com/brg8/pcgs

    You could try to email them and ask if they have an API they would grant you access to.

    Thanks. I had done a search for PCGS API and ran across that link but nothing about an actual official API. And good idea, I will email the general info address just to see if I can get an authoritative response.

  • Options
    waisaacswaisaacs Posts: 88 ✭✭

    @yosclimber said:

    @waisaacs said:
    Over on one of the Bass Collection threads, someone (it appears manually) went through consecutive cert numbers looking for newly graded Bass coins. My thought was to programmatically do something similar.

    That sounds familiar! And yes, I did it by hand. It was not very hard. I could have tried doing it with code, but the number of photos was small. Incidentally, the hbcf.org page has a Captcha that triggers if I traverse the coins faster than a certain pace. So code is not guaranteed to be successful.

    A few years ago, I used web scraping techniques to download all the PCGS CoinFacts photos of seated half dimes. I had been doing it manually, but that was a slow process and I wanted to make sure I got all the years, because it was such an important photo source for attribution. This was back when all the TrueViews were shown on the index page, not just the top 3 plus Registry sets like it is now. I did one year at a time, and I ran my programs in the middle of the night, so I could be sure it would not impact other users.

    I also did a similar thing for Heritage seated half dime photos.

    Since I have the permission of both places to use their photos for half dime attribution guides (citing them as the source on each photo), I figure it did not really matter how I download them. (As long as I did not negatively impact any users).

    Several people have tried to use CV to automate grading from photos. There has not been much success. I have thought of using it for die pair attribution. But I have not gotten past simple stuff like cropping auction photos.

    Thanks for letting me know your experience, didn't know if I should have explicitly tagged you in my reference.

    I am worried about triggering rate-limiting measures, or worse a fail2ban activation or similar (which can sometimes affect more people than just the intended).

    As I said, my intention is not to be disruptive in any way, so maybe there is a way to get permission to do what I want, which, beyond data exploration, I'm still figuring out.

  • Options
    yosclimberyosclimber Posts: 4,594 ✭✭✭✭✭
    edited October 7, 2022 11:54AM

    The Bass Collection is a rather special case, too.
    All the coins are listed on the [edited:] hbrf.org site,
    and the PCGS Cert numbers are pretty much in the same order,
    although in blocks of 20 to 40 or so.
    So trying consecutive Cert numbers works fine.

    If you were trying to find all graded coins for a certain denomination and date,
    that is a much harder problem, because CoinFacts only identifies about 3 cert numbers,
    and you are not likely to find many consecutive Cert numbers for the same denom/date
    (unless it is a modern with bulk submissions).
    I would not try random Cert numbers to see if the PCGS Coin number matches.
    That is way too many queries per yielded result.

    Also, the submission date is not disclosed, and PCGS considers that sensitive data.
    So you would not be able to look at things like time trends in submissions.

    So I am not sure what interesting questions could be explored using data that is available at low impact.

    In my work on half dimes, I create rosters for auction appearances of the top coins in each die pair / die state. Some of the coins appear more than once, so time effects are observable on those. Sometimes they change grade or grading company; that is natural.

    One project that would be productive would be population data that does not double count multiple submissions of the same coin (or multiple auction appearances of the same coin). Obviously it requires matching new to old photos, so it is probably becoming feasible internal to PCGS.

    Actually, PCGS Secure Plus does this matching, but it uses laser scan data that is beyond what is in a photo.
    https://www.pcgs.com/news/pcgs-announces-pcgs-secure-plustrade-service-for-increased-consumer-protection
    So it is an extra step in the certification process, and it is optional, so it will not detect all resubmissions.

    Automated photo matching is much easier than trying to do grading from photos, but it is still difficult. I can do it with my half dime rosters, but I do the matching by hand. And it's fair to say that matching is difficult on high grade untoned proofs.

  • Options
    MetroDMetroD Posts: 1,935 ✭✭✭✭✭

    @waisaacs said:
    Thanks. I had done a search for PCGS API and ran across that link but nothing about an actual official API. And good idea, I will email the general info address just to see if I can get an authoritative response.

    You have probably already seen it, but there is a section on "use of website content" in the "terms of use".

    It includes a legal contact should you desire to request permission to use any "materials".

  • Options
    waisaacswaisaacs Posts: 88 ✭✭

    I received a response from PCGS customer service, and was pointed to https://www.pcgs.com/publicapi

    Apparently this is a fairly recent decision to make this release, so I don't know if the API is stable, whether they will continue to support it, etc. But I was pleasantly surprised to get a quick and positive response to my question.

    I was able to get through the process of signing up, generating an access token, and successfully making a test API call.

    I'll note that there are (reasonable) licensing and attribution things you have to agree to, and that there is a 1000 API calls/day default limit. The latter is going to prevent me from doing the kind of collection and analysis that I had in mind.

    But I'll probably still develop a little bit of code and play with it. If it gets mature enough I'll share it on GitHub or similar.

    Tagging @yosclimber to make sure you see it.

  • Options
    yosclimberyosclimber Posts: 4,594 ✭✭✭✭✭

    Wow, thanks!
    The "Find with PCGS number and grade" will be helpful to me.
    And the 1000 queries/day is not a problem for my usage.

  • Options
    waisaacswaisaacs Posts: 88 ✭✭

    @yosclimber said:
    Wow, thanks!
    The "Find with PCGS number and grade" will be helpful to me.
    And the 1000 queries/day is not a problem for my usage.

    I've still only glanced at the specs. Let me know what you find useful, here or PM.
    And curious if that number+grade call returns all the cert numbers that match, or just a single (maybe random?) example?

  • Options
    BikergeekBikergeek Posts: 206 ✭✭✭✭

    This is an interesting thread! I'm a former developer who long ago transitioned to less technical roles, but I still like to find easy ways to access information and I've learned some things today! One question for @yosclimber is about "hbcf.org," which piqued my interest as I'd not heard of it. But maybe there's a typo in there, as it resolves to a non-coin site? If you could clarify, it would be appreciated! (I'm not nitpicking, by the way - I know you are a half dime expert and I'm working on the capped bust half dime set so when you talk, I listen!)

    New website: Groovycoins.com Capped Bust Half Dime registry set: Bikergeek CBHD LM Set

  • Options
    yosclimberyosclimber Posts: 4,594 ✭✭✭✭✭
    edited October 7, 2022 11:57AM

    Sorry about the typo (kind of embarassing - I should have checked it), the actual URL is:
    https://hbrf.org/coin-collection/

    And it's always nice to find another half dime fan!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file