PCGS Cert data policy
I'm a data science-y guy, and the PCGS cert data appeals to me as a rich source of information. I'm wondering what issues I'd run into if I wanted to collect and analyze this dataset.
Over on one of the Bass Collection threads, someone (it appears manually) went through consecutive cert numbers looking for newly graded Bass coins. My thought was to programmatically do something similar.
I know that largely speaking, there is legal support for the kind of "web scraping" I'm describing. I looked for some FAQ on PCGS websites that addressed policy related to this, but couldn't find anything.
I'm not planning to do anything commercial with the data, just poke around to look for interesting stuff. I love to find connections in datasets. I've also wondered if anyone has ever tried to run TrueView images through a machine learning process to match grades to images (it would be hard to turn that into a useful predictive process, since you always get a grade with a TV, but I'm interested from a theoretical standpoint).
So I don't want to do anything that would, for example, interfere with the cert webpage performance. And I don't want to step on any virtual PCGS toes. But I'd love to explore the cert data just out of personal interest.
If anyone can point me to guidelines I should abide by, or provide me with direct advice, I'd appreciate it. If anyone has any suggestions on what connections or details in the data I might look for, I'm happy to take those. And if in the end, there's a way to share what I find in a way that's consistent with any PCGS data policies (always linking to their source data maybe?), I'd love to do that if I can.
Any feedback appreciated,