Home U.S. Coin Forum

Using ChatGPT to analyze auction catalogs Part 1

BoosibriBoosibri Posts: 12,328 ✭✭✭✭✭
edited May 1, 2025 5:15AM in U.S. Coin Forum

As some of you may know, I really enjoy tracing the provenances of my collection and building updated census for the rarest pieces. While my collection focuses primarily in Latin America, I figure this topic is of interest to all.

I set out with some free time to see if I could train Chat GPT (using premium and the deep research function) to automatically scan available auction catalogs, identify specific coins in the sale that I was looking for, and then match the images against my collection to confirm provenances. A HUGE undertaking and one I expect to fail for now.

To start this I have to break the training into two parts. Part 1: accurately scan the auction catalogs and correctly identify the right coins. Part 2: leverage the AI to match images of varying qualities by identifying unique characteristics.

Part 1: 90% Success.
I started with a coin that I own and have been tracing the provenances of, an 1914 El Salvador proof peso. It came originally in sets and were minted in Philadelphia. I asked ChatGPT to “ Search archive.org for listings 1914 proof peso. It may be listed either as a coin or as set. Double check all examples found to ensure accuracy in a table with the auctioneer, a date, the lot number and description.”

The result I received contained some accurate listings, but also false positives. I asked ChatGPT to review the false positives and suggest causes for the error.

It was doing two things which needed to be corrected:
1. Over-reliance on OCR to scan the catalogs for the terms
2. Use of other sources outside of the archive.org catalogs which make mention of a piece being in a sale which were in error.

Updated the prompt to:

•   Open each INSERT COMPANY auction catalog individually (available on Archive.org and Newman Numismatic Portal).
•   Manually read each catalog — no reliance on keyword search — focusing carefully on:
•   Latin America sections
•   El Salvador, year 1914
•   Proof pesos listed individually or as part of a set
•   If the 1914 El Salvador proof Pepsi (or set) is found:
•   Record auction sale name/date
•   Lot number and page
•   Exact text of the catalog description
•   If not found, clearly mark as “No listing”.
•   Build a full structured table,

I asked ChatGPT to update its search method to rely on manually scanning the catalogs, as well as to ignore outside references. The quality improved. I asked ChatGPT to scan the 1975 Superior ANA sale (which contains a specimen). No positive result.

I asked ChatGPT to diagnose the issue.

Issues:
1. Without defining what “manual” scanning was, the model was assuming where in the catalog Latin American lots were. The Superior ANA sales was not laid on in a logic flow and bounced around difference genres. So ChatGPT searched what it thought was Latin America but missed another section.

I asked it to diagnose determine options to correct the error:

How I Define and Identify Latin American Lots (During Manual Review):
1. By Country Mentioned in the Lot Header or Description
I will flag any lot that references:
• Mexico
• Central America: El Salvador, Guatemala, Honduras, Nicaragua, Costa Rica, Panama
• South America: Colombia, Venezuela, Peru, Bolivia, Chile, Argentina, Paraguay, Uruguay, Brazil
• Caribbean/West Indies: Cuba, Dominican Republic, Haiti, Puerto Rico, Jamaica, etc.
2. By Manual Reading of Every Lot in the Catalog
I’m not relying on section headers or country groupings alone.
Many older catalogs (like Superior 1975 ANA) scatter Latin American coins throughout, especially Mexico and Caribbean types, mixed with U.S. or World.
3. By Not Relying on Country Dividers or OCR
I will turn every catalog page and read every lot number from the beginning of the coin section to the end of the sale.
This avoids any assumptions about where country sections begin or end.

I updated the prompt and changed the focus to test:

Open the 1975 Superior ANA auction catalog individually (available on Archive.org and Newman Numismatic Portal).
• Manually read each catalog — no reliance on keyword search — focusing carefully on:
• Define Latin America sections (per the previously defined listing) based on the lot index in the catalog. If no lot index exists, manually read each lot in the catalog.
• Identify all dollar-sized proof coins
• If the a dollar sized proof coin is found:
• Record auction sale name/date
• Lot number and page
• Exact text of the catalog description
• If not found, clearly mark as “No listing”.
• Build a full structured table

Output seems accurate aside from still failing to fully grasp the dimensions of a dollar (26mm - 40mm was the definition)

More on image recognition later

Comments

  • jshaulisjshaulis Posts: 866 ✭✭✭✭

    Following and want to know more.

    Successful transactions with forum members commoncents05, dmarks, Coinscratch, Bullsitter, DCW, TwoSides2aCoin, Namvet69 (facilitated for 3rd party), Tetromibi, ProfLizMay, MASSU2, MWallace, Bruce7789, Twobitcollector, 78saen, U1chicago, Rob41281

  • calgolddivercalgolddiver Posts: 1,538 ✭✭✭✭✭

    great progress and iterative learning !!!!

    Top 20 Type Set 1792 to present

    Top 10 Cal Fractional Type Set

    successful BST with Ankurj, BigAl, Bullsitter, CommemKing, DCW(7), Downtown1974, Elmerfusterpuck, Joelewis, Mach1ne, Minuteman810430, Modcrewman, Nankraut, Nederveit2, Philographer(5), Realgator, Silverpop, SurfinxHI, TomB and Yorkshireman(3)

  • MsMorrisineMsMorrisine Posts: 34,463 ✭✭✭✭✭

    did you fact check it?

    Current maintainer of Stone's Master List of Favorite Websites // My BST transactions
  • scubafuelscubafuel Posts: 1,920 ✭✭✭✭✭

    I’ll be curious to know how it goes for you, in the end. When I tried to have GPT parse through auction catalogues about 6 months ago, my results were summed up as “close, but I end up doing most of the work myself”

  • BoosibriBoosibri Posts: 12,328 ✭✭✭✭✭

    @scubafuel said:
    I’ll be curious to know how it goes for you, in the end. When I tried to have GPT parse through auction catalogues about 6 months ago, my results were summed up as “close, but I end up doing most of the work myself”

    The paid ChatGPT is meaningfully better. The iteration and weeding out assumptions it is making are key learnings

  • BoosibriBoosibri Posts: 12,328 ✭✭✭✭✭

    @MsMorrisine said:
    did you fact check it?

    Yes

  • jmlanzafjmlanzaf Posts: 35,509 ✭✭✭✭✭

    @Boosibri said:

    @scubafuel said:
    I’ll be curious to know how it goes for you, in the end. When I tried to have GPT parse through auction catalogues about 6 months ago, my results were summed up as “close, but I end up doing most of the work myself”

    The paid ChatGPT is meaningfully better. The iteration and weeding out assumptions it is making are key learnings

    The deep thinking is really amazing. It is interesting to follow its "thinking". I was using it today.

    This would work even better if it was used as a front end in an existing archive like Heritage or NNP. I imagine they will do some RAG system at some point.

    An excellent example of the value of prompt engineering.

    Love this!

  • BoosibriBoosibri Posts: 12,328 ✭✭✭✭✭
    edited May 1, 2025 5:41AM

    One thing I wanted to test was if OCR with stronger subsequent prompt language could be as effective as reading lot by lot in the catalog. The first method of reading lot by lot can take well over an hour and based on ChatGPT’s assessment, 2.5-3hrs though that is exaggerated as their research tool is down for maintenance when I probed the topic. Using OCR would cut the time to read the catalog to 30min or so.

    So I decided to test the two methods:
    Prompt:
    -Rescan the catalog using ocr searching for coins noted as proof.
    -For positive matches, assess if they are from Latin America.
    -If from Latin America determine if they are dollar-sized (26-40mm).
    -Compare the resulting list with the previously created table one and identify any differences.
    -For differences analyze why there is a difference in the methodology output.

    Results matched between the two methods. Explicitly stating the parameters for dollar sized caused the model to exclude on coin which was in the previous list while continuing to include the other minors.

    So for now, I am going to use the OCR based approach to allow for a faster assessment of catalogs and be very explicit in the language after the OCR hit to weed out false positives.

    I am now going to give it a big task… scan all of the Schulman sales on NNP and create a list of all proof dollar-sized coins which appear in the sales.

  • MsMorrisineMsMorrisine Posts: 34,463 ✭✭✭✭✭

    exciting

    Current maintainer of Stone's Master List of Favorite Websites // My BST transactions
  • jmlanzafjmlanzaf Posts: 35,509 ✭✭✭✭✭

    @Boosibri said:
    One thing I wanted to test was if OCR with stronger subsequent prompt language could be as effective as reading lot by lot in the catalog. The first method of reading lot by lot can take well over an hour and based on ChatGPT’s assessment, 2.5-3hrs though that is exaggerated as their research tool is down for maintenance when I probed the topic. Using OCR would cut the time to read the catalog to 30min or so.

    So I decided to test the two methods:
    Prompt:
    -Rescan the catalog using ocr searching for coins noted as proof.
    -For positive matches, assess if they are from Latin America.
    -If from Latin America determine if they are dollar-sized (26-40mm).
    -Compare the resulting list with the previously created table one and identify any differences.
    -For differences analyze why there is a difference in the methodology output.

    Results matched between the two methods. Explicitly stating the parameters for dollar sized caused the model to exclude on coin which was in the previous list while continuing to include the other minors.

    So for now, I am going to use the OCR based approach to allow for a faster assessment of catalogs and be very explicit in the language after the OCR hit to weed out false positives.

    I am now going to give it a big task… scan all of the Schulman sales on NNP and create a list of all proof dollar-sized coins which appear in the sales.

    Fantastic

  • NicNic Posts: 3,400 ✭✭✭✭✭
  • coinkatcoinkat Posts: 23,630 ✭✭✭✭✭

    Good luck with the project

    Experience the World through Numismatics...it's more than you can imagine.

  • BoosibriBoosibri Posts: 12,328 ✭✭✭✭✭
    edited May 2, 2025 7:03AM

    A step backward…yesterday’s attempt to run the prompts against a broad range of sales was a total failure. But in the failure there is learning…

    The prompts should have been simple and replicable after the success in analyzing a single catalog. I ran the prompt to search all of the Schumann auctions for proof, dollar-sized coins, from Latin America. ChatGPT was instructed to manually scan all catalogs, ignore all external or outside sources, after identifying a positive match, do a second review to ensure accuracy. And then to report back in a table.

    To note, in earlier tests I have used the Schumann sales, which I am very familiar with, to run the earlier tests that built the prompts that ultimately worked on the Superior 1975 ANA sale.

    In the output of the Schumann search, I received multiple false positives again and then omissions. It blamed the omissions on OCR, which I had previously instructed it NOT to use. I asked it to assess again to read and add the missed lots and it added incorrect information for the lots. I asked it to diagnose…

    What it appears is occurring is that past testing with the unrefined prompts which generated false positives which I never corrected, have become institutional knowledge which supersedes the actual prompt language of the current query.

    I am now clearing all past assumptions and memory and starting over. Learning… the prompts are superseded by past work which has become institutionalized, despite being in error.

    After cleaning and resetting all saved information, the query worked again:

  • MsMorrisineMsMorrisine Posts: 34,463 ✭✭✭✭✭
    edited May 2, 2025 10:44AM

    @Boosibri said:
    What it appears is occurring is that past testing with the unrefined prompts which generated false positives which I never corrected, have become institutional knowledge which supersedes the actual prompt language of the current query.

    I am now clearing all past assumptions and memory and starting over. Learning… the prompts are superseded by past work which has become institutionalized, despite being in error.

    this is a problem from chatgpt3. i simply started a new discussion each time. i suggest it for even massively simple prompts

    Current maintainer of Stone's Master List of Favorite Websites // My BST transactions
  • semikeycollectorsemikeycollector Posts: 1,125 ✭✭✭✭✭
    edited May 2, 2025 2:39PM

    I'm starting to get my teeth into these references. Not sure if you have gone through them. I'm very interested in what you are doing! The pdf is from a Kaggle course.

    https://www.promptingguide.ai/

  • MsMorrisineMsMorrisine Posts: 34,463 ✭✭✭✭✭

    looking forward to part 2

    Current maintainer of Stone's Master List of Favorite Websites // My BST transactions

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file