Digitizing US Mint documents - Sample photos added

RogerB · August 4, 2019 12:13PM

Several collectors have asked about how to digitize documents from the archives. Here is my basic approach. Others might use different equipment or procedures.

Bound volumes.
The fastest way is to use a specialized book scanner (Book-Eye is one brand). This holds a book in a V-shaped cradle, and scans facing pages into one image. NARA College Park has one of these but the cost per scan, sensitivity of the equipment to poor scans, and requirement for considerable post-scan adjustment make this an unattractive option.

Another way is to use a portable book scanner (CZUR 18 Pro, ScanSnap, etc.) to make an image of the open book on a flat surface, and then make extensive software corrections for page curvature within the scanner. These are not usable for book scanning at NARA because volumes cannot be placed flat with both covers on a table. (Preservation/conservation rule.) Tests showed that page correction/flattening was good, but deteriorated as the front or back of the book was approached. Some of these will not give good images under ordinary bright office light (ScanSnap is especially poor). [CZUR is not really a scanner – it is a copying camera that does the same thing but includes page correction software.]

I tried several book scanners to copy only one page of a large volume at a time. Results were good, but color and exposures were often inconsistent and only the CZUR could image pages larger than 8x11-in.

My primary method is to use one of NARA copy stands with LED lights and photograph each page using fixed exposure and color balance. I do all the odd number pages first, then turn the book around and do all even pages. Post-processing consists of rotating all pages to correct orientation, adjusting objectionable skew and color, numbering images to match the original page numbers. Everything is then checked, assembled, checked again, converted to PDF and given a NARA locator ID.

Flat Documents.
These can be scanned using one of the book scanners mentioned above. The software will automatically rotate and deskew images, and crop out a black background. Quality can be very good to excellent.

I continue to use a camera, then post-process to crop and deskew images.

Warped Flat Documents.
If the warp, such as from being rolled for a century, is mild, the original can be put on a large flatbed scanner. The lid flattens the document while an image is made. NARA does not permit more than one page at a time, so some documents cannot presently be scanned, others have a curl the is so strong a page will not flatten without weights that would damage the equipment.

OK. I’m sure that’s more than anyone wanted to know…..

Nysoto · August 4, 2019 3:23PM

Good information, thanks. I have mostly used a camera at the Philadelphia and Seattle NARA's.

What are the copyright statutes with National Archive documents?

When reproducing NARA documents in a book or article, how should they be referenced? What about posting a NARA document on the internet?

RogerB · August 4, 2019 4:15PM

Almost all NARA documents are government work products and thus public property. You only run into difficulties when someone enhances an original or combines it with copyright material. For example, an article on 1895 dollars that includes images of coinage reports is copyright by the author. But the report images themselves are public domain, if unaltered.

when in doubt consult the NARA copyright disclosures accompanying some files, or check with an intellectual property attorney.

RogerB · August 4, 2019 4:24PM

As for references, my opinion is that a reader needs enough information to take them back to the original, without being overly brief. I use something like this:

NARA, RG104, entry 235, vol. 75, p.18. Letter dated June 29, 1895 to Kretz from Preston.

Originally I designated the NARA facility by an abbreviation, but dropped that a few years ago since files are occasionally moved from one place to another.

My presumption is that references need to be intelligible by both academic and casual readers. (For the same reason, I avoid the initials and last name only format used in much of academia. How many "J. C. Smith" people are out there, compared to "Josiah Caperton Smith?") A few extra words or letters are more than worth the tiny amount of space they require.

RogerB · August 4, 2019 5:40PM

Given its Sunday night and there’s no pro football yet, I’ll extend the boredom on this subject to …

Post Processing.
Once the images are made at NARA, LoC, etc. they usually need some sort of digital processing to produce easy-to-read products. Several things might require attention. The most common of these are rotation and/or deskew, color adjustment, sharpness, perspective and localized enhancement.

Most of these are commonly understood but perspective correction and localized enhancement require more explanation. Original journal or letter pages are usually rectangular, but warping of paper and journal binders can turn rectangular pages into trapezoidal sheets. Errors can also be introduced during photography if the camera and subject are not precisely aligned, or if a lens of very short focal length is used. To avoid distracting the reader, this should be corrected in Photoshop or a similar image modification program. This is a somewhat touchy process - it is easy to create more problems than are cured.

Localized enhancement is something avoided in good coin photos, but it often necessary when copying documents. Text of uneven density, or various ink colors, or pencil notations; or a page might have portions that are normally unreadable but where enhancement can reveal letters and words. Color photos – usually R-G-B – can be split into separate color or hsl channels and these manipulated and combined to bring out nearly invisible details.

There are two key points to post processing: start with the best possible image; and, make the minimum changes required to produce a clear, readable image – aesthetically pleasing if possible.

KindaNewish · August 4, 2019 6:02PM

Just keep doing what you are doing. This is an invaluable service that you are providing.
Just worry about getting it up to NNP, I have no doubt that sometime some future technology will allow this to be cataloged and organized with a click or two.

ricko · August 5, 2019 4:12AM

This was quite informative. While I was not seeking such information, it is interesting and I certainly learned a bit. Thanks Roger...Cheers, RickO

RogerB · August 5, 2019 2:29PM

KindaNewish -
The final PDF files are sent to NNP via file transfer protocol (ftp) and NNP handles the rest through their search engine and Internet Archive.

I've tested adding searchable indices with volunteer transcribers, but the work is just TOO BORING and numbing to keep the interest of anyone. I've gone through some of the Entry 1 journals and made lists of items of interest to me - but it's slow going and might completely miss things important to others.

Automated manuscript recognition would be a big help, but the minimum $100k needed to possibly get a prototype - with no guarantees of scale or portability - frightens potential backers. Corporate numismatics does not care a cracked coconut - they are consumers not originators of information.

RogerB · August 8, 2019 4:16PM

Optical character recognition (OCR) works well on modern electric typed or computer generated documents. Older manual typewriter text is unevenly spaced and can be confusing to OCR engines. It takes only a small amount of fuzzy text, irregular spacing, or presscopy bleeding to totally confuse OCR. The vast body of US government typed documents made before about 1970 (except typeset) result in text that takes as much or more time to correct than to retype.

A large part of this is because the technology and OCR engines are 20 to 30 years old and no one has bothered to update them. There are some simple ways to improve OCR output that have not been implemented in commercial products. Anyone interested can contact me.

RogerB · August 9, 2019 8:00AM

A collector asked: “You go through a lot of steps. What does that mean for the photos?”

Here’s a direct comparison between a typical photo and one I made of the same page. The image at left was made by NARA for a customer several years ago. Other than color, notice that parts of the image are cut off. The image at right was made to ensure that all readable information was captured, and that color and image density were consistent with the original.

[E-160 18690317 Catalog of Mint collection]

All the “extra” steps beyond making a basic image take considerable time to implement. But I feel they also help users – especially those unaccustomed to the subject.

RogerB · August 9, 2019 8:35AM

It’s also common practice to have to “go with what you’re given.” The originals might be warped or an image supplied by a source is of low quality. In this example, the image on the left was provided by NARA about 6 months ago. Resolution was excellent, but appearance and readability left a lot to be desired. At right is the relevant page after correction of basic geometry, tonal scale and color. While further flattening of the curvature at right could be performed, this version is acceptable for general use.

[RG104 E-63]

Digitizing US Mint documents - Sample photos added

Comments

Leave a Comment