@RogerB said:
Lance - I will send you some presscopy images in the next batch.
I mentioned "100 per day" but there are many days when only a half-dozen images can be made --- nothing interesting shows up. I've had several days in a row with nothing of interest. That's part of the problem created when agencies simply dumped records in boxes.
I can knock out the boring ones unless not worthy from your end considering how cumbersome it sounds.
How about doing video of a mass amount of docs somewhat rapid-fire and just snap-shotting the vid with your feet up at home via pc?
This method can be very efficient in some instances. Especially for animals/wildlife where precise moments are easy to miss and is usually when the best moments occur; between images. grrr.
I use the 2 spaces between sentences. it won't affect searchability and does improve readability.
I have also found a lot of one sentence, run on paragraphs. After 100 years it's still an issue.
I left a "You" and "Your" -- Honor did not follow -- capitalized in one doc as the referenced person was a judge.
I have also run across either some kind of artifacts in the documents or a lot of unnecessary commas in the documents. I try to do my best with the correct comma placement.
I did run across "centre" in a document. I searched and found that during the particular era, the aforementioned spelling was in much wider use than "center." I did not change the spelling as written.
I've gone with the assumption that a transcriber's duty is to recreate the document with any "strange" attributes retained. I've not changed comma usage, spelling, sentence structure, etc. Maybe I'm wrong in that assumption?
@TommyType said:
I've gone with the assumption that a transcriber's duty is to recreate the document with any "strange" attributes retained. I've not changed comma usage, spelling, sentence structure, etc. Maybe I'm wrong in that assumption?
you're probably right. it is a hard thing for me to resist given the qty of such divergences. i think mainly i've leaned towards parroting but have for sure updated/removed seemingly harmless things. extra commas, obvious? misspellings - although i concede those things are debatable.
I do try to make a balance with readability and preservation.
For me, the true preservation is in the original picture. The usefulness is in the text.
I keep overall sentence structure for preservation.
The odd comma does not bother me to change it. Although some appear to be artifacts from the strange placement. Odd capitalization is sometimes changed.
Since these docs will be searched and read, i fixed a couple of spelling errors. I do note now that Roger mentioned adding (sic) in the docs.
Perhaps clearance for a standard can be received, roger.
RE: "I'm hitting "(period)(space)(space)(New Sentence)" Are you removing those double spaces?
I remove all double spaces, or spaces used in lieu of tabs. Both seem to make it more difficult for character recognition, although there's not a big difference. In some transcriptions, using rows of spaces can amount to 300 - 400 spaces. Of course, removal is a simple matter, so it's simpler for transcribers to do what is comfortable for them, rather than trying to "fit everyone into the identical pattern." Double spaces are also removed after colons and semi colons, and in embedded tables. However, in plain text multiple spaces might be used to align the decimal points in numbers. This is just for easier reading.
The US Government internal standard, and the one used for submitting reports to most states is to allow the word processing software to space letters and words correctly. So the "double space after a period" rule from touch typing is considered obsolete.
Actually, the double space rule creates a lot of problems when doing OCR on original typed letters up to about 1960 (IBM Selectric era). The exact length of spaces, and even the gaps between letters, depends on the speed and consistency of the typist. Uneven typing creates irregular spacing and letters that are above the line base. OCR can't interpret these accurately, and we end up with odd characters added, micro-spaces, and sometimes new words invented by the software. [The direct mechanical link between typist and typeface carrier made consistent spacing very difficult. The Selectric eliminated that link and turned character spacing over to the machine. This produced regular spacing which allows modern OCR software to adapt to the output of electric typewriters, regardless of the typist's skill.]
@RogerB said:
RE: "I'm hitting "(period)(space)(space)(New Sentence)" Are you removing those double spaces?
Of course, removal is a simple matter, so it's simpler for transcribers to do what is comfortable for them, rather than trying to "fit everyone into the identical pattern."
As for some basic NARA rules: Nothing goes in, nothing goes out.
Exceptions - pre-approved note paper issued inside NARA work rooms; small outside documents that are stapled together, pre-inspected, and signed by NARA staff; cameras and flatbed scanners (no auto feeders, no lights, no flash); small tripods and copy stands.
All locations have lockers for storage of coats, bags, etc. Everything is subject to inspection at any time by any staff. Photography and scanning have to be pre-approved based on the documents you are working with.
The text including punctuation should remain as in the original. I occasionally add [sic] when there is a flagrantly misspelled word, and sometimes add the correct word after [sic: ] so that a user is not confused. I also add an explanation if there is something really strange in a document - usually a very obscure reference.
Obvious errors - such as not closing a quotation - are routinely fixed. The same for new year boundary date errors. But, mathematical errors are not changed, although corrections might be included in an editor's note.
The final review/editing is also affected by judgement built up in reading all of these in their original form, and in reading thousands of pages from the principal authors. Each author has his/her style, and sometimes it is frustrating to see long paragraphs consisting of a single sentence with massive quantities of punctuation scattered everywhere....Charles Barber was a master of this.
Sometimes it helps to read the document through before starting a transcription. Many letters were dictated, and the clerk simply wrote to match the speaker's phrasing and emphasis. Nineteenth century documents use copious quantities of subordinate phrases, recursions, and asides -- something we don't do much in today's writing. (Think of Hemingway or Joyce but without the artistry.)
Sometimes it helps to read the document through before starting a transcription. Many letters were dictated, and the clerk simply wrote to match the speaker's phrasing and emphasis. Nineteenth century documents use copious quantities of subordinate phrases, recursions, and asides -- something we don't do much in today's writing. (Think of Hemingway or Joyce but without the artistry.)
@RogerB said:
The text including punctuation should remain as in the original. I occasionally add [sic] when there is a flagrantly misspelled word, and sometimes add the correct word after [sic: ] so that a user is not confused. I also add an explanation if there is something really strange in a document - usually a very obscure reference.
Obvious errors - such as not closing a quotation - are routinely fixed. The same for new year boundary date errors. But, mathematical errors are not changed, although corrections might be included in an editor's note.
The final review/editing is also affected by judgement built up in reading all of these in their original form, and in reading thousands of pages from the principal authors. Each author has his/her style, and sometimes it is frustrating to see long paragraphs consisting of a single sentence with massive quantities of punctuation scattered everywhere....Charles Barber was a master of this.
I don't fix those commas. I just fix the horribly misplaced ones.
In fact, for all of those one sentence paragraphs, the necessary commas are necessary for readability.
@RogerB said:
RE: "I'm hitting "(period)(space)(space)(New Sentence)" Are you removing those double spaces?
I remove all double spaces, or spaces used in lieu of tabs. Both seem to make it more difficult for character recognition, although there's not a big difference. In some transcriptions, using rows of spaces can amount to 300 - 400 spaces. Of course, removal is a simple matter, so it's simpler for transcribers to do what is comfortable for them, rather than trying to "fit everyone into the identical pattern." Double spaces are also removed after colons and semi colons, and in embedded tables. However, in plain text multiple spaces might be used to align the decimal points in numbers. This is just for easier reading.
The US Government internal standard, and the one used for submitting reports to most states is to allow the word processing software to space letters and words correctly. So the "double space after a period" rule from touch typing is considered obsolete.
Actually, the double space rule creates a lot of problems when doing OCR on original typed letters up to about 1960 (IBM Selectric era). The exact length of spaces, and even the gaps between letters, depends on the speed and consistency of the typist. Uneven typing creates irregular spacing and letters that are above the line base. OCR can't interpret these accurately, and we end up with odd characters added, micro-spaces, and sometimes new words invented by the software. [The direct mechanical link between typist and typeface carrier made consistent spacing very difficult. The Selectric eliminated that link and turned character spacing over to the machine. This produced regular spacing which allows modern OCR software to adapt to the output of electric typewriters, regardless of the typist's skill.]
As someone who has repeatedly worked in high throughput jobs I say it is necessary to install standards.
At the worker level the time involved with decision making over seemingly quick and easy judgments adds up quickly and can slow down a project quickly.
Second, at your level, there is more than just the time saved. When the last line of defense is the only one doing a certain operations there will also be a higher error rate.
At the project level, uniformity increases the ease and speed of accessing and using the data, more so with fewer errors.
Structure and removing or greatly reducing independent decision making is imperative to overall project speed and quality.
I urge you to provide your standards, if not to all then at least to me.
I urge you to provide your standards, if not to all then at least to me.
fwiw, i keep a notepad file with his initial guidelines and a few extra notes based on doing several docs. also, from his notes after doing several more docs.
i think it is his way of showing appreciation to our generosity of time/effort and keeping a nice light tone and enjoyment from this project.
i'm willing to bet we have a very high accuracy rate anyway and catching/fixing erros are minimal on his end. more edits probably come from his final assessment of last minute changes based on his experience of these types of research/transcriptions. i am confident about his getting our trans docs with minimal errors/changes needed by him sending me his final drafts for comparison to my rough drafts for comparison and i don't recall offhand any errors really, just maybe formatting things which is his final call.
most of what i've done is simple letter format transcription. easy peasy. the hardest part to me is deciphering some of the cursive words and some i don't spend too much time on any stumpers as i kinda know which ones i'm gonna figure out and which ones i'm not. lol - i figure Roger has seen enough he can do it in 1/10th the time/effort. i focus on accuracy as well as qty.
don't be too hard on yourself or us Ms. i'm confident in Roger and the rest of us doing these trans. i'm confident our results are quite acceptable, especially for gratis wages.
there really is not much opportunity for mistakes from what i can tell. a good proofread of the letter beforehand, the actual trans and a proof of the draft prior to sending is pretty thorough for mostly simple, albeit peculiar communications.
i'm looking forward to many more and find it unfortunate so much effort is required for the procurement of the source material. what can ya do though?
i'm glad there were/are a nice group of volunt. for this as i bet we knocked out a good amount of material in a short period of time. once the prog. is set up for the basic format requirements, the effort is not too dissimilar from all the posts we make on this forum and/or lengthy emails, just a century and a half ago. lol
i'd have more accuracy/grammar/formatting in my posts here but i do it on my tablet and most of the time using a keyboard is overkill for such short posts but too much effort for pecking at a digital one and/or using a mouse to click the letters which is beyond slow and mentally painful.
as well as my tablet gets A LOT of miles and usage and has fallen victim to the "ghost touches" and will need to be disassembled, thoroughly to fix these (the actual fix being stupendously easy compared to the effort to dis and then re-assemble) and i have (mostly) disabled touch for the time being via device manager. it is a hardware problem of the manufacturers using glue in lieu of tape for preventing some thin flat cable(s) from touching the metal frame causing minor grounding issues and they usually don't exhibit until well after the warranty and for light users, probably will never experience these.
@RogerB said:
As for some basic NARA rules: Nothing goes in, nothing goes out.
Exceptions - pre-approved note paper issued inside NARA work rooms; small outside documents that are stapled together, pre-inspected, and signed by NARA staff; cameras and flatbed scanners (no auto feeders, no lights, no flash); small tripods and copy stands.
All locations have lockers for storage of coats, bags, etc. Everything is subject to inspection at any time by any staff. Photography and scanning have to be pre-approved based on the documents you are working with.
They let flatbed scanners in but not independent lighting?
NARA permits laptops and ipad-type devices, but not lights or outside paper weights or clear pressure covers. As a public facility, NARA is actually fairly liberal in allowing access to documents. The Federal Reserve Board charges nothing and are often very happy to have visitors to their archives; same for Library of Congress, especially the photo and Congressional document divisions. Some other archives are very different and some have onerous (my opinion) conditions -- The Connecticut State Archives refuses to allow anyone to photograph or copy their materials - except their own staff, who make fuzzy photocopies at 50-cents per page. Some other places charge $1 or more per page, and others have a flat-fee of $10 or more plus $1 per page plus a reproduction fee of $35 to $50 per image for even non-profit publication. The French National Archive charged $300 for a digital copy of the Janvier reducing lathe patent that is reproduced in the first issue of JNR.
The accuracy rate among all transcribers overall is better than 99.99% on a character basis. This includes everyone's initial work where more minor errors are expected.
OCR is not directly used with transcriptions - they are all, essentially, ASCII characters. But, some documents contain newspaper clippings, typed letters and irregularly spaced typeset pages that end up being automatically OCR'd by some data storage systems. Thus, I try to plan for future situations so that work does not have to be revised.
As for standards - everyone making transcriptions pretty much abides by the very simple guidelines. The goal is to get useful product, without burdening volunteers with too many requirements. The comments I've made in earlier posts can be incorporated by individuals if they wish, but that is a personal choice. Have fun with this and enjoy the odd and interesting things you'll read.
Lastly, know that the efforts of every volunteer is appreciated now, and will be continue to add value in the future when these are eventually turned over the NNP.
@RogerB said:
Lastly, know that the efforts of every volunteer is appreciated now, and will be continue to add value in the future when these are eventually turned over the NNP.
NNP = "Newman Numismatic Portal" This is an educational project supported by the Eric P. Newman Educational Foundation and Washington University of St. Louis. The goal is a permanent repository for all American numismatic publications, with open access to anyone without charge. (However, there are restrictions on copyright material.)
[I was one of the "founding fathers" so to speak.]
This afternoon the 21 transcription volunteers were sent several hundred documents. These are in larger batches than in December and some of the documents include large tables, or complex text. There are also a few typed documents that failed OCR.
The number of files sent to each volunteer depends largely on total file sizes, since many email systems do not accept attachments larger than 10 meg total.
Document dates range from 1790 to 1897, and handwriting from clear to moderately crabby. (I'll save the true chicken scratching for later.)
Apparently I was voted off the island and nobody told me. Just laughing at me as I wonder around aimlessly trying to figure what is going on and what to do next.
The example is clearest as a table. (An Excel or Word table makes the data spacing regular and easy to read. From a strict data search perspective, it doesn't matter.)
Lance -- you have mail -- lots of it. You have all of E-235-vol-082. No one has been "voted off the island," (Maybe I should be voted off -- I sent the files to the wrong email address. Someone is in for a surprise!)
@RogerB said:
The example is clearest as a table. (An Excel or Word table makes the data spacing regular and easy to read. From a strict data search perspective, it doesn't matter.)
Lance -- you have mail -- lots of it. You have all of E-235-vol-082. No one has been "voted off the island," (Maybe I should be voted off -- I sent the files to the wrong email address. Someone is in for a surprise!)
It is you and the other volunteers who should be thanked!
BTW - I mentioned "chicken scratching" --- Here's an example pulled from one of Rep. Alexander Stephens' clearer letters regarding the Goloid alloy tests.
@RogerB said:
BTW - I mentioned "chicken scratching" --- Here's an example pulled from one of Rep. Alexander Stephens' clearer letters regarding the Goloid alloy tests.
It clearly says, :"Silver + copper to which his busy beautiful as a nueui until Stephan Euliod - which consists of this following of the following verbalism of couple of the action follows.... "
Most of the docs I received were pretty good....except for this one. Thankfully not much text, but I can't make heads or tails of the first two words!
I've got:
"______???_____ of Coinage Dies for the calendar year 1894, the obverse of which were this day destroyed by the Coiner of the Mint, in presence of the Superintendent and Assayer, and together with the reverses thereof, sent to the Philadelphia Mint."
Can anyone squint, and divine any meaning for those first few words?
also, considering NARA's approach for scanning or photographing documents, should a document be marked "declassified," would it also be necessary to include the full text of the "declassified" statement on the page?
I usually violate the "no changes" rule when the original has only "D.E." or "D. Eagle" in it, and make at least one the full "Double Eagle." This makes it easier for search programs to fine the phrase. (The archivists are likely issuing "condemnation bulls" against me for this...) The goal is clarity of content so researchers can find what they want.
The "declassified" and other NARA-related phrases and not needed in transcriptions. Their rules apply to photocopies or photographs. However, when the word "Private" or something similar is written within the document, then it should be included at the top left, since this was part of the original communication.
I don't condemn on this. as mentioned earlier, the original document is included above the text in the PDF for preservation. The value in the text portion is searchability. Minor words, eh. It could be argued it does not matter either way.
Agreed. The academically important factors are: 1) unadulterated original image, and 2) ability to locate desired information.
The only change made to the image are corrections of initial acquisition artifacts (geometry, etc.), and addition of the locator code so the item can be found in the archives.
Conceptually, digitization of archival documents is less than half the solution to access. The much greater problem is inability to locate desired data. OCR is fine for modern materials and so-so for post-1880 typescript. But earlier materials have such a low OCR success rate that they can confuse more than explicate.
My opinion is that in addition to supporting scanning of documents, the Newman Numismatic Portal (NNP) should be supporting basic research and development into handwriting recognition. (I'm doing some work on this, but my real programming days were long ago and I can;t devote full-time to producing a good product.) The NNP comment, of course, emphasizes how little ANA, ANS and the business end of numismatics do on this subject.
Have coin collectors always been annoying? Yes...yes they have:
January 14, 1897.
Horace Morey, Esq.
Bay City, Michigan.
In reply to your letter of the 7th instant, you are respectfully informed that the Mints have no half eagles or five dollar pieces on hand of the coinage of 1847. The only way in which you could obtain a piece of this date would probably be from some coin-dealer.
As yet no half eagles have been struck at the Mints this year, but it is probable that some of these pieces will be coined before the close of the year, when you can obtain one.
Respectfully yours,
R. E. Preston,
Director of the Mint.
@TommyType said:
Have coin collectors always been annoying? Yes...yes they have:
well i see how it is. we are sharing transcriptions now eh? well, i will see your "annoying" and raise you "insulting."
May 10, 1897.
Overton Cade?, Esq.,
Superintendent, U. S. Mint,
New Orleans, La.
I enclose herewith four quarter dollars, coined at your mint, and I desire particularly to invite your attention to the manner in which these coins are milled and reeded.
From inspection it is perfectly clear that the imperfect milling and reeding is all due to the failure on the part of the coiner or his assistant to exercise proper supervision over the coining room as well as to the carelessness on the part of the foreman and the machinist.
I will thank you to call the attention of the coiner to this matter without delay, that he may take steps hereafter to see that no coins that are imperfectly milled and reeded such as these enclosed, be permitted to leave the mint. You will please return one dollar in currency for the coins enclosed.
Respectfully yours,
R. E. Preston,
Director of the Mint.
LOL...There is entertainment in these seemingly mundane correspondence.
I've gone through a few letters to a House of Representatives committee justifying the closure of the New Orleans mint....so they are about to find out how angry Director Preston REALLY is.
@TommyType said:
LOL...There is entertainment in these seemingly mundane correspondence.
I've gone through a few letters to a House of Representatives committee justifying the closure of the New Orleans mint....so they are about to find out how angry Director Preston REALLY is.
November 26, 1895.
Herman Kretz, Esq.,
Superintendent, U.S. Mint,
Philadelphia, Pa.
In answer to your communication of the 25th instant, relative to the purchase of “toilet paper”, you are authorized to purchase the same as per request, as there seems to be no contract for the purchase of this article in your list of awards for the current fiscal year.
Very respectfully,
B.F. Butler,
Acting Director of the Mint.
@TommyType said:
Have coin collectors always been annoying? Yes...yes they have:
January 14, 1897.
Horace Morey, Esq.
Bay City, Michigan.
In reply to your letter of the 7th instant, you are respectfully informed that the Mints have no half eagles or five dollar pieces on hand of the coinage of 1847. The only way in which you could obtain a piece of this date would probably be from some coin-dealer.
As yet no half eagles have been struck at the Mints this year, but it is probable that some of these pieces will be coined before the close of the year, when you can obtain one.
Respectfully yours,
R. E. Preston,
Director of the Mint.
obviously will be there to steal some with the cashier's help the first chance they both get ( <- sarcasm)
@TommyType said:
Have coin collectors always been annoying? Yes...yes they have:
well i see how it is. we are sharing transcriptions now eh? well, i will see your "annoying" and raise you "insulting."
May 10, 1897.
Overton Cade?, Esq.,
Superintendent, U. S. Mint,
New Orleans, La.
I enclose herewith four quarter dollars, coined at your mint, and I desire particularly to invite your attention to the manner in which these coins are milled and reeded.
From inspection it is perfectly clear that the imperfect milling and reeding is all due to the failure on the part of the coiner or his assistant to exercise proper supervision over the coining room as well as to the carelessness on the part of the foreman and the machinist.
I will thank you to call the attention of the coiner to this matter without delay, that he may take steps hereafter to see that no coins that are imperfectly milled and reeded such as these enclosed, be permitted to leave the mint. You will please return one dollar in currency for the coins enclosed.
Respectfully yours,
R. E. Preston,
Director of the Mint.
@TLeverage said:
My personal favorite thus far:
In answer to your communication of the 25th instant, relative to the purchase of “toilet paper”, you are
that got me wondering, so i looked it up.
"Even though Queen Elizabeth I's invented one of the first flush toilets in 1596, commercially produced toilet paper didn't begin circulating until 1857. Quilted Northern, formerly Northern Tissue, advertised as late as 1935 that their toilet paper was “Splinter-Free!”"
I enclose herewith four quarter dollars, coined at your mint, and I desire particularly to invite your
warranted if there was an issue.
what i wouldn't give for those, probably unc, O mint quarters, probable major and rare errors. my heart sped up during transcription.
if i read about copper reeded edge coins coins from 1795 being returned due to being unusual and possibly being counterfeit (from the collector's view). i'm done with this whole stinkin' project!
Transcriptions not only allow a peek at the routine and unusual things happening at the mints, but they present the opportunity to connect events occurring over weeks, months or years of time.
The New Orleans complaint is but a small part of a larger group of problems affecting this mint and which ultimately led to it's closure as a coinage mint on June 30, 1909. It is possible that key materials relating to the famous 1900-O/CC silver dollar varieties will turn up. Documents presently being transcribed tell of furloughs and suspension of coinage at New Orleans in the mid-1890s.
Headquarters and individual mint files contain many requests for old coins, or for values of old coins, or for information about California fractional gold. It appears that most were answered with an individually prepared "form letter" reply.
I can knock out the boring ones unless not worthy from your end considering how cumbersome it sounds.
How about doing video of a mass amount of docs somewhat rapid-fire and just snap-shotting the vid with your feet up at home via pc?
This method can be very efficient in some instances. Especially for animals/wildlife where precise moments are easy to miss and is usually when the best moments occur; between images. grrr.
I use the 2 spaces between sentences. it won't affect searchability and does improve readability.
I have also found a lot of one sentence, run on paragraphs. After 100 years it's still an issue.
I left a "You" and "Your" -- Honor did not follow -- capitalized in one doc as the referenced person was a judge.
I have also run across either some kind of artifacts in the documents or a lot of unnecessary commas in the documents. I try to do my best with the correct comma placement.
I did run across "centre" in a document. I searched and found that during the particular era, the aforementioned spelling was in much wider use than "center." I did not change the spelling as written.
I've gone with the assumption that a transcriber's duty is to recreate the document with any "strange" attributes retained. I've not changed comma usage, spelling, sentence structure, etc. Maybe I'm wrong in that assumption?
you're probably right. it is a hard thing for me to resist given the qty of such divergences. i think mainly i've leaned towards parroting but have for sure updated/removed seemingly harmless things. extra commas, obvious? misspellings - although i concede those things are debatable.
I do try to make a balance with readability and preservation.
For me, the true preservation is in the original picture. The usefulness is in the text.
I keep overall sentence structure for preservation.
The odd comma does not bother me to change it. Although some appear to be artifacts from the strange placement. Odd capitalization is sometimes changed.
Since these docs will be searched and read, i fixed a couple of spelling errors. I do note now that Roger mentioned adding (sic) in the docs.
Perhaps clearance for a standard can be received, roger.
RE: "I'm hitting "(period)(space)(space)(New Sentence)" Are you removing those double spaces?
I remove all double spaces, or spaces used in lieu of tabs. Both seem to make it more difficult for character recognition, although there's not a big difference. In some transcriptions, using rows of spaces can amount to 300 - 400 spaces. Of course, removal is a simple matter, so it's simpler for transcribers to do what is comfortable for them, rather than trying to "fit everyone into the identical pattern." Double spaces are also removed after colons and semi colons, and in embedded tables. However, in plain text multiple spaces might be used to align the decimal points in numbers. This is just for easier reading.
The US Government internal standard, and the one used for submitting reports to most states is to allow the word processing software to space letters and words correctly. So the "double space after a period" rule from touch typing is considered obsolete.
Actually, the double space rule creates a lot of problems when doing OCR on original typed letters up to about 1960 (IBM Selectric era). The exact length of spaces, and even the gaps between letters, depends on the speed and consistency of the typist. Uneven typing creates irregular spacing and letters that are above the line base. OCR can't interpret these accurately, and we end up with odd characters added, micro-spaces, and sometimes new words invented by the software. [The direct mechanical link between typist and typeface carrier made consistent spacing very difficult. The Selectric eliminated that link and turned character spacing over to the machine. This produced regular spacing which allows modern OCR software to adapt to the output of electric typewriters, regardless of the typist's skill.]
tyvm
As for some basic NARA rules: Nothing goes in, nothing goes out.
Exceptions - pre-approved note paper issued inside NARA work rooms; small outside documents that are stapled together, pre-inspected, and signed by NARA staff; cameras and flatbed scanners (no auto feeders, no lights, no flash); small tripods and copy stands.
All locations have lockers for storage of coats, bags, etc. Everything is subject to inspection at any time by any staff. Photography and scanning have to be pre-approved based on the documents you are working with.
The text including punctuation should remain as in the original. I occasionally add [sic] when there is a flagrantly misspelled word, and sometimes add the correct word after [sic: ] so that a user is not confused. I also add an explanation if there is something really strange in a document - usually a very obscure reference.
Obvious errors - such as not closing a quotation - are routinely fixed. The same for new year boundary date errors. But, mathematical errors are not changed, although corrections might be included in an editor's note.
The final review/editing is also affected by judgement built up in reading all of these in their original form, and in reading thousands of pages from the principal authors. Each author has his/her style, and sometimes it is frustrating to see long paragraphs consisting of a single sentence with massive quantities of punctuation scattered everywhere....Charles Barber was a master of this.
MsMorrisine --
Sometimes it helps to read the document through before starting a transcription. Many letters were dictated, and the clerk simply wrote to match the speaker's phrasing and emphasis. Nineteenth century documents use copious quantities of subordinate phrases, recursions, and asides -- something we don't do much in today's writing. (Think of Hemingway or Joyce but without the artistry.)
What purpose is optical character recognition on a text based file?
As text in a PDF, the text could simply be copy and pasted out. OCR is an unnecessary task.
Ok. What are you not telling me?
I don't fix those commas. I just fix the horribly misplaced ones.
In fact, for all of those one sentence paragraphs, the necessary commas are necessary for readability.
As someone who has repeatedly worked in high throughput jobs I say it is necessary to install standards.
At the worker level the time involved with decision making over seemingly quick and easy judgments adds up quickly and can slow down a project quickly.
Second, at your level, there is more than just the time saved. When the last line of defense is the only one doing a certain operations there will also be a higher error rate.
At the project level, uniformity increases the ease and speed of accessing and using the data, more so with fewer errors.
Structure and removing or greatly reducing independent decision making is imperative to overall project speed and quality.
I urge you to provide your standards, if not to all then at least to me.
fwiw, i keep a notepad file with his initial guidelines and a few extra notes based on doing several docs. also, from his notes after doing several more docs.
i think it is his way of showing appreciation to our generosity of time/effort and keeping a nice light tone and enjoyment from this project.
i'm willing to bet we have a very high accuracy rate anyway and catching/fixing erros are minimal on his end. more edits probably come from his final assessment of last minute changes based on his experience of these types of research/transcriptions. i am confident about his getting our trans docs with minimal errors/changes needed by him sending me his final drafts for comparison to my rough drafts for comparison and i don't recall offhand any errors really, just maybe formatting things which is his final call.
most of what i've done is simple letter format transcription. easy peasy. the hardest part to me is deciphering some of the cursive words and some i don't spend too much time on any stumpers as i kinda know which ones i'm gonna figure out and which ones i'm not. lol - i figure Roger has seen enough he can do it in 1/10th the time/effort. i focus on accuracy as well as qty.
don't be too hard on yourself or us Ms. i'm confident in Roger and the rest of us doing these trans. i'm confident our results are quite acceptable, especially for gratis wages.
there really is not much opportunity for mistakes from what i can tell. a good proofread of the letter beforehand, the actual trans and a proof of the draft prior to sending is pretty thorough for mostly simple, albeit peculiar communications.
i'm looking forward to many more and find it unfortunate so much effort is required for the procurement of the source material. what can ya do though?
i'm glad there were/are a nice group of volunt. for this as i bet we knocked out a good amount of material in a short period of time. once the prog. is set up for the basic format requirements, the effort is not too dissimilar from all the posts we make on this forum and/or lengthy emails, just a century and a half ago. lol
i'd have more accuracy/grammar/formatting in my posts here but i do it on my tablet and most of the time using a keyboard is overkill for such short posts but too much effort for pecking at a digital one and/or using a mouse to click the letters which is beyond slow and mentally painful.
as well as my tablet gets A LOT of miles and usage and has fallen victim to the "ghost touches" and will need to be disassembled, thoroughly to fix these (the actual fix being stupendously easy compared to the effort to dis and then re-assemble) and i have (mostly) disabled touch for the time being via device manager. it is a hardware problem of the manufacturers using glue in lieu of tape for preventing some thin flat cable(s) from touching the metal frame causing minor grounding issues and they usually don't exhibit until well after the warranty and for light users, probably will never experience these.
It's standards instead of being hard on myself.
The interesting reads counter the implementation of standards.
I'll just take the email.
They let flatbed scanners in but not independent lighting?
Some thoughts on the above discussions --
NARA permits laptops and ipad-type devices, but not lights or outside paper weights or clear pressure covers. As a public facility, NARA is actually fairly liberal in allowing access to documents. The Federal Reserve Board charges nothing and are often very happy to have visitors to their archives; same for Library of Congress, especially the photo and Congressional document divisions. Some other archives are very different and some have onerous (my opinion) conditions -- The Connecticut State Archives refuses to allow anyone to photograph or copy their materials - except their own staff, who make fuzzy photocopies at 50-cents per page. Some other places charge $1 or more per page, and others have a flat-fee of $10 or more plus $1 per page plus a reproduction fee of $35 to $50 per image for even non-profit publication. The French National Archive charged $300 for a digital copy of the Janvier reducing lathe patent that is reproduced in the first issue of JNR.
The accuracy rate among all transcribers overall is better than 99.99% on a character basis. This includes everyone's initial work where more minor errors are expected.
OCR is not directly used with transcriptions - they are all, essentially, ASCII characters. But, some documents contain newspaper clippings, typed letters and irregularly spaced typeset pages that end up being automatically OCR'd by some data storage systems. Thus, I try to plan for future situations so that work does not have to be revised.
As for standards - everyone making transcriptions pretty much abides by the very simple guidelines. The goal is to get useful product, without burdening volunteers with too many requirements. The comments I've made in earlier posts can be incorporated by individuals if they wish, but that is a personal choice. Have fun with this and enjoy the odd and interesting things you'll read.
Lastly, know that the efforts of every volunteer is appreciated now, and will be continue to add value in the future when these are eventually turned over the NNP.
What is the NNP?
NNP = "Newman Numismatic Portal" This is an educational project supported by the Eric P. Newman Educational Foundation and Washington University of St. Louis. The goal is a permanent repository for all American numismatic publications, with open access to anyone without charge. (However, there are restrictions on copyright material.)
[I was one of the "founding fathers" so to speak.]
This afternoon the 21 transcription volunteers were sent several hundred documents. These are in larger batches than in December and some of the documents include large tables, or complex text. There are also a few typed documents that failed OCR.
The number of files sent to each volunteer depends largely on total file sizes, since many email systems do not accept attachments larger than 10 meg total.
Document dates range from 1790 to 1897, and handwriting from clear to moderately crabby. (I'll save the true chicken scratching for later.)
.
Apparently I was voted off the island and nobody told me. Just laughing at me as I wonder around aimlessly trying to figure what is going on and what to do next.
Thanks Roger and Happy New Year.
"Inspiration exists, but it has to find you working" Pablo Picasso
Read previous posts, not sure, is this a table or a list?
It's in rows and columns, Is this what needs word table format or will entering text without word format be OK?
I had a table, and was asked to put it in a formal table copy-and-pasted from Excel
The example is clearest as a table. (An Excel or Word table makes the data spacing regular and easy to read. From a strict data search perspective, it doesn't matter.)
Lance -- you have mail -- lots of it. You have all of E-235-vol-082. No one has been "voted off the island,"
(Maybe I should be voted off -- I sent the files to the wrong email address. Someone is in for a surprise!)
mail received; lots of it.
gtratzi señor
"phew! mail received; lots of it.
gtratzi señor"
It is you and the other volunteers who should be thanked!
BTW - I mentioned "chicken scratching" --- Here's an example pulled from one of Rep. Alexander Stephens' clearer letters regarding the Goloid alloy tests.
PS: There are worse....
It clearly says, :"Silver + copper to which his busy beautiful as a nueui until Stephan Euliod - which consists of this following of the following verbalism of couple of the action follows.... "
for the chicken scratch, I'd take an attempt at sending a single large file. perhaps more resolution will help.
Most of the docs I received were pretty good....except for this one. Thankfully not much text, but I can't make heads or tails of the first two words!
I've got:
"______???_____ of Coinage Dies for the calendar year 1894, the obverse of which were this day destroyed by the Coiner of the Mint, in presence of the Superintendent and Assayer, and together with the reverses thereof, sent to the Philadelphia Mint."
Can anyone squint, and divine any meaning for those first few words?
"Process verbal of Coinage Dies....."
It's old mint-speak for "Process according to verbal instructions..."
Interestingly, that's how a lot of the patterns from the 1870s-80s were created: verbal instructions.
Son of a gun....that's exactly what I had, but figured it couldn't be correct!
on searchable terms, like double eagle, do you expand abbreviations from D. Eagle to [sic: ] Double Eagle ?
also, considering NARA's approach for scanning or photographing documents, should a document be marked "declassified," would it also be necessary to include the full text of the "declassified" statement on the page?
Excellent questions!
I usually violate the "no changes" rule when the original has only "D.E." or "D. Eagle" in it, and make at least one the full "Double Eagle." This makes it easier for search programs to fine the phrase. (The archivists are likely issuing "condemnation bulls" against me for this...) The goal is clarity of content so researchers can find what they want.
The "declassified" and other NARA-related phrases and not needed in transcriptions. Their rules apply to photocopies or photographs. However, when the word "Private" or something similar is written within the document, then it should be included at the top left, since this was part of the original communication.
I don't condemn on this. as mentioned earlier, the original document is included above the text in the PDF for preservation. The value in the text portion is searchability. Minor words, eh. It could be argued it does not matter either way.
Agreed. The academically important factors are: 1) unadulterated original image, and 2) ability to locate desired information.
The only change made to the image are corrections of initial acquisition artifacts (geometry, etc.), and addition of the locator code so the item can be found in the archives.
Conceptually, digitization of archival documents is less than half the solution to access. The much greater problem is inability to locate desired data. OCR is fine for modern materials and so-so for post-1880 typescript. But earlier materials have such a low OCR success rate that they can confuse more than explicate.
My opinion is that in addition to supporting scanning of documents, the Newman Numismatic Portal (NNP) should be supporting basic research and development into handwriting recognition. (I'm doing some work on this, but my real programming days were long ago and I can;t devote full-time to producing a good product.) The NNP comment, of course, emphasizes how little ANA, ANS and the business end of numismatics do on this subject.
Have coin collectors always been annoying? Yes...yes they have:
January 14, 1897.
Horace Morey, Esq.
Bay City, Michigan.
In reply to your letter of the 7th instant, you are respectfully informed that the Mints have no half eagles or five dollar pieces on hand of the coinage of 1847. The only way in which you could obtain a piece of this date would probably be from some coin-dealer.
As yet no half eagles have been struck at the Mints this year, but it is probable that some of these pieces will be coined before the close of the year, when you can obtain one.
Respectfully yours,
R. E. Preston,
Director of the Mint.
well i see how it is. we are sharing transcriptions now eh? well, i will see your "annoying" and raise you "insulting."
May 10, 1897.
Overton Cade?, Esq.,
Superintendent, U. S. Mint,
New Orleans, La.
I enclose herewith four quarter dollars, coined at your mint, and I desire particularly to invite your attention to the manner in which these coins are milled and reeded.
From inspection it is perfectly clear that the imperfect milling and reeding is all due to the failure on the part of the coiner or his assistant to exercise proper supervision over the coining room as well as to the carelessness on the part of the foreman and the machinist.
I will thank you to call the attention of the coiner to this matter without delay, that he may take steps hereafter to see that no coins that are imperfectly milled and reeded such as these enclosed, be permitted to leave the mint. You will please return one dollar in currency for the coins enclosed.
Respectfully yours,
R. E. Preston,
Director of the Mint.
what you got to say about that!?
LOL...There is entertainment in these seemingly mundane correspondence.
I've gone through a few letters to a House of Representatives committee justifying the closure of the New Orleans mint....so they are about to find out how angry Director Preston REALLY is.
My personal favorite thus far:
November 26, 1895.
Herman Kretz, Esq.,
Superintendent, U.S. Mint,
Philadelphia, Pa.
In answer to your communication of the 25th instant, relative to the purchase of “toilet paper”, you are authorized to purchase the same as per request, as there seems to be no contract for the purchase of this article in your list of awards for the current fiscal year.
Very respectfully,
B.F. Butler,
Acting Director of the Mint.
obviously will be there to steal some with the cashier's help the first chance they both get ( <- sarcasm)
From The Man to the Local Man.
warranted if there was an issue.
that got me wondering, so i looked it up.
"Even though Queen Elizabeth I's invented one of the first flush toilets in 1596, commercially produced toilet paper didn't begin circulating until 1857. Quilted Northern, formerly Northern Tissue, advertised as late as 1935 that their toilet paper was “Splinter-Free!”"
say what!
what i wouldn't give for those, probably unc, O mint quarters, probable major and rare errors. my heart sped up during transcription.
if i read about copper reeded edge coins coins from 1795 being returned due to being unusual and possibly being counterfeit (from the collector's view). i'm done with this whole stinkin' project!
Transcriptions not only allow a peek at the routine and unusual things happening at the mints, but they present the opportunity to connect events occurring over weeks, months or years of time.
The New Orleans complaint is but a small part of a larger group of problems affecting this mint and which ultimately led to it's closure as a coinage mint on June 30, 1909. It is possible that key materials relating to the famous 1900-O/CC silver dollar varieties will turn up. Documents presently being transcribed tell of furloughs and suspension of coinage at New Orleans in the mid-1890s.
Headquarters and individual mint files contain many requests for old coins, or for values of old coins, or for information about California fractional gold. It appears that most were answered with an individually prepared "form letter" reply.
2 separate pdfs for a single 2 page document.
make 1 text file or 2 text files?