Evaluating Digital Texts

Quality of the scanning

Is the complete page captured? Is the image skewed or distorted? Is the image of sufficient resolution?

Quality of the OCR/text conversion

Is full text provided? What method was used to produce the text–double-keying or OCR? How accurate is the text? Are the texts marked up in TEI (Text Encoding Initiative)? Are words joined across line breaks? Are running heads preserved?

Quality of the metadata

Is the bibliographic information accurate? Is it clear what edition you are looking at? If there are multiple volumes, do you know which volume you are getting and how to locate the other volume(s)?

Terms of use

What are you legally able to do with the digitized work? Can you download the full-text and use tools to analyze it? Is the content freely and openly available, or do you have to pay for use?


Can you easily download the text and store it in your own collection? How much work do you have to do to convert the text into a format appropriate for use with text analysis tools? How hard is it to find the electronic text in the first place? Is there a Zotero translator for the collection?


Is the digital archive well-regarded in the scholarly community? If you cited the archive in your bibliography, would fellow researchers question your decision? Does the archive provide clear information about its process for selecting, digitizing, and preserving texts?

From Lisa Spiro, "Evaluating the Quality of Electronic Texts," Digital Scholarship in the Humanities, May 2008.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License