Decapod Training Module 1 - Digitization Workflows

1. Introduction

Before starting on a digitization project, it is important to consider how to manage the books being digitized and how to structure the work.

"Books" or "Materials"

In this guide the term "books" or "materials" refer to any item to be digitized.

  • Personnel
    • who will be working on the project?
    • Who will manage and oversee the process?
  • Physical storage and retrieval of books
    • where will the books be located?
    • how will books be transferred to the digitization equipment?
    • is preservation a priority?
    • are the books fragile?
  • Equipment used for digitization
    • what method of digitization will be used?
    • what is the budget and requirements?
  • Cataloguing and tracking books
    • how will books be identified so it's clear what is being digitized?
    • How will work be tracked?
    • How should books be described so that it is useful to both administration and users?
  • Quality assurance
    • What is the goal of the digitization effort? Preservation, archiving, web distribution, print? Do the results match this goal?
  • Digital storage and distributing digital copies of books
    • Where will the digital files be stored and served to users?
    • Is local backup a necessity?
    • How much storage is needed?

Organizing all of this into a clear statement will help improve quality, efficiency, and usefulness of the digitization effort. To do this we should create a "Digitization Workflow".

For the section above, the text will be the narration and each bullet point will be an illustration or image.

Example: "Are the books fragile?" will show a fragile book.

2. Basic digitization workflow

This section has dialog and text. The dialog is spoken simultaneously while the text shown / illustrated visually.

A basic digitization workflow consists of multiple steps with a start and an end.

State a goal:

  • State clearly which books are going to be digitized and what the output will be. This will help focus the workflow.

Dialog: "Stating a goal of a digitization project will help keep the work focused. The workflow's objective is to help accomplish this goal."

Step 1: Preparation of books

  • Roles: books/subject expert, project manager
  • Work: Determine which books to digitize.

Dialog: "Before any work begins, the books to be digitized must be identified. Knowing the quantity of books will help generate estimates for time and resources required to complete the work."

Step 2: Cataloguing and Indexing (Metadata)

  • Roles: books/subject expert
  • Equipment: a system capable of managing metadata
  • Work: Generate metadata for materials. Ascertain copyright, permissions, and ownership.

Dialog: "A complete cataloging and inventory of the content to digitize will help organize work, and, determine any copyright issues. This can be managed using something as simple as a spreadsheet to something more robust like a digital content management system."

Step 3: Digitization

  • Roles: Digitization experts
  • Equipment: digitization software and/or hardware.
  • Work: Digitize the material using the equipment best suited for the job.

Dialog: "The equipment selected and used will be influenced by the goals of the project and the materials to be digitized. For example, are the materials sturdy enough for a flatbed scanner or will cameras be used to reduce handling?"

Step 4: Quality Assurance

  • Roles: project manager, materials expert
  • Equipment: Workstation for viewing produced digital copies
  • Work: determine quality of work done. Digitize again if required.

Dialog: "Quality assurance is where the output is matched with the goals and expectations of the project. If any images do not meet the requirements, they should be sent back to be re-digitized. Re-occuring quality issues could be a symptom of a larger issue and should be addressed."

Step 5: Storage

  • Roles: Systems manager / IT
  • Equipment: Storage facility for digital files
  • Work: Store digital copies and link back to metadata database.

Dialog: "Digital images can quickly consume a lot of storage space on a computer. With files ranging from 10 MB to 70 MB depending on the resolution and file format, it's possible that a digitized book will occupy a few gigabytes. Managing and safely storing these valuable files is critical and an IT or systems specialist can help develop an effective strategy.

Storing files is only part of the issue - the images also need to be associated or linked back to its metadata or catalogue entry. This will be covered later in this video."

Step 6: Distribution

  • Roles: Systems manager / IT
  • Work: Make digital copies accessible.

Dialog: "By this step, the quality of images should satisfy the needs of the project and the files stored safely. Now it is time to consider how files will be made available to those who want it. Will the files be accessed locally by staff, or will it be accessible to a broader audience through the Internet? Will printing of the images be allowed, if so will there be multiple versions of the images to satisfy printing and web-based viewing?"

3. Example scenario: Workflow from start to finish

Introduction and Goal:

A university library is looking to start a large scale digitization effort, but would like to run a small pilot project to start. The material to be digitized are old fragile journals that have been damaged by flooding.

The goal is to digitize these damaged materials to preserve them for internal use. They would also like to make the journals available online to the public eventually, but isn't a requirement.

Step 1: Preparation of materials and storage
Due to the small scale nature of the operation, the librarian assigned to the project finds a secure cabinet to store all the material. All the content is is organized according to the author's name.

Before any work can begin, everything needs to be indexed and recorded.

Step 2: Cataloguing and Indexing
The project manager and the librarian agree that a simple spreadsheet is sufficient to keep track of the work. In the future, it may be worth considering a larger scale database but that is beyond what is needed right now.

The librarian inputs entries into the spreadsheet for each work to be digitized.
Using the Dublin Core ( metadata schema all the critical details that describe each particular work is recorded. In addition to the Dublin Core fields, they also append fields specific to their project:

  • Name of digitizer
  • Date digitization started
  • Date digitization is completed
  • Name of person doing QA
  • QA Date
  • QA Status
  • Digitization notes
  • File location

In the process of entering the data, they come across a few journals which are published more recently than the others. This means that this content will need to be checked for copyright issues first. These items are removed from the project until permissions are given to digitize.

Step 3: Digitization
Now that the cataloguing work has been completed, attention can turn to digitizing the material.

Since the goal is to create preservation quality digital copies, the digitization process will aim to create faithful renditions of the material.

Deciding whether to post-process the images to remove blemishes and non-original markings was up for a serious debate. On one hand, non-original blemishes and markings aren't part of the author's original work, but to remove these markings the images must be altered thus breaking from the "faithful renditions" criteria. In the end the librarian and project manager decided that it would be more efficient to not make any modifications to the digital copies at this time, and revisit this issue when technology and resources become available.

The library already has a number of flatbed scanners which can be used. Since some of the material is large or fragile, cameras were purchased so that the material won't get damaged on a flatbed scanner (by flattening or breaking the spine). The cameras are high resolution and mounted on tripods. Extra training will be required to ensure good results since there are more factors to consider when using cameras(see Module 3 for techniques on using cameras).

To do the actual digitization, the project manager has hired two students. The students are expected to follow a set of guidelines:

  • Each student is only allowed to work on 1 item at a time.
  • They are to handle the material carefully.
  • Document any missing pages, rips, or other artifacts that affect the content area
  • For small to medium sized sturdy material, use digitized with a flatbed scanner
  • For medium to larger sized material, and fragile material, user a camera

To begin, one of the students goes to the librarian for a journal to digitize. The librarian chooses a journal, looks up the item's ID in the spreadsheet, and puts the student's name down as the person doing the digitization.

The student then goes to a workstation with a scanner or camera and starts to capture each page from cover to cover. As the student works, they take care not to damage the material and to make notes of existing damage or blemishes that affect the original content.

Once all the pages have been captured, the files are transferred over the network to the library's shared storage. In the process, the files are renamed to match the Dublin Core ID of the original book so that the files can be easily associated. Also, all files belonging to a single work is put into an compressed archive file so that nothing is accidentally lost. The archive is also renamed to match the Dublin Core ID.

The student is now done, returns the material to the librarian, and then proceeds to acquire another book to digitize.

Step 4: Quality Assurance

Since the goal of the project is to preserve the material, the success of the project is dependent upon the quality of the work. As each journal is completed, the project manager's checks the results to ensure that they meet the standards set out at the start of the project.

If any of the images do not meet the requirements, the individual image is flagged and is re-digitized.

Once all the images appear to be satisfactory, the project manager approves the results and the work for that book is complete.

Step 5: Storage & Distribution

Once the archive has been transferred to the central storage, files will be extracted from its archive, backed up, and the associated database entry (in this case the master spreadsheet) is updated to indicate its location on the network.

With the files secure and accessible, the digitization work for this particular journal is considered done and complete.

Printed Resources:

  1. A guide referencing additional materials related to workflows
  2. A guide referencing software tools
  3. A guide referencing hardware / digitization equipment
  4. A guide for metadata
  5. A guide for librarians (copyrights, metadata, cataloging, etc.)