45,000 pages per day, 10 source libraries, 1 scanning centre.
The UK Medical Heritage Library is one of the largest digitisation projects on the Internet Archive’s plate right now, producing an online resource for the study of the history of medicine and related sciences based on the 19th century book collections of 10 UK libraries.
The digitisation process for this ambitious project is an international collaboration, which starts with the hard copy books in the individual donor libraries across the UK and ends with the digitised books freely available on the Internet Archive website thanks to several teams of staff from Internet Archive based in London and North America.
In summer 2014 the Wellcome Library took over part of the top floor of 183 Euston Road (the Wellcome Collection building in the heart of London), demolishing several walls and creating a large open-plan room capable of housing over a dozen scanning units and thousands of books on shelves, trolleys and crates.
This space is now the centre of operations for the Internet Archive, with 14 staff members on-site unpacking, assessing, logging and digitising medical history books and pamphlets from all the UK Medical Heritage Library partners. Chris Booth, the Internet Archive’s Regional Digitization Manager, based at the Euston Centre, says “The Internet Archive digitises books at dozens of locations worldwide but the Euston Scan Center is indeed the largest such operation so far.”
They currently have collections from four partners on the go – University College London, London School of Hygiene & Tropical Medicine, Glasgow University Library, and the Wellcome Library. A further six partners will will start sending books over the next few months, and digitisation will continue into Spring 2016.
The work begins with the hard copy books in their home libraries. Partner libraries select the books and check condition, size, and suitability for digitisation. They provide accurate inventories, information on special handling requirements where necessary, and carry out any necessary repairs or preparations such as splitting any pesky uncut pages and marking the start and end of books or pamphlets that are bound together. They carefully pack the books into large crates for shipment – and may prepare anything from 4 – 15 crates in a single shipment.
The crates are sent to the Euston Scan Centre, where Internet Archive, who have decades of experience in dealing with high-throughput digitisation from multiple partners, take over. Once the IA team receives a shipment, they check everything against the inventory and make a record of the packing methodology so they replicate it for the return trip. They assess condition as well to be absolutely sure the books can withstand the rigours of digitisation. Foldouts are noted and marked so they can be digitised separately.
According to Chris Booth, “many Internet Archive staff here have a personal interest in the book as an object – some have completed book binding courses, others have spent a large portion of their education or career working directly with unique and historic manuscripts – so there is a strong focus on correct book handling and preservation”.
The London-based scanning staff are then allocated books to digitise. They digitise the book from cover-to-cover and load the images to the Internet Archive site. The books are scanned at an impressive rate; Chris Booth says: “Our Book Scanners aspire to digitise around 800 images per hour and although that sounds intense it really isn’t as bad as it might seem. Team Skype chats are used to circulate ‘Interesting Finds’ so as we’re performing the photography we all get to embrace the weird and wonderful of the UK Medical Heritage Library. This helps to bind the team together and maintain interest in what we are working on.” Some of these images are showcased on the Internet Archive’s Instagram account.
as we’re performing the photography we all get to embrace the weird and wonderful of the UK Medical Heritage Library
Once this is done, quality control and post-processing is done almost entirely off-site by Internet Archive staff in other locations, ranging from Toronto to San Francisco. “Internet Archive projects certainly are an international collaboration with staff in three different time zones pulling together to ensure that the books we digitise are finished to a high standard. Each physical item might pass through four people’s hands in London and the digital files will be worked on by at least another two staff based in North America. Although we might never meet our colleagues across the Atlantic we treat each other as though we were all in the same room, using Skype chats to share information within the Euston Scan Center and abroad” Chris told us.
The images are subjected to optical character recognition (OCR), which creates electronic text versions of each book, and a range of formats are created, including PDFs, ebooks, Daisy talking book, and formats for viewing with the Internet Archives’ book viewer. Within days, assuming a book is correct, the images are available for the public to view online.
The portable gymnasium : a manual of exercises, arranged for self instruction in the use of the portable gymnasium by Fr. Gustav Ernst, 1861.
Author: Dr Christy Henshaw is Digitisation Programme Manager at the Wellcome Library.