This summer we waved goodbye to the last of our nine partners who travelled with us on a short but intensive journey to digitise 15 million pages of 19th century health and medical culture over the past 20 months. These 40,000 books and 30,000 pamphlets make up UKMHL and add a wealth of UK-based material to the constantly growing Medical Heritage Library, covering a huge range of topics related to human and animal health and well-being.
The project was a collaboration between a diverse set of partners including:
- 2 funders: £3.8 million split between Jisc/HEFCE (£2.4m) and Wellcome (£1.4m).
- 1 co-design partner organisation: Research Libraries UK
- A lot of expert advisors: drawn from an extensive list of medical historians and information professionals
- 10 content owners: 6 university libraries, 3 medical society libraries, and Wellcome Library
- 4 service partners: Internet Archive’s scanning service, Intranda GmbH, Sero Consulting, and Gooii Ltd
- 4 content hosts: Internet Archive, Wellcome Library and Jisc Historical Texts* (and content on the Internet Archive also gets added to the Digital Public Library of America)
- 1 international partner: the North American-based Medical Heritage Library consortium
We didn’t simply aim to make a lot of material available online. We also sought to develop efficient workflows for creating, managing, and providing access to large numbers of digital items – far larger than anything we had attempted before. The logistics alone help illustrate the scale of the effort to get through some 3,600 titles per month:
- 16 workstations for Internet Archive’s scanning unit on-site at Wellcome
- 17 book trolleys for storing and moving books around the scanning centre
- 120 metres shelving to hold books being digitised
- 75 bookends to do what bookends do
- 210 crates for shipping at the rate of around 50 volumes per crate
- 42 round-trip shipments to move the books to and from the Wellcome
After the physical items were digitised, processed and quality controlled, the files resided on the Internet Archive’s servers (alongside millions of other digitised texts). In order to add the digitised content – JPEG2000 images, OCR text and metadata – to the Wellcome digital library, we needed a way to harvest the files and ingest them into our repository, all at a rate of several thousand titles each month. Otherwise we could (and once or twice did) end up with a huge backlog of content.
Our development partners at Intranda helped us to fully automate this process. The system routinely picks up newly digitised titles, downloads the files we need, processes them and ingests them into the digital library, keeping us in-step with the pace of digitisation.
Once the content is in the digital library we can make the full-text available for searching in our Library catalogue, provide deep-zoom viewing via the Universal Viewer, and a wide range of download and embed options using the full suite of IIIF API standards.
Jisc (project co-funders) also harvested and downloaded the digitised content for inclusion in Jisc Historical Texts (JHT). This collection will be the first open access collection available in JHT when it launches on 27 October. In addition the site will have user-friendly data visualisations that will allow users to browse the huge range of topics in the corpus using the full-text index.
As we neared completion of the project, we knew we couldn’t stop just yet. With thousands of books still on the shelves in the Wellcome Library we have continued digitising our printed collections, if a bit more sedately. The Internet Archive now digitises around 800 books each month, so you can still find new content available like The Ideal Home and its Problems, published around 1911 (If you love seeing new things every day, visit New Things to Look At for the latest digitised items in the Wellcome Library.