The Wellcome Digital Library pilot has been underway for 18 months with 6 months to go before we launch the new Library website. This will provide access to a wide range of digital content related to the Foundations of Genetics theme. All of the work done so far has been behind the scenes: digitising content, procuring and developing our digital library systems, and designing a new website. We are looking forward to displaying the product of all this work to the public – but we’re not quite there yet!
So where are we now and what will we be doing in 2012? Here is a snapshot of progress so far. Further details on some of these projects can be found on this blog, and we will continue to explain our activities in more detail in future posts.
Archives: With our in-house team of two photographers, we have digitised around 380,000 pages from the collections of Crick, Mourant, Medawar, Sanger, Wyatt, Grueneberg, and the Blood Group Unit. We have just started the Eugenics Society collection, which will carry on throughout the spring and summer.
Genetics Books: This project has just begun, with up to 2,000 books to be digitised this spring by an external supplier, Bespoke Archive Digitisation, working on-site.
MOH reports: A successful JISC funding bid meant we could add the Greater London Medical Officer of Health reports to the pilot project. Conservation is underway, and digitisation will begin in a few month’s time.
ProQuest: We have partnered with ProQuest to digitise our pre-1700 printed books for Early European Books online, with over 1,000 books now digitised and around 13,000 to go. Those with subscriptions and anyone in the UK can view our first 400 books on the EEB website with more to come shortly (search for “Wellcome”).
External content: We have had the first delivery from one of our external partners, Cold Spring Harbour Laboratory, including correspondence from the James Watson archive. This adds around 50,000 images to our digital archive collections, with more to come throughout 2012 and early 2013 from all partners.
Copyright and sensitivity: Hand in hand with digitisation, we are assessing our content for sensitivity and copyright issues where necessary. Sensitive items (containing certain types of private information as defined by the Data Protection Act) are identified and flagged as unsuitable for online dissemination. Copryight clearance of in-copyright works is underway with the help of the Authors’ Licensing and Collecting Society, and the Publishers’ Licensing Society.
Digital Asset Management & Storage: Safety Deposit Box 4.1, our digital asset management system, was extended to provide extra functionality for large sets of digital assets in 2011. This system is now in production. Our storage system, Pillar, now includes a Write Once Read Many (WORM) backup drive to ensure that our files are secure in the long term.
Workflow system: We procured Goobi (Intranda Version) with bespoke modifications in 2011 to act as a workflow system, enabling us to track project progress, and to automate a number of activities (including ingest of content into Safety Deposit Box). This has recently been put into use in production, particularly for the Genetics books digitisation project. Soon we will be using Goobi for all digitisation projects, and to ingest our backlog of images.
JPEG 2000: We now archive all our images in the JPEG 2000 (Part 1) format, and have an automated batching process set up with LuraWave. Soon, we will be implementing JPEG 2000 validation as part of this process to ensure all JPEG 2000s meet the correct standards before ingest.
Digital delivery: A new digital delivery system is currently under development that will interoperate with Safety Deposit Box and our new website content management system, Alterian CM7. We have commissioned CM7 developers Digirati to carry out this development, which will be completed at the end of the summer. So far they have produced a proof of concept system that demonstrates an end-to-end sequence from retrieval of images from Safety Deposit Box using METS files created by Goobi, to displaying images online. They are adapting Seadragon, the MS viewer used by several other digital libraries, to meet our specific needs and design criteria.
Search and discovery: We are also making changes to our single search system, Encore. This work is looking at providing better representation of archival metadata in Encore, and also options for incorporating a full-text index. The purpose is to provide access to all Library content – the catalogues as well as the digitised materials – via a single interface.
New website and user experience
User experience-led design: Last year the Library brought on board external suppliers Clearleft – user experience and web design experts – to help redesign the information architecture and visual appearance of the new website. New designs are already visible on the internal web development environment, so further user testing of a real website can soon be done.
Transferring content: The Library has carried out a full content audit of the current website, and prioritised content to carry across to the new site. The current site contains over 2,000 pages; this will be considerably reduced. The content carried across to the new site will be thoroughly edited to ensure it is up-to-date and consistent with the new site “style”.
Creating new content: New content will also be created once the content management system is in action, with a focus on the Foundation of Modern Genetics. This is a major part of the Library’s aim to provide interpretative content to both researchers and the “curious public”.