How do you digitise an entire library? I’ve been asking members of our Digitisation team to tell me about their part in the digitisation process.
Name: Rioghnach Ahern
What is your job title?
Digital Ingest Coordinator
And what does that involve?
In one sentence it involves coordinating and overseeing the ingest of digital content from all digitisation projects and ensuring content is ready for delivery to the Wellcome Library website.
Once the items have been digitised I monitor the throughput of each digital collection. For things digitised by the Internet Archive scanning unit, such as the UKMHL collection, I mainly monitor automated processes.
We harvest metadata from the Internet Archive for the UK MHL project a month after the images have been uploaded to their site. This gives us a little bit of wiggle room so that we’re not inheriting any errors or imaging issues etc.
Once the metadata is harvested it then goes through an automatic import step in our workflow tracking system Goobi. We download the metadata (MARC XML file), Scandata (this provides pagination and structural information for the book – enabling better navigation for the user), OCR for full text (Abby.gz), and image files (JPEG2000 images).
We then unpack this data, and create ALTO files to enable our users to search within a book. Each book is then automatically METS (Metadata Encoding & Transmission Standard) edited.
The human tasks are monitoring throughput, troubleshooting issues, stopping plugins for maintenance purposes, liaising with staff at the Internet Archive about imaging issues or coordinating with the metadata coordinator about metadata queries.
Although it’s automated, we’re able to switch the tap on and off as it were, so that helps! Plus we can then prioritise other work coming through from other projects or partners.
For other collections, for example the Mental healthcare archives, it may be more complicated:
- Overseeing the ingest of in-house archives, manuscripts, etc. and troubleshooting where required.
- The partner institutions can now transfer their images to us via FTP. This has streamlined our work considerably as image delivery used to be a very time consuming task.
- METS edition for this work steam is manual as we need to restrict some content because of data protection or copyright.
It sounds complicated…
Not really, it’s quite mechanical. Attention to detail is a must for this kind of work. It helps that I’m very pedantic!
I always wondered why you have two screens on your desk…
I’m in Goobi working on a specific item or workflow on one screen and I have the Intranda task manager on the other to give me a bird’s eye view of all the items processing through our systems.
What’s the best / most interesting thing about your work?
Detective work figuring out when things get stuck in the workflow. I like figuring out what’s gone wrong and finding solutions to the problems.
When you complete a collection and everything is finally available online – I find that very satisfying.
What’s the most frustrating thing about the job?
It’s the usual perils of relying on several complex IT systems. There are days when it feels like nothing is working! There may be unexpected backlogs of tasks because we’ve inherited issues do to with incomplete metadata or validation errors from other systems.
Finally, what’s your favourite discovery in the digitised collections?
Haldane’s cats, which I came across in the pilot phase of the Codebreakers: Makers of Modern Genetics archive digitisation.
I came across some letters written to J B S Haldane asking if people could adopt the cats that he bred. He wrote an advertisement in the Daily Worker looking for people to take them. This is my favourite letter, written by Fiona Bradbury, aged 4 and a half, asking for a black tom kitten:
It’s so sweet and totally unexpected in the middle of dry scientific articles!
Thanks Rioghnach – now I know what a digital workflow looks like!