The successful long term management of digital assets is a key concern for us as we build our digital resource. Until recently, we did not have a dedicated system in use for storing and managing master files for any digitised content, and our file backup system was not idea for dealing with large sets of data. Files were managed via simple filesharing on dedicated storage servers. Backups were only created for some content, and even then the backups were not permanent.
When the Wellcome Digital Library was initiated it quickly became clear that we needed a dedicated system to manage our digital masters – something that was scalable, robust, and could handle all the digital formats we create or procure – including born digital material. We also needed a secure storage system with offsite, permanent backup capability. This blog post describes our digital asset management system; a future blog post will provide details of our secure storage solution.
We already had an existing digital asset management system in place to manage born digital archives: Preservica (formerly Safety Deposit Box, or SDB), developed by Tessella. This system incorporates a suite of tools designed to manage and preserve digital files. It provides a context in which administrative and descriptive metadata is associated with all ingested content. SDB can be combined with tools that can “migrate” files from one format to another to counter format obsolescence and therefore ensure the longevity of the data. In other words, when Word 2010 is no longer supported by Microsoft, SDB can help migrate these files to a current format that is supported by software available at that time.
However, in order to manage preservation of large sets of digitised content, SDB “out of the box” was not entirely able to meet the needs of the Wellcome Digital Library. We carried out a Feasibility Study in 2010 to determine whether it would be possible to use a modified version of the system. The research and prototype system we commissioned proved that it was indeed feasible to use SDB with certain software extensions.
In the spring of 2011, we commissioned Preservica to extend and install the newest version of the software (SDB4). This work is now complete and in the testing phase.
Key preservation functionality that SDB4 provides is listed here (some of this was “out of the box”, some were developed as extensions to the core system or as modules):
Automated ingest: automated SDB workflow to create and ingest a “submission information package” (SIP) – a bundle of content files and metadata forming a complete “object”
Multiple manifestations: ability to associate all the different manifestations of an object together (e.g. master video file, broadband versions, narrowband versions, transcripts, etc.)
SQL Server database: stores and indexes administrative and descriptive metadata describing objects stored in SDB
Characterisation: use of the JHOVE, DROID and PRONOM characterisation tools to extract essential technical information about digital files to be stored in the database
Format Migration: SDB4 builds on the PLANETS framework to support preservation planning for mitigation of obsolescence
Integrity checking: creates and stores a unique SHA-1 hash code for each file that can be used to test validity of the file over time
Provide access to content: delivers content to external systems using an API
Automatic export of administrative metadata: allows us to store metadata such as unique SDB identifiers in our workflow system in order to deliver files to the user
Administrative interface: allows administrators access to the content and the database, and to generate reports
This is by no means a complete description of SDB’s functionality, but the above provides a flavour of the most important features for long-term asset management and access to master files. Much of the development work that was done to extend SDB4 is likely to have an application for other users, not just the Wellcome Digital Library. Where possible, extensions are generic – although system requests/commands and metadata mappings may be highly specific to us.
We hope that our efforts in specifying our own needs will benefit other SDB users who see the value in long term preservation for digitised materials, and the efficiencies of combining born digital and digitised content into a single system strategy.
In future, SDB will interface with our workflow system and the image/media servers employed by our digital delivery system. These latter two systems have not yet been implemented.