Digital preservation and backup at OLH

Posted by Martin Paul Eve on 24 September 2020

An article last week in Science pointed to the fact that "dozens of scientific journals have vanished from the internet, and no one preserved them". These titles were open-access journals and, obviously, press coverage like this does no good for the spread of OA. It is, of course, possible for non-OA journals to vanish and to have been inadequately preserved. Because, however, subscription journals have a revenue stream, I suppose they are more likely to be members of digital preservation schemes, whereas the barriers to establishing OA journals are lower, so there may be more risk of their disappearance.

At OLH we have several procedures in place to ensure the long-term preservation and short-term backup of our journals. Digital preservation procedures are in place to ensure that if OLH ceases to operate, our titles remain available. Backup procedures are there to mitigate local data loss and to ensure continuity of operation. All articles are assigned a DOI that, in the event of a preservation service being activated, should be redirected to those sources, enabling persistent addressing of material.

  1. We are members of CLOCKSS (Controlled Lots of Copies Keeps Stuff Safe). This means that the CLOCKSS archive ingests our material and, in the event of OLH going offline/folding, the archive is authorized to make the content available. However, CLOCKSS is still in the process of writing a plugin that can ingest from our platform Janeway. We therefore only have partial coverage in this archive at present. This will, in the near future, have total coverage of OLH titles and articles and act as a long-term preservation facility.
  2. Our articles are ingested by the Internet Archive Scholar service where the fulltext is available. This is a long-term preservation facility.
  3. We encourage authors to deposit the version of record in institutional and subject repositories, thereby ensuring additional redundancy. We do not know the full extent of redundancy that this provides, but it provides at least some assurance of failsafe backup. This acts as a long-term preservation facility but, as before, the extent of coverage is unknown/not in our capacity.
  4. Janeway databases are preserved with hot backups of their live state at our server host, DigitalOcean. Our third-party providers, such as Ubiquity Press, are responsible for backups of the databases and journals that they maintain. Likewise for Liverpool University Press. This is a backup facility.
  5. We currently keep two offsite copies of the database and filesystem that we control ourselves. These are geolocated at separate sites in the United Kingdom. We are about to expand this to a third site, giving triple redundancy of all data. These backups are packed into local dated archive files so that we have a record of changes on a daily basis. Each offsite backup location has 10TB of storage, ensuring that we can keep a large number of backups. Each backup location/server is protected by an uninterruptible power supply that safely shuts down the box if a power outage is detected. This is a backup facility.


And just for fun, here's a picture of one of our storage facilities!



I hope this gives some additional clarity over the measures that we have taken, at OLH, to ensure our permanence in the record.

Martin Paul Eve