PURR Preservation Strategic Plan

1. Objectives

The primary focus of PURR’s preservation activities is:

  • The intellection content of the data published and deposited into the repository (including gallery and supplementary materials). PURR prioritizes preservation of the content of the data as opposed to the look and feel or functionality of the deposited dataset.
  • The producer-supplied metadata as well as any metadata extracted or added during the preservation process. PURR does not commit to preserving data within the project space or any publication in “draft” status.

2. Preservation Strategies

The following preservation acts adhere to the PURR Preservation Strategic Plan’s goal of preserving access to digital content so that it remains readable, meaningful, and understandable. Each object that enters the repository will undergo some type of preservation strategy, defined here:

Bit-level Preservation: This is the most basic level of preservation in PURR. Datasets designated for bit-level preservation will undergo file backup, fixity checking, and recognition of file format preservation activities.  Datasets will be maintained in a state that provides potential for long-term care and accessibility.  Datasets will undergo Bit-level Preservation if the file format is unrecognized or unsustainable, or if Full or Limited Preservation is not possible at the time of ingest.  When possible, representation information will be included within object metadata.

Limited Preservation: Datasets at this level will receive Bit-level Preservation activities with representation information, and migration actions when appropriate during the life cycle. Migration will be made with the priority of preserving content and when possible, the format and style. Functionality of the dataset may not be preserved at this level, due to proprietary formats which may fall into this level.

Full Preservation: This is the highest preservation level in PURR. Datasets at this level receive Bit-level Preservation activities, with representation information, as well as transformation/normalization and migration actions when appropriate during the life cycle.  Full Preservation includes continual monitoring by PURR managers for obsolescence and changes in file formats and/or technology.

PURR reserves the right not to accept or preserve objects found to be corrupted or dangerous to the repository.

As stated in the PURR Digital Preservation Policy, Purdue Libraries is committed to preserving and maintaining all PURR content for a period of ten years after it is published in PURR. Long-term preservation of PURR content beyond the ten year retention period is subject to the Libraries' selection criteria for long-term retention, pending budget approval for staffing and related resources needed to accomplish this goal.

3. Specific Preservation Actions

The primary goal of the PURR preservation actions is to preserve the intellectual content of the objects deposited into the repository, in accordance with the PURR Preservation Strategic Plan. Each object placed within PURR will be subject to established preservation techniques in order to maintain its integrity and our ability to reproduce the object as necessary to ensure continuous access over time.

The following actions are undertaken within the preservation process:

Robust Preservation Metadata: Each dataset submitted to PURR will receive a full implementation of preservation metadata. PURR weaves together multiple metadata standards in order to fully describe the unique nature of many datasets. The four schemas used are Dublin Core for discoverability and citation; METS for structural and hierarchal representation of dataset files; MODS to note the creator and access rights of the dataset; and PREMIS to record the preservation events each dataset may undergo as well as the legal rights assigned to each dataset.

File Format Recognition:  At ingest each dataset will be analyzed to determine its file format.  PURR uses the technical registry, PRONOM and the format identification tool, DROID to verify the format of each dataset; this information is documented for use in potential transformation, migration, and fixity checks during the entire life cycle of the object.

Secure Storage and Backup:  All data within PURR is fully duplicated on a regular basis to prevent catastrophic loss of information.  Information is backed up and mirrored at another site to provide a means of recovery in case of disaster.  This file duplication also prevents data loss in case of data corruption detected through regular fixity checks.  Purdue University Libraries is a member of the MetaArchive Cooperative, a Private LOCKSS Network. Data within PURR is geographically and redundantly stored within the network. Data is stored in ways which will facility easy recovery in the case of a catastrophic loss or server fail.

Fixity:  All data within PURR will undergo regularly-scheduled fixity checks designed to ensure no loss of data has occurred.  Fixity checking will be done by comparison of a hash generated at the time of ingest.  In the case file degradation is discovered, the corrupted data will be removed and replaced with its uncorrupted counterpart at mirror sites.

Transformation/Normalization:  As each dataset enters PURR, items not structured within established file formats will undergo a process to transform and normalize the object into an analogous long-term preservation format.  When possible, items will be presented in native formats to preserve original appearance; however, when this is not feasible the content of the file receives preservation preference over its appearance.  PURR is committed to maintaining the fullest possible accessibility to datasets following transformative actions. Transformation events will be recorded in metadata associated with the dataset throughout its life cycle.

Migration:  The intellectual content of the datasets must be preserved; as such, PURR will continue to monitor content for potential file format obsolescence.  If circumstances dictate data within PURR is at risk of obsolescence, the content will undergo transformation to a new file format more conducive to its preservation.  This will be done to bring PURR in line with rapidly evolving archival best practices and to ensure long-term preservation and access.  This migration may include "upgrading" datasets to a newer version of the same format or transformation into a completely new file structure.  Migration events will be recorded in the preservation metadata associated with the dataset.

The Purdue University Research Repository (PURR) is a university core research facility provided by the Purdue University Libraries and the Office of the Executive Vice President for Research and Partnerships, with support from additional campus partners.