Skip to main content

Research Data Services: Archiving & Preservation

Information about how to organize, describe, preserve and share your research data

Data Archiving & Preservation

The difference between backing up & archiving

The terms "backup" and "archiving" are often used interchangeably, as they both relate to saving a specific version of a file, but they are actually very different processes. The term “backup” is used specifically when making copies of various files with the knowledge that the files may change. Backups are kept for a certain amount of time, but can be discarded after a specified time has passed. Archiving is used when a file is to be preserved as-is, often at the end of a project and acts as a static (and usually final) record. [source - DataONE education module]
 

Plan ahead to preserve your data

In addition to planning for local archive storage options (local server, network or OSU’s digital repository), we recommend that you investigate public data repositories within your subject area or discipline. A searchable list of repositories can be found in www.re3data.org, and a list of repositories by discipline is here. See Data Repositories for more information on that option.

In many cases, OSU’s digital repository (or “institutional repository”) ScholarsArchive@OSU (SA@OSU) can be a suitable archive and sharing mechanism for your data. All items deposited into SA@OSU receive a persistent identifier (DOI), are freely available to anyone, and are full-text searchable, making them discoverable through Google, Google Scholar and other large search engines. If you are interested in depositing data into SA@OSU, or have further questions, please contact us here.

Things to consider when archiving your data

  • File formats for long term access: The file format in which you keep your data is a primary factor in one’s ability to use your data in the future. Plan for both hardware and software obsolescence. See the section Organizing Files and File Formats for details on preferable long-term storage file formats.
  • Don’t forget the documentation: Document your research and data so others can interpret the data. It is important to begin to document your data at the very beginning of your research project and continue throughout the project. This template may be helpful. 
  • OSU data retention policy University Oregon State University's  Records Retention policy states that OSU records must be retained for no less than the minimum retention period set forth in the General Schedule. The general schedule states that
    • Records that document the review of research projects that involve the use of human subjects should be retained a minimum of 3 years after project completion except for FDA-regulated drug or device research.
    • An investigator or sponsor shall maintain the records that document the review of research projects that involve the use of human subjects for FDA-regulated device research for a period of 2 years after the investigation is completed.
    • Records that document the review of research projects that involve the care and use of animal subjects should be retained a minimum of 3 years.

Additional data sharing and/or archiving requirements may be imposed by the sponsoring agency; the PI is responsible for complying with such requirements.

  • Ownership and privacy
    Make sure that you have considered the implications of sharing data, in terms of copyright and IP ownership, and ethical requirements like privacy and confidentiality. Data generated by research projects at or under the auspices of Oregon State University are owned by the University. However, the principle investigator (PI) is responsible for retention, preservation, distribution, and control of the data.

Maintaining the integrity of your data

Digital data are fragile, regardless of which storage medium you choose (DVD, hard disk, tapes, etc.). Digital data are susceptible to bit rot, and are likely to degrade or decay over time. The recommended methods for combating bit rot are refreshment and replication.

Refreshment: Periodically copy your data onto a new drive or disk (every 2-5 years).
Replication: Maintain your original copy, an external copy, and an external remote copy. Use at least two forms of storage in two different locations.

For long-term archiving of finalized data, personal computers and external storage devices are NOT recommended. Networked file servers managed by the information services group in your college or department, or OSU’s centralized computing group is the best choice. 

 

Software Obsolescence

Does anyone remember Quattro Pro or Lotus 1-2-3? Exactly. When you archive the final version of your dataset(s), consider using an open, non-proprietary format to ensure that you will be able to fully access it/them in the future. Common file formats for text-based data are plain text (ASCII), HDF and NetCDF. Multimedia formats include JPEG 2000, MNG and PNG. For a list of many other open formats, see here.

If you prefer to keep your data in a proprietary format, there are a couple of ways to ensure continued access to older datasets. When new software versions are released and become established, migrate your older datasets to the newer version or package. In the case of software that becomes obsolete, you may be able to emulate the older software using a virtual machine. The recommended best practice however, is to convert your data to an open format, which facilitates both preservation and sharing.

Adapted from: University of Oregon | Univeristy of Virginia