|Archiving & Preservation Print Page|
The terms "backup" and "archiving" are often used interchangeably, as they both relate to saving a specific version of a file, but they are actually very different processes. The term “backup” is used specifically when making copies of various files with the knowledge that the files may change. Backups are kept for a certain amount of time, but can be discarded after a specified time has passed. Archiving is used when a file is to be preserved as-is, often at the end of a project and acts as a static (and usually final) record. [source - DataONE education module]
In addition to planning for local archive storage options (local server, network or OSU’s digital repository), we recommend that you investigate public data repositories within your subject area or discipline. A searchable list of repositories can be found here, and a list of repositories by discipline is here. See Data Repositories for more information on that option.
In many cases, OSU’s digital repository (or “institutional repository”) ScholarsArchive@OSU (SA@OSU) can be a suitable archive and sharing mechanism for your data. All items deposited into SA@OSU receive a persistent identifier (DOI or ARK), are freely available to anyone, and are full-text searchable, making them discoverable through Google, Google Scholar and other large search engines. If you are interested in depositing data into SA@OSU, or have further questions, please contact us (link).
Digital data are fragile, regardless of which storage medium you choose (DVD, hard disk, tapes, etc.). Digital data are susceptible to bit rot, and are likely to degrade or decay over time. The recommended methods for combatting bit rot are refreshment and replication.
Refreshment: Periodically copy your data onto a new drive or disk (every 2-5 years).
Replication: Maintain your original copy, an external copy, and an external remote copy. Use at least two forms of storage in two different locations.
For long-term archiving of finalized data, personal computers and external storage devices are NOT recommended. Networked file servers managed by the information services group in your college or department, or OSU’s centralized computing group (link) is the best choice. See the OSU Community Network (CN) services and pricing for more details.
Does anyone remember Quattro Pro or Lotus 1-2-3? Exactly. When you archive the final version of your dataset(s), consider using an open, non-proprietary format to ensure that you will be able to fully access it/them in the future. Common file formats for text-based data are plain text (ASCII), HDF and NetCDF. Multimedia formats include JPEG 2000, MNG and PNG. For a list of many other open formats, see here.
If you prefer to keep your data in a proprietary format, there are a couple of ways to ensure continued access to older datasets. When new software versions are released and become established, migrate your older datasets to the newer version or package. In the case of software that becomes obsolete, you may be able to emulate the older software using a virtual machine. The recommended best practice however, is to convert your data to an open format, which facilitates both preservation and sharing.