Skip to Main Content

Research Data Services

Information about how to organize, describe, preserve and share your research data

Data Types & File Formats

After defining what we mean by data, it is helpful to consider what types of data you create and/or work with, and what format those data take. Your data stewardship practices will be dictated by the types of data that you work with, and what format they are in.

Data Types

Data types generally fall into five categories:

Observational
- Captured in situ
- Can’t be recaptured, recreated or replaced
- Examples: Sensor readings, sensory (human) observations, survey results

Experimental
- Data collected under controlled conditions, in situ or laboratory-based
- Should be reproducible, but can be expensive
- Examples: gene sequences, chromatograms, spectroscopy, microscopy

Derived or compiled
- Reproducible, but can be very expensive
- Examples: text and data mining, derived variables, compiled database, 3D models

Simulation
- Results from using a model to study the behavior and performance of an actual or theoretical system
- Models and metadata, where the input can be more important than output data
- Examples: climate models, economic models, biogeochemical models

Reference or canonical
- Static or organic collection [peer-reviewed] datasets, most probably published and/or curated.
- Examples: gene sequence databanks, chemical structures, census data, spatial data portals.
 

Data Formats 

Research data comes in many varied formats: text, numeric, multimedia, models, software languages, discipline specific (e.g. crystallographic information file (CIF) in chemistry), and instrument specific.

Formats more likely to be accessible in the future are:
- Non-proprietary
- Open, documented standards
- In common usage by the research community
- Using standard character encodings (ASCII, UTF-8)
- Uncompressed (desirable, space permitting)

A table with appropriate and recommended formats for preserving and sharing research data over the long term can be found in the ScholarsArchive@OSU user guide

Sources: University of Edinburgh Information Services
University of Oregon Libraries
California Digital Libraries

Other resources

The ETDplus project has published a File Formats guidance brief. It is a short "how to" document written for a student audience, designed to assist students with data management issues related to their theses and dissertations. 

You can access the six Guidance Briefs from ETDplus through the Tools and Resources page