Skip to Main Content

Research Data Services

Information about how to organize, describe, preserve and share your research data

What Should I Focus on When Organizing Data?

There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues. Consider the following things as you organize your data:

  • File version control (see tools below)
  • Directory structure and file naming conventions (see below)
  • File naming conventions for specific disciplines (see below)
  • File structure
  • Use same structure for backups

File Naming Best Practices

File names should provide context for the files that they name, and distinguish them from files that may be similar. Many files are used independently of their file or directory structure, so provide sufficient description in the file name.

1. Be consistent

  • Have conventions for naming
    • ​Directory structure
    • Folder names
    • File names
  • ​Always include the same information (e.g. date and time)
  • Retain the order of information (e.g. YYYYMMDD, not MMDDYY)

2. Be descriptive so others can understand your meaning.

Try to keep file and folder names under 32 characters.

Within reason, Include relevant information such as:

  • Unique identifier (i.e. Project Name or Grant number in folder name)
  • Project or research data name
  • Conditions (Lab instrument, Solvent, Temperature, etc.)
  • Run of experiment (sequential)
  • Date (in file properties too)
  • Use application-specific codes in 3-letter file extension and lowercase: mov, tif, wrl
  • When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.
  • No special characters: &, *%#;()!@$^~'{}[]?<>-
  • Use only one period and before the file extension (e.g. name_paper.doc NOT name.paper.doc OR name_paper..doc)

example: Project_instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext


Directory Structure Naming Conventions

The structure of directories/folders for organizing the files should also have a clear, documented naming convention.

The top-level folder or directory should include the project title, unique identifier, and date (year).

Directories/folders within the substructure should be divided by a common theme. For example. each folder may contain a run of an experiment or a different version of each dataset.

 

Adapted from: GeorgiaTech and University of Oregon

Data Organization Tools

Data Identifiers

Datasets identifiers will allow your data to be referenced and shared. Data identifiers must be globally unique and persistent: they must not be repeated elsewhere and they must not change over time.

Identifier schemes:

URI Uniform Resource Identifier
PURL Persistent Uniform Resource Locator
DOI Digital Object Identifier
HDL The Handle System
InChI IUPAC International Chemical Identifier 


File Naming Conventions for Specific Disciplines

Many communities of practice have standard recommendations, for example:


File Renaming


Version Control


Workflow Tools


Bibliographic Management

(Adapted from GeorgiaTech)

Other resources

The ETDplus project has published a Data Organization and a Version Control guidance brief. These are short "how to" documents written for a student audience, designed to assist students with data management issues related to their theses and dissertations. 

You can access the six Guidance Briefs from ETDplus through the Tools and Resources page