Skip to Main Content

ScholarsArchive@OSU User Guide

This guide helps ScholarsArchive users deposit and manage their content.

Preferred File Formats

To maximize the ability to share, preserve and re-use digital files, carefully consider the format you use for digital files. Selection of a file format can help you in the future by limiting the chances of your data becoming obsolete when a proprietary format is no longer supported or available. 

 Formats more likely to be accessible in the future are: 

  • Non-proprietary 

  • Open, documented standards 

  • In common usage by the research community 

  • Use standard character encodings (ASCII, UTF-8) 

  • Uncompressed (desirable, space permitting) 

Use the table below to find an appropriate and recommended format for preserving and sharing your digital files over the long term. 

Most content deposited to ScholarsArchive@OSU is textual in nature: theses and dissertations, research articles, presentations, technical reports, conference proceedings, posters, etc. The PDF file format is required for this content. PDF/A-1 -- ISO 19005-1 is preferred with fonts embedded (.pdf). PDF without fonts embedded is also acceptable but not recommended. To save a Microsoft word document as a PDF with fonts embedded, follow these simple instructions: https://www.bc.edu/content/dam/files/libraries/pdf/embed-fonts.pdf.

For other content types--such as quantitative and statistical data, spreadsheets, databases, graphics, audio, and video (among others)--use the table below to find an appropriate and recommended format for preserving and sharing your digital files in ScholarsArchive@OSU over the long term.

 

Format

Highest Confidence

Medium Confidence

Lowest Confidence

Text

Plain text -- US-ASCII, UTF-8, UTF-16 with BOM (.txt)

SGML with included DTD (.sgm, .sgml)

XML with included schema (.xml)

PDF/A-1 --  ISO 19005-1 (.pdf)

Plain text -- ISO 8859-x (.txt)

Rich Text Format 1.x (.rtf)

Cascading Style Sheets (.css)

HTML (.html, .htm)

LaTeX with referenced files (.latex, .tex)

OpenDocument Text (.odt, .sxw)

MS Word 2007+ (OOXML) (.docx)

PDF with fonts embedded (.pdf)

Microsoft Word (.doc)

WordPerfect (.wpd)

all others

Digitized Books, Maps, Paper etc.

JPEG2000 -- lossless (.jp2)

TIFF -- uncompressed (.tiff)

PDF/A-1 --  ISO 19005-1 (.pdf)

n/a

All others

Raster Graphics


 

TIFF -- uncompressed or CCITT 4 compressed (.tiff)

JPEG2000 -- lossless compression (.jp2)

PNG (.png)--24bit true color

TIFF -- compressed (.tiff)

JPEG (.jpg)

JPEG2000 -- lossy compression (.jp2)

GIF (.gif)

Digital Negative DNG (.dng)

BMP (.bmp)

PNG (.png)--8 bit indexed

PhotoShop (.psd)

MrSID (.sid)

RAW files

all others

Vector Graphics

SVG -- no JavaScript binding (.svg)

PDF/A-1 --  ISO 19005-1 (.pdf)

Computer Graphics Metafile (.cgm)

Encapsulated Postscript (.eps)

Macromedia Flash (.swf)

all others

Digitized Audio

BWAV LPCM (.bwav, .wav)

24-bit, 96kHz

n/a

all others

Born Digital Audio

AIFF -- PCM (aif, aiff) LPCM codec.

WAV -- PCM (.wav)

LPCM codec

SUN audio -- uncompressed (.au, .snd)

Standard MIDI (.mid)

Free Lossless Audio Codec (.flac)

Apple Lossless Audio Codec (ALAC) (.m4a)

MP3 (.mp3)

Advance Audio Coding (.mp4)

AIFC -- compressed AIFF (.aifc)

RealAudio (.rm, .ra)

Windows Media Audio (.wma)

WAV -- compressed (.wav)

Ogg Vorbis (.ogg) (LOSSY)

all others

Digitized Video

FFV1/Matroska (.mkv)

AVI -- uncompressed (.avi)

QuickTime -- uncompressed, motion JPEG (.mov)

Uncompressed .mxf

Motion JPEG 2000 (.jp2)

ProRes  (.mov)

Born Digital Video

FFV1/Matroska (.mkv)

AVI -- uncompressed (.avi)

QuickTime -- uncompressed, motion JPEG (.mov)

Uncompressed .mxf

MPEG-4 (.mp4) H.264

MPEG-1, MPEG-2 (.mp1, .mp2)

Ogg Theora (.ogv, .ogg)

ProRes  (.mov)

Motion JPEG 2000 (.jp2)

Windows Media Video (.wmv)

RealVideo (.rm, .rv)

all others

Spreadsheet or Database

Comma- or tab-separated Values (.csv, .tsv, .txt)

Delimited text

SIARD: Software Independent Archiving of Relational Databases (.siard)

dBASE (.dbf)

OpenDocument Spreadsheet (.ods)

MS Excel 2007+ (OOXML) (.xlsx)

Excel (.xls)

all others

Computer Programs

 

Computer program source code

Compiled / Executable files

Presentation

PDF/A-1 --  ISO 19005-1 (.pdf)

OpenDocument Presentation (.odp)

MS Powerpoint 2007+ (OOXML) (.pptx)

PowerPoint (.ppt)

all others

Geospatial

GeoTIFF (.tif)

GeoJSON (.json, .geojson)

ESRI Shapefile (making sure all component files are present) (.shp, .shx, .dbf)

ESRI Geodatabase (.gdb) (prefer Shapefiles)

ESRI Export Format (.e00)

Geography Markup Language (GML) (.gml)

Keyhole Markup Language (KML) (.kml, .kmz)

Other ESRI files

Containers

Zip --no compression

.tar

Zip- compressed

All others

Quantitative and Statistical Data

 

(See also: Spreadsheet or Database)

Comma- or tab-separated Values (.csv, .tsv, .txt)

Structured text or markup file containing metadata information:

Data Documentation Initiative (.ddi), XML (.xml), JSON (.json)

SIARD: Software Independent Archiving of Relational Databases (.siard)

HDF5 (.hdf)

SPSS (.sav, .sps, .spv, .spo)

SAS (.sas, .sas7dat)

R (.R)

HDF4 (.hdf)

Excel (.xls)

Other proprietary formats

CAD

 

(See also: Vector Graphics)

Industry Foundation Class (.ifc)

Standard for the Exchange of Product Model Data (.step, .stp, .p21)

Initial Graphics Exchange Specification (.igs)

AutoDesk’s Drawing Interchange File Format/Data eXchange Format (.dxf)

AutoCAD (.dwg)

Extensible 3D (.x3D)

Universal 3D (.u3D)

Portable Document Format/Engineering or PDF3D (.pdf)

Other proprietary CAD formats

Email

MBOX

EML

MSG

PST

 

 

Table reused courtesy of University of Washington Libraries: Preferred File Formats—UW Libraries. (n.d.). Retrieved January 28, 2021, from https://www.lib.washington.edu/preservation/preservation_services/digitization-and-digital-preservation/preferred-file-formats