Skip to Main Content

ScholarsArchive@OSU User Guide

This guide helps ScholarsArchive users deposit and manage their content.

Preferred File Formats

To maximize the ability to share, preserve and re-use digital files, carefully consider the format you use for digital files. Selection of a file format can help you in the future by limiting the chances of your data becoming obsolete when a proprietary format is no longer supported or available. 

 Formats more likely to be accessible in the future are: 

  • Non-proprietary 

  • Open, documented standards 

  • In common usage by the research community 

  • Use standard character encodings (ASCII, UTF-8) 

  • Uncompressed (desirable, space permitting) 

Use the table below to find an appropriate and recommended format for preserving and sharing your digital files over the long term. The table indicates the level of confidence that file formats will continue to be accessible over time, based on these characteristics. There may be aspects of formats that are not considered highest confidence that are desirable for near-term use. In some instances, it may be appropriate to submit proprietary or other lower-confidence formats alongside a highest-confidence format in order to facilitate near term use (for instance, an Excel/XLSX file for fully featured near-term use, submitted with a CSV file for long-term preservation).

Most content deposited to ScholarsArchive@OSU is textual in nature: theses and dissertations, research articles, presentations, technical reports, conference proceedings, posters, etc. The PDF file format is required for this content. PDF/A-1 -- ISO 19005-1 is preferred with fonts embedded (.pdf). PDF without fonts embedded is also acceptable but not recommended. To save a Microsoft word document as a PDF with fonts embedded, follow these instructions:

For other content types--such as quantitative and statistical data, spreadsheets, databases, graphics, audio, and video (among others)--use the table below to find an appropriate and recommended format for preserving and sharing your digital files in ScholarsArchive@OSU over the long term. Please note that per ScholarsArchive@OSU Preservation Policy, repository staff commit to performing format migration for any files submitted using recommended Highest Confidence formats, should those formats become obsolete. For all other file formats, only bit-level preservation is guaranteed.


Document Formats

  Highest Confidence Medium Confidence Lowest Confidence
Word Processing PDF/A-1 (ISO 19005-1)

PDF/UA (ISO 14289-1)
Portable Document Format / PDF (All other types)

OpenDocument Text (Open Office)
(.sxw, .odt)

MS Word 2007+ (OOXML)

Rich Text Format
Microsoft Word

Google Docs

Plain Text Plain text (US-ASCII, UTF-8, UTF-16 with BOM)
Plain text (ISO 8859-x)
Structured Text SGML – with included DTD
(.sgm, .sgml)

XML – with included schema

(.htm, .html)

Cascading Style Sheets

LaTeX with referenced files
(.latex, .tex)

Presentations PDF/A-1 (ISO 19005-1)
Portable Document Format / PDF

OpenDocument Presentation (Open Office)
(.sxi, .odp)

MS PowerPoint 2007+ (OOXML)
(.pptx, .ppsx)
Microsoft PowerPoint
(.ppt, .pps)

MS PowerPoint 2007+ with macros enabled
Scanned Documents PDF/A-1 (ISO 19005-1)

TIFF – uncompressed
(.tif, .tiff)

JPEG2000 – lossless compression
Portable Document Format / PDF
eBooks Open eBook File
Portable Document Format / PDF

Structured Data Formats

  Highest Confidence Medium Confidence Lowest Confidence
Tabular Data Comma-Separated Values

Tab-Separated Values

Delimited Text
OpenDocument Spreadsheet (Open Office)
(.sxc, .ods)

MS Excel 2007+ (OOXML)
Microsoft Excel

MS Excel 2007+ with macros enabled
Databases SQLite
(.sqlite, various)

Software Independent Archiving of Relational Databases (SIARD)
Statistical Data Comma-Separated Values

Delimited Text

(.R, .rdata)

(.sav, .sps, spv, spo)

(.sas, .sas7dat)

Other proprietary formats
Geospatial Data GeoTIFF
(.tif, .tiff)

(.json, .geojson)

ESRI Shapefile, with component files
(.shp, .shx, .dbf)

ESRI Geodatabase

ESRI Export Format

Geography Markup Language

Keyhole Markup Language
(.kml, .kmz)
Other ESRI files

Other proprietary formats
Metadata and Markup XML – with included schema

JSON – with included metadata

Data Documentation Initiative

Audio-Visual Material Formats

  Highest Confidence Medium Confidence Lowest Confidence
Raster Graphics
TIFF – uncompressed or CCITT 4 compressed
(.tif, .tiff)

JPEG2000 – lossless compression

PNG – 24-bit true color
TIFF – compressed
(.tif, .tiff)

JPEG2000 – lossy compression

PNG – 8-bit indexed

(.jpg, .jpeg)



DNG Digital Negative


Proprietary RAW files
Vector Graphics
Scalable Vector Graphics

PDF/A-1 (ISO 19005-1)
Computer Graphics Metafile
Encapsulated Postscript

Macromedia Flash
Audio Broadcast WAVE (BWAV) – LPCM codec

WAVE – PCM, LPCM codec

AIFF – PCM, LPCM codec
(.aif, .aiff)
MPEG Audio Layer III

Advance Audio Coding
(.mp4, .aac)

Apple Lossless Audio Codec / ALAC

Free Lossless Audio Codec

Standard MIDI

SUN Audio – uncompressed
(.au, .snd)
WAVE – compressed

AIFC – compressed AIFF

(.rm, .ra)

Windows Media Audio

Ogg Vorbis
Video FFV1 / Matroska

AVI – uncompressed

QuickTime – uncompressed, motion JPEG

MXF – uncompressed

MPEG-4 – H.264
(.mp4, .m4v)

SubRip (subtitle file)
(.mp2, .mpg, .vob)

(.mp1, .mpg)

Ogg Theora
(.ogg, .ogv)

Apple ProRes

Motion JPEG2000
Windows Media Video

(.rm, .rv)

3D/CAD Formats

  Highest Confidence Medium Confidence Lowest Confidence
3D/CAD Industry Foundation Class

Standard for the Exchange of Product Model Data
(.step, .stp, .p21)

Initial Graphics Exchange Specification
AutoDesk’s Drawing Interchange File Format / Data eXchange Format


Extensible 3D

Universal 3D

Portable Document Format / Engineering or PDF3D
Other proprietary formats

Container Formats

  Highest Confidence Medium Confidence Lowest Confidence
Containers ZIP – uncompressed

TAR Tape Archive

BitTorrent files (.torrent) are required for depositing files larger than 5GB. See the BitTorrent Guide for more information.
ZIP – compressed

GNU Zip / GZip – compressed

GZip compressed tarballs

Remember that container files are only as good as their contents! The files inside the ZIP or TAR wrapper should still adhere as closely as possible to the best practices and recommendations elsewhere in this guide.

Email Formats

  Highest Confidence Medium Confidence Lowest Confidence
Email MBOX Email Format

EML Internet Message Format
(.eml, .mht, .mhtml)
MSG Microsoft Outlook Item Message Format

PST Microsoft Personal Folders Format

Software/Computer Code Formats

  Highest Confidence Medium Confidence Lowest Confidence
Software or Computer Code   Computer program source code
Compiled or executable files