- Borrow & Request
- Collections
- Help
- Meet & Study Here
- Tech & Print
- About
To maximize the ability to share, preserve and re-use digital files, carefully consider the format you use for digital files. Selection of a file format can help you in the future by limiting the chances of your data becoming obsolete when a proprietary format is no longer supported or available. While the information in this guide is intended to support deposits to ScholarsArchive@OSU, the general recommendations apply to any digital files intended for long-term storage and use.
Formats more likely to be accessible in the future are:
Non-proprietary
Open, documented standards
In common usage by the research community
Use standard character encodings (ASCII, UTF-8)
Uncompressed (desirable, space permitting)
Use the table below to find an appropriate and recommended format for preserving and sharing your digital files over the long term. The table indicates the level of confidence that file formats will continue to be accessible over time, based on these characteristics. There may be aspects of formats that are not considered highest confidence that are desirable for near-term use. In some instances, it may be appropriate to submit proprietary or other lower-confidence formats alongside a highest-confidence format in order to facilitate near term use (for instance, an Excel/XLSX file for fully featured near-term use, submitted with a CSV file for long-term preservation).
Most content deposited to ScholarsArchive@OSU is textual in nature: theses and dissertations, research articles, presentations, technical reports, conference proceedings, posters, etc. The PDF file format is required for this content. PDF/A-1 -- ISO 19005-1 is preferred with fonts embedded (.pdf). PDF without fonts embedded is also acceptable but not recommended. To save a Microsoft word document as a PDF with fonts embedded, follow these instructions: https://www.bc.edu/content/dam/files/libraries/pdf/embed-fonts.pdf.
For other content types--such as quantitative and statistical data, spreadsheets, databases, graphics, audio, and video (among others)--use the table below to find an appropriate and recommended format for preserving and sharing your digital files in ScholarsArchive@OSU over the long term. Please note that per ScholarsArchive@OSU Preservation Policy, repository staff commit to performing format migration for any files submitted using recommended Highest Confidence formats, should those formats become obsolete. For all other file formats, only bit-level preservation is guaranteed.
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
Word Processing | PDF/A-1 (ISO 19005-1) (.pdf) PDF/UA (ISO 14289-1) (.pdf) |
Portable Document Format / PDF (All other types) (.pdf) OpenDocument Text (Open Office) (.sxw, .odt) MS Word 2007+ (OOXML) (.docx) Rich Text Format (.rtf) |
Microsoft Word (.doc) Google Docs (.gdoc) WordPerfect (.wpd) |
Plain Text | Plain text (US-ASCII, UTF-8, UTF-16 with BOM) (.txt) |
Plain text (ISO 8859-x) (.txt) |
|
Structured Text | SGML – with included DTD (.sgm, .sgml) XML – with included schema (.xml) XSL (.xsl) |
HTML (.htm, .html) Cascading Style Sheets (.css) LaTeX with referenced files (.latex, .tex) Markdown (.md) |
|
Presentations | PDF/A-1 (ISO 19005-1) (.pdf) |
Portable Document Format / PDF (.pdf) OpenDocument Presentation (Open Office) (.sxi, .odp) MS PowerPoint 2007+ (OOXML) (.pptx, .ppsx) |
Microsoft PowerPoint (.ppt, .pps) MS PowerPoint 2007+ with macros enabled (.pptm) |
Scanned Documents | PDF/A-1 (ISO 19005-1) (.pdf) TIFF – uncompressed (.tif, .tiff) JPEG2000 – lossless compression (.jp2) |
Portable Document Format / PDF (.pdf) |
|
eBooks | Open eBook File (.epub) |
Portable Document Format / PDF (.pdf) |
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
Tabular Data | Comma-Separated Values (.csv) Tab-Separated Values (.tsv) Delimited Text (.txt) |
OpenDocument Spreadsheet (Open Office) (.sxc, .ods) MS Excel 2007+ (OOXML) (.xlsx) |
Microsoft Excel (.xls) MS Excel 2007+ with macros enabled (.xlsm) |
Databases | SQLite (.sqlite, various) Software Independent Archiving of Relational Databases (SIARD) (.siard) |
dBASE / DBF (.dbf) |
|
Statistical Data | Comma-Separated Values (.csv) Delimited Text (.txt) HDF5 (.hdf) |
R (.R, .rdata) SPSS (.sav, .sps, spv, spo) SAS (.sas, .sas7dat) HDF4 (.hdf) |
Other proprietary formats |
Geospatial Data | GeoTIFF (.tif, .tiff) GeoJSON (.json, .geojson) netCDF (.nc) |
ESRI Shapefile, with component files (.shp, .shx, .dbf) ESRI Geodatabase (.gdb) ESRI Export Format (.e00) Geography Markup Language (.gml) Keyhole Markup Language (.kml, .kmz) |
Other ESRI files Other proprietary formats |
Metadata and Markup | XML – with included schema (.xml) JSON – with included metadata (.json) Data Documentation Initiative (.ddi) |
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
Images: Raster Graphics |
TIFF – uncompressed or CCITT 4 compressed (.tif, .tiff) JPEG2000 – lossless compression (.jp2) PNG – 24-bit true color (.png) |
TIFF – compressed (.tif, .tiff) JPEG2000 – lossy compression (.jp2) PNG – 8-bit indexed (.png) JPEG (.jpg, .jpeg) GIF (.gif) BMP (.bmp) DNG Digital Negative (.dng) |
PhotoShop (.psd) MrSID (.sid) Proprietary RAW files (various) |
Images: Vector Graphics |
Scalable Vector Graphics (.svg) PDF/A-1 (ISO 19005-1) (.pdf) |
Computer Graphics Metafile (.cgm) |
Encapsulated Postscript (.eps) Macromedia Flash (.swf) |
Audio | Broadcast WAVE (BWAV) – LPCM codec (.bwf) WAVE – PCM, LPCM codec (.wav) AIFF – PCM, LPCM codec (.aif, .aiff) |
MPEG Audio Layer III (.mp3) Advance Audio Coding (.mp4, .aac) Apple Lossless Audio Codec / ALAC (.m4a) Free Lossless Audio Codec (.flac) Standard MIDI (.mid) SUN Audio – uncompressed (.au, .snd) |
WAVE – compressed (.wav) AIFC – compressed AIFF (.aifc) RealAudio (.rm, .ra) Windows Media Audio (.wma) Ogg Vorbis (.ogg) |
Video | FFV1 / Matroska (.mkv) AVI – uncompressed (.avi) QuickTime – uncompressed, motion JPEG (.mov) MXF – uncompressed (.mxf) MPEG-4 – H.264 (.mp4, .m4v) SubRip (subtitle file) (.srt) |
MPEG-2 (.mp2, .mpg, .vob) MPEG-1 (.mp1, .mpg) Ogg Theora (.ogg, .ogv) Apple ProRes (.mov) Motion JPEG2000 (.jp2) |
Windows Media Video (.wmv) RealVideo (.rm, .rv) |
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
3D/CAD | Industry Foundation Class (.ifc) Standard for the Exchange of Product Model Data (.step, .stp, .p21) Initial Graphics Exchange Specification (.igs) |
AutoDesk’s Drawing Interchange File Format / Data eXchange Format (.dxf) AutoCAD (.dwg) Extensible 3D (.x3D) Universal 3D (.u3D) Portable Document Format / Engineering (PDF/E) or PDF3D (.pdf) |
Other proprietary formats |
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
Containers | ZIP – uncompressed (.zip) TAR Tape Archive (.tar) BitTorrent files (.torrent) are required for depositing files larger than 5GB. See the BitTorrent Guide for more information. |
ZIP – compressed (.zip) GNU Zip / GZip – compressed (.gz) GZip compressed tarballs (.tar.gz) |
Remember that container files are only as good as their contents! The files inside the ZIP or TAR wrapper should still adhere as closely as possible to the best practices and recommendations elsewhere in this guide.
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
MBOX Email Format (.mbox) EML Internet Message Format (.eml, .mht, .mhtml) |
MSG Microsoft Outlook Item Message Format (.msg) PST Microsoft Personal Folders Format (.pst) |
Highest Confidence | Medium Confidence | Lowest Confidence | |
---|---|---|---|
Software or Computer Code |
Computer program source code Notebook documents combining text and executable code, e.g. Jupyter Notebook files (.ipynb)* |
Compiled or executable files (various) |
*As a JSON (and thus Highest Confidence) based format, a Jupyter Notebooks file may be readily preserved as a structured text document. However, the functionality of notebook-type documents is not guaranteed long term. Depositors of Jupyter Notebooks and similar formats to ScholarsArchive@OSU are encouraged to include with their .ipnyb file(s) a PDF of the notebook with the code compiled, along with a Readme describing the environment, filesystem, dependencies, etc.
Over time, new formats emerge and recommended formats fall out of favor. This table is reviewed annually, and is subject to change. Still have questions about file formats for ScholarsArchive@OSU deposits? Contact us.
121 The Valley Library
Corvallis OR 97331–4501
Phone: 541-737-3331