Skip to Main Content

Collection Development and Maintenance

Datasets

Dataset Acquisitions

OSU Libraries acquire datasets to improve discovery and access for learners and researchers that might not otherwise be able to retrieve the content.

Datasets are recorded, factual information and their documentation, regardless of the form of the media on which they are recorded, that are used for research or educational purposes. Examples include, but are not limited to, quantitative and qualitative data, tabular data, textual data, images, spatial data, and statistics.

Scope:

  • Quantitative and qualitative data, tabular data, textual data, images, spatial data, and statistical data will all be considered.
  • To be included in our collections, data can be purchased, licensed, or open.
    • Purchased data are data that the OSU Libraries own outright (although there may be terms of use in a contract).
    • Licensed data are data that the OSU Libraries essentially rents for a period of time, which may be predetermined or on-going.
    • Open data are data that are openly available to the public.
  • Data should be relevant to research and curricular needs of OSU.
  • Any time period, geography, or language will be considered.
  • Not in scope for this policy are:
    • Any data resources published in print or as an ebook, unpublished print lists, or any other collection of organized data in analog form.
    • Data visualizations collected as supplementary materials for dataset acquisition.
    • Datasets generated by OSU researchers in the course of their research. These datasets can be deposited, shared and preserved through OSU’s institutional repository, ScholarsArchive@OSU.

Considerations:

  • In order to maximize impact, we strive to build a data collection useful to and accessible by the largest number of OSU constituents. Datasets purchased by OSU Libraries should have broad curricular and research interest in the data to be acquired. Preference will be given to requests for data and datasets that meet this consideration. Data cannot be restricted to individual schools, departments, buildings, or specific groups.
  • The OSU Libraries cannot acquire “single user” datasets for individuals. That is, the OSU Libraries cannot acquire datasets with restrictive license terms or technical requirements limiting use to one individual or group, to one specific project or purpose. Perpetual access datasets are preferred.
  • The OSU Libraries are not responsible for downloading or making available any specific software that might be required to read or analyze the data.
  • Acquisition decisions will be considered by the Data Management Specialist, the relevant Liaison Librarian(s), Discovery Services, Collections Management, and representatives of the Emerging Technologies and Services department as necessary.
  • Size will be a consideration for acquiring datasets hosted by OSU Libraries.
  • Data will not require frequent or costly updates.
  • The vendor is a reliable business partner that supports patron privacy and current accessibility standards.
  • Data quality is guided by:
    • The FAIR principles: The framework used by the academic community to create data that can be reused. To be FAIR data must be Findable, Accessible, Interoperable and Reusable. More information on the FAIR principles is linked linked below. OSU Libraries will give priority to datasets that are documented, recorded and distributed in a way that follows the guidance of the FAIR principles, to maximize the usability of the data by the OSU community.
      • Data recorded in formats that are preferably:
        • Actionable and machine readable with contemporary computing infrastructure.
        • Platform independent, accessible in all operating systems.
        • Character-based formats are preferred over native or binary formats.
        • Non-proprietary or don’t require specialized software.
        • Not encrypted.
      • Datasets accompanied by comprehensive and accurate documentation and metadata, including information necessary to catalog and administer the dataset, and to cite the dataset accurately (for example, creator, title, publisher, abstract, etc.) and information necessary to understand and interpret all the information in the dataset accurately (this may include data dictionaries, methodological protocols, code or software used to create the dataset, or codebooks).
    • Data from a credible and reliable source.

 

Policy Information

This policy was last reviewed October 2022.