Skip to Main Content

Research Data Services

Information about how to organize, describe, preserve and share your research data

Intellectual Property Rights & Licensing Data

Introduction

Intellectual property (IP) rights management is an important part of any data management program. As a builder of a database or other data resource, you would have an interest in who owns that resource and how it might be used. As someone who may populate that resource with data provided in part by others, you will want to make sure that all legal, ethical, and professional obligations that you may have to the provider of the data are met. And because the benefits of data sharing are so well known and documented, you may wish to share your database and/or content with others. Your work can only fully be utilized by others if they know what terms (if any) you have placed on the use of the data. This fact sheet provides a brief overview of some of the issues associated with managing IPR in data projects.


Layers of copyright protection in a dataset

In any data project, there are likely to be two components. The first is the data that you have collected/assembled/generated. Think of it as the raw content in the system. It could be hourly temperature readings from a sensor, the age of individuals in a survey, recordings of individual voices, or photographs of plant specimens. The second component is the data system in which the data is stored and managed.

We usually don’t think of data content separate from the system in which it is stored, but the distinction is important in terms of intellectual property rights. The question is what, if anything, is protected by copyright. Data that is factual has no copyright protection under U.S. law; facts can’t be copyrighted. But not all data is in the public domain. A project might, for example, be built around copyrighted photographs; the photographs are part of the project’s “data.” But in many cases, the data in a data management system as well as the metadata describing that data will be factual, and hence not protected by copyright.

The organization of the data in a database, on the other hand, can have a thin layer of copyright protection. Deciding what data needs to be included in a database, how to organize the data, and how to relate different data elements are all creative decisions that may receive copyright protection.

Datasets can include other creative ways of documenting and explaining the data, such as annotations or visualizations. Charts and figures, if they are sufficiently original, are protected by copyright.  

Datasets can also include different subsets of data, some of which are covered by copyright, and some of which aren't (for example, a collection of csv files, factual and not protected by copyright, and a collection of software programs that creatively combine, operate and visualize the data). 

Because of the different copyright status of the different layers of a dataset, different mechanisms are required to manage each. Copyright can govern the use of databases and some data content (that which is itself original). Contract law, trademarks, and other mechanisms are required to regulate factual data. In order to protect a dataset made of factual information plans need to be made before making the data publicly available. Otherwise the data will be in the public domain and it may be hard to protect it.


Licensing your data for reuse by others

In order to facilitate the reuse of data, it is imperative that others know the terms on which you are making both the database and the data content available. Fortunately, the Open Data Commons group (http://opendatacommons.org/) has been developing legally binding tools to govern the use of data sets. Using a combination of copyright and contractual standards, they have created three standard licenses that can be used in conjunction with data projects. In addition, it is possible to articulate a set of “community norms” that complement the use of formal licenses. While not having the force of law, norms can express the shared beliefs of a community vis-à-vis data sharing and reuse.

The three ODC licenses are:

  1. Public Domain Dedication and License (PDDL): This dedicates the database and its content to the public domain, free for everyone to use as they see fit.
  2. Attribution License (ODC-By): Users are free to use the database and its content in new and different ways, provided they provide attribution to the source of the data and/or the database.
  3. Open Database License (ODC-ODbL): ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.

Creative Commons (http://www.creativecommons.org) also has a library of standardized licenses, and some of them can be applied to data and databases. The ODC-By license, for example, is the equivalent of a Creative Commons Attribution license (CC BY). CC BY licenses, however, are based on copyright ownership of the underlying work, whereas the ODC-By license can apply to works that are not protected by copyright (such as factual data).

The three CC licenses that are of greatest relevance to data management are:

  1. CC0 (i.e., "CC Zero"): When an owner wishes to waive her copyright and/or database rights, the CC0 mark can be used. It effectively places the database and data into the public domain. It is the functional equivalent of an ODC PDDL license.
  2. Public Domain mark (PDM): It is used to mark works that are in the public domain, and for which there are no known copyright or database restrictions. Factual data in a database, for example, might be flagged as PDM in order to make it clear it is free to use.
  3. CC-BY: It is used when an owner wishes to allow their copyrighted work to be reused and shared with the condition that appropriate credit is given. 


Which license to select?

There is no single right answer as to which license to assign to a database or content. Note, however, that anything other than an ODC PDDL or CC0 license may cause serious problems for subsequent scientists and other users. This is because of the problem of attribution stacking. It may be possible to extract data from a data set, use it in your own research project, and still maintain information as to the source of that data. Data could conceivably come from hundreds of sources, however, with each source wishing to be acknowledged. Furthermore, the data in the other databases may not have originated with it, but have been extracted from other databases that also demand attribution. Rather than legally require that everyone provide attribution to the data, it might be enough to have a community norm that says “if you make extensive use of data from this data set, please credit the authors.”

CC-BY licenses are often used for works that are not covered by copyright to encourage attribution. This practice is not recommended because it adds confusion to how to reuse a dataset. If you are trying to decide between CC0 and CC-BY consider that:

  • Attaching a license to something that is in the public domain (like factual data that has been publicly shared) does not give you any extra rights. 
  • In academia, it is expected that if somebody uses your work or builds upon it, you will be given attribution. Not doing that, even if the work is in the public domain, can be considered plagiarism, which is a serious form or research misconduct. You should not need to license something as CC-BY to encourage attribution in academic circles. 

 

This license wizard created by the Institute of Formal and Applied Linguistics may help you decide which License to use for your dataset or software: https://ufal.github.io/public-license-selector/ 


Who is the owner of factual data and copyrighted data sets?

The ownership of works produced by OSU faculty, students, and non-academic staff is governed by the OUS Internal Management Directive 6.215 - Rights to Inventions, Technological Improvements, Educational and Professional Materials, and especially the OSU Patent Policy. The precise answer will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether the work has, "been developed in the course of employment” or "during conduct of normal activities"; and, whether substantial university resources were used in the creation of the work.

 

Further reading

“How to License Research Data” at http://www.dcc.ac.uk/resources/how-guides/license-research-data. Written with British law in mind, but it has a good discussion of the pros and cons of the ODC licenses.

Jordan S. Hatcher: "Why we can't use the same open licensing approach for databases as we do for content and software." https://www.semantic-web.at/news/jordan-s-hatcher-x22-why-we-can-x27-t-use-the-same-open-licensing-approach-for-databases-a

“Open Data” Wikipedia, http://en.wikipedia.org/wiki/Open_data

Naomi Korn and Professor Charles Oppenheim, “Licensing Open Data: A Practical Guide” http://discovery.ac.uk/files/pdf/Licensing_Open_Data_A_Practical_Guide.pdf. Another guide written with UK law in mind, but with a helpful comparison of CC and ODC licensing options.

Guide to Open data licensing by opendefinition.org: http://opendefinition.org/guide/data/

Carroll, M. W. (2015). Sharing Research Data and Intellectual Property Law: A Primer. PLOS Biology, 13(8), e1002235. https://doi.org/10.1371/journal.pbio.1002235

 

 

(Original content gratefully adapted from Peter Hirtle, Senior Policy Advisor, Cornell University Library)

Data Licensing Organizations