Why we should publish our data under Creative Commons Zero (CC0)

With the first datasets getting published and more coming soon, the issue comes up under what license we – the Canadensys community and the individual collections – will publish our data. Dealing with the legal stuff can be tedious, which is why we have looked into this issue with the Canadensys Steering Committee & Science and Technology Advisory Board before opening the discussion to the whole community.

By data we mean specimen, observation or checklist datasets published as a Darwin Core Archive and any derivatives. To keep the discussion focused, this does not include pictures or software code.

2012.01.30 – Update to post: technically CC0 is not a license, but a waiver (see comment below).

What we hope to achieve

  1. One license for the whole Canadensys community, which is easier for aggregation and sends a strong message as one community.
  2. An existing license, because we don’t want to write our own legal documents.
  3. An open license, allowing our data to be really used.
  4. A clear license, so users can focus on doing great research with the data, instead of figuring out the fine print.
  5. Giving credit where credit is due.

Our recommendation

cc-zero We recommend Canadensys participants to publish their data under Creative Commons Zero (CC0). With CC0 you waive any copyright you might have over the data(set) and dedicate it to the public domain. Users can copy, use, modify and distribute the data without asking your permission. You cannot be held liable for any (mis)use of the data either.

CC0 is recommended for data and databases and is used by hundreds of organizations. It is especially recommended for scientific data and thus encouraged by Pensoft (see their guidelines for biodiversity data papers) and Nature (see this opinion piece). Although CC0 doesn’t legally require users of the data to cite the source, it does not take away the moral responsibility to give attribution, as is common in scientific research (more about that below).

Why would I waive my copyright?

For starters, there’s very little copyright to be had in our data, datasets and databases. Copyright only applies to creative content and 99% of our data are facts, which cannot be copyrighted. We do hold copyright over some text in remarks fields, the data format or database model we chose/created, and pictures. If we consider a Darwin Core Archive (which is how we are publishing our data) the creative content is even further reduced: the data format is a standard and we only provide a link to pictures, not the pictures themselves.

Figuring out where the facts stop and where the (copyrightable) creative content begins can already be difficult for the content owner, so imagine what a legal nightmare it can become for the user. On top of that different rules are used in different countries. Publishing our data under CC0 removes any ambiguity and red tape. We waive any copyright we might have had over the creative content and our data gets the legal status of public domain. It can no longer be copyrighted by anyone.

Can’t we use another license?

Let’s go over the options. Keep in mind that these licenses only apply to the creative aspect of the dataset, not the facts. But as pointed out above, figuring this out can be difficult or impossible for the user. So much so in fact, that the user may decide not to use the data at all, especially if they think they might not meet the conditions of the license.

All rights reserved

copyright The user cannot use the data(set) without the permission of the owner.

Conclusion: Not good.

Open Data Commons Public Domain Dedication and License (PDDL)

There are no restrictions on how to use the data. This license is very similar to CC0.

Conclusion: Perfect, in fact this license was a precursor of CC0, but… it is less well known and maybe not as legally thorough as CC0. CC0 made a huge effort to cover legislation in almost all countries and the Creative Commons community is working hard to improve this even further. Therefore, if you have to choose, CC0 is probably better.

Creative Commons Attribution-NoDerivs (CC BY-ND)

by-nd The user cannot build upon the data(set), which is what most data use involves.

Conclusion: Not good, and sadly used by theplantlist.org. Roderic Page pointed this out by showing what cool things he can NOT do with the data.

Creative Commons Attribution-NonCommercial (CC BY-NC)

by-nc The user cannot use the data(set) for commercial purposes. This seems fine from an academic viewpoint, but the license is a lot more restrictive than intuitively thought. See: Hagedorn, G. et al. ZooKeys 150 (2011). Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information.

Conclusion: Not good.

Creative Commons Attribution-ShareAlike (CC BY-SA) or Open Data Commons Open Database License (ODbL)

by-sa The user has to share any work based upon the data(set) under a license that is identical or similar to the one used.

Conclusion: Good, but… this can lead to some problems for an aggregator like Canadensys or GBIF: if they are mixing and merging data with different SA licenses, which one do they choose? They might be incompatible.

Creative Commons Attribution (CC BY) or Open Data Commons Attribution License (ODC-By)

by The user has to attribute the data(set) in the manner specified by the owner. This condition is also present in the three licenses above.

Conclusion: Good, but… this can lead to impractical “attribution stacking”. If an aggregator or a user of that aggregator is using and integrating different datasets provided under a BY license, they legally have to cite the owner for each and every one of those in the manner specified by these owners (again, for the potential creative content in the data). See point 5.3 at the bottom of this Creative Commons page for a better explanation and this blog post for an example.

But giving credit is a good thing!

Absolutely, but legally enforcing it can lead to the opposite affect: a user may decide not to use the data out of fear of not completely complying with the license (see paragraph above). As hinted at the beginning of this post, CC0 removes the drastic legally enforceable requirement to give attribution, but it does not remove the moral obligation to give attribution. In fact, this has been the common practice in scientific research for many decades: legally, you don’t have to cite the research/data you’re using, but not doing so could be considered plagiarism, which would compromise your reputation and the credibility of your work.

To encourage users to give credit where credit is due, we propose to create Canadensys norms. Norms are not a legal document (see an example here), but a “code of conduct” where we declare how we would like users to use, share and cite our data, and how they can participate. We can explain how one could cite an individual specimen, a collection, a dataset or an aggregated “Canadensys” download. We can point out that our data are constantly being corrected or added to, so it is useful to keep coming back to the original repository and not to a secondary repository that may not have been updated. In addition to that, we can build tools to monitor downloads or automatically create an adequate citation. And with the arrival of data papers – which drafts can now be automatically generated from IPT – data(sets) are really brought into the realm of traditional publishing and the associated scientific recognition.

Conclusion

All this to say that there are mechanisms where both users and data owners can benefit, without the legal burden. CC0 + norms guarantees that our data can be used now and in the future. I for one will update the license for our Université de Montréal Biodiversity Centre datasets. We hope you will join us!

Thanks to the Gregor Hagedorn for his valuable advice on all the intricacies of data licensing.

  • Felix Sperling

    I think this is an excellent summary, and the section on “But giving credit is a good thing” is especially good. I’m ‘on board’ with CCO.
      –  Felix Sperling, 

  • timothy vollmer

    This is a fantastic post! One nit is that CC0 is technically not a license, but a waiver. Thanks. 

    timothy
    Creative Commons

    • http://www.linkedin.com/in/peterdesmet Peter Desmet

      Thanks a lot! I was actually wondering today why CC0 is not listed under licenses on http://www.creativecommons.org. This explains why.
      Guess I’ll have to update my post and title. :-)

      • http://www.linkedin.com/in/peterdesmet Peter Desmet

        And now updated. This kind of feedback is really useful.

  • timothy vollmer

    This is a fantastic post! One nit is that CC0 is technically not a license, but a waiver. Thanks. 

    timothy
    Creative Commons

  • Pingback: Därför ska du använda CC0 för din data | Dead Session

  • MArk Markcost

    One of best summaries of the pros and cons of licences I have read

  • Pingback: Can’t I just say “data available for educational and research use”? « Research Remix

  • Pingback: Around the Web: Some resources on the Panton Principles & open data : Confessions of a Science Librarian

  • Pingback: Open Data & The Panton Principles: Thoughts on a presentation to librarians : Confessions of a Science Librarian

  • Pingback: Around the Web: Some resources on the Panton Principles & open data – Confessions of a Science Librarian

  • Pingback: Resources on Open Access in Canada – Confessions of a Science Librarian

  • Pingback: prochain match pologne

  • Pingback: LwZkeXbi

  • Pingback: VUvCWfuh

  • Pingback: YJMWNPfC

  • Pingback: gucci bags

  • Pingback: louis vuitton bags

  • Pingback: Louis Vuitton Outlet

  • Pingback: FAKE OAKLEY SUNGLASSES

  • Pingback: diablo 3 gold

  • Pingback: maillot italie

  • Pingback: maillot foot colombie

  • Pingback: maillot colombie homme

  • Pingback: lunette de soleil ray ban

  • Pingback: maillot brésil hulk

  • Gustavo Olivares

    I see the argument for CC-0 but I do prefer CC-BY, particularly for “datasets”. It is more than “attachment” to the data, if you’re doing a meta-analysis and don’t cite it (and presumably don’t give ways of getting it) then how are the reviewers (or the readers) supposed to check that?

    In my view, attribution is more about traceability as it allows the reader of a derivative work to go back to the source and check that the data says what the paper claim it says. Who hasn’t encountered a statement in an article that points to a reference, only to find that it was not in fact the original reference and that actually the original author meant something different?

    I agree that the “BY” part puts some costs on the user but they are not “unreasonable” costs, particularly when weighted against the traceability of the data.

    • David Shorthouse

      Thanks for the comment, Gustavo. I agree that the “BY” at first seems like an attractive option for the data producer. But does it necessarily allow for traceability/verification for the consumer of a data product in a manner similar to cited literature? And, what if the consumer is building a dynamic, web-based service that gleans portions of a dataset into a new, value-added product?

      Although DataCite, http://www.datacite.org/ is making great strides toward standard ways to cite data, I haven’t yet seen widespread, cross-domain uptake. The social/administrative/technical infrastructure for data has not yet matured as it has with the publications industry. All this to say that leaving it up to the producer to specify how they require their data be cited (the “BY” in a very real legal sense) doesn’t necessarily confer traceability.

      If the data product is a static reconstitution of other static data outputs, I suppose the legal requirement for the consumer to cite his/her sources is not very difficult to manage. But, data need not be static of course. Much of it is borne digital these days and remains digital throughout its lifecycle. And, quite often a consumer needs to update fields in a dataset with fields from another dataset (eg georeferencing locality information, disambiguating scientific names, etc.). The resultant stacking of citations could very well force a consumer into the difficult position of abandoning a perfectly good dataset; the burden of managing record-level citations to fulfill the legal requirement is very challenging.

      • Gustavo Olivares

        Thanks for responding! (I only saw the date of the post after I submitted the comment … I just saw a link to this from a page on G+)

        You’re right in that traceability is not automatic just because I set citation requirements and you’re right in pointing out the work of datacite but the fact that a standard “data citation” is not here doesn’t mean that we should not cite data. To me, what it highlights is the fact that “data” is not well covered by any of the existing copyright/licensing standards and that’s why the discussions around CC-0 or CC-BY (and others).

        Maybe I am biased by my backgound on earth and engineering sciences because I always deal with static (ish) data that get merged/analysed/updated within the scientific literature and therefore the citation/attribution of the work is paramount because if my conclusions depend on “others” data, I need to point to that otherwise my conclusions can’t be tested/challenged. Which is as much traceability as it is reproducibility

        In any case, what I don’t agree with is the “nuisance” argument that citation stacking is difficult. I grant that it may not be simple to manage complex data aggregations but what was unmanageable 50 years ago I carry around on my pocket fully indexed! So my recommendation is not to remove the citation requirement but to work towards making those attributions easier to work with by supporting initiatives like datacite, orcid and creative commons to find the best framework for data that promotes knowledge development and sharing without risking the integrity of that knowledge.

  • Pingback: toms shoes outlet

  • Pingback: madagaskar

  • Pingback: dostana

  • Pingback: prediction

  • Pingback: qwxgvnmkfbrvecganfhv

  • Pingback: svsjgvgvbbvcfncggjkdf

  • Pingback: bgsvcvbhjfgmnbdvgbnhg

  • Pingback: csngrdngthnfgdsfgnsfsd

  • Pingback: gxcrcfgrtgsgabdjnhacfg

  • Pingback: garcinia cambogia extract

  • Pingback: new fat burning supplements dr oz

  • Pingback: how to get rid of bed bugs

  • Pingback: may day Wishes Poems

  • Pingback: Celebrity Images

  • Pingback: cheap insurance compani

  • Pingback: home owner insurance quotes

  • Pingback: motorcycle protection

  • Pingback: Car insurance company

  • Pingback: HotCouponsCodes.com

  • Pingback: gain credit score

  • Pingback: http://www.moebel-online-shop.com/iittala-glasvogel-by-toikka-singschwan-grau-bestellen/

  • Pingback: affordable shared web hosting