Link Love: DOIs for Darwin Core Archives

Canadensys now assigns digital object identifiers (DOIs) to the Darwin Core Archives (DwC-A) it hosts and serves via its Integrated Publishing Toolkit (IPT) repository of checklist and occurrence data. For example, the DOI for the Royal Ontario Museum’s Green Plant Herbarium (TRT) is doi:10.5886/g7j6gct1 and that for the University of Montréal Biodiversity Centre’s Marie-Victorin Herbarium (MT) is doi:10.5072/rzav8bu2. We adjusted the recommended citation format for data packages that have DOIs. Other packages are waiting in our queue and will receive similar treatment.

Digital object identifiers are the identifiers for scholarly works thanks in large part to CrossRef, one of the first DOI Registration Agencies. The primary purpose of DOIs is to prevent link rot; clients take responsibility for maintaining the redirect link associated with a DOI. CrossRef has built valuable services off its store of metadata such as CrossCheck and Cited-by Linking that prevents scholarly and professional plagiarism and discovers how publications are being cited, respectively. Through CrossRef’s services, you can discover who cited your research paper.

DataCite is the DOI Registration Agency responsible for developing the requisite metadata schema and services for scholarly data that may one day be comparable to CrossRef’s CrossCheck and Cited-by Linking. Data are distinctly different from published works; accurate provenance is a very difficult proposition when data are reused, mixed, merged, and repartitioned. While DataCite and its community determine how to best create such services for scholars, creating metadata documents and submitting these to DataCite still has its advantages for data archives and end users. Search and discovery are easier for consumers when metadata are centralized. For example, searching for “Royal Ontario Museum” on DataCite’s website produces a collection of accessible data packages and the metadata associated with these packages can be examined for suitability. The metadata for the Green Plant Herbarium includes an array of downloads for incorporation into reference managing software or RDF Knowledge Bases. DataCite now needs essential, value-added services that integrate their store of metadata with other scholarly works, perhaps in a manner similar to Dryad.

DataCite Canada services are offered in cooperation with DataCite International and are being coordinated by the National Research Council of Canada Institute for Scientific and Technical Information (NRC-CISTI). According to their website, DataCite Canada “…makes it easy for Canadian research organizations to obtain and manage DOIs for their research data.” We at Canadensys can attest to that. From first communications via email, to signing of agreements, to submitting our first sets of metadata and assigning DOIs, we were impressed by their professionalism, useful documentation, and support. Best of all, DOIs through DataCite Canada are free!

Once an agreement with DataCite Canada is signed and an account created, assigning a DOI to a data package requires the submission of metadata according to a well defined schema. There are a number of required and optional elements. Thankfully, the metadata auto-generated in each Darwin Core Archive from our Integrated Publishing Toolkit has elements that closely align with those required by DataCite. Christian Gendreau wrote a translation tool that converts one metadata file to the other, released his work on GitHub, and rolled it into our data management workflow. We hope this can be incorporated into future versions of GBIF’s Integrated Publishing Toolkit such that others may have an out-of-the-box experience.

We recognize that assigning a DOI to a package of species occurrence data is not ideal. Individual specimens are the units that should receive DOIs because they are cited in the primary literature. Nonetheless, Canadensys has started walking down a path alongside International data repositories. Significant hurdles at the level of the collection were identified in this process. Our hope is that solutions to some of these problems can be equally useful to specimen-level identifiers. We are encouraged by the work done by the BiSciCol team and will learn from their efforts. We can say without doubt that identifiers for checklist files will be immensely valuable. Most of the datasets served from the Canadensys Integrated Publishing Toolkit have received a CC0 (Creative Commons, Public Domain) waiver and we have data norms to encourage best practices when there is a need to cite data. A DOI will eventually help us get closer to facilitating these best practices and to automate provenance.

If Canadensys members decide to one day host their own Integrated Publishing Toolkit from their institution, we can change the metadata associated with their DOI(s). The DOI(s) will remain the same, but will magically point to their new landing page. If there are any automated statistics about how the data are being cited, these will not be lost.

If you have any requirements or suggestions about how Canadensys can best coordinate data linking and citation, please provide your comments or questions. We expect to work with DataCite Canada who will act as our voice in numerous DataCite International committees and working groups.