Metadata
Introduction
Metadata is data about data, and can be used to define, structure, manage and discover information. In the context of a specimen dataset, metadata includes the address of the collection, the number of specimens, the taxonomical scope, the names and definitions of the dataset fields, etc.
Metadata is no different than ‘regular data’ : one person’s data is often another person’s metadata. For example, the address of a collection is metadata for a specimen dataset, but data for a registry of collections. For a data user, good metadata will enable him to discover data and assess their appropriateness for particular needs.
Metadata standards
Data standards are used to exchange metadata (primarily used for machine-machine interaction). In the biodiversity informatics community, the standards used are:
- Ecological Metadata Language (EML)
- Resource Description Framework (RDF)
- Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
All standards are expressed as XML. Datasets which are published via the IPT automatically express their metadata as EML.
Registries
In order to allow the discovery of data, a dataset/collection not only needs metadata, but also needs to be registered somewhere. For collections, such indexes/registries include:
Registered collections can choose a unique code (e.g. MT), which can be referenced in literature. Unfortunately, some codes are not unique across discipline or continent, which is one of the reasons why the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG) and the Royal Botanic Garden Edinburgh developed the Biodiversity Collection Index (BCI).
Biodiversity Collection Index (BCI)
The Biodiversity Collection Index (BCI) is a world-wide index to biological collections and assigns a globally unique LSID to each collection, while still keeping the original code. Information (metadata) about each collection is harvested from existing registries, and expressed via several standards (including RDF and OAI-PMH). Users can update or add information on the BCI website.
Global Biodiversity Resources Discovery System (GBRDS)
BCI information will be used in GBIF’s more ambitious attempt to create an index of not only collections, but any biodiversity information in the world. This Global Biodiversity Resources Discovery System (GBRDS) is currently under development. Datasets published via the IPT will be registered automatically.
Metadata and Canadensys
Since most Canadensys collections are already registered in the Biodiversity Collection Index (BCI) (via Index Herbariorum and the Insect and Spider Collections of the World), we will use the BCI’s services as a unified way to harvest collection metadata. Curators should review their collection information in BCI to insure it is up to date.
- Peter Desmet Biodiversity Collection Index (BCI)