Digitization
Introduction
Digitization is the process of recording biodiversity information in a digital form. The digitization of biological specimens allows curators to manage their collections more efficiently and to publish the information, which enables it to be used and studied in different ways.
There are two kinds of digitization:
- Text digitization: digitally recording the label data associated with a specimen, as text. When we use the word digitization, we generally mean text digitization.
- Imaging: creating a digital image of a specimen. See the imaging page for more information.
Digitization can be a very time-consuming process and it is therefore important to make it as efficient and error-proof as possible. See the documents for guidelines.
Tools
- List of software for biological collection management
- North Carolina State University Chirographum Historicum, handwriting samples of 75+ botanical collectors.
- Conservatoire et Jardin botaniques de la Ville de Genève Auxilium Ad Botanicorum Graphicem, handwriting samples of botanists, as published in Candollea between 1972 and 1979 (in French).
Documents
- Larry Speers, 2009. From ink to electrons : Issues to be considered
- GBIF, 2008. GBIF Training Manual 1 : Digitization of natural history collections data
- Frazier et al., 2008. Initiating a collection digitization project
- Morris, 2005. Relational database design and implementation for biodiversity informatics.
- Willemse & Mols, 2007. Data guidelines : Collection data registration at the Nationaal Herbarium Nederland
- Utah Valley University Herbarium How to build your own virtual herbarium
- Lampe & Striebing, 2005. How to digitize large insect collections? – Preliminary results of the dig project
Data priorities for Canadensys
Only part of the Canadensys’ collections can be digitized with the current funding. To maximize the use of the generated data for research, we need to prioritize what to digitize first.
Text information captured on the specimen label is sufficient for most research using biological collection information. Typically this information is the “what, when and where” of a specimen, but often much richer information can be found and used. Since it is hard to predict all the possible data uses (see Uses of primary species-occurrence data), it is better to capture as much information as possible.
Since all our information is published using the Darwin Core standard, it is useful to consult the Darwin Core terms as an indication of what information to capture: these are the most commonly used elements of biodiversity information and the result of several use case analyses.
Collections should also prioritize the digitization of taxonomic and geographic groups that are most valuable for research. This includes the groups that are unique or well-represented in a collection, groups that receive research attention from the collection researchers or the Canadensys community (e.g. the genus Carex), rare and invasive species, and groups that can contribute to climate studies.