Text digitization
Introduction
Digitization is the process of recording specimen information in a digital form. This allows the collection to be managed more efficiently and the data to be published, which enables it to be used and studied in different ways. There are two kinds of digitization:
- Text digitization: digitally recording the label data associated with a specimen, as text. When we use the word digitization, we generally mean text digitization.
- Imaging: creating a digital image of a specimen. See the imaging page for more information.
Digitization can be a very time-consuming process and it is therefore important to make it as efficient and error-proof as possible. See the documents for guidelines.
Tools
- North Carolina State University Chirographum Historicum, handwriting samples of 75+ botanical collectors.
- Conservatoire et Jardin botaniques de la Ville de Genève Auxilium Ad Botanicorum Graphicem, handwriting samples of botanists, as published in Candollea between 1972 and 1979 (in French).
- List of Software for biological collection management
- Specify, an open source collection management tool.
Documents
- Larry Speers From ink to electrons : Issues to be considered
- GBIF, 2008. GBIF Training Manual 1 : Digitization of natural history collections data
- Frazier, C.K., J. Wall & S. Grant. GBIF, 2008. Initiating a collection digitization project
- Morris, P.J. PhyloInformatics, 2005. Relational database design and implementation for biodiversity informatics.
- Willemse, L.P.M. & J.B. Mols. NHN, 2007. Data guidelines : Collection data registration at the Nationaal Herbarium Nederland
- Utah Valley University Herbarium How to build your own virtual herbarium
- Lampe, K.-H. & D. Striebing. African Biodiversity, 2005. How to digitize large insect collections? – Preliminary results of the dig project
Data priorities for Canadensys
Only part of the Canadensys’ collections can be digitized with the current funding. To maximize the use of the generated data for research, we need to prioritize what to digitize.
Text information captured on the specimen label is sufficient for most research using biological collection information. Typically this information is the “what, when and where” of a specimen, but often much richer information can be found and used. Since it is hard to assess all the possible different data uses (see Uses of primary species-occurrence data), it is better to capture as much information as possible. A useful first guideline on what to capture are the Darwin Core terms, since they are the most commonly used elements of biodiversity information and the result of several use case analyses.
Collections should also prioritize the digitization of taxonomic and geographic groups that are most valuable for research. This includes the groups that are unique or well-represented in a collection, groups that receive research attention from the Canadensys community (e.g. the genus Carex), rare and invasive species, and groups that can contribute to climate studies.