Text digitization

Introduction

Digitization is the process of recording specimen information in a digital form. This allows the collection to be managed more efficiently and the data to be published, which enables it to be used and studied in different ways. There are two kinds of digitization:

  • Text digitization: digitally recording the label data associated with a specimen, as text. When we use the word digitization, we generally mean text digitization.
  • Imaging: creating a digital image of a specimen. See the imaging page for more information.

Digitization can be a very time-consuming process and it is therefore important to make it as efficient and error-proof as possible. See the documents for guidelines.

Tools

Documents

Data priorities for Canadensys

Only part of the Canadensys’ collections can be digitized with the current funding. To maximize the use of the generated data for research, we need to prioritize what to digitize.

Text information captured on the specimen label is sufficient for most research using biological collection information. Typically this information is the “what, when and where” of a specimen, but often much richer information can be found and used. Since it is hard to assess all the possible different data uses (see Uses of primary species-occurrence data), it is better to capture as much information as possible. A useful first guideline on what to capture are the Darwin Core terms, since they are the most commonly used elements of biodiversity information and the result of several use case analyses.

Collections should also prioritize the digitization of taxonomic and geographic groups that are most valuable for research. This includes the groups that are unique or well-represented in a collection, groups that receive research attention from the Canadensys community (e.g. the genus Carex), rare and invasive species, and groups that can contribute to climate studies.