CHAH Recommendation

Memorandum of Understanding

Herbarium Exchange Protocol 2020
Data Delivery Guide

A Darwin Core Archive (DwC-A)¹ is a file suitable for exchanging biodiversity data. As of 2020, it is the leading solution to the problem of data exchange between collections and between collections and GBIF.

There are several ways to deliver exchange data to another institution: using IPT or manual generation.

GBIF IPT

One of the simplest ways is to install a copy of GBIF IPT, then prepare your data as follows:

generate an occurrence data file from your local specimen database
import this data into IPT
map the data to Darwin Core’s Occurrence data class
fill in the metadata for the dataset
publish and register your dataset.

Specify

Specify is capable of generating Darwin Core Archives. Version 7 also supports extensions.

Manual Generation

In the simplest case, an archive can be represented by a single CSV file if it contains only Darwin Core-compatible data and the header row uses only the names of Darwin Core terms. In the more common case, the file is a Zip archive containing:

One core file usually representing occurrence, taxon or event data.
One or more extension files representing other data that relates to the core data.
A metadata file that specifies the relationships between the core and extension files.
A resource metadata file that describes the archive’s content.

Each data file can be a CSV or other delimited data file, while the metadata and resource metadata files are small XML files defined by DwC-A and Ecological Metadata Language (EML) respectively. For general use, any terms can be used in an archive so long as the sender and the recipient agree on the archive’s content, but for full GBIF support, the core file must contain occurrence, taxon or event data and extension files must contain data types that are available in GBIF’s Extension Repository.

In the examples below, values such as abf543g are used in place of a UUID or URI to aid readability.

Export the data

The first step in preparing an archive is to generate a set of delimited text files that contain the data to be sent. Herbarium exchange is usually focussed on specimens, so the core file should contain occurrence data.

The names of columns in a file’s header row do not need to match the names of terms in Darwin Core, HISPID 6 or the other standards used in the Herbarium Exchange Protocol 2020, but in this case you must use a metadata file to specify the data type for each column.

Occurrences

Report a set of occurrence data fields from your institutional specimen database. If it is possible to add a header row, do this as it can be difficult to add it later if the data file is large.

occurrenceID,basisOfRecord,recordedBy,recordedByID,typeOfType,typeStatus
abf543g,PreservedSpecimen,"Mueller, F.",https://www.wikidata.org/entity/Q708002,lectotype,"lectotype of Banksia ornata Meisn. 1854. Plantae Muellerianae: Proteaceae. Linnaea: ein Journal für die Botanik in ihrem ganzen Umfange, oder Beiträge zur Pflanzenkunde 26"
nqy908e,PreservedSpecimen,"Thiele, K.R.",https://orcid.org/0000-0002-6658-6636,,
jjn449z,PreservedSpecimen,"Nicolle, D.",https://viaf.org/viaf/92803422,,
qao951r,PreservedSpecimen,"Maslin, B.R.|Reid, J.E.",https://viaf.org/viaf/94978723|,,

Figure 1. Example occurrence data for the core file. Note that this includes the recommended fields for agent unique identifiers and type status.

For each additional data type that you want to exchange, report the fields of that data to separate extension files. Note that a separate file is only required if the archive contains many rows in the extension file that point to a single row in the core file.

Resource Relationships

resourceID,relationshipOfResource,relatedResoureID
neo292a,duplicate of,abf543g
sar404e,duplicate of,nqy908e
cpz098p,duplicate of,qao951r

Figure3. Example resource relationship data for specimen duplicate provenance.

Permits

occurrenceID,permitStatus,permitType,permitURI
abf543g,Permit not required,Other,
nqy908e,Permit available,Collecting Permit,3ff78acc
nqy908e,Permit available,Import Permit,8c583b91
jjn449z,Permit available,Collecting Permit,2779127f
qao951r,Permit available,Collecting Permit,bffd544b

Figure 2. Example permit data. Note that two of the entries point to the same occurrence record.

In each file, confirm that the data in each column follows the requirements of the term’s vocabulary and modify the data if necessary. We will describe later how to validate your archive using the DwC-A Validator.

Create the Metadata file

Open a text editor and create the metadata XML file by pasting in the following XML and editing the files and field entries to match your data. The Metafile documentation explains the elements in the XML structure. An example of a metadata file is available at: https://github.com/rbgvictoria/dehispidator/blob/master/archive/meta.xml.

<?xml version="1.0" encoding="UTF-8"?>
    <archive xmlns="http://rs.tdwg.org/dwc/text/"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://rs.tdwg.org/dwc/text/
                                 http://rs.tdwg.org/dwc/text/tdwg_dwc_text.xsd">
        <core rowType="http://rs.tdwg.org/dwc/xsd/simpledarwincore/SimpleDarwinRecord"
              ignoreHeaderLines="1" linesTerminatedBy="\r\n" encoding="UTF-8">
            <files><location>occurrences.csv</location></files>
            <id index="0"/>
            <field index="0" term="http://rs.tdwg.org/dwc/terms/occurrenceID"/>
            <field index="1" term="http://rs.tdwg.org/dwc/terms/basisOfRecord"/>
            <field index="2" term="http://rs.tdwg.org/dwc/terms/recordedBy"/>
            <field index="3" term="http://rs.tdwg.org/dwc/iri/recordedBy"/>
            <field index="4" term="http://hiscom.chah.org.au/hispid/terms/typeOfType"
                   vocabulary="http://hiscom.chah.org.au/hispid/vocabulary/type_of_type.xml"/>
            <field index="5" term="http://rs.tdwg.org/dwc/terms/typeStatus"/>
        </core>
        <extension rowType="http://rs.tdwg.org/dwc/terms/ResourceRelationship"
                   ignoreHeaderLines="1" linesTerminatedBy="\r\n" encoding="UTF-8">
            <files><location>resource-relationships.csv</location></files>
            <coreId index="2"/>
            <field index="0" term="http://rs.tdwg.org/dwc/terms/resourceID"/>
            <field index="1" term="http://rs.tdwg.org/dwc/terms/relationshipOfResource"/>
            <field index="2" term="http://rs.tdwg.org/dwc/terms/relatedResourceID"/>
        </extension>
        <extension rowType="http://data.ggbn.org/schemas/ggbn/terms/Permit"
                   ignoreHeaderLines="1" linesTerminatedBy="\r\n" encoding="UTF-8">
            <files><location>permits.csv</location></files>
            <coreId index="0"/>
            <field index="1" term="http://data.ggbn.org/schemas/ggbn/terms/permitStatus"
                   vocabulary="http://rs.gbif.org/vocabulary/ggbn/permit_status.xml"/>
            <field index="2" term="http://data.ggbn.org/schemas/ggbn/terms/permitType"
                   vocabulary="http://rs.gbif.org/vocabulary/ggbn/permit_type.xml"/>
            <field index="3" term="http://data.ggbn.org/schemas/ggbn/terms/permitURI"/>
        </extension>
    </archive>

Figure 4. Example metadata file content.

Create the resource metadata file

Open a text editor and create a file called “resource.xml”. Paste in the following XML and edit the field entries to match your details.

<?xml version="1.0"?>
<eml:eml
        xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 xsd/eml.xsd">

    <dataset>
        <title>Herbarium exchange for ...</title>
        <creator id="https://orcid.org/0000-0000-0000-0000">
            <individualName>
                <givenName>Staff</givenName>
                <surName>Member</surName>
            </individualName>
            <electronicMailAddress>staff.member@example.com</electronicMailAddress>
            <userId directory="https://orcid.org">https://orcid.org/0000-0000-0000-0000</userId>
        </creator>
        <keywordSet>
            <keyword>herbarium</keyword>
            <keyword>exchange</keyword>
        </keywordSet>
        <contact>
            <references>https://orcid.org/0000-0000-0000-0000</references>
        </contact>
    </dataset>
</eml:eml>

Figure 5. Example resource metadata file content.

Create the archive

Archive each of the above files into a Zip file and you’ve successfully created a Darwin Core Text Archive.

On Windows, you can create a zip file containing the using 7-Zip or WinZip and either open your data files, or drag them into the 7-Zip or WinZip window and save.
On Macintosh, you can select all the files, ctrl click on them and choose “Compress .. items”. The resulting zip archive will be called “Archive.zip” in the same directory.

Validate the archive

To confirm that the archive’s structure and data is correctly formed, check it using GBIF’s data validator.

Send it

This file can now be:

sent by email (if it is small enough),
placed on a file sharing site such as DropBox,
uploaded to AVH/ALA manually, or
placed on an externally-visible web server for download by others, such as the ALA.

References

GBIF (2017). Darwin Core Archives – How-to Guide, version 2.0, released on 9 May 2011, (contributed by Remsen D, Braak, K, Döring M, Robertson, T), Copenhagen: Global Biodiversity Information Facility, accessible online at: https://github.com/gbif/ipt/wiki/DwCAHowToGuide.