dataverse-external-vocab-support

Scripts and material related to using external vocabulary services with Dataverse

View the Project on GitHub

Dataverse External Vocabulary Management

Dataverse supports the use of third-party vocabulary and persistent identifier (PID) services through a generic external vocabulary support mechanism that allows service-specific scripts, and field-specific json configurations added via a Dataverse setting that allows specification of how fields in Dataverse metadatablocks are to be associated with specific services and vocabularies.

For example, instead of a plain text type in, one could select a term from multiple vocabularies:

Select a vocabulary

Input2

and then a term

Input3

and have them displayed as a link to the remote site defining the term:

Display2

Or, with additions planned for Dataverse 6.4, one could replace the four author related fields with selectors for ORCID (people) and ROR (organizations)

Input1

which would display as entries with icons that link to the definition pages.

Display1

and can still support entering info for people/organizations who do not have ORCID or ROR entries.

Display can also be graphical, as in displaying Local Contexts Notices and Labels

image

Repository Contents

This repository contains scripts and example materials that demonstrate how to configure Dataverse to leverage them. They are a mixture of initial proofs-of-concept, demonstrations of alternative approaches, and some that are potentially mature enough for production use (although the latter often require later versions of Dataverse which have extensions/bug fixes for the underlying mechanism. Documentation in the /examples subdirectory provides additional details for specific scripts and configuration for specific fields.

It also contains a JSON Schema that can be used to validate configuration files.

Scripts in Production

The following scripts/config files are being used in production (or testing) at one or more Dataverse Sites

Deployment

In general there are four steps to add interaction with a vocabulary or PID service to Dataverse:

To deploy scripts to multiple fields, you need to add one section (JSON Object) per field/script combo to the array in your config file.

How It All Works

The basic idea of the Dataverse External Vocabulary mechanism is to simplify adding and displaying controlled terms and PIDs as metadata. As far as Dataverse is concerned, all that is happening is that a term or PID URI is being entered into a text field and Dataverse then stores and displays the term/PID URI. The interesting part is that a JavaScript is taking over Dataverse’s text input and text display to instead provide support such as a type-ahead lookup from a vocabulary/PID service and, on the diplay side, displaying the human-readable name of associated with the term/PID, and potentially additional metadata about the term/PID, rather than the raw URI.

The scripts know which fields to manage based on some invisible data-cvoc-* attributes Dataverse adds to the page’s HTML. Dataverse has a flexible configuration mechanism to allow admins to specify which fields should be associated with which scripts, but, in other repositories, these associations could be static. For example, this simple static example page shows the ORCID and ROR scripts associated with two input and two display fields. You can look at the page source to see the additional attributes in the HTML that make this work.

There’s more of course. When a repository already has separate subfields for names and identifiers, scripts can be written to fill in both. If the underlying vocabulary/PID service supports multiple vocabularies, or has an advanced search mechanism, the scipts can be written to let you select which vocabulary to use or provide an advanced search interface. If there’s a field where you want to be able to handle free text as well as controlled terms/PIDs, scripts can support that as well. Dataverse also includes a mechanism to allow metadata about the terms/PIDs to be captured, making it possible to provide internationalization for search (i.e. allowing search in your language for a term), include organization acronyms in exported metadata formats, etc. Fortunately, most of this complexity is handled by script/config example developers and Dataverse admins just need to select which ones to install.

For further details, see James D. Myers, & Vyacheslav Tykhonov. (2023). A Plug-in Approach to Controlled Vocabulary Support in Dataverse. DOI

Packages

The directory packages include complete working sets of metadatablock.tsv / cvoc config and / js files.