Interface ExportDataProvider
- All Superinterfaces:
CoreProvider
Exporter to create
new metadata export formats.
This interface offers multiple methods for retrieving dataset metadata in various formats and levels of detail. Exporters should choose the method that best fits their needs, considering the completeness of metadata and performance implications.
Implementation Guide
Implementers must override the context-accepting versions of all data retrieval methods. No-argument convenience methods are provided as default implementations for backward compatibility but are deprecated and will be removed in a future version.Context Handling
Implementations should respect context options where applicable. Not all methods support all context options - see individual method documentation for details. All methods require a non-nullDatasetExportQuery or FileExportQuery.
Passing null will result in a NullPointerException.
Callers should use DatasetExportQuery.defaults() respectivelly FileExportQuery.defaults() instead of passing null.- See Also:
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionDeprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.getDataCiteXml(DatasetExportQuery query) Returns dataset metadata conforming to the DataCite standard as XML.jakarta.json.JsonArrayDeprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.Stream<jakarta.json.JsonObject>Returns detailed metadata for files in the dataset.Stream<jakarta.json.JsonObject>getDatasetFileDetails(FileExportQuery query, PageRequest request) Returns detailed metadata for files in the dataset.default jakarta.json.JsonObjectDeprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.jakarta.json.JsonObjectgetDatasetJson(DatasetExportQuery query) Returns complete dataset metadata in Dataverse's standard JSON format.default jakarta.json.JsonObjectDeprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.jakarta.json.JsonObjectgetDatasetORE(DatasetExportQuery query) Returns dataset metadata in JSON-LD-based OAI-ORE format.default jakarta.json.JsonObjectDeprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.jakarta.json.JsonObjectReturns dataset metadata conforming to the schema.org standard.default Optional<InputStream>Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0.default Optional<InputStream>Returns metadata in the format specified by an Exporter's prerequisite.
-
Field Details
-
API_LEVEL
static final int API_LEVEL- See Also:
-
-
Method Details
-
getDatasetJson
Returns complete dataset metadata in Dataverse's standard JSON format.This format includes comprehensive dataset-level metadata along with basic metadata for each file in the dataset. It is the same JSON format used in the Dataverse API and available as a metadata export option in the UI.
- Parameters:
query- specification for data retrieval- Returns:
- dataset metadata in Dataverse JSON format
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- While no formal JSON schema exists for this format, it is well-documented in the Dataverse guides. Along with OAI_ORE, this is one of only two export formats that provide complete dataset and file metadata.
- Implementation Note:
- Implementations must respect the
datasetMetadataOnlyflag. When true, file-level metadata should be excluded to optimize performance for datasets with large numbers of files. Other context options (publicFilesOnly, offset, length) do not apply and should be ignored.
-
getDatasetJson
Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetDatasetJson(DatasetExportQuery)instead.Returns complete dataset metadata using default options.- Returns:
- dataset metadata in Dataverse JSON format
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-
getDatasetORE
Returns dataset metadata in JSON-LD-based OAI-ORE format.OAI-ORE (Open Archives Initiative Object Reuse and Exchange) provides a structured way to describe aggregations of web resources. This format is used in Dataverse's archival bag export mechanism and available via UI and API.
- Parameters:
query- specification for data retrieval- Returns:
- dataset metadata in OAI-ORE format
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- Along with the standard JSON format, this is one of only two export formats that provide complete dataset-level metadata along with basic file metadata for each file in the dataset.
- Implementation Note:
- Implementations must respect the
datasetMetadataOnlyflag. Other context options do not apply and should be ignored.
-
getDatasetORE
Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetDatasetORE(DatasetExportQuery)instead.Returns dataset metadata in OAI-ORE format using default options.- Returns:
- dataset metadata in OAI-ORE format
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-
getDatasetFileDetails
Returns detailed metadata for files in the dataset.For tabular files that have been successfully ingested, this may include DDI-centric metadata extracted during the ingest process. This detailed metadata is not available through other methods in this interface.
The query may specify filters to skip certain files or how much metadata details should be included. The resulting stream will contain a limited number of elements only, specified by a
PageRequest, avoiding huge memory allocations in the provider.- Parameters:
query- specification for file data retrievalrequest- the page request containing pagination information such as page offset and page size- Returns:
- JSON array with one entry per dataset file (both tabular and non-tabular)
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query or request is null- Since:
- 2.1.0
- API Note:
- No formal JSON schema is available for this output. The format is not extensively documented; implementers may wish to examine the DDIExporter and JSONPrinter classes in the Dataverse codebase for usage examples.
-
getDatasetFileDetails
Returns detailed metadata for files in the dataset.For tabular files that have been successfully ingested, this may include DDI-centric metadata extracted during the ingest process. This detailed metadata is not available through other methods in this interface.
The query may specify filters to skip certain files or how much metadata details should be included. The resulting stream will contain all matching files for consumption. In cases with large metadata quantities use
getDatasetFileDetails(FileExportQuery,PageRequest)for a stream containing a limited number of elements only, avoiding huge memory allocations in the provider.- Parameters:
query- specification for file data retrieval- Returns:
- JSON array with one entry per dataset file (both tabular and non-tabular)
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- No formal JSON schema is available for this output. The format is not extensively documented; implementers may wish to examine the DDIExporter and JSONPrinter classes in the Dataverse codebase for usage examples.
-
getDatasetFileDetails
Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetDatasetFileDetails(FileExportQuery)orgetDatasetFileDetails(FileExportQuery, PageRequest)instead.Returns detailed metadata for all files using default options.Note that this method will serialize all file metadata into one large JSON array. This can be memory-intensive for large datasets and should be used judiciously. There have been reports of unexportable large datasets in production installations. Using
getDatasetFileDetails(FileExportQuery)instead is advised.- Returns:
- JSON array with one JSON object entry per dataset file
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-
getDatasetSchemaDotOrg
Returns dataset metadata conforming to the schema.org standard.This metadata subset is used in dataset page headers to improve discoverability by search engines. It provides structured data markup (JSON-LD) following the schema.org vocabulary.
- Parameters:
query- specification for data retrieval- Returns:
- dataset metadata in schema.org format
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- This metadata export is not complete. It should only be used as a starting point for an Exporter if it simplifies implementation compared to using the complete JSON or OAI_ORE exports.
- Implementation Note:
- All context options are ignored by this method.
-
getDatasetSchemaDotOrg
@Deprecated(since="2.1.0", forRemoval=true) default jakarta.json.JsonObject getDatasetSchemaDotOrg()Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetDatasetSchemaDotOrg(DatasetExportQuery)instead.Returns dataset metadata in schema.org format using default options.- Returns:
- dataset metadata in schema.org format
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-
getDataCiteXml
Returns dataset metadata conforming to the DataCite standard as XML.This is the same metadata format sent to DataCite when DataCite DOIs are used. It provides citation metadata following the DataCite Metadata Schema.
Note: the returned XML document can easily be queried using XPath and other techniques
- Parameters:
query- specification for data retrieval- Returns:
- dataset metadata as DataCite XML string
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- This metadata export is not complete. It should only be used as a starting point for an Exporter if it simplifies implementation compared to using the complete JSON or OAI_ORE exports.
- Implementation Note:
- All context options are ignored by this method.
-
getDataCiteXml
Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetDataCiteXml(DatasetExportQuery)instead.Returns dataset metadata in DataCite XML format using default options.- Returns:
- dataset metadata as DataCite XML string
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-
getPrerequisiteInputStream
Returns metadata in the format specified by an Exporter's prerequisite.Some Exporters transform metadata from one standard format to another (e.g., DDI XML to DDI HTML). Such Exporters declare a prerequisite format via
Exporter.getPrerequisiteFormatName(), and this method provides access to that prerequisite metadata.- Parameters:
query- specifcation passed to the prerequisite exporter- Returns:
- metadata in the prerequisite format, or empty if no prerequisite is configured
- Throws:
ExportException- if metadata retrieval failsNullPointerException- if the query is null- Since:
- 2.1.0
- API Note:
- This is useful for creating alternate representations of the same metadata (e.g., XML, HTML, PDF versions of a standard like DDI), especially when conversion libraries exist. Note that if a third-party Exporter replaces the internal exporter you depend on, this method may return unexpected results.
- Implementation Note:
- The default implementation returns empty. Override only if your provider supports prerequisite format chaining. The prerequisite exporter receives the same context as specified in this call.
-
getPrerequisiteInputStream
@Deprecated(since="2.1.0", forRemoval=true) default Optional<InputStream> getPrerequisiteInputStream()Deprecated, for removal: This API element is subject to removal in a future version.since 2.1.0, for removal in 3.0.0. UsegetPrerequisiteInputStream(DatasetExportQuery)instead.Returns metadata in the prerequisite format using default options.- Returns:
- metadata in the prerequisite format, or empty if no prerequisite is configured
- Throws:
ExportException- if metadata retrieval fails- Since:
- 1.0.0
-