Interface ExportDataProvider

All Superinterfaces:
CoreProvider

public interface ExportDataProvider extends CoreProvider
Provides dataset metadata that can be used by an Exporter to create new metadata export formats.

This interface offers multiple methods for retrieving dataset metadata in various formats and levels of detail. Exporters should choose the method that best fits their needs, considering the completeness of metadata and performance implications.

Implementation Guide

Implementers must override the context-accepting versions of all data retrieval methods. No-argument convenience methods are provided as default implementations for backward compatibility but are deprecated and will be removed in a future version.

Context Handling

Implementations should respect context options where applicable. Not all methods support all context options - see individual method documentation for details. All methods require a non-null DatasetExportQuery or FileExportQuery. Passing null will result in a NullPointerException. Callers should use DatasetExportQuery.defaults() respectivelly FileExportQuery.defaults() instead of passing null.
See Also:
  • Field Details

  • Method Details

    • getDatasetJson

      jakarta.json.JsonObject getDatasetJson(DatasetExportQuery query)
      Returns complete dataset metadata in Dataverse's standard JSON format.

      This format includes comprehensive dataset-level metadata along with basic metadata for each file in the dataset. It is the same JSON format used in the Dataverse API and available as a metadata export option in the UI.

      Parameters:
      query - specification for data retrieval
      Returns:
      dataset metadata in Dataverse JSON format
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      While no formal JSON schema exists for this format, it is well-documented in the Dataverse guides. Along with OAI_ORE, this is one of only two export formats that provide complete dataset and file metadata.
      Implementation Note:
      Implementations must respect the datasetMetadataOnly flag. When true, file-level metadata should be excluded to optimize performance for datasets with large numbers of files. Other context options (publicFilesOnly, offset, length) do not apply and should be ignored.
    • getDatasetJson

      @Deprecated(since="2.1.0", forRemoval=true) default jakarta.json.JsonObject getDatasetJson()
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.1.0, for removal in 3.0.0. Use getDatasetJson(DatasetExportQuery) instead.
      Returns complete dataset metadata using default options.
      Returns:
      dataset metadata in Dataverse JSON format
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0
    • getDatasetORE

      jakarta.json.JsonObject getDatasetORE(DatasetExportQuery query)
      Returns dataset metadata in JSON-LD-based OAI-ORE format.

      OAI-ORE (Open Archives Initiative Object Reuse and Exchange) provides a structured way to describe aggregations of web resources. This format is used in Dataverse's archival bag export mechanism and available via UI and API.

      Parameters:
      query - specification for data retrieval
      Returns:
      dataset metadata in OAI-ORE format
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      Along with the standard JSON format, this is one of only two export formats that provide complete dataset-level metadata along with basic file metadata for each file in the dataset.
      Implementation Note:
      Implementations must respect the datasetMetadataOnly flag. Other context options do not apply and should be ignored.
    • getDatasetORE

      @Deprecated(since="2.1.0", forRemoval=true) default jakarta.json.JsonObject getDatasetORE()
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.1.0, for removal in 3.0.0. Use getDatasetORE(DatasetExportQuery) instead.
      Returns dataset metadata in OAI-ORE format using default options.
      Returns:
      dataset metadata in OAI-ORE format
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0
    • getDatasetFileDetails

      Stream<jakarta.json.JsonObject> getDatasetFileDetails(FileExportQuery query, PageRequest request)
      Returns detailed metadata for files in the dataset.

      For tabular files that have been successfully ingested, this may include DDI-centric metadata extracted during the ingest process. This detailed metadata is not available through other methods in this interface.

      The query may specify filters to skip certain files or how much metadata details should be included. The resulting stream will contain a limited number of elements only, specified by a PageRequest, avoiding huge memory allocations in the provider.

      Parameters:
      query - specification for file data retrieval
      request - the page request containing pagination information such as page offset and page size
      Returns:
      JSON array with one entry per dataset file (both tabular and non-tabular)
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query or request is null
      Since:
      2.1.0
      API Note:
      No formal JSON schema is available for this output. The format is not extensively documented; implementers may wish to examine the DDIExporter and JSONPrinter classes in the Dataverse codebase for usage examples.
    • getDatasetFileDetails

      Stream<jakarta.json.JsonObject> getDatasetFileDetails(FileExportQuery query)
      Returns detailed metadata for files in the dataset.

      For tabular files that have been successfully ingested, this may include DDI-centric metadata extracted during the ingest process. This detailed metadata is not available through other methods in this interface.

      The query may specify filters to skip certain files or how much metadata details should be included. The resulting stream will contain all matching files for consumption. In cases with large metadata quantities use getDatasetFileDetails(FileExportQuery,PageRequest) for a stream containing a limited number of elements only, avoiding huge memory allocations in the provider.

      Parameters:
      query - specification for file data retrieval
      Returns:
      JSON array with one entry per dataset file (both tabular and non-tabular)
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      No formal JSON schema is available for this output. The format is not extensively documented; implementers may wish to examine the DDIExporter and JSONPrinter classes in the Dataverse codebase for usage examples.
    • getDatasetFileDetails

      @Deprecated(since="2.1.0", forRemoval=true) jakarta.json.JsonArray getDatasetFileDetails()
      Deprecated, for removal: This API element is subject to removal in a future version.
      Returns detailed metadata for all files using default options.

      Note that this method will serialize all file metadata into one large JSON array. This can be memory-intensive for large datasets and should be used judiciously. There have been reports of unexportable large datasets in production installations. Using getDatasetFileDetails(FileExportQuery) instead is advised.

      Returns:
      JSON array with one JSON object entry per dataset file
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0
    • getDatasetSchemaDotOrg

      jakarta.json.JsonObject getDatasetSchemaDotOrg(DatasetExportQuery query)
      Returns dataset metadata conforming to the schema.org standard.

      This metadata subset is used in dataset page headers to improve discoverability by search engines. It provides structured data markup (JSON-LD) following the schema.org vocabulary.

      Parameters:
      query - specification for data retrieval
      Returns:
      dataset metadata in schema.org format
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      This metadata export is not complete. It should only be used as a starting point for an Exporter if it simplifies implementation compared to using the complete JSON or OAI_ORE exports.
      Implementation Note:
      All context options are ignored by this method.
    • getDatasetSchemaDotOrg

      @Deprecated(since="2.1.0", forRemoval=true) default jakarta.json.JsonObject getDatasetSchemaDotOrg()
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.1.0, for removal in 3.0.0. Use getDatasetSchemaDotOrg(DatasetExportQuery) instead.
      Returns dataset metadata in schema.org format using default options.
      Returns:
      dataset metadata in schema.org format
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0
    • getDataCiteXml

      Document getDataCiteXml(DatasetExportQuery query)
      Returns dataset metadata conforming to the DataCite standard as XML.

      This is the same metadata format sent to DataCite when DataCite DOIs are used. It provides citation metadata following the DataCite Metadata Schema.

      Note: the returned XML document can easily be queried using XPath and other techniques

      Parameters:
      query - specification for data retrieval
      Returns:
      dataset metadata as DataCite XML string
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      This metadata export is not complete. It should only be used as a starting point for an Exporter if it simplifies implementation compared to using the complete JSON or OAI_ORE exports.
      Implementation Note:
      All context options are ignored by this method.
    • getDataCiteXml

      @Deprecated(since="2.1.0", forRemoval=true) String getDataCiteXml()
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.1.0, for removal in 3.0.0. Use getDataCiteXml(DatasetExportQuery) instead.
      Returns dataset metadata in DataCite XML format using default options.
      Returns:
      dataset metadata as DataCite XML string
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0
    • getPrerequisiteInputStream

      default Optional<InputStream> getPrerequisiteInputStream(DatasetExportQuery query)
      Returns metadata in the format specified by an Exporter's prerequisite.

      Some Exporters transform metadata from one standard format to another (e.g., DDI XML to DDI HTML). Such Exporters declare a prerequisite format via Exporter.getPrerequisiteFormatName(), and this method provides access to that prerequisite metadata.

      Parameters:
      query - specifcation passed to the prerequisite exporter
      Returns:
      metadata in the prerequisite format, or empty if no prerequisite is configured
      Throws:
      ExportException - if metadata retrieval fails
      NullPointerException - if the query is null
      Since:
      2.1.0
      API Note:
      This is useful for creating alternate representations of the same metadata (e.g., XML, HTML, PDF versions of a standard like DDI), especially when conversion libraries exist. Note that if a third-party Exporter replaces the internal exporter you depend on, this method may return unexpected results.
      Implementation Note:
      The default implementation returns empty. Override only if your provider supports prerequisite format chaining. The prerequisite exporter receives the same context as specified in this call.
    • getPrerequisiteInputStream

      @Deprecated(since="2.1.0", forRemoval=true) default Optional<InputStream> getPrerequisiteInputStream()
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.1.0, for removal in 3.0.0. Use getPrerequisiteInputStream(DatasetExportQuery) instead.
      Returns metadata in the prerequisite format using default options.
      Returns:
      metadata in the prerequisite format, or empty if no prerequisite is configured
      Throws:
      ExportException - if metadata retrieval fails
      Since:
      1.0.0