Dear all, as demonstrated in the last working group meeting, I have set up a repository which evaluates multiple providers to generate code from 5.14 OpenAPI specifications. You can find the repository here:
https://github.com/JR-1991/pyDataverse-generation-analysis
There is still much to be done, but most of it can be handled by designing a comprehensive OpenAPI specification. To demonstrate how code can be used, I have started providing Jupyter Notebooks that test a couple of endpoints. The first example that utilizes the SpeakEasy API can be found here:
Jupyter Notebook
I will post new examples here once ready :raised_hands:
I highly recommend watching the recording of yesterday's call to get oriented.
These days we could also try pointing these generators at https://hub.dataverse.org/openapi
I gave it a shot with the PyDantic Generator, but I am not sure what to make of it. There are a lot of fields that seem like to be used internally? I think that it is technically fine, but in some ways it could be confusing for users. Here is an example for DataFile:
class DataFile ( BaseModel ):
mergeable : Optional [ bool ] = None
id : Optional [ int ] = None
publicationDate : Optional [ Dict [ str , Any ]] = None
releaseUser : Optional [ AuthenticatedUser ] = None
createDate : Optional [ Dict [ str , Any ]] = None
modificationTime : Optional [ Dict [ str , Any ]] = None
indexTime : Optional [ Dict [ str , Any ]] = None
permissionModificationTime : Optional [ Dict [ str , Any ]] = None
permissionIndexTime : Optional [ Dict [ str , Any ]] = None
storageIdentifier : Optional [ str ] = None
dtype : Optional [ str ] = None
protocol : Optional [ str ] = None
authority : Optional [ str ] = None
separator : Optional [ str ] = None
globalIdCreateTime : Optional [ date ] = Field ( None , example = '2022-03-10' )
identifier : Optional [ str ] = None
identifierRegistered : Optional [ bool ] = None
alternativePersistentIndentifiers : Optional [
List [ AlternativePersistentIdentifier ]
] = Field ( None , unique_items = True )
previewImageAvailable : Optional [ bool ] = None
storageQuota : Optional [ StorageQuota ] = None
previewImageFail : Optional [ bool ] = None
creator : Optional [ AuthenticatedUser ] = None
roleAssignments : Optional [ List [ RoleAssignment ]] = None
released : Optional [ bool ] = None
instanceofDataverse : Optional [ bool ] = None
instanceofDataset : Optional [ bool ] = None
instanceofDataFile : Optional [ bool ] = None
dataverseContext : Optional [ Dataverse ] = None
authorString : Optional [ str ] = None
yearPublishedCreated : Optional [ str ] = None
contentType : constr ( regex = r '^.*/.*$' )
checksumType : Optional [ ChecksumType ] = None
checksumValue : Optional [ str ] = None
rootDataFileId : Optional [ int ] = None
previousDataFileId : Optional [ int ] = None
filesize : Optional [ int ] = None
restricted : Optional [ bool ] = None
provEntityName : Optional [ str ] = None
dataTables : Optional [ List [ DataTable ]] = None
auxiliaryFiles : Optional [ List [ AuxiliaryFile ]] = None
ingestReports : Optional [ List [ IngestReport ]] = None
ingestRequest : Optional [ IngestRequest ] = None
dataFileTags : Optional [ List [ DataFileTag ]] = None
fileMetadatas : Optional [ List [ FileMetadata ]] = None
guestbookResponses : Optional [ List [ GuestbookResponse ]] = None
fileAccessRequests : Optional [ List [ FileAccessRequest ]] = None
fileAccessRequesters : Optional [ List [ AuthenticatedUser ]] = None
ingestStatus : Optional [ str ] = None
thumbnailForDataset : Optional [ Dataset ] = None
embargo : Optional [ Embargo ] = None
retention : Optional [ Retention ] = None
deleted : Optional [ bool ] = None
markedAsDuplicate : Optional [ bool ] = None
duplicateFilename : Optional [ str ] = None
effectivelyPermissionRoot : Optional [ bool ] = None
dataTable : Optional [ DataTable ] = None
tags : Optional [ List [ DataFileTag ]] = None
tagLabels : Optional [ List [ str ]] = None
tagLabelsAsJsonArrayBuilder : Optional [ Dict [ str , Any ]] = None
ingestReport : Optional [ IngestReport ] = None
ingestReportMessage : Optional [ str ] = None
tabularData : Optional [ bool ] = None
originalFileFormat : Optional [ str ] = None
originalFileSize : Optional [ int ] = None
originalFileName : Optional [ str ] = None
derivedOriginalFileName : Optional [ str ] = None
originalFormatLabel : Optional [ str ] = None
friendlyType : Optional [ str ] = None
owner : Optional [ Dataset ] = None
description : Optional [ str ] = None
draftFileMetadata : Optional [ FileMetadata ] = None
fileMetadata : Optional [ FileMetadata ] = None
latestFileMetadata : Optional [ FileMetadata ] = None
latestPublishedFileMetadata : Optional [ FileMetadata ] = None
friendlySize : Optional [ str ] = None
originalChecksumType : Optional [ str ] = None
storageIO : Optional [ StorageIODataFile ] = None
shapefileType : Optional [ bool ] = None
image : Optional [ bool ] = None
filePackage : Optional [ bool ] = None
ingestScheduled : Optional [ bool ] = None
ingestInProgress : Optional [ bool ] = None
ingestProblem : Optional [ bool ] = None
asThumbnailForDataset : Optional [ Dataset ] = None
unf : Optional [ str ] = None
harvested : Optional [ bool ] = None
remoteArchiveURL : Optional [ str ] = None
harvestingDescription : Optional [ str ] = None
displayName : Optional [ str ] = None
directoryLabel : Optional [ str ] = None
currentName : Optional [ str ] = None
publicationDateFormattedYYYYMMDD : Optional [ str ] = None
createDateFormattedYYYYMMDD : Optional [ str ] = None
targetUrl : Optional [ str ] = None
deaccessioned : Optional [ bool ] = None
I cannot speak for Java, but in Python and Rust there are ways to decide which fields flow into the OpenAPI specs. Here is an example from Python:
https://drf-spectacular.readthedocs.io/en/latest/customization.html
Yeah, I don't think indexTime, for example, is particularly interesting to end users.
Last updated: Nov 01 2025 at 14:11 UTC