Hi everyone,
I am trying to add datasets that have been deposited in other repositories to our collections running 6.2, but not sure how to configure the JSON file to attach the data files. I am following the instructions here:
https://guides.dataverse.org/en/latest/api/native-api.html#id59
Do I just put the data files to be uploaded in the same folder as the JSON file? How do I name the data files and how to configure the following part of the JSON file. I tried different things and kept getting a message says "Validation Failed: Please specify a file name. (Invalid value:).java.util.stream.ReferencePipeline$3@71302796".
"files": [
{
"description": "",
"label": "pub",
"restricted": false,
"version": 1,
"datasetVersionId": 1,
"dataFile": {
"id": 4,
"filename": "pub",
"contentType": "application/vnd.dataverse.file-package",
"filesize": 1698795873,
"description": "",
"storageIdentifier": "162017e5ad5-ee2a2b17fee9",
"originalFormatLabel": "UNKNOWN",
"rootDataFileId": -1,
"checksum": {
"type": "SHA-1",
"value": "54bc7ddb096a490474bd8cc90cbed1c96730f350"
}
}
}
]
Hi! First, I'm turning your guides link into something more permanent (since "latest" is always changing): https://guides.dataverse.org/en/6.2/api/native-api.html#import-a-dataset-into-a-dataverse-collection
I was just playing with the import API yesterday. One sec.
This is what I did: https://github.com/IQSS/dataverse/pull/10694/commits/4f055b6d89e790ff4df6af490f7152e88e1c431d
Oh, but I wasn't testing files. Let me try with the example from the guides.
It seems to import ok on my machine. Please check this out:
Screenshot-2024-07-18-at-2.21.56PM.png
Thanks, Philip for testing the functions on your machine. I am thinking I probably did not get the JSON file correctly configured. Where do I get the value to add to "storageIdentifier":?
Well, first you might want to see if scripts/api/data/dataset-package-files.json works for you.
That's the file I tried that's used in the guides.
Here's what the guides say about storage identifiers:
Before calling the API, make sure the data files referenced by the POSTed JSON are placed in the dataset directory with filenames matching their specified storage identifiers. In installations using POSIX storage, these files must be made readable by the app server user.
Which sounds right to me.
But your current problem, I think, is that your JSON isn't working.
So I would try scripts/api/data/dataset-package-files.json
Thanks. Let me try dataset-package-files.json first.
Is there a place to download the file named "pub" for testing, or I can just use any file?
I checked my JSON file. It was based on the dataset-package-files.json file. The part I don't know how to set up is the "dataFile" section. In "storageIdentifier": "162017e5ad5-ee2a2b17fee9", do I give a random value to storageIdentifier or is it machine generated? The instruction says "make sure the data files referenced by the POSTed JSON are placed in the dataset directory with filenames matching their specified storage identifiers". Do I rename my data file to a name like 162017e5ad5-ee2a2b17fee9?
This is the error message after running the curl: {"status":"ERROR","message":"Validation Failed: Please specify a file name. (Invalid value:).java.util.stream.ReferencePipeline$3@55ac8b8e"}
In the JSON file you are telling Dataverse the name of the file on disk.
So you should use the storage identifier as the name (on disk).
I don't know where you keep your data but the filename could be something like this:
/usr/local/dataverse/data/10.5072/FK2/ZTIQ5D/18b23995b07-aee7a8fd551d
Thanks for the detailed explanation! I named the xlsx data file as "19d611809e9-rumbolda6e05" and put it in this folder: /home/hansen/Documents/2Dataset_Migration/10.7266/N7XP72WT/. Below is the setting for the files section. I am still getting the same error message.
"files": [
{
"description": "",
"label": "",
"restricted": false,
"version": 1,
"datasetVersionId": 1,
"dataFile": {
"id": 1001,
"filename": "19d611809e9-rumbolda6e05",
"contentType": "application/vnd.dataverse.file-package",
"filesize": 239590,
"description": "",
"storageIdentifier": "/home/hansen/Documents/2Dataset_Migration/10.7266/N7XP72WT/19d611809e9-rumbolda6e05",
"originalFormatLabel": "UNKNOWN",
"rootDataFileId": -1,
"checksum": {
"type": "SHA-1",
"value": "7e543d4cd13715aec38ba485267568442a7818fa"
}
}
}
]
Why would it work for me and not for you? You did try scripts/api/data/dataset-package-files.json, right? And it didn't work?
I would like to try, but I don't have the pub file.
I tried using the JSON file to add metadata of datasets with existing DOI. They all work fine and can be successfully published without the data file. But after adding the data files in the app, the datasets can not be published. An error message says there were problems registering the datasets. Does it mean that we need to upload metadata and data files altogether using a JSON to be able to publish?
You don't need the pub file.
You can just test with that JSON.
Just to see if the JSON works.
It works.
Oh! Great! So now we need to figure out what's different between the two JSON files?
Using the same JSON, I can successfully add metadata. The error message showed up after adding the "files" section. Does the curl automatically upload the data file on local disk to the server or do we need to upload the files first?
You need to upload the files first.
But I'm confused. scripts/api/data/dataset-package-files.json does have a "files" section. You're saying it worked, right? You can see a file in Dataverse? Like in the screenshot I put above?
scripts/api/data/dataset-package-files.json worked for me. I can see a file in Dataverse.
Before doing this, I tried to add metadata to Dataverse first, and then upload the data files in the app before publishing. But ketp getting the "unable to register" error in server.log.
Ok, that's a different error, related to publishing.
Have we moved on to that error, the publishing error? The first problem is resolved?
Based on what you explained, I think I misunderstood how dataset-package-files.json works. I thought it can upload files to the server.
Are we supposed to be able to upload data files in the app after adding the metadata using the JSON? I only see some cache files in the corresponding folder on the server after doing that.
I'm not sure but I think people use the import API this way:
That is, I'm not sure if it works to keep the imported dataset in draft to add files or make metadata edits. It might work, but I'm not sure.
I will try and see if the registration error still shows up after uploading the data files to the server first and then using import API to publish.
I greatly appreciate your time and kind help!
Awesome. Good luck!
And feel free to open issues about the API not working the way you expect it to!
Got it. Thanks!
Hi Philip, I copied the file to the server and successfully published the dataset. But file seems not downloadable. Is there anything else that I am missing. Here is the link to the dataset: https://dataverse.fgcu.edu/dataset.xhtml?persistentId=doi:10.7266/N7XP72WT
{"status":"ERROR","code":404,"message":"Datafile 182: Failed to locate and/or open physical file."}
Hmm, it seems like the file is not in the right place on disk.
What does server.log say?
Let me check if it's a permission issue
Also you're using this:
"contentType": "application/vnd.dataverse.file-package",
Instead you should use the right contentType for Excel.
Yes. That's something that I was to ask. Where do I find contentType for different file types?
Here you go: https://github.com/IQSS/dataverse/blob/v6.3/src/main/java/propertyFiles/MimeTypeDisplay.properties
Thank you so much!
You saved my life again :grinning_face_with_smiling_eyes:
Ha. Sure! :blush:
Hi Philip, I tried to add a folder containing 200 tif images to a dataset. I have tried zip it or double zip it, but the folder does not show up in dataverse. The other xlsx file being added together showed up without any problems. Is there anything I am missing about how to handle folders in the json file?
Hmm, you could try tar.gz format instead of zip.
Okay. Thanks!
Last updated: Oct 30 2025 at 06:21 UTC