Stream: python

Topic: Changes in the files API endpoint for replacing files?


view this post on Zulip Adina Wagner (Oct 24 2024 at 07:06):

Hi all! :) Sorry if this is in the wrong channel - feel free to point to a better fitting one!
I noticed that two of datalad-dataverses tests against demo.dataverse.org started to fail 24 days ago, i.e., when Dataverse 6.4 was released. There were no changes in pyDataverse or DataLad-dataverse, and the error code is a bad request that previously was successful - so I'm suspecting that there was an API change that we or pyDataverse need to account for:

E       httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://demo.dataverse.org/api/v1/files/2422933/replace'

I read through the changelog, diff'ed the Native API's "Replacing files" user guide between the current and previous version, and browsed the milestone PRs, but did not spot anything that looked like a change in this API. Does someone know where I'd need to look?

Many thanks in advance!

view this post on Zulip Sebastian Höffner (Oct 24 2024 at 08:06):

You have come to the right place, the only other place I can think of is the issue tracker ;-)

There were no significant changes in the last years except an (so it seems) innocent Merge from remote 8 months ago which changed the response status enum but nothing behavior-wise it seems. Plus, a commit 4 months ago which added OpenAPI specs, which I would assume not to change the behavior either. Apart from those changes, the changes are more than a year back (usually from 8 years ago) – and if I diff 6.3 to 6.4, there seem to be no changes to the Files.java.

Do you have a status message for the error so we could narrow down the cause? Otherwise we would need to figure out which string might not be set in the bundle, if another handler already returns the error, or if the AddReplaceFileHelper throws for some reason.

Is it possible to show us which request (including request headers minus auth) is failing or create a failing example?

view this post on Zulip Philip Durbin 🚀 (Oct 24 2024 at 11:04):

Hmm, yes, 24 days ago we released Dataverse 6.4 and deployed it to https://demo.dataverse.org

view this post on Zulip Philip Durbin 🚀 (Oct 24 2024 at 11:08):

What is file id 2422933? I'm also getting a 404 at https://demo.dataverse.org/file.xhtml?fileId=2422933

Was the file simply deleted? :thinking:

view this post on Zulip Philip Durbin 🚀 (Oct 24 2024 at 11:13):

Like we say on the demo homepage, "Datasets older than 30 days will be deleted at 5am EST on the first day of each month." That would have been the day after the 6.4 release.

view this post on Zulip Adina Wagner (Oct 28 2024 at 10:27):

Apologies for the late response, I missed your responses (I probably disabled email notifications or something - will check)!

Just very briefly, here's what gets send in the request of a minimalistic test that started to fail (taken from a python debugger within pyDataverse, sorry for the odd formatting):

(PdB) p kwargs
{'url': 'https://demo.dataverse.org/api/v1/files/2431409/replace',
'headers': {'User-Agent': 'pydataverse'},
'params': None, 'files': {'file': <_io.BufferedReader name='/tmp/pytest-of-adina/pytest-3/test_file_handling0/replace_source.txt'>},
'data': {'jsonData': '{\n  "label": "replace_source.txt",\n  "directoryLabel": "downstairs",\n  "pid": "doi:10.70122/FK2/KJM2V2",\n  "filename": "replace_source.txt"\n}'}}

The message is "Failed to add file to dataset"

(Pdb) res = method(**kwargs, auth=self.auth, follow_redirects=True, timeout=None)
(Pdb) p res
<Response [400 Bad Request]>
(Pdb) p res.json()
{'status': 'ERROR', 'message': 'Failed to add file to dataset.'}

The tests set up datasets in demo.dataverse.org, but they also clean up after themselves, so a file ID not being on demo.dataverse.org is expected after the test ends.

If it helps, I'll check if I can show this error with curl only?

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 12:17):

Ok, so from looking at https://demo.dataverse.org/file.xhtml?fileId=2431409 (I had to log in as a superuser since it's unpublished), you're trying to replace a text file called "mykey" that has "some_content" in it. That should work. :thinking:

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 12:18):

"replace_source.txt" is also a text file with a little bit of text in it?

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 12:19):

"Failed to add file to dataset" is not a very helpful error. :doh: Why? Why?

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 12:22):

I was hoping our server.log file on the demo server would have more details but I don't see the file id (2431409) anywhere.

view this post on Zulip Don Sizemore (Oct 28 2024 at 12:49):

the error is:

Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 4.0.1.payara-p2.v202310250827): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "unq_dvobject_0"
  Detail: Key (authority, protocol, identifier)=(10.70122, doi, FK2/KJM2V2/VNI8PM) already exists.
Error Code: 0

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 13:04):

Ah, thanks, @Don Sizemore. Interesting.

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 13:06):

"Pre-Publish File DOI Reservation" is new as of 6.4: https://github.com/IQSS/dataverse/releases/tag/v6.4

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 13:09):

This PR: Reserve File Pids #7334

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 13:55):

@Adina Wagner since you offered, I'm interested if you can reproduce it with curl on https://demo.dataverse.org . Here are the docs to get you started: https://guides.dataverse.org/en/6.4/api/native-api.html#replacing-files

If it fails with curl, it's definitely a bug and you'd be very welcome to create an issue about this at https://github.com/IQSS/dataverse/issues

view this post on Zulip Don Sizemore (Oct 28 2024 at 14:24):

@Adina Wagner I've reproduced this on demo.dataverse.org - would you like for me to open an issue, or would you prefer to?

view this post on Zulip Adina Wagner (Oct 28 2024 at 14:33):

done: https://github.com/IQSS/dataverse/issues/10975

view this post on Zulip Adina Wagner (Oct 28 2024 at 14:34):

please add anything I've missed :)

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 14:40):

@Adina Wagner excellent bug report! Thank you!

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 14:55):

@Adina Wagner here's a thought as a potential work around for you.

What if you use https://guides.dataverse.org/en/6.4/api/native-api.html#change-collection-attributes to change filePIDsEnabled to false for the collection you're using. This one, currently: https://demo.dataverse.org/dataverse/dv-7194db59-950f-11ef-8832-a002a54c8dc6

view this post on Zulip Adina Wagner (Oct 28 2024 at 15:22):

Thanks for the thought! I don't think that the work around is doable for us, though, users of datalad-dataverse will very likely not have superuser privileges. Even if, I would be hesitant to apply such a setting internally instead of having users do it explicitly themselves

view this post on Zulip Philip Durbin 🚀 (Oct 28 2024 at 15:37):

That makes sense. I was thinking about any automated testing that might be failing on our end, if you want to get it passing again with a work around. I agree it's a different story with end users.

view this post on Zulip Kai König (Dec 14 2024 at 11:21):

Adina Wagner schrieb:

(Pdb) res = method(**kwargs, auth=self.auth, follow_redirects=True, timeout=None)
(Pdb) p res
<Response [400 Bad Request]>
(Pdb) p res.json()
{'status': 'ERROR', 'message': 'Failed to add file to dataset.'}

I had the same error message when trying to upload a tar.gz: #community > uploading .tar.gz files via API

Insights into the server logs would help me out a lot

view this post on Zulip Jan Range (Dec 16 2024 at 15:53):

@Kai König would you mind posting the code here?

view this post on Zulip Philip Durbin 🚀 (Feb 10 2025 at 20:40):

@Adina Wagner last week we merged #10979 to fix the issue you opened: #10975

view this post on Zulip Philip Durbin 🚀 (Feb 10 2025 at 20:40):

It'll be part of Dataverse 6.6. For details on the timeline, please see #community > Release 6.6 Timeline


Last updated: Nov 01 2025 at 14:11 UTC