Hi all,
Most of our datasets have no access restrictions, however, we have set up Terms of Use and as an end user, if you want to download any files you must accept them.
This works well, but if someone tries to download any file using the Dataverse API, he/she/it will be able to do so without agreeing to the Terms of Use.
Is there any way to restrict downloading from the Dataverse api with these conditions? No Restricted Files / Access conditions. I know, it sounds strange, but it is a requirement we have.
Maybe if the api only allowed downloads of requests coming from a whitelist, is it possible?
I don't find it strange at all. Please see this issue: File API download bypasses terms of use #2911
@Alejandra Tenorio if that issue expresses your concern, please feel free to leave a comment or at least give it a :thumbs_up:
Hi @Philip Durbin, this is just our requirement. I have noticed that this will be a new feature
Can we (CIMMYT) help in any way?
@Alejandra Tenorio sure! Would you like to make pull request?
I had a great time at CIMMYT by the way. :heart:
I will discuss with Jesus, we could probably develop it and make a PR.
Obviously, we would follow all your recommendations
Yes, before a PR, can we have a description of how it might work?
yes, sure
Awesome. At minimum, the person using the API needs to have a chance to read the terms. I assume (hope!) this is already available via API.
Hi @Philip Durbin
we have worked with a proposal
We think that the file download could work as follows
These are our assumptions:
- You can create an API token only if you have a user on a Dataverse instance. At least we have each user's last name, first name and email address. Desirably the Affiliation.
- Using the API, anyone can download files with no access restrictions.
- If someone uses an API Token, we could know the user associated to that token, do not?
- As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files.
File download:
Terms of use:
If its dataset has no Terms of use & the datafile has no access restrictions:
o No changes.
If its dataset has no Terms of use & the datafile has access restrictions:
o No changes, an API token is required.
If its dataset has Terms of use & the datafile has no access restrictions:
o The API would download the file with its terms of use as a txt file. Could they be compressed as a zip?
If its dataset has Terms of use & the datafile has access restrictions:
o An API token is required.
o The API would download the file with its terms of use as a txt file. Could they be compressed as a zip?
Guestbook:
If a Dataset has no guestbook & the datafile has no access restrictions:
o No changes.
If a Dataset has no guestbook & the datafile has access restrictions:
o No changes, an API token is required.
If a Dataset has guestbook & the datafile has no access restrictions:
o A token will always be required, and the API would create a GuestbookResponse row with the user's first name, last name and email.
If a Dataset has guestbook & the datafile has access restrictions:
o A token will always be required, and the API would create a GuestbookResponse row with the user's first name, last name and email.
what do you think? do you think these API changes could work well?
Hi! I'm packing for a trip (#community > Distribits 2024 ) so I can't engage very deeply in this, but I appreciate the proposal!
I'd like to get more eyes on it. :eyes: What do you think? What if you sent it to https://groups.google.com/g/dataverse-community or added it as a comment to #2911? Also, we'll need this for https://github.com/IQSS/dataverse-frontend some day so we could even open a new issue there.
I will say I'm left wondering a bit about the terms of use side. Sure, I could receive the terms as a text file (zipped or not), but then what? How do I agree to the terms?
Hi @Philip Durbin , sure i'm going to comment to #2911, you gave me a new idea about the terms.
Thanks for adding that comment!
Good morning everyone! ๐๐ผ just looking at this and I was thinking, what would you do if the Terms of Use change? Would you revoke the access on the files until it is agreed again? I am not sure how this works on the current workflow :thinking:
Well, on thing we should be clear on is that there is no concept of saving a user's acceptance of the terms.
So @Alejandra Tenorio this applies as well to this part of your comment:
"Would the API download the file with its terms of use as a txt file? If the user has already accepted the terms of use, is it necessary?"
Instead the terms are presented in a popup and the use clicks to agree. If they go to download the same file an hour later, they get the same thing, another popup.
Not a legal expert here, but... Could it be a disclaimer on the token generation that says something between the lines of "By generating this token you accept the terms of use associated with any files that you try to download with it"
I doubt it. People even want private url users to accept terms of use: #8199
Juan Pablo Tosca Villanueva said:
Not a legal expert here, but... Could it be a disclaimer on the token generation that says something between the lines of "By generating this token you accept the terms of use associated with any files that you try to download with it"
This is a very important point, some datasets have special terms of use. If possible, users should accept the terms of use of it.
Philip Durbin said:
I doubt it. People even want private url users to accept terms of use: #8199
Reviewing an unpublished dataset is the purpose of a Private URL. Should reviewers accept the terms? :thinking:
I'm at a talk ( #community > Distribits 2024 ) about a system (OpenNeuro) that uses short-term JSON Web Tokens (JWTs) to allow access to private data.
Hi @Philip Durbin , how are you doing? I am back to Zulip :sweat_smile: These are my comments:
Can we talk more about 2 first? If an API user is receiving the file, perhaps that file should be in a machine-parseable format like JSON, YAML, or XML.
Yes, it could be, we were thinking of a txt to be read by humans, because we have users sharing the API download URL. But, yes, no problem, we suggest a json.
we could map to the existing json metadata file
Hmm, maybe we should switch to talking about URLs. :grinning:
I'd like to make this concrete. Can we use the file at https://data.cimmyt.org/file.xhtml?persistentId=hdl:11529/10549036/3 as an example? It's just the first file I found and it has terms of use. When I click "accept" I see the URL https://data.cimmyt.org/api/access/datafile/58697?gbrecs=true being used to download the file.
The web interfaces uses this URL. We want to preserve the same experience in the web interface, right? So maybe the web interface will (in the future, after your pull request) use a new URL?
And an API user will use a different URL to access the file? One that is protected and will give them a JSON file with a link to the actual file?
Can we make a sequence diagram like below? But less complicated! :sweat_smile: I can help! I usually use PlantUML for this.
Yes, let's use that example. We intend to keep the experience in the web interface. Even, an API user will use the same URL to access a file. When some user clicks "accept", the file will be downloaded from this URL: https://data.cimmyt.org/api/access/datafile/58697?gbrecs=true, but the web UI will send a token to the API to approve the download.
Sure, we can work with a diagram
Ok, I think it's making sense. Have you used PlantUML before? Here's the source of that diagram: https://github.com/IQSS/dataverse/blob/v6.2/doc/sphinx-guides/source/admin/img/make-data-count.uml
I use something like this to turn it into a PNG:
java -jar ~/bin/plantuml.jar -graphvizdot ~/.homebrew/bin/dot -tpng make-data-count.uml
By the way, Dataverse developer Michael Bar-Sinai gave a very good talk about drawing before coding: https://www.mbarsinai.com/blog/2014/01/12/draw-more-work-less/ :grinning:
Ha, he's also the guy who put "burrito" in Dataverse:
Philip Durbin said:
Ok, I think it's making sense. Have you used PlantUML before? Here's the source of that diagram: https://github.com/IQSS/dataverse/blob/v6.2/doc/sphinx-guides/source/admin/img/make-data-count.uml
No, it will be the first time
Ok, if you have any trouble, please let me know!
Philip Durbin said:
Ok, if you have any trouble, please let me know!
Thank you very much, we are in touch
I haven't used it in years but the latest version of PlantUML should work with that mdc.uml file, I hope!
Hi @Philip Durbin , how are you doing? Here is the diagram
terms_of_use.png
Wow! Amazing! :heart:
Do you think it is time to share it in Google groups?
Hmm, can you please share the source of the UML first? I might want to hack on it.
And I sent to pic to IQSS Slack. Let's wait for some feedback.
Overall, I think it makes sense. As an API user, right now I get the actual file. In the future I would get a JSON file unless I send a valid token.
Thank you, let's wait for comments :)
Hi @Philip Durbin , how are you doing?
Just to ask about IQSS Slack comments.
@Alejandra Tenorio hi! I just DM'ed you some feedback from Slack.
@Alejandra Tenorio thanks for posting your proposal! https://groups.google.com/g/dataverse-community/c/pu6190IQwGo/m/8TC-DQQTBAAJ
I just sent a link to our internal Slack as well.
@Alejandra Tenorio here's an idea. We have "tech hours" every Tuesday at 3pm Boston time. Would you want to join tomorrow to discuss your proposal with the core dev team?
Hi @Philip Durbin, yes, I would like to, unfortunately today we have a system upgrade scheduled, if I finish on time I would like to join, I would like to talk about the proposal.
@Alejandra Tenorio no problem! Is next week ok? This week we are talking about file uploads with React.
Sorry @Philip Durbin for my late reply. Yes, Its ok next Tuesday.
@Alejandra Tenorio great! Before then, I'll send you a Zoom link.
@Alejandra Tenorio is it ok if I just DM you the link?
Hi @Philip Durbin , yes, no problem
Great, I just sent you a DM.
@Alejandra Tenorio I hope that meeting yesterday was helpful.
#2911 was just closed by a script (see #community > issue backlog )
Please feel free to open a fresh issue.
Last updated: Nov 01 2025 at 14:11 UTC