Stream: community

Topic: No Restricted Files / Access conditions


view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 15:26):

Hi all,
Most of our datasets have no access restrictions, however, we have set up Terms of Use and as an end user, if you want to download any files you must accept them.
This works well, but if someone tries to download any file using the Dataverse API, he/she/it will be able to do so without agreeing to the Terms of Use.
Is there any way to restrict downloading from the Dataverse api with these conditions? No Restricted Files / Access conditions. I know, it sounds strange, but it is a requirement we have.

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 15:28):

Maybe if the api only allowed downloads of requests coming from a whitelist, is it possible?

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 15:31):

I don't find it strange at all. Please see this issue: File API download bypasses terms of use #2911

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 15:31):

@Alejandra Tenorio if that issue expresses your concern, please feel free to leave a comment or at least give it a :thumbs_up:

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 16:07):

Hi @Philip Durbin, this is just our requirement. I have noticed that this will be a new feature

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 16:07):

Can we (CIMMYT) help in any way?

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 17:09):

@Alejandra Tenorio sure! Would you like to make pull request?

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 17:09):

I had a great time at CIMMYT by the way. :heart:

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 18:02):

I will discuss with Jesus, we could probably develop it and make a PR.

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 18:02):

Obviously, we would follow all your recommendations

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 18:52):

Yes, before a PR, can we have a description of how it might work?

view this post on Zulip Alejandra Tenorio (Mar 20 2024 at 19:09):

yes, sure

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 19:13):

Awesome. At minimum, the person using the API needs to have a chance to read the terms. I assume (hope!) this is already available via API.

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:20):

Hi @Philip Durbin

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:21):

we have worked with a proposal

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:22):

We think that the file download could work as follows

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:23):

These are our assumptions:
- You can create an API token only if you have a user on a Dataverse instance. At least we have each user's last name, first name and email address. Desirably the Affiliation.
- Using the API, anyone can download files with no access restrictions.
- If someone uses an API Token, we could know the user associated to that token, do not?
- As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files.

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:23):

File download:

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:25):

Terms of use:

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:26):

Guestbook:

view this post on Zulip Alejandra Tenorio (Apr 01 2024 at 22:27):

what do you think? do you think these API changes could work well?

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 02 2024 at 14:02):

Hi! I'm packing for a trip (#community > Distribits 2024 ) so I can't engage very deeply in this, but I appreciate the proposal!

I'd like to get more eyes on it. :eyes: What do you think? What if you sent it to https://groups.google.com/g/dataverse-community or added it as a comment to #2911? Also, we'll need this for https://github.com/IQSS/dataverse-frontend some day so we could even open a new issue there.

I will say I'm left wondering a bit about the terms of use side. Sure, I could receive the terms as a text file (zipped or not), but then what? How do I agree to the terms?

view this post on Zulip Alejandra Tenorio (Apr 02 2024 at 16:05):

Hi @Philip Durbin , sure i'm going to comment to #2911, you gave me a new idea about the terms.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 02 2024 at 17:18):

Thanks for adding that comment!

view this post on Zulip Juan Pablo Tosca Villanueva (Apr 03 2024 at 12:49):

Good morning everyone! ๐Ÿ‘‹๐Ÿผ just looking at this and I was thinking, what would you do if the Terms of Use change? Would you revoke the access on the files until it is agreed again? I am not sure how this works on the current workflow :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 03 2024 at 13:32):

Well, on thing we should be clear on is that there is no concept of saving a user's acceptance of the terms.

So @Alejandra Tenorio this applies as well to this part of your comment:

"Would the API download the file with its terms of use as a txt file? If the user has already accepted the terms of use, is it necessary?"

Instead the terms are presented in a popup and the use clicks to agree. If they go to download the same file an hour later, they get the same thing, another popup.

view this post on Zulip Juan Pablo Tosca Villanueva (Apr 03 2024 at 13:50):

Not a legal expert here, but... Could it be a disclaimer on the token generation that says something between the lines of "By generating this token you accept the terms of use associated with any files that you try to download with it"

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 03 2024 at 13:52):

I doubt it. People even want private url users to accept terms of use: #8199

view this post on Zulip Alejandra Tenorio (Apr 03 2024 at 19:57):

Juan Pablo Tosca Villanueva said:

Not a legal expert here, but... Could it be a disclaimer on the token generation that says something between the lines of "By generating this token you accept the terms of use associated with any files that you try to download with it"

This is a very important point, some datasets have special terms of use. If possible, users should accept the terms of use of it.

view this post on Zulip Alejandra Tenorio (Apr 03 2024 at 20:01):

Philip Durbin said:

I doubt it. People even want private url users to accept terms of use: #8199

Reviewing an unpublished dataset is the purpose of a Private URL. Should reviewers accept the terms? :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 04 2024 at 09:46):

I'm at a talk ( #community > Distribits 2024 ) about a system (OpenNeuro) that uses short-term JSON Web Tokens (JWTs) to allow access to private data.

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 18:13):

Hi @Philip Durbin , how are you doing? I am back to Zulip :sweat_smile: These are my comments:

  1. API user tries to download a file with terms. They get a text file instead. Yes!
  2. The text file has the URL to download the file. I guess my question is, do they have to parse the text to find the URL? Will this be easy to do? To keep it simple, it could be a short text, only file title, its URL and mentioning that its terms of use must be accepted prior to downloading.
  3. What happens to the existing download URL? It stops working? Now the user get a text file instead? You mean the user interface? Nothing would change there.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 18:15):

Can we talk more about 2 first? If an API user is receiving the file, perhaps that file should be in a machine-parseable format like JSON, YAML, or XML.

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 18:38):

Yes, it could be, we were thinking of a txt to be read by humans, because we have users sharing the API download URL. But, yes, no problem, we suggest a json.

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 18:41):

we could map to the existing json metadata file

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 18:49):

Hmm, maybe we should switch to talking about URLs. :grinning:

I'd like to make this concrete. Can we use the file at https://data.cimmyt.org/file.xhtml?persistentId=hdl:11529/10549036/3 as an example? It's just the first file I found and it has terms of use. When I click "accept" I see the URL https://data.cimmyt.org/api/access/datafile/58697?gbrecs=true being used to download the file.

The web interfaces uses this URL. We want to preserve the same experience in the web interface, right? So maybe the web interface will (in the future, after your pull request) use a new URL?

And an API user will use a different URL to access the file? One that is protected and will give them a JSON file with a link to the actual file?

Can we make a sequence diagram like below? But less complicated! :sweat_smile: I can help! I usually use PlantUML for this.

make-data-count.png

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 19:15):

Yes, let's use that example. We intend to keep the experience in the web interface. Even, an API user will use the same URL to access a file. When some user clicks "accept", the file will be downloaded from this URL: https://data.cimmyt.org/api/access/datafile/58697?gbrecs=true, but the web UI will send a token to the API to approve the download.

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 19:16):

Sure, we can work with a diagram

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:19):

Ok, I think it's making sense. Have you used PlantUML before? Here's the source of that diagram: https://github.com/IQSS/dataverse/blob/v6.2/doc/sphinx-guides/source/admin/img/make-data-count.uml

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:20):

I use something like this to turn it into a PNG:

java -jar ~/bin/plantuml.jar -graphvizdot ~/.homebrew/bin/dot -tpng make-data-count.uml

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:38):

By the way, Dataverse developer Michael Bar-Sinai gave a very good talk about drawing before coding: https://www.mbarsinai.com/blog/2014/01/12/draw-more-work-less/ :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:40):

Ha, he's also the guy who put "burrito" in Dataverse:

burrito.png

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 19:44):

Philip Durbin said:

Ok, I think it's making sense. Have you used PlantUML before? Here's the source of that diagram: https://github.com/IQSS/dataverse/blob/v6.2/doc/sphinx-guides/source/admin/img/make-data-count.uml

No, it will be the first time

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:44):

Ok, if you have any trouble, please let me know!

view this post on Zulip Alejandra Tenorio (Apr 12 2024 at 19:45):

Philip Durbin said:

Ok, if you have any trouble, please let me know!

Thank you very much, we are in touch

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 19:46):

I haven't used it in years but the latest version of PlantUML should work with that mdc.uml file, I hope!

view this post on Zulip Alejandra Tenorio (Apr 17 2024 at 19:55):

Hi @Philip Durbin , how are you doing? Here is the diagram
terms_of_use.png

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 17 2024 at 19:55):

Wow! Amazing! :heart:

view this post on Zulip Alejandra Tenorio (Apr 17 2024 at 19:57):

Do you think it is time to share it in Google groups?

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 17 2024 at 19:59):

Hmm, can you please share the source of the UML first? I might want to hack on it.

And I sent to pic to IQSS Slack. Let's wait for some feedback.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 17 2024 at 20:05):

Overall, I think it makes sense. As an API user, right now I get the actual file. In the future I would get a JSON file unless I send a valid token.

view this post on Zulip Alejandra Tenorio (Apr 17 2024 at 20:07):

terms_of_use.uml

view this post on Zulip Alejandra Tenorio (Apr 17 2024 at 20:08):

Thank you, let's wait for comments :)

view this post on Zulip Alejandra Tenorio (Apr 24 2024 at 16:58):

Hi @Philip Durbin , how are you doing?
Just to ask about IQSS Slack comments.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 24 2024 at 18:29):

@Alejandra Tenorio hi! I just DM'ed you some feedback from Slack.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 25 2024 at 19:20):

@Alejandra Tenorio thanks for posting your proposal! https://groups.google.com/g/dataverse-community/c/pu6190IQwGo/m/8TC-DQQTBAAJ

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 25 2024 at 19:23):

I just sent a link to our internal Slack as well.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 06 2024 at 15:18):

@Alejandra Tenorio here's an idea. We have "tech hours" every Tuesday at 3pm Boston time. Would you want to join tomorrow to discuss your proposal with the core dev team?

view this post on Zulip Alejandra Tenorio (May 07 2024 at 15:57):

Hi @Philip Durbin, yes, I would like to, unfortunately today we have a system upgrade scheduled, if I finish on time I would like to join, I would like to talk about the proposal.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 07 2024 at 16:07):

@Alejandra Tenorio no problem! Is next week ok? This week we are talking about file uploads with React.

view this post on Zulip Alejandra Tenorio (May 09 2024 at 23:11):

Sorry @Philip Durbin for my late reply. Yes, Its ok next Tuesday.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 10 2024 at 00:05):

@Alejandra Tenorio great! Before then, I'll send you a Zoom link.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 13 2024 at 15:50):

@Alejandra Tenorio is it ok if I just DM you the link?

view this post on Zulip Alejandra Tenorio (May 13 2024 at 16:58):

Hi @Philip Durbin , yes, no problem

view this post on Zulip Philip Durbin ๐Ÿš€ (May 13 2024 at 19:17):

Great, I just sent you a DM.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 15 2024 at 14:23):

@Alejandra Tenorio I hope that meeting yesterday was helpful.

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 21 2024 at 17:56):

#2911 was just closed by a script (see #community > issue backlog )

Please feel free to open a fresh issue.


Last updated: Nov 01 2025 at 14:11 UTC