Stream: containers

Topic: testing S3


view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:16):

As part of https://github.com/IQSS/dataverse/pull/9541 I need to test S3.

I've always tested S3 with Payara running directly on my Mac. :grimacing:

It should be possible to supply all the necessary S3 config via MPCONFIG, right? That way I can continue my streak of doing all development with containers. :happy: It's been over a month! :tada:

I'd rather not switch back to Payara running directly on my Mac if I can avoid it!

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:28):

Yes it is possible but only dirty, because the files part is not yet supporting MPCONFIG. So you need to use the "dataverse_files_xxx_option=..." env var hack. Mind the casing.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:30):

Interesting. Do you have a list of env vars handy I can start with? If not, I'll look at the guides.

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:32):

Just take the example from the startup script asa template :smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:33):

Sorry, which startup script?

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:34):

https://github.com/IQSS/dataverse/blob/4903e9f0277105ea6a8c59a2f962dac8bcf715f2/src/main/docker/scripts/init_2_configure.sh#L18-L22

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:35):

Nice. Thanks. I'll probably add some docs for this!

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:36):

Sure. Don't put too much effort into it - should go away anyway with making the config of storage via MPCONFIG available. Someone needs to code that...

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:36):

Right. There's a comment about https://github.com/IQSS/dataverse/issues/7000 but no PR yet, right?

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:37):

Am I right that we don't plan to create a PR until milestone G? "MPCONFIG for remaining dozen JVM settings (Size: 80)"

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:38):

No, that's not correct.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:38):

Oh, wait. Milestone D. "Configurability: Make storage configuration use MPCONFIG (Size: 33)"

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:38):

There are some config things in milestones along the way

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:38):

So all I have for now is a local branch and a stash of changes on my laptop for the work on the file storage thing

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:39):

Awesome. Want me to create an issue?

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:39):

There was some discussion a while ago during some tech hour about if it would be OK to require a list of storage names which IIRC was eventually put on hold with "hmm yeah dunno maybe fine lets see"

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:40):

Well there is 7000, but maybe we can close that and create issues for the remaining parts

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:40):

Yeah, smaller chunks, I'm thinking.

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:40):

(That would be mail, which has an issue, storage and "the leftovers")

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:40):

yum

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:41):

I've been going on with this for a while... :innocent:

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:42):

Glad that it seems to be appreciated now among the devs... That lookup thing seems to be intriguing... :grinning_face_with_smiling_eyes:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:43):

Well, now I need it. The ability to configure S3 in containers. :happy:

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:44):

Hrhr... I remember we talked about this stuff long time ago and one of my arguments was "you want this for containers" :yum:

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:45):

Ah BTW do you want to test localstack for S3?

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:45):

Or are you going to use Minio?

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:45):

(Or Seaweedfs)

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:46):

IIRC this is part of another milestone, would it make sense to take a look into how spin up the storage container, too?

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2023 at 14:46):

I'm going to use real S3. It's possible to configure it with the hacks above, right?

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:46):

Yes, that's completely possible

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:47):

Oh wait: one thing... What I already had a chance to bring in for MPCONFIG and S3 is the keys thing

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:47):

So you don't need to think about how to deal with the AWS profile and config files in ~/.aws

view this post on Zulip Oliver Bertuch (Apr 26 2023 at 14:47):

That is documented in the main section, it's been sitting there for a while and made @Don Sizemore happy before

view this post on Zulip Philip Durbin ๐Ÿš€ (May 01 2023 at 19:07):

I started editing my .env file today but got so confused I gave up and just ran Payara locally on my Mac. These are the (surely incorrect) values I was editing when I gave up:

DATAVERSE_FILES_STORAGE__DRIVER__ID=s3
DATAVERSE_FILES_LOCAL_TYPE=s3
DATAVERSE_FILES_LOCAL_LABEL=S3
DATAVERSE.FILES.<ID>.ACCESS-KEY=
DATAVERSE.FILES.<ID>.SECRET-KEY=

I'm happy to try again but I'd need some hand holding. Or more time (than I want to spend today) studying the code. Or I can just wait for MPCONFIG. :sweat_smile:

view this post on Zulip Oliver Bertuch (May 02 2023 at 07:16):

:leftwards_hand: Here you go

view this post on Zulip Oliver Bertuch (May 02 2023 at 07:18):

May I point you towards http://preview.guides.gdcc.io/en/develop/container/app-image.html#tunables and the description of the "dataverse_*" vars? Maybe we should add a note that this is case sensitive.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 02 2023 at 11:03):

Ah, I didn't know about these magic trick rules.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 02 2023 at 11:03):

Replace - with __ for example.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 02 2023 at 11:03):

I wasn't sure what do with with SECRET-KEY and gave up.

view this post on Zulip Oliver Bertuch (May 02 2023 at 11:29):

Both Secret and Access Key are already using MPCONFIG, so you may use either way

view this post on Zulip Philip Durbin ๐Ÿš€ (May 17 2023 at 12:14):

Configuring S3 via MPCONFIG is possible, right? At least in containers?

view this post on Zulip Oliver Bertuch (May 17 2023 at 12:15):

No. Needs a hack.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 17 2023 at 12:19):

Ok, the but the hack should work. Great. I'll try again some day.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 15:23):

I was able to configure containerized Dataverse to use S3!

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 15:24):

diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index 30c55661a2..298a20b6ff 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -13,6 +13,14 @@ services:
       - DATAVERSE_DB_PASSWORD=secret
       - DATAVERSE_DB_USER=${DATAVERSE_DB_USER}
       - DATAVERSE_FEATURE_API_BEARER_AUTH=1
+      - dataverse_files_storage__driver__id=s3
+      - dataverse_files_s3_type=s3
+      - dataverse_files_s3_label=S3
+      - dataverse_files_s3_custom__endpoint__url=s3.us-east-2.amazonaws.com
+      - dataverse_files_s3_custom__endpoint__region=us-east-2
+      - dataverse_files_s3_bucket__name=pdurbin
+      - dataverse_files_s3_access__key=REDACTED
+      - dataverse_files_s3_secret__key=REDACTED
     ports:
       - "8080:8080" # HTTP (Dataverse Application)
       - "4848:4848" # HTTP (Payara Admin Console)

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 15:25):

As you can see earlier in this thread, I was using UPPER CASE before, like any good sysadmin. But it seems like one must use lower case.

view this post on Zulip Oliver Bertuch (May 19 2023 at 19:09):

Yeah like I said (did I?) before in this thread, this is a limitation because we have a hack in place for now until the storage subsystem is MPCONFIGified

view this post on Zulip Oliver Bertuch (May 19 2023 at 19:09):

Sorry 'bout that!

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 19:34):

Now I see you said "case sensitive" above.

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 19:34):

And now I get it. :sweat_smile:

view this post on Zulip Oliver Bertuch (May 19 2023 at 19:35):

Sorry, I should have EMPHASIZED_IT_MORE. :yum:

view this post on Zulip Philip Durbin ๐Ÿš€ (May 19 2023 at 19:39):

Well, now I can document it: #containers > docs for milestone A #9540

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 11 2023 at 15:18):

I just thought I'd point out that this issue made it to the board. Currently it's in "sprint ready": Add S3 tests to the regular integration test suite #6783

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 07 2023 at 11:02):

Any discussion or prep for this S3 testing work? We could discuss it today or a future ct meeting if people want to.

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 19 2023 at 19:48):

Before working on #6783 should we merge #9273 or does it not matter?

view this post on Zulip Oliver Bertuch (Sep 19 2023 at 19:52):

Yes yes yes please merge first

view this post on Zulip Oliver Bertuch (Sep 19 2023 at 19:53):

Oh I see it was added to the 6.1 milestone!

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 19 2023 at 19:56):

Yes, both are flagged for 6.1.

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 19 2023 at 19:59):

Great, thanks, I left a comment: https://github.com/IQSS/dataverse/issues/6783#issuecomment-1726389019

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:26):

Should I pick up this issue? Add S3 tests to the regular integration test suite #6783

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:27):

@Oliver Bertuch are we in a better spot now that #9273 has been merged? Do you have ideas on how to proceed with S3 testing?

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:27):

You bet we are!

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:28):

Do you want to work on this now?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:30):

That's what I'm saying. It's in "this sprint". I could pick it up next.

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:30):

Oh! Wasn't aware!

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:31):

Want me to give you some thoughts and spill ideas?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:31):

That's my job. To stare at the project board. :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:31):

Yes, please hit me with ideas.

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:32):

The way I see it, we should definitely take a look at adding integration test using Testcontainers

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:32):

There are basically 2 ways to do this: go for LocalStack or use some other S3 compatible thing

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:33):

https://java.testcontainers.org/modules/minio/

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:33):

https://java.testcontainers.org/modules/localstack/

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:34):

As we are using the official AWS Java SDK, it might be worth going for LocalStack

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:34):

No objection to LocalStack, though I've only heard of it.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:34):

I think @Don Sizemore is a fan.

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:37):

You will have to do some ugly hacks with configuration... Sorry, but most dataverse.files S3 options ain't MPCONFIG enabled yet, so no nice simple wrapper with @JvmSetting

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:37):

Oh, I know, I worked around them. I know the hacks.

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:37):

BeforeAll and AfterAll are your friends :smile_cat:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:38):

So I need to add LocalStack to the docker compose file?

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:38):

No - Testcontainers makes containers start from within the test class

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:39):

No edit of the compose file necessary - it's much more self-contained this way

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:39):

Remember https://github.com/IQSS/dataverse/blob/develop/src/test/java/edu/harvard/iq/dataverse/authorization/providers/oauth2/oidc/OIDCAuthenticationProviderFactoryIT.java ?

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:40):

It's basically the same thing

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:40):

There is a code example on the LocalStack TC docs page

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:40):

If you want, we can hack on this later

view this post on Zulip Oliver Bertuch (Oct 04 2023 at 18:40):

(Meeting of DV Sustainability Group in 20)

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 18:42):

Ok, so something like this:

@Container
static KeycloakContainer keycloakContainer = new KeycloakContainer("quay.io/keycloak/keycloak:22.0")

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 04 2023 at 19:17):

Ok, I picked it up.

view this post on Zulip Oliver Bertuch (Oct 10 2023 at 11:25):

@Philip Durbin you might be interested in https://github.com/IQSS/dataverse/pull/9939/files#diff-7f214d6bbc0cb42b01ab03fb4b3d26fc18ca74c93e12ce998fbb7371f90f80ba and/or https://github.com/IQSS/dataverse/pull/9939/files#diff-cec3a75a2f2de0e5eed7be824f8305ec3101ec2a86d4ebc4e4939ce51e315997 for some more examples :smile:

view this post on Zulip Oliver Bertuch (Oct 10 2023 at 11:26):

Would it be in scope to refactor the dataverse.files.<id> thing for this? It would make writing these tests much easier...

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 10 2023 at 11:47):

Thanks for the examples!

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 10 2023 at 11:48):

I don't know what to say about the refactoring. I'm buried in HR paperwork again and we have a visitor all week (which I'm looking forward to). I haven't been able to work on S3 testing at all. :disappointed:

view this post on Zulip Oliver Bertuch (Oct 10 2023 at 11:54):

It's your first day of the week, so no worries. And you did cleanse out loads of stuff over the weekend! So much dust came off!

view this post on Zulip Oliver Bertuch (Oct 10 2023 at 12:24):

In case it helps: there's also https://github.com/gaul/s3proxy which might be interesting for testing (transient memory storage), but might also add support to drop more storage backends and rely on a different component to create a bridge between Dataverse -> S3 -> sth else

view this post on Zulip Oliver Bertuch (Oct 10 2023 at 12:27):

If you are looking for sth. more lightweight not involving testcontainers, maybe https://github.com/findify/s3mock would be worth a look

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 13:30):

#6783 starts by talking about S3AccessIT, how we should add it to our Jenkins runs. But if we switch to S3 storage we won't be testing the filesystem anymore.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 13:30):

Can we test both?

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:02):

Maybe. Should be doable.

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:02):

Might need API extension though

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:03):

But why only an e2e test?

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:03):

No UT/IT?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 15:18):

Ha. Well, I thought I'd start with what we requested in the issue. I don't want to lose sight of that.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 15:19):

But yes, integration tests too, if I can figure it out.

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:25):

But couldn't we just move that one test to a non e2e test?

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:26):

(Haven't looked yet)

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 15:26):

Wouldn't that safe the trouble of jumping through the E2E hoops with config and all?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 15:27):

Here's a handy link: https://github.com/IQSS/dataverse/blob/v6.0/src/test/java/edu/harvard/iq/dataverse/api/S3AccessIT.java

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 15:29):

I'm not sure how to move it. Happy to talk about it.

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 18:46):

To me this test looks like it's simply testing an upload and a deletion, making sure along the way the prefix is correct

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 18:46):

IMHO there is no need to have a full fledged deployment around for that...

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 19:30):

You saying that using Testcontainers we could create a dataset? Let me go look at the test you added.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 19:41):

OIDCAuthenticationProviderFactoryIT, that is

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 19:44):

I could do a quiet, quick Zoom if it helps

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 19:46):

Sure! https://harvard.zoom.us/j/98729082963?pwd=OUVmWnFwSTlibS9wOWI0aTRZOW45dz09

view this post on Zulip Oliver Bertuch (Oct 16 2023 at 20:35):

That was fun hacking! Thanks for doing all the typing @Philip Durbin :smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 16 2023 at 20:37):

Ha! I'm the hands. You're the brain. It was fun!

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 18:48):

I'm sending "hello" into S3AccessIO and pulling it out again, making sure it's still "hello". Code coverage of that class has jumped from 0% to 7.88%!

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 20:34):

Oh, wait. I'm not sure code coverage is being reported properly in Netbeans. :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 20:37):

For OIDCAuthProvider it's only showing 1.01%. I don't think that's right.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 20:38):

Ok, 67% when I look at target/site/jacoco-integration-test-coverage-report/edu.harvard.iq.dataverse.authorization.providers.oauth2.oidc/OIDCAuthProvider.html. Phew. :sweat_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 21:16):

I wonder if there's a way to skip MOST tests except the one I'm working on. Something like this: mvn verify -DskipTests=true -Dtest=S3AccessIOIT.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 21:19):

Ok, actually, this seems to work: mvn verify -Dtest=S3AccessIOIT

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 21:42):

Hmm, I'm still seeing Keycloak output though, from the other test, I guess.

view this post on Zulip Oliver Bertuch (Oct 17 2023 at 22:08):

https://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#test

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 19:57):

Thanks, I'll circle back to excluding Keycloak when testing S3 but right now I'm trying to configure multiple stores. I must be doing it wrong.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 19:58):

This what I have:

- dataverse_files_storage__driver__id=file-1
- dataverse_files_file-1_type=file
- dataverse_files_file-1_label=Filesystem
- dataverse_files_file-1_directory=${STORAGE_DIR}/store
- dataverse_files_s3-3reals3_type=s3
- dataverse_files_s3-3reals3_label=Real S3
- dataverse_files_s3-3reals3_custom__endpoint__url=s3.us-east-2.amazonaws.com
- dataverse_files_s3-3reals3_custom__endpoint__region=us-east-2
- dataverse_files_s3-3reals3_bucket__name=pdurbin
- dataverse_files_s3-3reals3_upload__redirect=true
- dataverse_files_s3-3reals3_access__key=REDACTED
- dataverse_files_s3-3reals3_secret__key=REDACTED

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 19:59):

I'm trying to have a default "file" store called "file-1".

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 19:59):

And an "s3" store called "s3-3reals3".

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 20:01):

(I plan to add two more s3 stores: "s3-1localstackdirect" and "s3-2localstackvanilla" or something. Similar to what I wrote about at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/issues/327 )

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 20:02):

When I list the stores I only get the file-1 one:

{
    "status": "OK",
    "data": {
        "Filesystem": "file-1"
    }
}

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 20:03):

@Oliver Bertuch do you see anything obvious I'm missing?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 20:06):

I've only ever configured a single store in containerized Dataverse. I'm not sure if it's supported.

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 08:17):

I am testing with two stores: one s3 and one file store. I use JVM_ARGS in my docker-compose like this:

    environment:
      JVM_ARGS:
        -Ddataverse.timerServer=true
        -Xmx24g
        -Xms4g
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:MaxMetaspaceSize=2g
        -XX:MetaspaceSize=256m
        -XX:+UseG1GC
        -XX:+UseStringDeduplication
        -XX:+DisableExplicitGC
        -Ddataverse.files.s3.upload-out-of-band=true
        -Ddataverse.api.allow-incomplete-metadata=true
        -Ddataverse.ui.allow-review-for-incomplete=false
        -Ddataverse.ui.show-validity-filter=true
        -Ddataverse.files.directory=/data
        -Ddataverse.files.uploads=/uploads
        -Ddataverse.files.s3.type=s3
        -Ddataverse.files.s3.label=s3
        -Ddataverse.files.s3.bucket-name=dataverse-local
        -Ddataverse.files.s3.download-redirect=true
        -Ddataverse.files.s3.upload-redirect=true
        -Ddataverse.files.s3.path-style-access=true
        -Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
        -Ddataverse.files.s3.custom-endpoint-region=us-east-1
        -Ddataverse.files.s3.disable-tagging=true
        -Ddataverse.files.file.type=file
        -Ddataverse.files.file.label=file
        -Ddataverse.files.file.directory=/data

Notice the space on the end of each line (otherwise it will not work).
This is the output from listing the storage:

{
   "status": "OK",
   "data":    {
      "s3": "s3",
      "file": "file"
   }
}

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 08:22):

It looks like Zulip strips the spaces, but there is space at the end of each line, before the end-of-line character, it functions as a separator for the jvm args.

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 08:27):

I also mount the aws secrets and config like this:

    volumes:
      - /path/to/aws/config/.aws:/opt/payara/.aws
      -...

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 08:51):

If you are interested, this is my entire image description:

  dataverse:
    image: ${DATAVERSE_IMAGE_TAG}
    hostname: dataverse
    user: payara
    networks:
      - app_net
      - db_net
    ports:
      - "7005:4848"
      - "7008:8080"
    environment:
      TZ: "Europe/Brussels"
      DATAVERSE_DB_HOST: db
      DATAVERSE_DB_PORT: 5432
      DATAVERSE_DB_USER: ${DB_USER}
      DATAVERSE_DB_NAME: ${DB_NAME}
      ENABLE_JDWP: 0
      DATAVERSE_MAIL_HOST: ${MAIL_SERVICE_HOST}
      DATAVERSE_MAIL_FROM: ${MAIL_SENDER}
      dataverse_fqdn: ${FQDN}
      dataverse_siteUrl: ${DATAVERSE_URL}
      dataverse_files_storage__driver__id: ${DRIVER_ID}
      db_SystemEmail: ${MAIL_SENDER_NAME}
      CUSTOM_INSTALL: "/opt/payara/custominstall"
      API_DEBUG: "false"
      LANG_DIR: "/languages"
      STORAGE_DIR: "/data"
      SECRETS_DIR: "/run/secrets"
      DUMPS_DIR: "/dumps"
      UPLOAD_DIR: "/uploads"
      MP_CONFIG_PROFILE: ${RDM_STAGE}
      JVM_ARGS:
        -Ddataverse.timerServer=true
        -Xmx24g
        -Xms4g
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:MaxMetaspaceSize=2g
        -XX:MetaspaceSize=256m
        -XX:+UseG1GC
        -XX:+UseStringDeduplication
        -XX:+DisableExplicitGC
        -Ddataverse.files.s3.upload-out-of-band=true
        -Ddataverse.api.allow-incomplete-metadata=true
        -Ddataverse.ui.allow-review-for-incomplete=false
        -Ddataverse.ui.show-validity-filter=true
        -Ddataverse.files.directory=/data
        -Ddataverse.files.uploads=/uploads
        -Ddataverse.files.s3.type=s3
        -Ddataverse.files.s3.label=s3
        -Ddataverse.files.s3.bucket-name=dataverse-local
        -Ddataverse.files.s3.download-redirect=true
        -Ddataverse.files.s3.upload-redirect=true
        -Ddataverse.files.s3.path-style-access=true
        -Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
        -Ddataverse.files.s3.custom-endpoint-region=us-east-1
        -Ddataverse.files.s3.disable-tagging=true
        -Ddataverse.files.file.type=file
        -Ddataverse.files.file.label=file
        -Ddataverse.files.file.directory=/data
        -Ddataverse.solr.host=solr
        -Ddataverse.mail.cc-support-on-contact-email=rdr@kuleuven.be
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ./config/dataverse:/opt/payara/custominstall
      - ./config/dataverse/set_language.sh:/opt/payara/scripts/init.d/set_language.sh
      - ./config/dataverse/change-admin-password.sh:/opt/payara/scripts/init.d/change-admin-password.sh
      - ${DV_FILES}:/data
      - ${DV_DUMPS}:/dumps
      - ${DV_UPLOADS}:/uploads
      - ${DV_LANG_DIR}:/languages
      - ${DOCROOT}:/opt/payara/appserver/glassfish/domains/domain1/docroot
      - ${SECRETS}:/run/secrets
      - ${SECRETS}/microprofile:/secrets
      - ${DV_PATH}/aws:/opt/payara/.aws
    depends_on:
      - proxy
      - db
      - index
    restart: unless-stopped
    privileged: false

I use a standard image built with maven. I think that the interesting script is the one that changes the admin password (you see it mounted in volumes):

#!/usr/bin/env bash

# Check and load secrets
if [ ! -s "${SECRETS_DIR}/admin/password" ]; then
  echo "No admin password present. Failing."
  exit 126
fi
ADMIN_PASSWORD=$(cat "${SECRETS_DIR}/admin/password")

echo "AS_ADMIN_PASSWORD=admin" > /tmp/password-change-file.txt
echo "AS_ADMIN_NEWPASSWORD=${ADMIN_PASSWORD}" >> /tmp/password-change-file.txt
echo "AS_ADMIN_PASSWORD=${ADMIN_PASSWORD}" >> ${PASSWORD_FILE}
asadmin --user=${ADMIN_USER} --passwordfile=/tmp/password-change-file.txt change-admin-password --domain_name=${DOMAIN_NAME} || true
rm /tmp/password-change-file.txt

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 11:08):

@Eryk Kulikowski that actually works?!? JVM_ARGS?!? @Oliver Bertuch have you seen this?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 11:08):

@Eryk Kulikowski what is the value of ${DATAVERSE_IMAGE_TAG} please?

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 12:10):

It works, you can pass any JVM/microprofile settings like this, just remember to add a space ( " ") after each argument. Otherwise, it will be sticking together. You can check if it worked in your server log when using it:

[Entrypoint] running /opt/payara/scripts/startInForeground.sh in foreground
Executing Payara Server with the following command line:
/opt/java/openjdk/bin/java
-cp
/opt/payara/appserver/glassfish/domains/domain1/lib/ext/*:/opt/payara/appserver/glassfish/modules/glassfish.jar
-Ddataverse.timerServer=true
-Xmx24g
-Xms4g
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxMetaspaceSize=2g
-XX:MetaspaceSize=256m
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+DisableExplicitGC
-Ddataverse.files.s3.upload-out-of-band=true
-Ddataverse.api.allow-incomplete-metadata=true
-Ddataverse.ui.allow-review-for-incomplete=false
-Ddataverse.ui.show-validity-filter=true
-Ddataverse.files.directory=/data
-Ddataverse.files.uploads=/uploads
-Ddataverse.files.s3.type=s3
-Ddataverse.files.s3.label=s3
-Ddataverse.files.s3.bucket-name=dataverse-local
-Ddataverse.files.s3.download-redirect=true
-Ddataverse.files.s3.upload-redirect=true
-Ddataverse.files.s3.path-style-access=true
-Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
-Ddataverse.files.s3.custom-endpoint-region=us-east-1
-Ddataverse.files.s3.disable-tagging=true
-Ddataverse.files.file.type=file
-Ddataverse.files.file.label=file
-Ddataverse.files.file.directory=/data
-Ddataverse.solr.host=solr
-Ddataverse.mail.cc-support-on-contact-email=rdr@kuleuven.be
-Ddataverse.lang.directory=/languages

-XX:+UnlockDiagnosticVMOptions
...
-Dfelix.fileinstall.bundles.startTransient=true

-Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory
...
-Djava.library.path=/opt/payara/appserver/glassfish/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
com.sun.enterprise.glassfish.bootstrap.ASMain
/opt/payara/config/pre-boot-commands.asadmin
-upgrade
false
-read-stdin
true
-postbootcommandfile
/opt/payara/config/post-boot-commands.asadmin
-domainname
domain1
-domaindir
/opt/payara/appserver/glassfish/domains/domain1
-asadmin-args
--host,,,localhost,,,--port,,,4848,,,--user,,,admin,,,--passwordfile,,,/opt/payara/passwordFile,,,--secure=false,,,--terse=false,,,--extraterse=false,,,--echo=false,,,--interactive=false,,,--autoname=false,,,start-domain,,,--verbose=false,,,--watchdog=false,,,--debug=false,,,--domaindir,,,/opt/payara/appserver/glassfish/domains,,,domain1
-instancename
server
-type
DAS
-verbose
false
-asadmin-classpath
/opt/payara/appserver/glassfish/lib/client/appserver-cli.jar
-debug
false
-asadmin-classname
com.sun.enterprise.admin.cli.AdminMain
-watchdog
false

Launching Payara Server on Felix platform
Oct 18, 2023 2:02:14 PM com.sun.enterprise.glassfish.bootstrap.osgi.BundleProvisioner createBundleProvisioner
INFO: Create bundle provisioner class = class com.sun.enterprise.glassfish.bootstrap.osgi.BundleProvisioner.
Registered com.sun.enterprise.glassfish.bootstrap.osgi.EmbeddedOSGiGlassFishRuntime@7c581736 in service registry.
Reading in commandments from /opt/payara/config/pre-boot-commands.asadmin

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 12:17):

@Philip Durbin
We use our own LIBIS registry for docker images. We build them with rdm-build https://github.com/libis/rdm-build
Check this script https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dv.sh
It uses env variables from here: https://github.com/libis/rdm-build/blob/main/env.dev
Resulting image tag is: docker.io/rdm/dataverse-stock:1.5
I add ruby to that in https://github.com/libis/rdm-build/blob/main/images/dataverse/Dockerfile
The final image tag is docker.io/rdm/dataverse:1.5
Where docker.io becomes our own LIBIS registry when building for production (STAGE=prod).
This particular image is still in test, not yet in pilot or production.

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 12:21):

My specific branch build script: https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dev_dv.sh
The other one uses the tagged version it checks out (notice that you download specific maven and java in the script, but it uses your default maven registry): https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dv.sh

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 12:29):

@Eryk Kulikowski what are you doing in an hour? Would you like to join the Containerization Working Group meeting? :grinning: https://ct.gdcc.io

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 12:32):

As a compose file is YAML, you might try using JVM_ARG: >. See also https://yaml.org/spec/1.2.2/#line-folding

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 12:33):

(So you should not need to pay attention about the spaces at the end)

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 12:38):

@Eryk Kulikowski I'm glad to see you are basing your work on the WG outputs! Glad it's being reused. Looks like we made it customizable enough.

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 12:41):

FWIW @Eryk Kulikowski, using Xmx, Xms etc in a container context is not considered good practice. Did you see the env vars for memory management in the tunables table? http://preview.guides.gdcc.io/en/develop/container/base-image.html#tunables You can easily asign the 24g of RAM defining the container limits in the compose definition, by default 70% of that will be allocated as heap.

view this post on Zulip Eryk Kulikowski (Oct 19 2023 at 12:50):

@Oliver Bertuch Thanks! I will check it out!

view this post on Zulip Don Sizemore (Oct 19 2023 at 14:56):

@Philip Durbin where I'm at: I have the basics in place, but next to read in both buckets' settings and tell Ansible to provision them within the container: https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/327_multiple_s3_stores/tests/group_vars/jenkins.yml#L336
One of my questions was whether you want want the S3 testing to provision said LocalStack buckets, but that would break testing on S3 proper. Any further direction/thoughts from you and/or @Oliver Bertuch at this point most welcome.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 16:12):

@Don Sizemore hmm, would it be a permanent break on testing S3 proper? Or more that if we someday want to test both S3 and LocalStack that we'd need to do further refactoring on the Ansible side?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 16:12):

@Deirdre Kirmis if your ears were burning a couple hours ago, we were trying to remember what flavor of S3 you use. Something on Dell, I think.

view this post on Zulip Deirdre Kirmis (Oct 19 2023 at 16:26):

we use aws s3 .. we tried for about a year to use Dell Isilon S3 for one of our stores but never got it working =)

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 16:27):

Oh. Bummer!

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 16:28):

Deirdre Kirmis said:

we use aws s3 .. we tried for about a year to use Dell Isilon S3 for one of our stores but never got it working =)

Have you ever tried to front the Isilon with Minio? I will probably do sth like that with our internal S3 to enable direct up/download. https://min.io/docs/minio/linux/administration/object-management/transition-objects-to-s3.html

view this post on Zulip Deirdre Kirmis (Oct 19 2023 at 16:28):

i think there is actually a fix to dataverse now (it was a bug in apache on the isilon server) .. but we decided to go all in on aws

view this post on Zulip Deirdre Kirmis (Oct 19 2023 at 16:28):

i did try fronting it with minio .. i need to look back at my notes to see how that went .. i think i ran into some problems there too

view this post on Zulip Deirdre Kirmis (Oct 19 2023 at 16:29):

but i also don't exactly always know what i'm doing :grinning_face_with_smiling_eyes:

view this post on Zulip Oliver Bertuch (Oct 19 2023 at 16:30):

Welcome to the club, here's your member id :identification_card: :see_no_evil:

view this post on Zulip Deirdre Kirmis (Oct 19 2023 at 16:30):

:sweat_smile: always fun to test things out!

view this post on Zulip Don Sizemore (Oct 19 2023 at 17:06):

@Philip Durbin Ansible can create and configure the buckets in LocalStack, but if for some/any reason Dataverse wanted to, it could? I don't think you want this to be the case, but just checking.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 17:57):

@Don Sizemore I think I'm a little slow today but I think we're on the same page. All good. :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:09):

After today's meeting I was all excited to try the magic trick with store names that don't have dashes in them.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:09):

I tried this:

- dataverse_files_storage__driver__id=file1
- dataverse_files_file1_type=file
- dataverse_files_file1_label=Filesystem
- dataverse_files_file1_directory=${STORAGE_DIR}/store
- dataverse_files_s3c_type=s3
- dataverse_files_s3c_label=Real S3
- dataverse_files_s3c_custom__endpoint__url=s3.us-east-2.amazonaws.com
- dataverse_files_s3c_custom__endpoint__region=us-east-2
- dataverse_files_s3c_bucket__name=pdurbin
- dataverse_files_s3c_upload__redirect=true
- dataverse_files_s3c_access__key=REDACTED
- dataverse_files_s3c_secret__key=REDACTED

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:10):

However, when I list the stores, I still only see the one:

{
    "status": "OK",
    "data": {
        "Filesystem": "file1"
    }
}

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:10):

@Oliver Bertuch ^^

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:10):

I guess I'll try JVM_ARGS next.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:38):

[ERROR] Failed to execute goal io.fabric8:docker-maven-plugin:0.43.4:run (default-cli) on project dataverse: Execution default-cli of goal io.fabric8:docker-maven-plugin:0.43.4:run failed: while scanning a simple key
[ERROR]  in 'reader', line 23, column 9:
[ERROR]             -Ddataverse.files.storage-driver ...
[ERROR]             ^
[ERROR] could not find expected ':'
[ERROR]  in 'reader', line 24, column 9:
[ERROR]             -Ddataverse.files.file1.type=file
[ERROR]             ^

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 19:39):

... when using this file:

docker-compose-dev.yml

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 20:26):

From https://www.yamllint.com

Screenshot-2023-10-19-at-4.25.49-PM.png

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 20:33):

Ah ha! I think (I hope) I got it.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 20:35):

Nope. :very_angry:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 21:00):

Weird! I sort of got it working?!? But I have three stores when I expected two:

{
    "status": "OK",
    "data": {
        "RealS3": "s3c",
        "Local": "local",
        "Filesystem": "file1"
    }
}

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 21:00):

Here's my file:

docker-compose-dev.yml

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 19 2023 at 21:06):

file1 and s3c I expect. I'm not sure where local is coming from. :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 20:04):

Just capturing some commands and output that @Don Sizemore helped me figure out to prove to myself that we can upload a test file from the Dataverse/Payara container to the Localstack container:

payara@dataverse:~$ echo -e '[default]\nregion = us-east-2' > ~/.aws/config
payara@dataverse:~$ cat ~/.aws/config
[default]
region = us-east-2
payara@dataverse:~$ echo -e '[default]\naws_access_key_id = default\naws_secret_access_key = default' > ~/.aws/credentials
payara@dataverse:~$ cat ~/.aws/credentials
[default]
aws_access_key_id = default
aws_secret_access_key = default
payara@dataverse:~$
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 ls
2023-10-23 18:02:38 mybucket
payara@dataverse:~$
payara@dataverse:~$ echo foo > test.txt
payara@dataverse:~$ cat test.txt
foo
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 cp test.txt s3://mybucket/
upload: ./test.txt to s3://mybucket/test.txt
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 ls s3://mybucket/
2023-10-23 18:13:23          4 test.txt
payara@dataverse:~$

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 20:06):

But we still can't upload files from Dataverse itself to LocalStack. Here's the config I'm using:

-Ddataverse.files.localstack1.type=s3
-Ddataverse.files.localstack1.label=LocalStack
-Ddataverse.files.localstack1.custom-endpoint-url=http://localstack:4566
-Ddataverse.files.localstack1.custom-endpoint-region=us-east-2
-Ddataverse.files.localstack1.bucket-name=mybucket
-Ddataverse.files.localstack1.upload-redirect=false
-Ddataverse.files.localstack1.access-key=default
-Ddataverse.files.localstack1.secret-key=default

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 20:06):

@Oliver Bertuch any ideas for me? I think I might try MinIO instead.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 21:20):

Ah ha! For Minio, at least, I needed this:

-Ddataverse.files.minio1.path-style-access=true

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 21:20):

As mentioned in our docs: https://guides.dataverse.org/en/6.0/installation/config.html#reported-working-s3-compatible-storage

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 24 2023 at 18:28):

@Don Sizemore please remind me, do you have a preference for LocalStack or MinIO? I'm certainly happy to see if the trick above works with LocalStack.

view this post on Zulip Don Sizemore (Oct 24 2023 at 18:59):

Philip Durbin said:

Don Sizemore please remind me, do you have a preference for LocalStack or MinIO? I'm certainly happy to see if the trick above works with LocalStack.

I really don't. LocalStack sounds better for testing overall; MinIO sounds better for sites like ASU who use Dell MinIO. Also because I have to keep correcting myself when I type Minion.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 24 2023 at 19:03):

Well, the path-style-access=true trick does seem to work for LocalStack.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 24 2023 at 21:16):

I'm testing both.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 25 2023 at 14:06):

LocalStack is finickier about bucket creation. I've only had luck creating the bucket with awslocal s3 mb s3://mybucket (rather than Java).

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 25 2023 at 15:18):

I made a draft pull request! add S3 tests, LocalStack, MinIO #6783 #10044

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 25 2023 at 15:21):

Feedback very welcome, of course!

view this post on Zulip Guido Schmutz (Oct 27 2023 at 08:41):

@Philip Durbin sorry a bit late, as you have figured it out with the S3 properties.

I have used the following way to specify the properties (just as environment variables on the container):

  dataverse:
    image: gdcc/dataverse:unstable
    container_name: dataverse
    hostname: dataverse
    user: payara
    labels:
      com.platys.name: dataverse
      com.platys.webui.title: Dataverse UI
      com.platys.webui.url: http://dataplatform:28294
      com.platys.restapi.title: Dataverse UI
      com.platys.restapi.url: http://dataplatform:28294/api/info/metrics/dataverses
    ports:
      - 28294:8080
      - 4848:4848
      - 9009:9009   # JDWP
      - 8686:8686   # JMX
    environment:
      - DATAVERSE_FQDN=${PUBLIC_IP}
      - _CT_DATAVERSE_SITEURL=http://${PUBLIC_IP}:28294
      - DATAVERSE_DB_HOST=dataverse-postgresql
      - DATAVERSE_DB_USER=dataverse
      - DATAVERSE_DB_PASSWORD=abc123!
      - DATAVERSE_MAIL_HOST=mailpit
      - DATAVERSE_MAIL_FROM=dataverse@localhost
      - _CT_DATAVERSE_SOLR_HOST=dataverse-solr
      - DATAVERSE_SOLR_PORT=8983
      - ENABLE_JMX=0
      - ENABLE_RELOAD=0
      - ENABLE_JDWP=1
      - dataverse_files_s3_type=s3
      - dataverse_files_s3_label="Object Storage"
      - dataverse_files_s3_bucket__name="dataverse-bucket"
      - dataverse_files_s3_custom__endpoint__url=http://${EADP_IP}:9000
      - dataverse_files_s3_custom__endpoint__region='us-east-1'
      - dataverse_files_s3_path__style__access=True
      - dataverse_files_s3_access__key=${PLATYS_AWS_ACCESS_KEY:?PLATYS_AWS_ACCESS_KEY must be set either in .env or as an environment variable}
      - dataverse_files_s3_secret__key=${PLATYS_AWS_SECRET_ACCESS_KEY:?PLATYS_AWS_SECRET_ACCESS_KEY must be set either in .env or as an environment variable}

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 27 2023 at 11:32):

@Guido Schmutz right, I'm able to configure a single file store like that, with environment variables, but when I tried to configure multiple stores, it didn't work. If you get a chance to try this, please let me know! Otherwise, I'll wait for a fix for #9998.

view this post on Zulip Guido Schmutz (Oct 27 2023 at 12:57):

ok, I see, you configure multiple stores, challenge accepted ;-) ... took me a while to figure it out, but now it works

  dataverse:
    image: gdcc/dataverse:unstable
    container_name: dataverse
    hostname: dataverse
    user: payara
    labels:
      com.platys.name: dataverse
      com.platys.webui.title: Dataverse UI
      com.platys.webui.url: http://dataplatform:28294
      com.platys.restapi.title: Dataverse UI
      com.platys.restapi.url: http://dataplatform:28294/api/info/metrics/dataverses
    ports:
      - 28294:8080
      - 4848:4848
      - 9009:9009   # JDWP
      - 8686:8686   # JMX
    environment:
      - DATAVERSE_FQDN=${PUBLIC_IP}
      - _CT_DATAVERSE_SITEURL=http://${PUBLIC_IP}:28294
      - DATAVERSE_DB_HOST=dataverse-postgresql
      - DATAVERSE_DB_USER=dataverse
      - DATAVERSE_DB_PASSWORD=abc123!
      - DATAVERSE_MAIL_HOST=mailpit
      - DATAVERSE_MAIL_FROM=dataverse@localhost
      - _CT_DATAVERSE_SOLR_HOST=dataverse-solr
      - DATAVERSE_SOLR_PORT=8983
      - ENABLE_JMX=0
      - ENABLE_RELOAD=0
      - ENABLE_JDWP=1
      - dataverse_files_storage__driver__id=s3
      - dataverse_files_file1_type=file
      - dataverse_files_file1_label="Local Filesystem"
      - dataverse_files_file1_directory=${STORAGE_DIR}/store
      - dataverse_files_s3_type=s3
      - dataverse_files_s3_label="Object Storage"
      - dataverse_files_s3_bucket__name="dataverse-bucket"
      - dataverse_files_s3_custom__endpoint__url=http://${EADP_IP}:9000
      - dataverse_files_s3_custom__endpoint__region='us-east-1'
      - dataverse_files_s3_path__style__access=True
      - dataverse_files_s3_access__key=${PLATYS_AWS_ACCESS_KEY:?PLATYS_AWS_ACCESS_KEY must be set either in .env or as an environment variable}
      - dataverse_files_s3_secret__key=${PLATYS_AWS_SECRET_ACCESS_KEY:?PLATYS_AWS_SECRET_ACCESS_KEY must be set either in .env or as an environment variable}
    volumes:
      - ./data-transfer:/data-transfer
    restart: unless-stopped
$ curl -s -X GET -H 'X-Dataverse-key: REDACTED' 'http://192.168.116.137:28294/api/admin/dataverse/storageDrivers' | jq
{
  "status": "OK",
  "data": {
    "Object Storage": "s3",
    "Local Filesystem": "file1"
  }
}

Guess it was the space in the label (- dataverse_files_s3c_label=Real S3) in your case, which caused an error.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 27 2023 at 12:59):

Ah, right. I caught that later, the space problem, in the context of the JVM_ARGS. I'll try it again. Thanks!

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 07 2023 at 12:53):

I set up S3 testing at https://github.com/gdcc/api-test-runner/actions/workflows/s3.yml

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 07 2023 at 12:53):

After some fiddling, it seems to work fine.

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 07 2023 at 12:54):

So my thought is that once we merge #10044, we'll copy the config I added over to the "develop" and "manual" jobs. (And eventually the "alpha" job, once we make a release.)

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 08 2023 at 15:27):

@Don Sizemore heads up that I just added test for direct download (and non-direct download): https://github.com/IQSS/dataverse/pull/10044/commits/c2d8ae54a98b850f6ac9f94fa799b36bf8acaa6c

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 08 2023 at 15:27):

Here's the change I made on the API test runner side: https://github.com/gdcc/api-test-runner/commit/02c815908480e571d28d8d47ccbc4456979b377e

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 08 2023 at 15:28):

I just kicked off https://github.com/gdcc/api-test-runner/actions/runs/6800385363 to test it.

view this post on Zulip Don Sizemore (Dec 05 2023 at 15:51):

Philip Durbin said:

Here's the change I made on the API test runner side: https://github.com/gdcc/api-test-runner/commit/02c815908480e571d28d8d47ccbc4456979b377e

I can tell Dataverse-Ansible not to template ~dataverse/.aws/credentials (and config) though it gets severely ticked when I rename it. Is Dataverse preferring global AWS creds to datastore-defined credentials expected, and do we want to open an issue to change this order?

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 15:51):

Good questions. I'm not sure. :sweat_smile:

view this post on Zulip Don Sizemore (Dec 05 2023 at 15:52):

maybe a question for tech hours this afternoon?

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 15:52):

yes, excellent idea

view this post on Zulip Don Sizemore (Dec 05 2023 at 15:53):

I mean, I _can_ nuke that file just for testing runs, but that won't address any installation with multiple datastores who expect / need a global credential file as well.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 15:53):

right, it sure smells like a bug

view this post on Zulip Don Sizemore (Dec 05 2023 at 15:55):

ah. I remember now. Jim's PR corrects credentials preference when RBAC is involved. Won't affect conf file vs. jvm-option precedence.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 15:56):

Meanwhile, following the tweaks I just made to the tests (changing the access and secret keys to match Jenkins), the first manual run on the API test runner failed but I'm hoping it's transient. I kicked off a second run.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 17:22):

Bah, the second run failed with the same error:

Error: S3AccessIT.setUp:66 ยป SdkClient Unable to execute HTTP request: Connect to s3.localhost.localstack.cloud:4566 [s3.localhost.localstack.cloud/127.0.0.1] failed: Connection refused

view this post on Zulip Don Sizemore (Dec 05 2023 at 17:50):

I left my instance up from this morning if you want to test anything further. If not, I'll kill it to save y'all $$$

view this post on Zulip Don Sizemore (Dec 05 2023 at 18:38):

sounds like we want Jim's PR merged and then everything should start to work (at least WRT S3 creds). Ima kill this morning's EC2 instance.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 18:54):

That's fine. When do we merge the Ansible branch?

view this post on Zulip Don Sizemore (Dec 05 2023 at 20:00):

Philip Durbin said:

That's fine. When do we merge the Ansible branch?

I'd say we merge Jim's PR, then merge that into your S3 testing branch, let me test manually once more, and we'll see if things run cleanly? (they should)

view this post on Zulip Juan Pablo Tosca Villanueva (Dec 05 2023 at 20:00):

image.png

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 05 2023 at 20:10):

Sounds good, and meanwhile, I'll try to figure out while I'm getting that "s3.localhost.localstack.cloud:4566... Connection refused" error, which JP points out is the same he saw when not on VPN. Makes no sense to me. :sweat_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 15:52):

S3 priorities has been merged! :tada: #10004

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 15:53):

Meanwhile, I'm still looking at the connection refused error. It's when I list the buckets, which is optional, so I commented that out to see if/when LocalStack fails later. Here's the run: https://github.com/gdcc/api-test-runner/actions/runs/7116920717

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 19:15):

Ok, I think I figured out the problem. I hope so. Re-running tests now.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 19:20):

Phew, yes, passing now: https://github.com/gdcc/api-test-runner/actions/runs/7118949044

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 19:21):

next up: Jenkins!

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 20:41):

I had confused myself with the api-test-runner. For now, until we merge, I set up a dedicated S3 job. Once we merge I'll put that extra config in the "develop" job. And once we release, into the "alpha" job.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 20:42):

The config basically spins up LocalStack and MinIO and tells Dataverse to use them.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 20:42):

docker compose stuff

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 20:42):

like I'm using locally, a slight variation on that

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 21:03):

@Don Sizemore I'm looking now at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/pull/337 . Thanks!! :heart:

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 21:04):

This is very exciting. So Docker-y!

view this post on Zulip Juan Pablo Tosca Villanueva (Dec 06 2023 at 21:08):

image.png

view this post on Zulip Juan Pablo Tosca Villanueva (Dec 06 2023 at 21:09):

Speaking about Doker, I broke their website this morning and I had to share their error page

view this post on Zulip Juan Pablo Tosca Villanueva (Dec 06 2023 at 21:09):

We need that icon :laughter_tears:

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 06 2023 at 21:11):

@Don Sizemore approved! Thank you!!

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 07 2023 at 14:16):

All tests are passing! Including the new S3 tests! https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10044/17/testReport/edu.harvard.iq.dataverse.api/S3AccessIT/

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 07 2023 at 14:16):

And now I can see code coverage! 35% for S3AccessIO! https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10044/17/execution/node/3/ws/target/coverage-it/edu.harvard.iq.dataverse.dataaccess/S3AccessIO.html

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 07 2023 at 14:17):

Could be better but I'm happy it's higher than what we had with only unit tests, 7%.


Last updated: Oct 30 2025 at 05:14 UTC