As part of https://github.com/IQSS/dataverse/pull/9541 I need to test S3.
I've always tested S3 with Payara running directly on my Mac. :grimacing:
It should be possible to supply all the necessary S3 config via MPCONFIG, right? That way I can continue my streak of doing all development with containers. :happy: It's been over a month! :tada:
I'd rather not switch back to Payara running directly on my Mac if I can avoid it!
Yes it is possible but only dirty, because the files part is not yet supporting MPCONFIG. So you need to use the "dataverse_files_xxx_option=..." env var hack. Mind the casing.
Interesting. Do you have a list of env vars handy I can start with? If not, I'll look at the guides.
Just take the example from the startup script asa template :smile:
Sorry, which startup script?
Nice. Thanks. I'll probably add some docs for this!
Sure. Don't put too much effort into it - should go away anyway with making the config of storage via MPCONFIG available. Someone needs to code that...
Right. There's a comment about https://github.com/IQSS/dataverse/issues/7000 but no PR yet, right?
Am I right that we don't plan to create a PR until milestone G? "MPCONFIG for remaining dozen JVM settings (Size: 80)"
No, that's not correct.
Oh, wait. Milestone D. "Configurability: Make storage configuration use MPCONFIG (Size: 33)"
There are some config things in milestones along the way
So all I have for now is a local branch and a stash of changes on my laptop for the work on the file storage thing
Awesome. Want me to create an issue?
There was some discussion a while ago during some tech hour about if it would be OK to require a list of storage names which IIRC was eventually put on hold with "hmm yeah dunno maybe fine lets see"
Well there is 7000, but maybe we can close that and create issues for the remaining parts
Yeah, smaller chunks, I'm thinking.
(That would be mail, which has an issue, storage and "the leftovers")
yum
I've been going on with this for a while... :innocent:
Glad that it seems to be appreciated now among the devs... That lookup thing seems to be intriguing... :grinning_face_with_smiling_eyes:
Well, now I need it. The ability to configure S3 in containers. :happy:
Hrhr... I remember we talked about this stuff long time ago and one of my arguments was "you want this for containers" :yum:
Ah BTW do you want to test localstack for S3?
Or are you going to use Minio?
(Or Seaweedfs)
IIRC this is part of another milestone, would it make sense to take a look into how spin up the storage container, too?
I'm going to use real S3. It's possible to configure it with the hacks above, right?
Yes, that's completely possible
Oh wait: one thing... What I already had a chance to bring in for MPCONFIG and S3 is the keys thing
So you don't need to think about how to deal with the AWS profile and config files in ~/.aws
That is documented in the main section, it's been sitting there for a while and made @Don Sizemore happy before
I started editing my .env file today but got so confused I gave up and just ran Payara locally on my Mac. These are the (surely incorrect) values I was editing when I gave up:
DATAVERSE_FILES_STORAGE__DRIVER__ID=s3
DATAVERSE_FILES_LOCAL_TYPE=s3
DATAVERSE_FILES_LOCAL_LABEL=S3
DATAVERSE.FILES.<ID>.ACCESS-KEY=
DATAVERSE.FILES.<ID>.SECRET-KEY=
I'm happy to try again but I'd need some hand holding. Or more time (than I want to spend today) studying the code. Or I can just wait for MPCONFIG. :sweat_smile:
:leftwards_hand: Here you go
May I point you towards http://preview.guides.gdcc.io/en/develop/container/app-image.html#tunables and the description of the "dataverse_*" vars? Maybe we should add a note that this is case sensitive.
Ah, I didn't know about these magic trick rules.
Replace - with __ for example.
I wasn't sure what do with with SECRET-KEY and gave up.
Both Secret and Access Key are already using MPCONFIG, so you may use either way
Configuring S3 via MPCONFIG is possible, right? At least in containers?
No. Needs a hack.
Ok, the but the hack should work. Great. I'll try again some day.
I was able to configure containerized Dataverse to use S3!
diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index 30c55661a2..298a20b6ff 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -13,6 +13,14 @@ services:
- DATAVERSE_DB_PASSWORD=secret
- DATAVERSE_DB_USER=${DATAVERSE_DB_USER}
- DATAVERSE_FEATURE_API_BEARER_AUTH=1
+ - dataverse_files_storage__driver__id=s3
+ - dataverse_files_s3_type=s3
+ - dataverse_files_s3_label=S3
+ - dataverse_files_s3_custom__endpoint__url=s3.us-east-2.amazonaws.com
+ - dataverse_files_s3_custom__endpoint__region=us-east-2
+ - dataverse_files_s3_bucket__name=pdurbin
+ - dataverse_files_s3_access__key=REDACTED
+ - dataverse_files_s3_secret__key=REDACTED
ports:
- "8080:8080" # HTTP (Dataverse Application)
- "4848:4848" # HTTP (Payara Admin Console)
As you can see earlier in this thread, I was using UPPER CASE before, like any good sysadmin. But it seems like one must use lower case.
Yeah like I said (did I?) before in this thread, this is a limitation because we have a hack in place for now until the storage subsystem is MPCONFIGified
Sorry 'bout that!
Now I see you said "case sensitive" above.
And now I get it. :sweat_smile:
Sorry, I should have EMPHASIZED_IT_MORE. :yum:
Well, now I can document it: #containers > docs for milestone A #9540
I just thought I'd point out that this issue made it to the board. Currently it's in "sprint ready": Add S3 tests to the regular integration test suite #6783
Any discussion or prep for this S3 testing work? We could discuss it today or a future ct meeting if people want to.
Before working on #6783 should we merge #9273 or does it not matter?
Yes yes yes please merge first
Oh I see it was added to the 6.1 milestone!
Yes, both are flagged for 6.1.
Great, thanks, I left a comment: https://github.com/IQSS/dataverse/issues/6783#issuecomment-1726389019
Should I pick up this issue? Add S3 tests to the regular integration test suite #6783
@Oliver Bertuch are we in a better spot now that #9273 has been merged? Do you have ideas on how to proceed with S3 testing?
You bet we are!
Do you want to work on this now?
That's what I'm saying. It's in "this sprint". I could pick it up next.
Oh! Wasn't aware!
Want me to give you some thoughts and spill ideas?
That's my job. To stare at the project board. :grinning:
Yes, please hit me with ideas.
The way I see it, we should definitely take a look at adding integration test using Testcontainers
There are basically 2 ways to do this: go for LocalStack or use some other S3 compatible thing
https://java.testcontainers.org/modules/minio/
https://java.testcontainers.org/modules/localstack/
As we are using the official AWS Java SDK, it might be worth going for LocalStack
No objection to LocalStack, though I've only heard of it.
I think @Don Sizemore is a fan.
You will have to do some ugly hacks with configuration... Sorry, but most dataverse.files S3 options ain't MPCONFIG enabled yet, so no nice simple wrapper with @JvmSetting
Oh, I know, I worked around them. I know the hacks.
BeforeAll and AfterAll are your friends :smile_cat:
So I need to add LocalStack to the docker compose file?
No - Testcontainers makes containers start from within the test class
No edit of the compose file necessary - it's much more self-contained this way
It's basically the same thing
There is a code example on the LocalStack TC docs page
If you want, we can hack on this later
(Meeting of DV Sustainability Group in 20)
Ok, so something like this:
@Container
static KeycloakContainer keycloakContainer = new KeycloakContainer("quay.io/keycloak/keycloak:22.0")
Ok, I picked it up.
@Philip Durbin you might be interested in https://github.com/IQSS/dataverse/pull/9939/files#diff-7f214d6bbc0cb42b01ab03fb4b3d26fc18ca74c93e12ce998fbb7371f90f80ba and/or https://github.com/IQSS/dataverse/pull/9939/files#diff-cec3a75a2f2de0e5eed7be824f8305ec3101ec2a86d4ebc4e4939ce51e315997 for some more examples :smile:
Would it be in scope to refactor the dataverse.files.<id> thing for this? It would make writing these tests much easier...
Thanks for the examples!
I don't know what to say about the refactoring. I'm buried in HR paperwork again and we have a visitor all week (which I'm looking forward to). I haven't been able to work on S3 testing at all. :disappointed:
It's your first day of the week, so no worries. And you did cleanse out loads of stuff over the weekend! So much dust came off!
In case it helps: there's also https://github.com/gaul/s3proxy which might be interesting for testing (transient memory storage), but might also add support to drop more storage backends and rely on a different component to create a bridge between Dataverse -> S3 -> sth else
If you are looking for sth. more lightweight not involving testcontainers, maybe https://github.com/findify/s3mock would be worth a look
#6783 starts by talking about S3AccessIT, how we should add it to our Jenkins runs. But if we switch to S3 storage we won't be testing the filesystem anymore.
Can we test both?
Maybe. Should be doable.
Might need API extension though
But why only an e2e test?
No UT/IT?
Ha. Well, I thought I'd start with what we requested in the issue. I don't want to lose sight of that.
But yes, integration tests too, if I can figure it out.
But couldn't we just move that one test to a non e2e test?
(Haven't looked yet)
Wouldn't that safe the trouble of jumping through the E2E hoops with config and all?
Here's a handy link: https://github.com/IQSS/dataverse/blob/v6.0/src/test/java/edu/harvard/iq/dataverse/api/S3AccessIT.java
I'm not sure how to move it. Happy to talk about it.
To me this test looks like it's simply testing an upload and a deletion, making sure along the way the prefix is correct
IMHO there is no need to have a full fledged deployment around for that...
You saying that using Testcontainers we could create a dataset? Let me go look at the test you added.
OIDCAuthenticationProviderFactoryIT, that is
I could do a quiet, quick Zoom if it helps
Sure! https://harvard.zoom.us/j/98729082963?pwd=OUVmWnFwSTlibS9wOWI0aTRZOW45dz09
That was fun hacking! Thanks for doing all the typing @Philip Durbin :smile:
Ha! I'm the hands. You're the brain. It was fun!
I'm sending "hello" into S3AccessIO and pulling it out again, making sure it's still "hello". Code coverage of that class has jumped from 0% to 7.88%!
Oh, wait. I'm not sure code coverage is being reported properly in Netbeans. :thinking:
For OIDCAuthProvider it's only showing 1.01%. I don't think that's right.
Ok, 67% when I look at target/site/jacoco-integration-test-coverage-report/edu.harvard.iq.dataverse.authorization.providers.oauth2.oidc/OIDCAuthProvider.html. Phew. :sweat_smile:
I wonder if there's a way to skip MOST tests except the one I'm working on. Something like this: mvn verify -DskipTests=true -Dtest=S3AccessIOIT.
Ok, actually, this seems to work: mvn verify -Dtest=S3AccessIOIT
Hmm, I'm still seeing Keycloak output though, from the other test, I guess.
https://maven.apache.org/surefire/maven-failsafe-plugin/integration-test-mojo.html#test
Thanks, I'll circle back to excluding Keycloak when testing S3 but right now I'm trying to configure multiple stores. I must be doing it wrong.
This what I have:
- dataverse_files_storage__driver__id=file-1
- dataverse_files_file-1_type=file
- dataverse_files_file-1_label=Filesystem
- dataverse_files_file-1_directory=${STORAGE_DIR}/store
- dataverse_files_s3-3reals3_type=s3
- dataverse_files_s3-3reals3_label=Real S3
- dataverse_files_s3-3reals3_custom__endpoint__url=s3.us-east-2.amazonaws.com
- dataverse_files_s3-3reals3_custom__endpoint__region=us-east-2
- dataverse_files_s3-3reals3_bucket__name=pdurbin
- dataverse_files_s3-3reals3_upload__redirect=true
- dataverse_files_s3-3reals3_access__key=REDACTED
- dataverse_files_s3-3reals3_secret__key=REDACTED
I'm trying to have a default "file" store called "file-1".
And an "s3" store called "s3-3reals3".
(I plan to add two more s3 stores: "s3-1localstackdirect" and "s3-2localstackvanilla" or something. Similar to what I wrote about at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/issues/327 )
When I list the stores I only get the file-1 one:
{
"status": "OK",
"data": {
"Filesystem": "file-1"
}
}
@Oliver Bertuch do you see anything obvious I'm missing?
I've only ever configured a single store in containerized Dataverse. I'm not sure if it's supported.
I am testing with two stores: one s3 and one file store. I use JVM_ARGS in my docker-compose like this:
environment:
JVM_ARGS:
-Ddataverse.timerServer=true
-Xmx24g
-Xms4g
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxMetaspaceSize=2g
-XX:MetaspaceSize=256m
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+DisableExplicitGC
-Ddataverse.files.s3.upload-out-of-band=true
-Ddataverse.api.allow-incomplete-metadata=true
-Ddataverse.ui.allow-review-for-incomplete=false
-Ddataverse.ui.show-validity-filter=true
-Ddataverse.files.directory=/data
-Ddataverse.files.uploads=/uploads
-Ddataverse.files.s3.type=s3
-Ddataverse.files.s3.label=s3
-Ddataverse.files.s3.bucket-name=dataverse-local
-Ddataverse.files.s3.download-redirect=true
-Ddataverse.files.s3.upload-redirect=true
-Ddataverse.files.s3.path-style-access=true
-Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
-Ddataverse.files.s3.custom-endpoint-region=us-east-1
-Ddataverse.files.s3.disable-tagging=true
-Ddataverse.files.file.type=file
-Ddataverse.files.file.label=file
-Ddataverse.files.file.directory=/data
Notice the space on the end of each line (otherwise it will not work).
This is the output from listing the storage:
{
"status": "OK",
"data": {
"s3": "s3",
"file": "file"
}
}
It looks like Zulip strips the spaces, but there is space at the end of each line, before the end-of-line character, it functions as a separator for the jvm args.
I also mount the aws secrets and config like this:
volumes:
- /path/to/aws/config/.aws:/opt/payara/.aws
-...
If you are interested, this is my entire image description:
dataverse:
image: ${DATAVERSE_IMAGE_TAG}
hostname: dataverse
user: payara
networks:
- app_net
- db_net
ports:
- "7005:4848"
- "7008:8080"
environment:
TZ: "Europe/Brussels"
DATAVERSE_DB_HOST: db
DATAVERSE_DB_PORT: 5432
DATAVERSE_DB_USER: ${DB_USER}
DATAVERSE_DB_NAME: ${DB_NAME}
ENABLE_JDWP: 0
DATAVERSE_MAIL_HOST: ${MAIL_SERVICE_HOST}
DATAVERSE_MAIL_FROM: ${MAIL_SENDER}
dataverse_fqdn: ${FQDN}
dataverse_siteUrl: ${DATAVERSE_URL}
dataverse_files_storage__driver__id: ${DRIVER_ID}
db_SystemEmail: ${MAIL_SENDER_NAME}
CUSTOM_INSTALL: "/opt/payara/custominstall"
API_DEBUG: "false"
LANG_DIR: "/languages"
STORAGE_DIR: "/data"
SECRETS_DIR: "/run/secrets"
DUMPS_DIR: "/dumps"
UPLOAD_DIR: "/uploads"
MP_CONFIG_PROFILE: ${RDM_STAGE}
JVM_ARGS:
-Ddataverse.timerServer=true
-Xmx24g
-Xms4g
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxMetaspaceSize=2g
-XX:MetaspaceSize=256m
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+DisableExplicitGC
-Ddataverse.files.s3.upload-out-of-band=true
-Ddataverse.api.allow-incomplete-metadata=true
-Ddataverse.ui.allow-review-for-incomplete=false
-Ddataverse.ui.show-validity-filter=true
-Ddataverse.files.directory=/data
-Ddataverse.files.uploads=/uploads
-Ddataverse.files.s3.type=s3
-Ddataverse.files.s3.label=s3
-Ddataverse.files.s3.bucket-name=dataverse-local
-Ddataverse.files.s3.download-redirect=true
-Ddataverse.files.s3.upload-redirect=true
-Ddataverse.files.s3.path-style-access=true
-Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
-Ddataverse.files.s3.custom-endpoint-region=us-east-1
-Ddataverse.files.s3.disable-tagging=true
-Ddataverse.files.file.type=file
-Ddataverse.files.file.label=file
-Ddataverse.files.file.directory=/data
-Ddataverse.solr.host=solr
-Ddataverse.mail.cc-support-on-contact-email=rdr@kuleuven.be
volumes:
- /etc/localtime:/etc/localtime:ro
- ./config/dataverse:/opt/payara/custominstall
- ./config/dataverse/set_language.sh:/opt/payara/scripts/init.d/set_language.sh
- ./config/dataverse/change-admin-password.sh:/opt/payara/scripts/init.d/change-admin-password.sh
- ${DV_FILES}:/data
- ${DV_DUMPS}:/dumps
- ${DV_UPLOADS}:/uploads
- ${DV_LANG_DIR}:/languages
- ${DOCROOT}:/opt/payara/appserver/glassfish/domains/domain1/docroot
- ${SECRETS}:/run/secrets
- ${SECRETS}/microprofile:/secrets
- ${DV_PATH}/aws:/opt/payara/.aws
depends_on:
- proxy
- db
- index
restart: unless-stopped
privileged: false
I use a standard image built with maven. I think that the interesting script is the one that changes the admin password (you see it mounted in volumes):
#!/usr/bin/env bash
# Check and load secrets
if [ ! -s "${SECRETS_DIR}/admin/password" ]; then
echo "No admin password present. Failing."
exit 126
fi
ADMIN_PASSWORD=$(cat "${SECRETS_DIR}/admin/password")
echo "AS_ADMIN_PASSWORD=admin" > /tmp/password-change-file.txt
echo "AS_ADMIN_NEWPASSWORD=${ADMIN_PASSWORD}" >> /tmp/password-change-file.txt
echo "AS_ADMIN_PASSWORD=${ADMIN_PASSWORD}" >> ${PASSWORD_FILE}
asadmin --user=${ADMIN_USER} --passwordfile=/tmp/password-change-file.txt change-admin-password --domain_name=${DOMAIN_NAME} || true
rm /tmp/password-change-file.txt
@Eryk Kulikowski that actually works?!? JVM_ARGS?!? @Oliver Bertuch have you seen this?
@Eryk Kulikowski what is the value of ${DATAVERSE_IMAGE_TAG} please?
It works, you can pass any JVM/microprofile settings like this, just remember to add a space ( " ") after each argument. Otherwise, it will be sticking together. You can check if it worked in your server log when using it:
[Entrypoint] running /opt/payara/scripts/startInForeground.sh in foreground
Executing Payara Server with the following command line:
/opt/java/openjdk/bin/java
-cp
/opt/payara/appserver/glassfish/domains/domain1/lib/ext/*:/opt/payara/appserver/glassfish/modules/glassfish.jar
-Ddataverse.timerServer=true
-Xmx24g
-Xms4g
-XX:+HeapDumpOnOutOfMemoryError
-XX:MaxMetaspaceSize=2g
-XX:MetaspaceSize=256m
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:+DisableExplicitGC
-Ddataverse.files.s3.upload-out-of-band=true
-Ddataverse.api.allow-incomplete-metadata=true
-Ddataverse.ui.allow-review-for-incomplete=false
-Ddataverse.ui.show-validity-filter=true
-Ddataverse.files.directory=/data
-Ddataverse.files.uploads=/uploads
-Ddataverse.files.s3.type=s3
-Ddataverse.files.s3.label=s3
-Ddataverse.files.s3.bucket-name=dataverse-local
-Ddataverse.files.s3.download-redirect=true
-Ddataverse.files.s3.upload-redirect=true
-Ddataverse.files.s3.path-style-access=true
-Ddataverse.files.s3.custom-endpoint-url=https://rdmo.icts.kuleuven.be
-Ddataverse.files.s3.custom-endpoint-region=us-east-1
-Ddataverse.files.s3.disable-tagging=true
-Ddataverse.files.file.type=file
-Ddataverse.files.file.label=file
-Ddataverse.files.file.directory=/data
-Ddataverse.solr.host=solr
-Ddataverse.mail.cc-support-on-contact-email=rdr@kuleuven.be
-Ddataverse.lang.directory=/languages
-XX:+UnlockDiagnosticVMOptions
...
-Dfelix.fileinstall.bundles.startTransient=true
-Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory
...
-Djava.library.path=/opt/payara/appserver/glassfish/lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
com.sun.enterprise.glassfish.bootstrap.ASMain
/opt/payara/config/pre-boot-commands.asadmin
-upgrade
false
-read-stdin
true
-postbootcommandfile
/opt/payara/config/post-boot-commands.asadmin
-domainname
domain1
-domaindir
/opt/payara/appserver/glassfish/domains/domain1
-asadmin-args
--host,,,localhost,,,--port,,,4848,,,--user,,,admin,,,--passwordfile,,,/opt/payara/passwordFile,,,--secure=false,,,--terse=false,,,--extraterse=false,,,--echo=false,,,--interactive=false,,,--autoname=false,,,start-domain,,,--verbose=false,,,--watchdog=false,,,--debug=false,,,--domaindir,,,/opt/payara/appserver/glassfish/domains,,,domain1
-instancename
server
-type
DAS
-verbose
false
-asadmin-classpath
/opt/payara/appserver/glassfish/lib/client/appserver-cli.jar
-debug
false
-asadmin-classname
com.sun.enterprise.admin.cli.AdminMain
-watchdog
false
Launching Payara Server on Felix platform
Oct 18, 2023 2:02:14 PM com.sun.enterprise.glassfish.bootstrap.osgi.BundleProvisioner createBundleProvisioner
INFO: Create bundle provisioner class = class com.sun.enterprise.glassfish.bootstrap.osgi.BundleProvisioner.
Registered com.sun.enterprise.glassfish.bootstrap.osgi.EmbeddedOSGiGlassFishRuntime@7c581736 in service registry.
Reading in commandments from /opt/payara/config/pre-boot-commands.asadmin
@Philip Durbin
We use our own LIBIS registry for docker images. We build them with rdm-build https://github.com/libis/rdm-build
Check this script https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dv.sh
It uses env variables from here: https://github.com/libis/rdm-build/blob/main/env.dev
Resulting image tag is: docker.io/rdm/dataverse-stock:1.5
I add ruby to that in https://github.com/libis/rdm-build/blob/main/images/dataverse/Dockerfile
The final image tag is docker.io/rdm/dataverse:1.5
Where docker.io becomes our own LIBIS registry when building for production (STAGE=prod).
This particular image is still in test, not yet in pilot or production.
My specific branch build script: https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dev_dv.sh
The other one uses the tagged version it checks out (notice that you download specific maven and java in the script, but it uses your default maven registry): https://github.com/libis/rdm-build/blob/main/images/dataverse/build_dv.sh
@Eryk Kulikowski what are you doing in an hour? Would you like to join the Containerization Working Group meeting? :grinning: https://ct.gdcc.io
As a compose file is YAML, you might try using JVM_ARG: >. See also https://yaml.org/spec/1.2.2/#line-folding
(So you should not need to pay attention about the spaces at the end)
@Eryk Kulikowski I'm glad to see you are basing your work on the WG outputs! Glad it's being reused. Looks like we made it customizable enough.
FWIW @Eryk Kulikowski, using Xmx, Xms etc in a container context is not considered good practice. Did you see the env vars for memory management in the tunables table? http://preview.guides.gdcc.io/en/develop/container/base-image.html#tunables You can easily asign the 24g of RAM defining the container limits in the compose definition, by default 70% of that will be allocated as heap.
@Oliver Bertuch Thanks! I will check it out!
@Philip Durbin where I'm at: I have the basics in place, but next to read in both buckets' settings and tell Ansible to provision them within the container: https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/327_multiple_s3_stores/tests/group_vars/jenkins.yml#L336
One of my questions was whether you want want the S3 testing to provision said LocalStack buckets, but that would break testing on S3 proper. Any further direction/thoughts from you and/or @Oliver Bertuch at this point most welcome.
@Don Sizemore hmm, would it be a permanent break on testing S3 proper? Or more that if we someday want to test both S3 and LocalStack that we'd need to do further refactoring on the Ansible side?
@Deirdre Kirmis if your ears were burning a couple hours ago, we were trying to remember what flavor of S3 you use. Something on Dell, I think.
we use aws s3 .. we tried for about a year to use Dell Isilon S3 for one of our stores but never got it working =)
Oh. Bummer!
Deirdre Kirmis said:
we use aws s3 .. we tried for about a year to use Dell Isilon S3 for one of our stores but never got it working =)
Have you ever tried to front the Isilon with Minio? I will probably do sth like that with our internal S3 to enable direct up/download. https://min.io/docs/minio/linux/administration/object-management/transition-objects-to-s3.html
i think there is actually a fix to dataverse now (it was a bug in apache on the isilon server) .. but we decided to go all in on aws
i did try fronting it with minio .. i need to look back at my notes to see how that went .. i think i ran into some problems there too
but i also don't exactly always know what i'm doing :grinning_face_with_smiling_eyes:
Welcome to the club, here's your member id :identification_card: :see_no_evil:
:sweat_smile: always fun to test things out!
@Philip Durbin Ansible can create and configure the buckets in LocalStack, but if for some/any reason Dataverse wanted to, it could? I don't think you want this to be the case, but just checking.
@Don Sizemore I think I'm a little slow today but I think we're on the same page. All good. :grinning:
After today's meeting I was all excited to try the magic trick with store names that don't have dashes in them.
I tried this:
- dataverse_files_storage__driver__id=file1
- dataverse_files_file1_type=file
- dataverse_files_file1_label=Filesystem
- dataverse_files_file1_directory=${STORAGE_DIR}/store
- dataverse_files_s3c_type=s3
- dataverse_files_s3c_label=Real S3
- dataverse_files_s3c_custom__endpoint__url=s3.us-east-2.amazonaws.com
- dataverse_files_s3c_custom__endpoint__region=us-east-2
- dataverse_files_s3c_bucket__name=pdurbin
- dataverse_files_s3c_upload__redirect=true
- dataverse_files_s3c_access__key=REDACTED
- dataverse_files_s3c_secret__key=REDACTED
However, when I list the stores, I still only see the one:
{
"status": "OK",
"data": {
"Filesystem": "file1"
}
}
@Oliver Bertuch ^^
I guess I'll try JVM_ARGS next.
[ERROR] Failed to execute goal io.fabric8:docker-maven-plugin:0.43.4:run (default-cli) on project dataverse: Execution default-cli of goal io.fabric8:docker-maven-plugin:0.43.4:run failed: while scanning a simple key
[ERROR] in 'reader', line 23, column 9:
[ERROR] -Ddataverse.files.storage-driver ...
[ERROR] ^
[ERROR] could not find expected ':'
[ERROR] in 'reader', line 24, column 9:
[ERROR] -Ddataverse.files.file1.type=file
[ERROR] ^
... when using this file:
Screenshot-2023-10-19-at-4.25.49-PM.png
Ah ha! I think (I hope) I got it.
Nope. :very_angry:
Weird! I sort of got it working?!? But I have three stores when I expected two:
{
"status": "OK",
"data": {
"RealS3": "s3c",
"Local": "local",
"Filesystem": "file1"
}
}
Here's my file:
file1 and s3c I expect. I'm not sure where local is coming from. :thinking:
Just capturing some commands and output that @Don Sizemore helped me figure out to prove to myself that we can upload a test file from the Dataverse/Payara container to the Localstack container:
payara@dataverse:~$ echo -e '[default]\nregion = us-east-2' > ~/.aws/config
payara@dataverse:~$ cat ~/.aws/config
[default]
region = us-east-2
payara@dataverse:~$ echo -e '[default]\naws_access_key_id = default\naws_secret_access_key = default' > ~/.aws/credentials
payara@dataverse:~$ cat ~/.aws/credentials
[default]
aws_access_key_id = default
aws_secret_access_key = default
payara@dataverse:~$
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 ls
2023-10-23 18:02:38 mybucket
payara@dataverse:~$
payara@dataverse:~$ echo foo > test.txt
payara@dataverse:~$ cat test.txt
foo
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 cp test.txt s3://mybucket/
upload: ./test.txt to s3://mybucket/test.txt
payara@dataverse:~$ aws --endpoint-url=http://localstack:4566 s3 ls s3://mybucket/
2023-10-23 18:13:23 4 test.txt
payara@dataverse:~$
But we still can't upload files from Dataverse itself to LocalStack. Here's the config I'm using:
-Ddataverse.files.localstack1.type=s3
-Ddataverse.files.localstack1.label=LocalStack
-Ddataverse.files.localstack1.custom-endpoint-url=http://localstack:4566
-Ddataverse.files.localstack1.custom-endpoint-region=us-east-2
-Ddataverse.files.localstack1.bucket-name=mybucket
-Ddataverse.files.localstack1.upload-redirect=false
-Ddataverse.files.localstack1.access-key=default
-Ddataverse.files.localstack1.secret-key=default
@Oliver Bertuch any ideas for me? I think I might try MinIO instead.
Ah ha! For Minio, at least, I needed this:
-Ddataverse.files.minio1.path-style-access=true
As mentioned in our docs: https://guides.dataverse.org/en/6.0/installation/config.html#reported-working-s3-compatible-storage
@Don Sizemore please remind me, do you have a preference for LocalStack or MinIO? I'm certainly happy to see if the trick above works with LocalStack.
Philip Durbin said:
Don Sizemore please remind me, do you have a preference for LocalStack or MinIO? I'm certainly happy to see if the trick above works with LocalStack.
I really don't. LocalStack sounds better for testing overall; MinIO sounds better for sites like ASU who use Dell MinIO. Also because I have to keep correcting myself when I type Minion.
Well, the path-style-access=true trick does seem to work for LocalStack.
I'm testing both.
LocalStack is finickier about bucket creation. I've only had luck creating the bucket with awslocal s3 mb s3://mybucket (rather than Java).
I made a draft pull request! add S3 tests, LocalStack, MinIO #6783 #10044
Feedback very welcome, of course!
@Philip Durbin sorry a bit late, as you have figured it out with the S3 properties.
I have used the following way to specify the properties (just as environment variables on the container):
dataverse:
image: gdcc/dataverse:unstable
container_name: dataverse
hostname: dataverse
user: payara
labels:
com.platys.name: dataverse
com.platys.webui.title: Dataverse UI
com.platys.webui.url: http://dataplatform:28294
com.platys.restapi.title: Dataverse UI
com.platys.restapi.url: http://dataplatform:28294/api/info/metrics/dataverses
ports:
- 28294:8080
- 4848:4848
- 9009:9009 # JDWP
- 8686:8686 # JMX
environment:
- DATAVERSE_FQDN=${PUBLIC_IP}
- _CT_DATAVERSE_SITEURL=http://${PUBLIC_IP}:28294
- DATAVERSE_DB_HOST=dataverse-postgresql
- DATAVERSE_DB_USER=dataverse
- DATAVERSE_DB_PASSWORD=abc123!
- DATAVERSE_MAIL_HOST=mailpit
- DATAVERSE_MAIL_FROM=dataverse@localhost
- _CT_DATAVERSE_SOLR_HOST=dataverse-solr
- DATAVERSE_SOLR_PORT=8983
- ENABLE_JMX=0
- ENABLE_RELOAD=0
- ENABLE_JDWP=1
- dataverse_files_s3_type=s3
- dataverse_files_s3_label="Object Storage"
- dataverse_files_s3_bucket__name="dataverse-bucket"
- dataverse_files_s3_custom__endpoint__url=http://${EADP_IP}:9000
- dataverse_files_s3_custom__endpoint__region='us-east-1'
- dataverse_files_s3_path__style__access=True
- dataverse_files_s3_access__key=${PLATYS_AWS_ACCESS_KEY:?PLATYS_AWS_ACCESS_KEY must be set either in .env or as an environment variable}
- dataverse_files_s3_secret__key=${PLATYS_AWS_SECRET_ACCESS_KEY:?PLATYS_AWS_SECRET_ACCESS_KEY must be set either in .env or as an environment variable}
@Guido Schmutz right, I'm able to configure a single file store like that, with environment variables, but when I tried to configure multiple stores, it didn't work. If you get a chance to try this, please let me know! Otherwise, I'll wait for a fix for #9998.
ok, I see, you configure multiple stores, challenge accepted ;-) ... took me a while to figure it out, but now it works
dataverse:
image: gdcc/dataverse:unstable
container_name: dataverse
hostname: dataverse
user: payara
labels:
com.platys.name: dataverse
com.platys.webui.title: Dataverse UI
com.platys.webui.url: http://dataplatform:28294
com.platys.restapi.title: Dataverse UI
com.platys.restapi.url: http://dataplatform:28294/api/info/metrics/dataverses
ports:
- 28294:8080
- 4848:4848
- 9009:9009 # JDWP
- 8686:8686 # JMX
environment:
- DATAVERSE_FQDN=${PUBLIC_IP}
- _CT_DATAVERSE_SITEURL=http://${PUBLIC_IP}:28294
- DATAVERSE_DB_HOST=dataverse-postgresql
- DATAVERSE_DB_USER=dataverse
- DATAVERSE_DB_PASSWORD=abc123!
- DATAVERSE_MAIL_HOST=mailpit
- DATAVERSE_MAIL_FROM=dataverse@localhost
- _CT_DATAVERSE_SOLR_HOST=dataverse-solr
- DATAVERSE_SOLR_PORT=8983
- ENABLE_JMX=0
- ENABLE_RELOAD=0
- ENABLE_JDWP=1
- dataverse_files_storage__driver__id=s3
- dataverse_files_file1_type=file
- dataverse_files_file1_label="Local Filesystem"
- dataverse_files_file1_directory=${STORAGE_DIR}/store
- dataverse_files_s3_type=s3
- dataverse_files_s3_label="Object Storage"
- dataverse_files_s3_bucket__name="dataverse-bucket"
- dataverse_files_s3_custom__endpoint__url=http://${EADP_IP}:9000
- dataverse_files_s3_custom__endpoint__region='us-east-1'
- dataverse_files_s3_path__style__access=True
- dataverse_files_s3_access__key=${PLATYS_AWS_ACCESS_KEY:?PLATYS_AWS_ACCESS_KEY must be set either in .env or as an environment variable}
- dataverse_files_s3_secret__key=${PLATYS_AWS_SECRET_ACCESS_KEY:?PLATYS_AWS_SECRET_ACCESS_KEY must be set either in .env or as an environment variable}
volumes:
- ./data-transfer:/data-transfer
restart: unless-stopped
$ curl -s -X GET -H 'X-Dataverse-key: REDACTED' 'http://192.168.116.137:28294/api/admin/dataverse/storageDrivers' | jq
{
"status": "OK",
"data": {
"Object Storage": "s3",
"Local Filesystem": "file1"
}
}
Guess it was the space in the label (- dataverse_files_s3c_label=Real S3) in your case, which caused an error.
Ah, right. I caught that later, the space problem, in the context of the JVM_ARGS. I'll try it again. Thanks!
I set up S3 testing at https://github.com/gdcc/api-test-runner/actions/workflows/s3.yml
After some fiddling, it seems to work fine.
So my thought is that once we merge #10044, we'll copy the config I added over to the "develop" and "manual" jobs. (And eventually the "alpha" job, once we make a release.)
@Don Sizemore heads up that I just added test for direct download (and non-direct download): https://github.com/IQSS/dataverse/pull/10044/commits/c2d8ae54a98b850f6ac9f94fa799b36bf8acaa6c
Here's the change I made on the API test runner side: https://github.com/gdcc/api-test-runner/commit/02c815908480e571d28d8d47ccbc4456979b377e
I just kicked off https://github.com/gdcc/api-test-runner/actions/runs/6800385363 to test it.
Philip Durbin said:
Here's the change I made on the API test runner side: https://github.com/gdcc/api-test-runner/commit/02c815908480e571d28d8d47ccbc4456979b377e
I can tell Dataverse-Ansible not to template ~dataverse/.aws/credentials (and config) though it gets severely ticked when I rename it. Is Dataverse preferring global AWS creds to datastore-defined credentials expected, and do we want to open an issue to change this order?
Good questions. I'm not sure. :sweat_smile:
maybe a question for tech hours this afternoon?
yes, excellent idea
I mean, I _can_ nuke that file just for testing runs, but that won't address any installation with multiple datastores who expect / need a global credential file as well.
right, it sure smells like a bug
ah. I remember now. Jim's PR corrects credentials preference when RBAC is involved. Won't affect conf file vs. jvm-option precedence.
Meanwhile, following the tweaks I just made to the tests (changing the access and secret keys to match Jenkins), the first manual run on the API test runner failed but I'm hoping it's transient. I kicked off a second run.
Bah, the second run failed with the same error:
Error: S3AccessIT.setUp:66 ยป SdkClient Unable to execute HTTP request: Connect to s3.localhost.localstack.cloud:4566 [s3.localhost.localstack.cloud/127.0.0.1] failed: Connection refused
I left my instance up from this morning if you want to test anything further. If not, I'll kill it to save y'all $$$
sounds like we want Jim's PR merged and then everything should start to work (at least WRT S3 creds). Ima kill this morning's EC2 instance.
That's fine. When do we merge the Ansible branch?
Philip Durbin said:
That's fine. When do we merge the Ansible branch?
I'd say we merge Jim's PR, then merge that into your S3 testing branch, let me test manually once more, and we'll see if things run cleanly? (they should)
Sounds good, and meanwhile, I'll try to figure out while I'm getting that "s3.localhost.localstack.cloud:4566... Connection refused" error, which JP points out is the same he saw when not on VPN. Makes no sense to me. :sweat_smile:
S3 priorities has been merged! :tada: #10004
Meanwhile, I'm still looking at the connection refused error. It's when I list the buckets, which is optional, so I commented that out to see if/when LocalStack fails later. Here's the run: https://github.com/gdcc/api-test-runner/actions/runs/7116920717
Ok, I think I figured out the problem. I hope so. Re-running tests now.
Phew, yes, passing now: https://github.com/gdcc/api-test-runner/actions/runs/7118949044
next up: Jenkins!
I had confused myself with the api-test-runner. For now, until we merge, I set up a dedicated S3 job. Once we merge I'll put that extra config in the "develop" job. And once we release, into the "alpha" job.
The config basically spins up LocalStack and MinIO and tells Dataverse to use them.
docker compose stuff
like I'm using locally, a slight variation on that
@Don Sizemore I'm looking now at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/pull/337 . Thanks!! :heart:
This is very exciting. So Docker-y!
Speaking about Doker, I broke their website this morning and I had to share their error page
We need that icon :laughter_tears:
@Don Sizemore approved! Thank you!!
All tests are passing! Including the new S3 tests! https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10044/17/testReport/edu.harvard.iq.dataverse.api/S3AccessIT/
And now I can see code coverage! 35% for S3AccessIO! https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10044/17/execution/node/3/ws/target/coverage-it/edu.harvard.iq.dataverse.dataaccess/S3AccessIO.html
Could be better but I'm happy it's higher than what we had with only unit tests, 7%.
Last updated: Oct 30 2025 at 05:14 UTC