Stream: community

Topic: Globus and S3 Glacier


view this post on Zulip Philip Durbin 🚀 (Nov 05 2025 at 14:56):

Just announced: https://www.globus.org/blog/globus-announces-support-for-amazon-s3-glacier-storage-classes

view this post on Zulip Philip Durbin 🚀 (Nov 05 2025 at 14:57):

@Deirdre Kirmis have you seen this? How does it grab you?

view this post on Zulip Deirdre Kirmis (Nov 05 2025 at 16:31):

Oh man, super cool. I’m curious how this could fit into the current Dataverse–Globus setup and what tweaks might be needed to make it work smoothly. I think the main thing to consider would be the retrieval costs and wait times when files are restored from Glacier .. there would be a tradeoff between saving money on storage and paying more (and waiting longer) when you need the data back. It’d take some planning to figure out which datasets can live in colder storage, but if it’s set up right, it could lead to some pretty big cost savings over time.

We currently have our installation configured to upload to Standard storage and then move to Intelligent Tiering. We were transitioning to Glacier but found that we were incurring a lot of retrieval costs that we weren't expecting. Just looking to see if the file has changed causes a retrieval cost and we didn't realize that. We really want to figure out how to utilize Glacier for our file storage somewhere, but need to completely understand how the fees work and when to use it.

I would be totally interested in having more discussions about this if anyone else is interested. We are in the process of working with AWS to analyze our workflows on this and determining how we can better utilize the storage classes and this fits right in with that thought process.

view this post on Zulip Leo Andreev (Nov 10 2025 at 23:36):

Hi Deirdre,
I was away Thu.-Fri., so didn't get to chime in sooner.

I’m curious how this could fit into the current Dataverse–Globus setup

My understanding is that this would give you an extra storage option very similar to what we here at HDV have been offering to users with Globus-accessible tape storage at NESE (https://nese.mghpcc.org/). Specifically, a storage tier that, on the one hand, comes with all sorts of limitations - Globus only, no "classic" api access (unlike your current Globus use case where the files end up in the same S3 buckets that Dataverse uses otherwise, and are accessible via /api/access/), the data may not be instantly available, and the workflows are less user-friendly overall. On the other hand, it is dirt cheap, compared to AWS S3, and therefore makes handling truly "large" data - Terabytes - possible.
From a quick look, the cost of Glacier is roughly the same as what we pay for the tape storage here.

view this post on Zulip Leo Andreev (Nov 10 2025 at 23:57):

(What you mentioned about incurring retrieval costs is significant though, and may indeed make the final costs less appealing than the deal we got with NESE, since that comes with no egress fees - at least as of now)

view this post on Zulip Leo Andreev (Nov 10 2025 at 23:59):

... and what tweaks might be needed to make it work smoothly

In theory at least, no special tweaks should be needed on the Dataverse side. As long as this storage is made Globus-accessible, as a Globus collection, you should be able to just create a globus-type storage volume in Dataverse, and it'll just work, like any other Globus-accessible storage. It will need to be configured as a "not-accessible-by-dataverse" volume, with something like

<jvm-options>-Ddataverse.files.globusGlacier.type=globus</jvm-options>
<jvm-options>-Ddataverse.files.globusGlacier.managed=true</jvm-options>
<jvm-options>-Ddataverse.files.globusGlacier.files-not-accessible-by-dataverse=true</jvm-options>

How to create and set up this Glacier volume on the S3 Connector/Globus side, I don't know (unlike you, we do not use Globus with S3 at all here). It is entirely possible that something unexpected may be encountered when attempting to set this up in real life, andmay in fact require some tweaking in the Dataverse code. One way to find out ...

view this post on Zulip Leo Andreev (Nov 12 2025 at 17:48):

One thing I should've noticed sooner in the blog announcement is that their language is focused on "downloading" and "restoring archived S3 objects". In other words, they are most likely not offering support for transferring data directly to Glacier via Globus, bypassing placing it into S3 (as I initially assumed). That would still be very useful potentially, as it would keep objects Globus-accessible even after they are moved to tiered storage or Glacier.

view this post on Zulip Deirdre Kirmis (Nov 13 2025 at 18:56):

Hi Leo!
Thanks so much for all of the info! We are examining our storage/stores currently and trying to start working on a Large Data Support plan similar to Harvard's. But we don't have access to the NESE or tape storage (maybe AWS) so are looking at other alternatives and I was hoping this could help, as you mentioned, or be an option (after considering retrievals). We are using Globus and I have stores configured to use it, so I am making a plan to try to test this out, if possible. I think we would need to identify datasets that are not really utilized very often, or maybe this would just be for datasets that we are "archiving"?

However I see that you pointed out that this new option likely won't be for transferring directly to Globus .. but for downloading or restoring to standard .. so then how would that work with configured globus stores? haha sorry for my confusion!

view this post on Zulip Leo Andreev (Nov 17 2025 at 22:17):

There's a good chance I'm even more confused.
But I believe one scenario would be a Dataverse Globus volume configured exactly the same as your current one(s). But on the S3 side the bucket would be configured to move everything to Glacier automatically. Once moved, these files will still be available via Globus, because it can now handle it transparently.
For the kind of data you mentioned, something that's not expected to be utilized/downloaded very often, this could be a low cost solution, even with the retrieval fees.
The S3 space in this setup is only used for temporary staging.


Last updated: Jan 09 2026 at 14:18 UTC