Stream: community

Topic: XR Dataverse


view this post on Zulip Charity Everett (May 08 2025 at 16:45):

Hello!

We (@robert christie and I) were able to get an initial brainstorming session done on the Dataverse XR project, and we are so psyched to work on this. The more the merrier of course, but I wanted to share what we have so far. Please offer feedback, questions, comments, concerns, emojis (in a pinch lol). Hope to see you in the Dataverse!
4.png
1.png
2.png
3.png

view this post on Zulip robert christie (May 08 2025 at 22:00):

I think this will be a fun project!
My understanding of the plan so far is to create an immersive, space themed 3D visualization of all of the datasets in the Dataverse that organizes datasets into ‘galaxies’ of similar datasets. Hopefully we can then use this visualization to build some stories and experiences that allow people to embark on a voyage through the Dataverse!
We have just started trying to make a layout for the visualization by embedding text describing each dataset and using dimensionality reduction techniques to create 2D & 3D scatterplots.
scatterplot.png

view this post on Zulip Charity Everett (May 08 2025 at 22:17):

I think that's a pretty great explanation of where we are so far! Thank you Robert! I would like to add that we also considered there being an onboarding process with an avatar (could be AI enabled) to help guide you through the Dataverse interface, and even possibly your searching and exploration. If anybody has any questions, we're all ears. :ear:

view this post on Zulip Charity Everett (May 12 2025 at 16:58):

:milky_way: Dataverse in WebXR: Help Us Build the Galaxy! :milky_way:
XRDataverseGIF.gif

We’re thrilled to share a first look at our experimental WebXR “space” version of the Dataverse-a new way to explore datasets as if navigating a galaxy of research. Check it out here: http://dataverse-viewer.s3-website-us-east-1.amazonaws.com/ .

This early prototype is the result of a fantastic collaboration with [@Robert Christie], who has done an incredible job translating this vision into an immersive, intuitive experience. Imagine each star as a dataset, ready to be discovered!

We’d love for more of you to get involved!
Whether you’re interested in UX, data visualization, metadata, or just want to brainstorm wild ideas, your expertise can help shape where this project goes next.

Drop a comment if you’re curious, want to test, or have ideas-we’re just getting started!

view this post on Zulip Charity Everett (May 14 2025 at 23:08):

I hope you are well!

ai-assistant-xr-dataverse (1).gif

We've been working on the AI Assistant (tentatively names MILKY), and we want to see if we can get it hooked up with chat.dataverse.org as the knowledgebase to start out. How would we go about that? I don't see how I can chat with it right now, was it taken offline for some reason?

Fun fact: MILKY looks wherever you point your cursor.

We're having so much fun with this! Can't wait to show off where we're headed next! Want to join us? The more the merrier!

@Philip Durbin ☀️

view this post on Zulip robert christie (May 14 2025 at 23:11):

Is there a way to get a bulk database extract? Currently I have been using the dataverse API to slowly collect metadata for the datasets.

view this post on Zulip Charity Everett (May 14 2025 at 23:11):

robert christie said:

Is there a way to get a bulk database extract? Currently I have been using the dataverse API to slowly collect metadata for the datasets.

@Philip Durbin ☀️

view this post on Zulip Philip Durbin 🚀 (May 15 2025 at 11:01):

@Slava Tykhonov has asked about this as well. It's from October but he published metadata from the Harvard and DANS installations of Dataverse. Does that help?

view this post on Zulip Charity Everett (May 15 2025 at 11:39):

Is the chat available for use? I don't see how from the public page.

view this post on Zulip Philip Durbin 🚀 (May 15 2025 at 13:06):

There is some archived chat here:

view this post on Zulip Philip Durbin 🚀 (May 15 2025 at 13:07):

For Zulip, I think we need to download messages from the API. I've also considered setting up https://github.com/zulip/zulip-archive which would automate this.

view this post on Zulip Philip Durbin 🚀 (May 15 2025 at 13:19):

I just asked about it here https://chat.zulip.org/#narrow/channel/137-feedback/topic/Zulip.20as.20a.20knowledge.20base.20for.20AI/near/2174469

view this post on Zulip Philip Durbin 🚀 (May 15 2025 at 15:34):

If it helps, here's how Onyx (formerly Danswer) pulls in messages from Zulip: https://github.com/onyx-dot-app/onyx/pull/247

view this post on Zulip Charity Everett (May 15 2025 at 22:32):

@Philip Durbin 🐼 Ok, I understand. I thought it was a chatbot.

view this post on Zulip Philip Durbin 🚀 (May 16 2025 at 02:16):

I'm probably just confusing things. :sweat_smile:

https://chat.dataverse.org is just a static page that we use to direct people here (to Zulip, to https://dataverse.zulipchat.com )

We do have an AI prompt at https://ask.dataverse.org built by @Slava Tykhonov

Maybe I'm not sure what you mean by chatbot. :thinking: :sweat_smile:

view this post on Zulip robert christie (May 16 2025 at 15:34):

Ah, of course! I should have checked the dataverse for dataverse data!

view this post on Zulip Charity Everett (May 16 2025 at 17:20):

The ask.datavers.org is the closest to a chatbot that you have, but I don't think it is trained to answer a very wide range of questions based on a couple of chats I had with it @Slava Tykhonov I could be wrong on this, please correct me if so. I think we're probably going to have to use the dataverse itself as a knowledgebase for an intelligent assistant. Just curious, is the ask.dataverse repo public? Also, if I remember correctly it utilizes a Hugging Face LLM, correct?

view this post on Zulip Slava Tykhonov (May 16 2025 at 18:38):

No, Ask service is own thing, you can watch my presentation to know how it works: https://harvard.zoom.us/rec/play/7BL5Rclaf31VsCLM1u9uI7PacYelA2ZKWMhtOBgvrHUTBomJSWu8y2MaeDjTm3OGmu0QArgtnRV5pkf5.KwK27HL6N8n9pCxp?accessLevel=meeting&canPlayFromShare=true&from=share_recording_detail&continueMode=true&componentName=rec-play&originRequestUrl=https%3A%2F%2Fharvard.zoom.us%2Frec%2Fshare%2FbOizatNdMdxINRCnqpt87fPITPvsDWTv3ysvA8kIaEE4wnmZPSeSUkdmpKYP1ooA.rKoNMqED_L8KtHOi

view this post on Zulip Philip Durbin 🚀 (May 16 2025 at 18:39):

And the code isn't public, right?

view this post on Zulip Slava Tykhonov (May 16 2025 at 18:40):

I have the whole Harvard Dataverse knowledge base in the triple store https://triples.now.museum/default

view this post on Zulip Slava Tykhonov (May 16 2025 at 18:40):

Source code isn't public yet but should be soon. Complex negotiations

view this post on Zulip Slava Tykhonov (May 16 2025 at 18:47):

And yes, it does semantic indexing on demand when someone is adding new page. So no "training" in the traditional sense, vectors go to the vector store.

view this post on Zulip robert christie (May 16 2025 at 21:54):

The chat is very cool! The point cloud we are working on is also using embeddings that represent each dataset. Embeddings are reduced into a 3 dimensional representation for viewing purposes with UMAP. I would be curious to hear your strategy for generating the embeddings for each dataset. One issue I have been running into is the inconsistent contents of dataverse metadata- some datasets have nice descriptions while others just name papers or have links to other documents. We have been creating a text description from the metadata that is embedded with nomic embed text v2.

view this post on Zulip Slava Tykhonov (May 16 2025 at 22:29):

Thanks, we did another presentation on this recently in Berlin, it's also online https://www.youtube.com/watch?v=5Rxt09tSJv8

view this post on Zulip Philip Durbin 🚀 (May 16 2025 at 23:03):

@robert christie @Charity Everett out of curiosity, are you looking at datasets in just Harvard Dataverse or other installations as well? There's a list at https://iqss.github.io/dataverse-installations/data/data.json

view this post on Zulip Charity Everett (May 17 2025 at 01:53):

@Philip Durbin 🐼 I would say the plan is to do the entire Dataverse at some point, but we may start with a more scaled down version in the immediate term depending upon the feasibility.

view this post on Zulip Slava Tykhonov (May 17 2025 at 08:31):

@Charity Everett Please also check my reactograph connected to Dataverse https://graph.muse-it.eu - you can click on nodes and open datasets linked to them. Built on SPARQL integration with Qlever triple store with metadata ingest from October. Looks quite close to what you want with XR experience.

view this post on Zulip Charity Everett (May 17 2025 at 13:58):

Thank you for sharing @Slava Tykhonov. It's a great talk and great work. I was able to get our Avatar set up with an AI listener/talk:https://my.spline.design/aiassistantxrdataverse-nxMctzOk6Dl77kB6gXZg7WP6/ - just using Open AI at the moment- haven't hooked up RAG or knowledgebase or anything yet, but you can talk to it. As for the dataverse itself, we were able to get some version of that up and running with React XR: http://dataverse-viewer.s3-website-us-east-1.amazonaws.com/ . We are working from that right now- if you move your cursor over any of the points then you get data on them and this is really just the beginning. There seems to be quite a bit of overlap between our interests. :-) Any feedback would be greatly appreciated.

view this post on Zulip Philip Durbin 🚀 (May 29 2025 at 23:31):

Wow, check out these glasses: https://holdtherobot.com/blog/2025/05/11/linux-on-android-with-ar-glasses/

view this post on Zulip Charity Everett (May 30 2025 at 16:59):

@Philip Durbin Those are cool. Are you thinking of getting a pair?

view this post on Zulip Philip Durbin 🚀 (May 30 2025 at 17:32):

Ha, not really but it's a neat idea. I like being outside. :smile:

view this post on Zulip Charity Everett (Jun 24 2025 at 08:42):

Hey all. I have run into some pretty nasty rate limits when it comes to getting the datasets that are capping me at 105,000 (which is 85,000 short). I've been stuck here for 1 week now, even when I increase the time between calls. Is there a way to get around this? @Philip Durbin

view this post on Zulip Philip Durbin 🚀 (Jun 24 2025 at 11:43):

Hmm, maybe? Can you please send an email to support@dataverse.harvard.edu and let us know what the ticket number is?


Last updated: Nov 01 2025 at 14:11 UTC