Hello!
We (@robert christie and I) were able to get an initial brainstorming session done on the Dataverse XR project, and we are so psyched to work on this. The more the merrier of course, but I wanted to share what we have so far. Please offer feedback, questions, comments, concerns, emojis (in a pinch lol). Hope to see you in the Dataverse!
4.png
1.png
2.png
3.png
I think this will be a fun project!
My understanding of the plan so far is to create an immersive, space themed 3D visualization of all of the datasets in the Dataverse that organizes datasets into ‘galaxies’ of similar datasets. Hopefully we can then use this visualization to build some stories and experiences that allow people to embark on a voyage through the Dataverse!
We have just started trying to make a layout for the visualization by embedding text describing each dataset and using dimensionality reduction techniques to create 2D & 3D scatterplots.
scatterplot.png
I think that's a pretty great explanation of where we are so far! Thank you Robert! I would like to add that we also considered there being an onboarding process with an avatar (could be AI enabled) to help guide you through the Dataverse interface, and even possibly your searching and exploration. If anybody has any questions, we're all ears. :ear:
:milky_way: Dataverse in WebXR: Help Us Build the Galaxy! :milky_way:
XRDataverseGIF.gif
We’re thrilled to share a first look at our experimental WebXR “space” version of the Dataverse-a new way to explore datasets as if navigating a galaxy of research. Check it out here: http://dataverse-viewer.s3-website-us-east-1.amazonaws.com/ .
This early prototype is the result of a fantastic collaboration with [@Robert Christie], who has done an incredible job translating this vision into an immersive, intuitive experience. Imagine each star as a dataset, ready to be discovered!
We’d love for more of you to get involved!
Whether you’re interested in UX, data visualization, metadata, or just want to brainstorm wild ideas, your expertise can help shape where this project goes next.
Drop a comment if you’re curious, want to test, or have ideas-we’re just getting started!
I hope you are well!
ai-assistant-xr-dataverse (1).gif
We've been working on the AI Assistant (tentatively names MILKY), and we want to see if we can get it hooked up with chat.dataverse.org as the knowledgebase to start out. How would we go about that? I don't see how I can chat with it right now, was it taken offline for some reason?
Fun fact: MILKY looks wherever you point your cursor.
We're having so much fun with this! Can't wait to show off where we're headed next! Want to join us? The more the merrier!
@Philip Durbin ☀️
Is there a way to get a bulk database extract? Currently I have been using the dataverse API to slowly collect metadata for the datasets.
robert christie said:
Is there a way to get a bulk database extract? Currently I have been using the dataverse API to slowly collect metadata for the datasets.
@Philip Durbin ☀️
@Slava Tykhonov has asked about this as well. It's from October but he published metadata from the Harvard and DANS installations of Dataverse. Does that help?
Is the chat available for use? I don't see how from the public page.
There is some archived chat here:
For Zulip, I think we need to download messages from the API. I've also considered setting up https://github.com/zulip/zulip-archive which would automate this.
I just asked about it here https://chat.zulip.org/#narrow/channel/137-feedback/topic/Zulip.20as.20a.20knowledge.20base.20for.20AI/near/2174469
If it helps, here's how Onyx (formerly Danswer) pulls in messages from Zulip: https://github.com/onyx-dot-app/onyx/pull/247
@Philip Durbin 🐼 Ok, I understand. I thought it was a chatbot.
I'm probably just confusing things. :sweat_smile:
https://chat.dataverse.org is just a static page that we use to direct people here (to Zulip, to https://dataverse.zulipchat.com )
We do have an AI prompt at https://ask.dataverse.org built by @Slava Tykhonov
Maybe I'm not sure what you mean by chatbot. :thinking: :sweat_smile:
Ah, of course! I should have checked the dataverse for dataverse data!
The ask.datavers.org is the closest to a chatbot that you have, but I don't think it is trained to answer a very wide range of questions based on a couple of chats I had with it @Slava Tykhonov I could be wrong on this, please correct me if so. I think we're probably going to have to use the dataverse itself as a knowledgebase for an intelligent assistant. Just curious, is the ask.dataverse repo public? Also, if I remember correctly it utilizes a Hugging Face LLM, correct?
No, Ask service is own thing, you can watch my presentation to know how it works: https://harvard.zoom.us/rec/play/7BL5Rclaf31VsCLM1u9uI7PacYelA2ZKWMhtOBgvrHUTBomJSWu8y2MaeDjTm3OGmu0QArgtnRV5pkf5.KwK27HL6N8n9pCxp?accessLevel=meeting&canPlayFromShare=true&from=share_recording_detail&continueMode=true&componentName=rec-play&originRequestUrl=https%3A%2F%2Fharvard.zoom.us%2Frec%2Fshare%2FbOizatNdMdxINRCnqpt87fPITPvsDWTv3ysvA8kIaEE4wnmZPSeSUkdmpKYP1ooA.rKoNMqED_L8KtHOi
And the code isn't public, right?
I have the whole Harvard Dataverse knowledge base in the triple store https://triples.now.museum/default
Source code isn't public yet but should be soon. Complex negotiations
And yes, it does semantic indexing on demand when someone is adding new page. So no "training" in the traditional sense, vectors go to the vector store.
The chat is very cool! The point cloud we are working on is also using embeddings that represent each dataset. Embeddings are reduced into a 3 dimensional representation for viewing purposes with UMAP. I would be curious to hear your strategy for generating the embeddings for each dataset. One issue I have been running into is the inconsistent contents of dataverse metadata- some datasets have nice descriptions while others just name papers or have links to other documents. We have been creating a text description from the metadata that is embedded with nomic embed text v2.
Thanks, we did another presentation on this recently in Berlin, it's also online https://www.youtube.com/watch?v=5Rxt09tSJv8
@robert christie @Charity Everett out of curiosity, are you looking at datasets in just Harvard Dataverse or other installations as well? There's a list at https://iqss.github.io/dataverse-installations/data/data.json
@Philip Durbin 🐼 I would say the plan is to do the entire Dataverse at some point, but we may start with a more scaled down version in the immediate term depending upon the feasibility.
@Charity Everett Please also check my reactograph connected to Dataverse https://graph.muse-it.eu - you can click on nodes and open datasets linked to them. Built on SPARQL integration with Qlever triple store with metadata ingest from October. Looks quite close to what you want with XR experience.
Thank you for sharing @Slava Tykhonov. It's a great talk and great work. I was able to get our Avatar set up with an AI listener/talk:https://my.spline.design/aiassistantxrdataverse-nxMctzOk6Dl77kB6gXZg7WP6/ - just using Open AI at the moment- haven't hooked up RAG or knowledgebase or anything yet, but you can talk to it. As for the dataverse itself, we were able to get some version of that up and running with React XR: http://dataverse-viewer.s3-website-us-east-1.amazonaws.com/ . We are working from that right now- if you move your cursor over any of the points then you get data on them and this is really just the beginning. There seems to be quite a bit of overlap between our interests. :-) Any feedback would be greatly appreciated.
Wow, check out these glasses: https://holdtherobot.com/blog/2025/05/11/linux-on-android-with-ar-glasses/
@Philip Durbin Those are cool. Are you thinking of getting a pair?
Ha, not really but it's a neat idea. I like being outside. :smile:
Hey all. I have run into some pretty nasty rate limits when it comes to getting the datasets that are capping me at 105,000 (which is 85,000 short). I've been stuck here for 1 week now, even when I increase the time between calls. Is there a way to get around this? @Philip Durbin
Hmm, maybe? Can you please send an email to support@dataverse.harvard.edu and let us know what the ticket number is?
Last updated: Nov 01 2025 at 14:11 UTC