Stream: troubleshooting

Topic: Term URI in the Controlled Vocab URL field


view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 18:35):

hi there .. this is somewhat urgent in that .. new datasets have the new Term UI field and if we add anything to that field, fixes later to migrate the data will wipe out newly entered data.

So .. after upgrading to Dataverse v6.3 (and v6.4) we noticed that we have some datasets where the Term URI is appearing in the controlled vacab URL field .. I know there were instructions in the v6.3 upgrade to migrate that data to the correct field .. however, when upgrading to v6.3, when running the supplied query to find records affected it did not find any (and still does not), so I assumed there was nothing to move (my bad) .. is there a way to fix this?

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 19:05):

trying to figure out why the query didn't find anything as most of our datasets have keywords and URIs included

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 19:16):

I can see the keywords in the datasetfieldvalue field if I search on a specific keyword value .. and i can see mulitple URIs if I search for anything like 'http%'

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 19:26):

.. and the query
'SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue');'
..
finds 357 rows of keywords

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 19:26):

.. and they query
'SELECT value FROM datasetfieldvalue dfv INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id WHERE value ILIKE 'http%';'

.. finds 628 rows

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 19:28):

... it's just the 2 where clauses together don't find anything

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 20:10):

Any help that I could get in this matter is greatly appreciated :face_with_spiral_eyes:

view this post on Zulip Deirdre Kirmis (Dec 02 2024 at 23:04):

Note: I also ran the query to migrate the fields in our QA site and nothing was updated.

view this post on Zulip Dimitri Szabo (Dec 03 2024 at 08:18):

Hi @Deirdre Kirmis ,
I'm pinging @luddaniel who contributed on the addition of the field and might help you :)

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 15:14):

Thank you!

view this post on Zulip luddaniel (Dec 03 2024 at 15:34):

Hey @Deirdre Kirmis, I'm not sure I understand well what is happening to you.

SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND value ILIKE 'http%';

This query finds the value for the keywordValue when the value starts with http so if you get nothing it means you have no url in this field.

view this post on Zulip luddaniel (Dec 03 2024 at 15:35):

But you are saying that you urls are maybe in a controlled vocabulary URL field, do you have a customized citation.tsv ?

view this post on Zulip luddaniel (Dec 03 2024 at 15:37):

Are you talking about keywordVocabularyURI ?

view this post on Zulip luddaniel (Dec 03 2024 at 15:40):

Can you show me one example ? one dataset with the metadata that are not like you want. (dataset json export for example)

view this post on Zulip luddaniel (Dec 03 2024 at 15:43):

Deirdre Kirmis said:

hi there .. this is somewhat urgent in that .. new datasets have the new Term UI field and if we add anything to that field, fixes later to migrate the data will wipe out newly entered data.

Unless you apply a manual delete in database, Dataverse mechanism does not delete data

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:14):

Thanks for your response! The problem is that, all the values that are currently in the "Controlled Vocabulary URL" field should be in the "Term URI" field. When we originally were entering them, we were putting them into the "Vocabulary URL" field.
cv-dataverse.jpg

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:15):

when we upgraded and the new fields appeared, the value that we were putting in the "vocabulary url" field appeared in the "Controlled Vocabulary field"

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:16):

we do not have a custom citation.tsv

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:18):

we don't have any custom metadata blocks, other than we added the "computational workflow" metadata block accidentally when we upgraded to v6.3

view this post on Zulip luddaniel (Dec 03 2024 at 16:37):

I see now why I was confused ! :sweat_smile:
First, I didn't know the default label for keywordVocabularyURI was Controlled Vocabulary URL (I'm french)
2nd, We assumed for a lot of reason that the Term field could contain either a text value or the term Uri.

I guess you undestand that keywordVocabularyURI is the URL for the vocabulary (MeSH -> http://id.nlm.nih.gov/mesh/D004493 or https://www.ncbi.nlm.nih.gov/mesh)
and keywordTermURI is the URL for the term (Physical education and training -> http://id.nlm.nih.gov/mesh/D010806)

In your case, if you used in the past keywordVocabularyURI as a way to add the keywordTermURI you can adjust the sql queries like this :

SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI')
AND value ILIKE 'http%';
UPDATE datasetfield df
SET datasetfieldtype_id  = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI')
FROM datasetfieldvalue dfv
WHERE dfv.datasetfield_id  = df.id
AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI')
AND dfv.value ILIKE 'http%';

This will transfert all url from keywordVocabularyURI value to keywordTermURI which is empty because it's new.
This is not what you want to do if keywordVocabularyURI contains correct URLs for the vocabulary.

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:45):

Sorry I was just about to post the json .. i'll post it anyway just in case ..

.. this looks like what we need! Thank you! I'm sorry I'm not familiar enough with the tables to know exactly where we needed to change that, but I did see the "keywordVocabularyURI" field and was wondering about it .. :smile:

So, it seems we were adding the Term URI when that should have been Controlled Vocabulary URL? (keywordVocabularyURI)?

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:45):

VXJNBO_export.json

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:50):

so, I ran that query in our QA site, and it changed the dataset field names so that now I have 2 "Term URI" fields .. this is for records that already had something in the Term URI field
migrated-cv-dataverse.jpg

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:52):

for records where that field was empty .. it did it correctly
migrated-cv2-dataverse.jpg

view this post on Zulip luddaniel (Dec 03 2024 at 16:53):

Deirdre Kirmis said:

so, I ran that query in our QA site, and it changed the dataset field names so that now I have 2 "Term URI" fields .. this is for records that already had something in the Term URI field
migrated-cv-dataverse.jpg

this one is weird

view this post on Zulip luddaniel (Dec 03 2024 at 16:53):

Deirdre Kirmis said:

for records where that field was empty .. it did it correctly
migrated-cv2-dataverse.jpg

this one worked but this is the case where you didn't want it :D

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:54):

oh yea, that was just a test .. i may have put the wrong URL .. but it was in the controlled vocab field .. now is in the term uri field

view this post on Zulip luddaniel (Dec 03 2024 at 16:54):

Deirdre Kirmis said:

So, it seems we were adding the Term URI when that should have been Controlled Vocabulary URL? (keywordVocabularyURI)?

Yes but you only had 1 field so you did your best :p keywordTermURI is by far more important.

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 16:55):

in production, we likely don't have anything in the Term URI fields .. so since the first query in the instructions didn't find anything, we should be okay?

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 17:04):

oh nm .. i see now it just kept what was there and added another one .. that is fine .. I think we are good! Thank you!!!

view this post on Zulip luddaniel (Dec 03 2024 at 18:12):

Queries are only here to help you to move url in the correct field if you want it, feel free to not use it.

view this post on Zulip luddaniel (Dec 03 2024 at 18:14):

I don't undertand why you have 5 fields, maybe the cache (need to restart payara) or you altered the data. But I don't think the queries I gave you can do that.

view this post on Zulip Deirdre Kirmis (Dec 03 2024 at 18:25):

all we really need to do at this point is move whatever is in the "Controlled Vocabulary URL" field to the "Term URI" field .. we were apparently using the "Vocabulary URL" field for that.

Will the second query above that you added do that? (I tried it on QA and it seemed to do so successfully, although it added the second Term URI field to the dataset that already had one). We will make sure we don't have anything in that field.

view this post on Zulip luddaniel (Dec 04 2024 at 09:20):

Deirdre Kirmis said:

Will the second query above that you added do that?

Yes

view this post on Zulip luddaniel (Dec 04 2024 at 09:21):

Deirdre Kirmis said:

We will make sure we don't have anything in that field.

As the field is created by curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv while upgrading to v6.3, it will be empty. No worries.

view this post on Zulip luddaniel (Dec 04 2024 at 09:26):

Deirdre Kirmis said:

although it added the second Term URI field to the dataset that already had one

I don't see how the sql query could do that, as it is a replacement of a field type, not an addition of a field value.
Something else did that. Please, restart payara, and do run some other tests, because fixing this afterward it really complicated in term of SQL.

view this post on Zulip Deirdre Kirmis (Dec 04 2024 at 15:22):

Running the query resolved the issue on our production site. Thank you!!
As for QA, I restarted payara but the double Term URI field is still there. This is a site that had already been upgraded to v6.3 and someone had added a record that included data in the Term URI field. When I ran the query, it moved the value of "Controlled Vocabulary URL" to a new "Term URI" field but kept the existing value in that field, so that now there are two Term URIs.

view this post on Zulip Deirdre Kirmis (Dec 04 2024 at 15:25):

we clear that site of all datasets once a month, so it doesn't matter to us that it is like that, unless it will cause an issue with the DB later for some reason

view this post on Zulip luddaniel (Dec 04 2024 at 15:55):

Good news :)

view this post on Zulip Notification Bot (Dec 04 2024 at 15:56):

luddaniel has marked this topic as resolved.

view this post on Zulip Oliver Bertuch (May 23 2025 at 15:18):

I'm just looking at the DV 6.3 upgrade instructions and see them SQL queries. Was there a particular reason why we didn't add a Flyway migration for this? (maybe an afterMigrate style one, potentially using Java for it to make it configurable to be applied)

view this post on Zulip Notification Bot (May 23 2025 at 15:22):

Philip Durbin has marked this topic as unresolved.

view this post on Zulip Philip Durbin πŸš€ (May 23 2025 at 15:23):

It looks like it came from this PR:

Add keywordTermURI metadata in keyword block #10371

view this post on Zulip Philip Durbin πŸš€ (May 23 2025 at 15:26):

From @Steven Ferey I see "Yes absolutely, I delete the script and update the PR description" at https://github.com/IQSS/dataverse/pull/10371#discussion_r1577941808

I'm not sure why. :thinking:

view this post on Zulip Steven Ferey (May 26 2025 at 14:14):

Hello, The two topics are different:

-> The SQL script mentioned by @Philip Durbin has been deleted because the modification to this metadata has been reverted: https://github.com/IQSS/dataverse/pull/10371/commits/9f728453eff6041a3becee2a093fb3479af2d317

-> The SQL queries mentioned by @Oliver Bertuch , and present in the update instructions, are current, and the update is optional. The select query allows the Dataverse administrator to evaluate the data in question. The administrator can then decide whether to migrate these values ​​from the keywordValue metadata to the new keywordTermURI metadata, depending on the proposed pattern.

view this post on Zulip luddaniel (May 26 2025 at 14:28):

In addition, the reason of this topic was that a specific Dataverse intance was using keywordVocabularyURI to store a Term URI (new keywordTermURI) that for many instances is in keywordValue (default case is covered by https://github.com/IQSS/dataverse/releases/tag/v6.3#upgrade-instructions-6-3.

You can now see why a flyway script was avoided, it may not match the administrator's wishes.

luddaniel said:

```sql
UPDATE datasetfield df
SET datasetfieldtype_id  = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI')
FROM datasetfieldvalue dfv
WHERE dfv.datasetfield_id  = df.id
AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI')
AND dfv.value ILIKE 'http%';

This will transfert all url from keywordVocabularyURI value to keywordTermURI which is empty because it's new.
This is not what you want to do if keywordVocabularyURI contains correct URLs for the vocabulary.

view this post on Zulip Philip Durbin πŸš€ (May 27 2025 at 13:23):

Thanks! @Oliver Bertuch is this making sense, that it's optional?

view this post on Zulip Oliver Bertuch (May 27 2025 at 13:40):

Uhm yeah sure. I'll still add a migration to the extra migrations that can be switched on/off and will tell you if there are any such elements.


Last updated: Oct 30 2025 at 06:21 UTC