hi there .. this is somewhat urgent in that .. new datasets have the new Term UI field and if we add anything to that field, fixes later to migrate the data will wipe out newly entered data.
So .. after upgrading to Dataverse v6.3 (and v6.4) we noticed that we have some datasets where the Term URI is appearing in the controlled vacab URL field .. I know there were instructions in the v6.3 upgrade to migrate that data to the correct field .. however, when upgrading to v6.3, when running the supplied query to find records affected it did not find any (and still does not), so I assumed there was nothing to move (my bad) .. is there a way to fix this?
trying to figure out why the query didn't find anything as most of our datasets have keywords and URIs included
I can see the keywords in the datasetfieldvalue field if I search on a specific keyword value .. and i can see mulitple URIs if I search for anything like 'http%'
.. and the query
'SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue');'
..
finds 357 rows of keywords
.. and they query
'SELECT value FROM datasetfieldvalue dfv INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id WHERE value ILIKE 'http%';'
.. finds 628 rows
... it's just the 2 where clauses together don't find anything
Any help that I could get in this matter is greatly appreciated :face_with_spiral_eyes:
Note: I also ran the query to migrate the fields in our QA site and nothing was updated.
Hi @Deirdre Kirmis ,
I'm pinging @luddaniel who contributed on the addition of the field and might help you :)
Thank you!
Hey @Deirdre Kirmis, I'm not sure I understand well what is happening to you.
SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND value ILIKE 'http%';
This query finds the value for the keywordValue when the value starts with http so if you get nothing it means you have no url in this field.
But you are saying that you urls are maybe in a controlled vocabulary URL field, do you have a customized citation.tsv ?
Are you talking about keywordVocabularyURI ?
Can you show me one example ? one dataset with the metadata that are not like you want. (dataset json export for example)
Deirdre Kirmis said:
hi there .. this is somewhat urgent in that .. new datasets have the new Term UI field and if we add anything to that field, fixes later to migrate the data will wipe out newly entered data.
Unless you apply a manual delete in database, Dataverse mechanism does not delete data
Thanks for your response! The problem is that, all the values that are currently in the "Controlled Vocabulary URL" field should be in the "Term URI" field. When we originally were entering them, we were putting them into the "Vocabulary URL" field.
cv-dataverse.jpg
when we upgraded and the new fields appeared, the value that we were putting in the "vocabulary url" field appeared in the "Controlled Vocabulary field"
we do not have a custom citation.tsv
we don't have any custom metadata blocks, other than we added the "computational workflow" metadata block accidentally when we upgraded to v6.3
I see now why I was confused ! :sweat_smile:
First, I didn't know the default label for keywordVocabularyURI was Controlled Vocabulary URL (I'm french)
2nd, We assumed for a lot of reason that the Term field could contain either a text value or the term Uri.
I guess you undestand that keywordVocabularyURI is the URL for the vocabulary (MeSH -> http://id.nlm.nih.gov/mesh/D004493 or https://www.ncbi.nlm.nih.gov/mesh)
and keywordTermURI is the URL for the term (Physical education and training -> http://id.nlm.nih.gov/mesh/D010806)
In your case, if you used in the past keywordVocabularyURI as a way to add the keywordTermURI you can adjust the sql queries like this :
SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI')
AND value ILIKE 'http%';
UPDATE datasetfield df
SET datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI')
FROM datasetfieldvalue dfv
WHERE dfv.datasetfield_id = df.id
AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI')
AND dfv.value ILIKE 'http%';
This will transfert all url from keywordVocabularyURI value to keywordTermURI which is empty because it's new.
This is not what you want to do if keywordVocabularyURI contains correct URLs for the vocabulary.
Sorry I was just about to post the json .. i'll post it anyway just in case ..
.. this looks like what we need! Thank you! I'm sorry I'm not familiar enough with the tables to know exactly where we needed to change that, but I did see the "keywordVocabularyURI" field and was wondering about it .. :smile:
So, it seems we were adding the Term URI when that should have been Controlled Vocabulary URL? (keywordVocabularyURI)?
so, I ran that query in our QA site, and it changed the dataset field names so that now I have 2 "Term URI" fields .. this is for records that already had something in the Term URI field
migrated-cv-dataverse.jpg
for records where that field was empty .. it did it correctly
migrated-cv2-dataverse.jpg
Deirdre Kirmis said:
so, I ran that query in our QA site, and it changed the dataset field names so that now I have 2 "Term URI" fields .. this is for records that already had something in the Term URI field
migrated-cv-dataverse.jpg
this one is weird
Deirdre Kirmis said:
for records where that field was empty .. it did it correctly
migrated-cv2-dataverse.jpg
this one worked but this is the case where you didn't want it :D
oh yea, that was just a test .. i may have put the wrong URL .. but it was in the controlled vocab field .. now is in the term uri field
Deirdre Kirmis said:
So, it seems we were adding the Term URI when that should have been Controlled Vocabulary URL? (keywordVocabularyURI)?
Yes but you only had 1 field so you did your best :p keywordTermURI is by far more important.
in production, we likely don't have anything in the Term URI fields .. so since the first query in the instructions didn't find anything, we should be okay?
oh nm .. i see now it just kept what was there and added another one .. that is fine .. I think we are good! Thank you!!!
Queries are only here to help you to move url in the correct field if you want it, feel free to not use it.
I don't undertand why you have 5 fields, maybe the cache (need to restart payara) or you altered the data. But I don't think the queries I gave you can do that.
all we really need to do at this point is move whatever is in the "Controlled Vocabulary URL" field to the "Term URI" field .. we were apparently using the "Vocabulary URL" field for that.
Will the second query above that you added do that? (I tried it on QA and it seemed to do so successfully, although it added the second Term URI field to the dataset that already had one). We will make sure we don't have anything in that field.
Deirdre Kirmis said:
Will the second query above that you added do that?
Yes
Deirdre Kirmis said:
We will make sure we don't have anything in that field.
As the field is created by curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv while upgrading to v6.3, it will be empty. No worries.
Deirdre Kirmis said:
although it added the second Term URI field to the dataset that already had one
I don't see how the sql query could do that, as it is a replacement of a field type, not an addition of a field value.
Something else did that. Please, restart payara, and do run some other tests, because fixing this afterward it really complicated in term of SQL.
Running the query resolved the issue on our production site. Thank you!!
As for QA, I restarted payara but the double Term URI field is still there. This is a site that had already been upgraded to v6.3 and someone had added a record that included data in the Term URI field. When I ran the query, it moved the value of "Controlled Vocabulary URL" to a new "Term URI" field but kept the existing value in that field, so that now there are two Term URIs.
we clear that site of all datasets once a month, so it doesn't matter to us that it is like that, unless it will cause an issue with the DB later for some reason
Good news :)
luddaniel has marked this topic as resolved.
I'm just looking at the DV 6.3 upgrade instructions and see them SQL queries. Was there a particular reason why we didn't add a Flyway migration for this? (maybe an afterMigrate style one, potentially using Java for it to make it configurable to be applied)
Philip Durbin has marked this topic as unresolved.
It looks like it came from this PR:
Add keywordTermURI metadata in keyword block #10371
From @Steven Ferey I see "Yes absolutely, I delete the script and update the PR description" at https://github.com/IQSS/dataverse/pull/10371#discussion_r1577941808
I'm not sure why. :thinking:
Hello, The two topics are different:
-> The SQL script mentioned by @Philip Durbin has been deleted because the modification to this metadata has been reverted: https://github.com/IQSS/dataverse/pull/10371/commits/9f728453eff6041a3becee2a093fb3479af2d317
-> The SQL queries mentioned by @Oliver Bertuch , and present in the update instructions, are current, and the update is optional. The select query allows the Dataverse administrator to evaluate the data in question. The administrator can then decide whether to migrate these values ββfrom the keywordValue metadata to the new keywordTermURI metadata, depending on the proposed pattern.
In addition, the reason of this topic was that a specific Dataverse intance was using keywordVocabularyURI to store a Term URI (new keywordTermURI) that for many instances is in keywordValue (default case is covered by https://github.com/IQSS/dataverse/releases/tag/v6.3#upgrade-instructions-6-3.
You can now see why a flyway script was avoided, it may not match the administrator's wishes.
luddaniel said:
```sql UPDATE datasetfield df SET datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI') FROM datasetfieldvalue dfv WHERE dfv.datasetfield_id = df.id AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordVocabularyURI') AND dfv.value ILIKE 'http%';This will transfert all url from
keywordVocabularyURIvalue tokeywordTermURIwhich is empty because it's new.
This is not what you want to do ifkeywordVocabularyURIcontains correct URLs for the vocabulary.
Thanks! @Oliver Bertuch is this making sense, that it's optional?
Uhm yeah sure. I'll still add a migration to the extra migrations that can be switched on/off and will tell you if there are any such elements.
Last updated: Oct 30 2025 at 06:21 UTC