Stream: troubleshooting

Topic: iso639-2 vs iso639-5 (dataverse citation version 6.5)


view this post on Zulip Alosh (Jun 10 2025 at 11:56):

Hi @Philip Durbin @Leo Andreev @Steven Winship ,

I'm Ali from DANS (Netherlands). I have a question about
# Support the full ISO 639-3 list of languages #10578 #10762
There are a lot of new languages in new Dataverse 6.5. Three is also another change that we don´t have any idea why it is changed? The sub-language-names order of some languages are changed when you compare them to iso369-2?

ISO639-2 (DANS)                                            | Dataverse citation.tsv file (v 6.5)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
bod,"Tibetan Standard, Tibetan, Central"                   | Tibetan, Tibetan Standard, Central bod
tib,"Tibetan Standard, Tibetan, Central"                   | Tibetan  Tibetan Central   Tibetan Standard    tib
div,"Divehi, Dhivehi, Maldivian"                           | Maldivian, Dhivehi, Divehi div
ful,"Fula, Fulah, Pulaar, Pular"                           | Fula, Fulah        ful
                                                           | Pulaar fuc 5671    fuc
                                                           | Pulabu pup 5672    pup
                                                           | Pular  fuf 5673    fuf
gla,"Scottish Gaelic, Gaelic"                              | Gaelic, Scottish Gaelic    gla
hat,"Haitian, Haitian Creole"                              | Haitian Creole, Haitian    hat
kal,"Kalaallisut, Greenlandic"                             | Greenlandic, Kalaallisut   kal
kua,"Kwanyama, Kuanyama"                                   | Kuanyama, Kwanyama kua
lim,"Limburgish, Limburgan, Limburger"                     | Limburgan, Limburger, Limburgish   lim
oci,Occitanoji,"Ojibwe, Ojibwa"                            | Ojibwe, Ojibwa oji
uig,"Uyghur, Uighur"                                       | Uighur, Uyghur uig

view this post on Zulip Leo Andreev (Jun 16 2025 at 20:02):

Just to clarify, the support for the extended iso639-3 list of languages was added in the citation block update in Dataverse 6.4. No changes were made in 6.5, to the best of my knowledge.

@Steven Winship may be able to give a better answer, as the author of the PR 10762. But my best answer would be that the goal was to support the full iso369-3 list (there were multiple requests from the community to that extent). The main, preferred names of a few languages - such as the ones listed above - had changed between iso369-2 and -3. So all such cases were reviewed manually and we made the best effort to make sure all the previously known designations of the affected languages were still supported (this was achieved by supplying all such names as "alternates").
One super important part is to make sure you didn't skip the citation block update prior to that one. The test is to confirm that all your existing language control vocab. values have non-NULL identifiers; for example:

SELECT cvv.strvalue, cvv.identifier FROM controlledvocabularyvalue cvv, datasetfieldtype ft WHERE cvv.datasetfieldtype_id = ft.id AND ft.name='language';

If not, make sure to apply the citation.tsv from Dataverse 6.3 first (wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/citation.tsv). As long as your existing language controlled vocab. values have the identifiers defined, extending the CV to the full iso639 list should be painless and should not result in the loss of any previously supported language codes.

One real drawback of having this full list in citation.tsv is that citation block updates take much longer now.

view this post on Zulip Alosh (Jun 18 2025 at 09:18):

Hi @Leo Andreev Thank very much for your clear reply. :folded_hands:


Last updated: Oct 30 2025 at 06:21 UTC