We have a custom PID provider called perma1 set as the default. (we dont have datacite integration yet nor will we have it soon we want to use permalink (for now).. Our Payara JVM options include:
-Ddataverse.pid.providers=perma1
-Ddataverse.pid.default-provider=perma1
-Ddataverse.pid.perma1.type=perma
-Ddataverse.pid.perma1.label=PermaLink
-Ddataverse.pid.perma1.authority=TEST
-Ddataverse.pid.perma1.shoulder=x/
-Ddataverse.pid.perma1.permalink.base-url=https://dataverse-test.bsc.es/dataset.xhtml?persistentId=perma:
-Ddataverse.pid.perma1.permalink.separator=/
In the Dataverse database (dvobject), our dataset row shows:
If we create a new dataset 
In the Dataverse database (dvobject), our dataset row shows:
id=501
protocol=perma
authority=TEST
identifier=x/GEGNYK
We can access the dataset page directly at https://dataverse-test.bsc.es/dataset.xhtml?persistentId=perma:TEST/x/GEGNYK
However, it never appears in Solr (numFound=0), and thus not in search results or dataset listings. Forced reindexing completes without errors, but the dataset is still missing from Solr. Logs show the custom PermaLinkPidProvider gets invoked (“Parsing in Perma...”), yet no dataset doc is created in Solr.
We suspect the custom perma protocol isn’t being recognized by the indexing pipeline, possibly due to incomplete integration between PermaLinkPidProvider and Dataverse’s GlobalIdServiceBean or indexing logic. We removed :DoiProvider="Builtin" and confirmed the PID config is set to perma1 only. Still, the dataset never appears in Solr.
We are looking for guidance on how to properly integrate a custom PID provider (“perma1”) so new “perma:…” datasets are fully indexed in Dataverse.
Strange. Can you please show us your server.log?
Thanks!
Interesting. Here's the error:
Exception Description: Syntax error parsing [SELECT fm.id FROM FileMetadata fm, DvObject dvo WHERE fm.datasetVersion.id=(SELECT dv.id FROM DatasetVersion dv WHERE dv.dataset.id=dvo.owner.id and dv.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY dv.versionNumber DESC, dv.minorVersionNumber DESC LIMIT 1) AND dvo.id=fm.dataFile.id AND fm.dataFile.id=:fid].
[232, 232] The right parenthesis is missing from the sub-expression.
[291, 292] The ORDER BY clause has 'dv.minorVersionNumber DESC ' and 'LIMIT 1' that are not separated by a comma.
[292, 299] The order by item is not a valid expression.
[299, 350] The query contains a malformed ending.
Which @Omer M Fahim saw in this issue: https://github.com/IQSS/dataverse/issues/11069
https://github.com/IQSS/dataverse/pull/11072 is the fix
@Simon Carroll which version of Dataverse are you running?
Philip Durbin 🚀 said:
Simon Carroll which version of Dataverse are you running?
6.4
Ok, the fix above is in 6.5 (#community > Dataverse 6.5 is here! )
Hi Philip
I can now create datasets and the permalink seems to work but nothing is indexed. I dont really see any relevant errors. Any ideas what could be happening
Current config below :
-Ddataverse.files.archive.type=trs
-Ddataverse.files.archive.base-url=https://transfer1.bsc.es/gpfs/archive/
-Ddataverse.files.archive.ingestsizelimit=0
-Ddataverse.files.archive.label=archive
-Ddataverse.pid.datacite.username=testaccount
-Ddataverse.pid.datacite.password=${ALIAS=doi_password_alias}
-Ddataverse.pid.perma1.type=perma
-Ddataverse.pid.perma1.label=PermaLink
-Ddataverse.pid.perma1.authority=TEST
-Ddataverse.pid.perma1.permalink.separator=/
-Ddataverse.pid.default-provider=perma1
-Ddataverse.pid.providers=perma1
-Ddataverse.spi.pidproviders.directory=/usr/local/payara6/glassfish/domains/domain1/config/pidproviders
-Ddataverse.pid.perma1.permalink.base-url=https://dataverse-test.bsc.es/dataset.xhtml?persistentId=perma:
server.log
I attach the relevant part of the log
even doing this
rocky@dataverse-test ~]$ curl http://localhost:8080/api/admin/index/dataset?persistentId=perma:TEST/JEZUF6
{"status":"OK","data":{"message":"Reindexed dataset perma:TEST/JEZUF6","id":517,"persistentId":"perma:TEST/JEZUF6","versions":[{"semanticVersion":"DRAFT","id":104}]}}
I still dont see it :thinking:
Hmm, what do you have in /usr/local/payara6/glassfish/domains/domain1/config/pidproviders?
Hi philip. It is empty.
Ok, could that line be causing you trouble? What if you remove it? This one: -Ddataverse.spi.pidproviders.directory=/usr/local/payara6/glassfish/domains/domain1/config/pidproviders
should I duplicate some of the config form the domain.xml to there
Oh, wait, you started by saying you have a custom PID provider.
In what way is it custom?
it depends on the definition. We just want to like the PID to the url of the dataset in dataverse
Ok, the regular perma provider should work fine for that.
It's funny, right now for development and demos we use a weird FAKE DOI provider but we've talked about switching to the perma provider. Maybe we should go ahead and switch. Then you'd have a working config!
just deleted that option and I still dont have new datasets
 indexed
Here's what I'm thinking. Maybe I could convert the compose.yml in our tutorial at https://guides.dataverse.org/en/6.5/container/running/demo.html to use perma instead of FAKE. If it works, if datasets are indexed properly, maybe you can try it and compare to your config. How does that sound?
It sounds great for us!
Please, if we can help let us know.
@Simon Carroll please take a look at this: https://github.com/IQSS/dataverse/pull/11108
Good morning! Thanks a lot for the support.
The config works nicely for us however, when I create a data set like this, https://dataverse-test.bsc.es/dataset.xhtml?persistentId=perma:BSCTEST/VPJRN7 I dont see the new data set indexed. 
image.png
Even if I do this :
curl http://localhost:8080/api/admin/index/dataset?persistentId=perma:BSCTEST/VPJRN7
{"status":"OK","data":{"message":"Reindexed dataset perma:BSCTEST/VPJRN7","id":534,"persistentId":"perma:BSCTEST/VPJRN7","versions":[{"semanticVersion":"DRAFT","id":110}]}}[rocky@dataverse-test ~]$
Hmm, could you please show us a fresh copy of server.log?
server.adddata.log
Good morning! I attach a log of creating a dataset  https://dataverse-test.bsc.es/citation?persistentId=perma:BSCTEST/0DBSEH,
server.index.log heres a couple of lines after manually indexing it
and the full log in case its usefull 
full.log
Unfortunately, I can't find anything relevant in the logs. Should we do a "diff" between our two configs? I'm using the compose.yml from https://github.com/IQSS/dataverse/pull/11108 . What other settings are you using?
jvmoptions.out
I am not using docker in test atm but here is the jvm options out.
very strange one . I do see in the logs something with the sholder "x" but we removed it the subsequent indexing should fix it anyway though ?
Well, if a dataset was saved with a certain PID due to previous configurations it will still have that PID. Reconfiguring Dataverse won't change the dataset's PID.
I wonder if we can have you start with a "known good" config that indexes datasets properly. Then add more config that you need. Does that make sense?
Because it seems like you are starting with a config that isn't working well, at least for indexing datasets.
yeah, well it worked beore switching to perma1
what can i do, change the default provider ?
-Ddataverse.pid.providers=fake
-Ddataverse.pid.fake.type=FAKE
-Ddataverse.pid.fake.label=Fake DOI Provider
-Ddataverse.pid.fake.authority=10.5072
-Ddataverse.pid.fake.shoulder=FK2/
Interesting. So perma1 caused all the problems. :grimacing:
this worked before
This is how I'm setting the default provider in the PR above:
-Ddataverse.pid.default-provider=perma1
-Ddataverse.pid.default-provider=perma1
we only differ on authority in the pid settings
and of course the fact we switch form one to the other
[root@dataverse-test rocky]# $PAYARA/bin/asadmin list-jvm-options | grep pid
Picked up JAVA_TOOL_OPTIONS: -Djdk.util.zip.disableZip64ExtraFieldValidation=true --add-opens=java.base/java.io=ALL-UNNAMED
-Ddataverse.pid.datacite.mds-api-url=https://mds.test.datacite.org/
-Ddataverse.pid.datacite.rest-api-url=https://api.test.datacite.org
-Ddataverse.pid.perma1.type=perma
-Ddataverse.pid.perma1.label=Perma1
-Ddataverse.pid.perma1.authority=BSCTEST
-Ddataverse.pid.perma1.permalink.separator=/
-Ddataverse.pid.default-provider=perma1
-Ddataverse.pid.providers=perma1
Right.
Could you please run this for me?
curl http://localhost:8080/api/admin/index/status
[root@dataverse-test rocky]# curl http://localhost:8080/api/admin/index/status
{"status":"OK","data":{"message":"Index Status Batch Job initiated, check log for job status."}}
Great. Now can you please look at server.log? You should see output that looks something like this:
dev_dataverse>   Beginning indexStatus()|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.856+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722856;_LevelValue=800;|
dev_dataverse>   checking for stale or missing dataverses|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.868+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722868;_LevelValue=800;|
dev_dataverse>   checking for stale or missing datasets|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.870+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722870;_LevelValue=800;|
dev_dataverse>   completed check for stale or missing content.|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.870+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722870;_LevelValue=800;|
dev_dataverse>   checking for dataverses in Solr only|#]
dev_dataverse>
dev_solr> 2024-12-20 15:15:22.875 INFO  (qtp1331270134-57) [c: s: r: x:collection1 t:null-5] o.a.s.c.S.Request webapp=/solr path=/select params={q=*&cursorMark=*&sort=id+asc&fq=dvObjectType:dataverses&rows=100&wt=javabin&version=2} hits=0 status=0 QTime=2
dev_dataverse> [#|2024-12-20T15:15:22.878+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722878;_LevelValue=800;|
dev_dataverse>   checking for datasets in Solr only|#]
dev_dataverse>
dev_solr> 2024-12-20 15:15:22.882 INFO  (qtp1331270134-28) [c: s: r: x:collection1 t:null-6] o.a.s.c.S.Request webapp=/solr path=/select params={q=*&cursorMark=*&sort=id+asc&fq=dvObjectType:datasets&rows=100&wt=javabin&version=2} hits=0 status=0 QTime=1
dev_dataverse> [#|2024-12-20T15:15:22.883+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722883;_LevelValue=800;|
dev_dataverse>   checking for files in Solr only|#]
dev_dataverse>
dev_solr> 2024-12-20 15:15:22.884 INFO  (qtp1331270134-56) [c: s: r: x:collection1 t:null-7] o.a.s.c.S.Request webapp=/solr path=/select params={q=*&cursorMark=*&sort=id+asc&fq=dvObjectType:files&rows=100&wt=javabin&version=2} hits=0 status=0 QTime=0
dev_dataverse> [#|2024-12-20T15:15:22.885+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722885;_LevelValue=800;|
dev_dataverse>   completed check for content in Solr but not database|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.886+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722886;_LevelValue=800;|
dev_dataverse>   Checking for solr-only permissions|#]
dev_dataverse>
dev_solr> 2024-12-20 15:15:22.887 INFO  (qtp1331270134-57) [c: s: r: x:collection1 t:null-8] o.a.s.c.S.Request webapp=/solr path=/select params={q=definitionPointDvObjectId:*&cursorMark=*&sort=id+asc&rows=1000&wt=javabin&version=2} hits=0 status=0 QTime=0
dev_dataverse> [#|2024-12-20T15:15:22.888+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722888;_LevelValue=800;|
dev_dataverse>   checking for permissions in database but stale or missing from Solr|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.890+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722890;_LevelValue=800;|
dev_dataverse>   completed checking for permissions in database but stale or missing from Solr|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.890+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722890;_LevelValue=800;|
dev_dataverse>   contentInDatabaseButStaleInOrMissingFromIndex: {"dataverses":[],"datasets":[]}|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.890+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722890;_LevelValue=800;|
dev_dataverse>   contentInIndexButNotDatabase: {"dataverses":[],"datasets":[],"files":[]}|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.891+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722891;_LevelValue=800;|
dev_dataverse>   permissionsInDatabaseButStaleInOrMissingFromIndex: {"dvobjects":[]}|#]
dev_dataverse>
dev_dataverse> [#|2024-12-20T15:15:22.891+0000|INFO|Payara 6.2024.6|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=493;_ThreadName=__ejb-thread-pool6;_TimeMillis=1734707722891;_LevelValue=800;|
dev_dataverse>   permissionsInIndexButNotDatabase: {"permissions":[]}|#]
dev_dataverse>
Thanks. It's interesting. It's claiming that all the datasets in your database are in your Solr index:
contentInDatabaseButStaleInOrMissingFromIndex: {"dataverses":[],"datasets":[]}
full log the second I need to head out sorry I will get back to you later. 
Thanks a lot!
No worries. Next I would see what you can see from the API: http://localhost:8080/api/search?q=*
Or even http://localhost:8080/api/search?q=*&per_page=1000 (1000 is the max)
That is, can you see more datasets etc from the Search API than you can see from the web interface? :thinking:
curl  http://localhost:8080/api/search?q=*
{"status":"OK","data":{"q":"*","total_count":83,"start":0,"spelling_alternatives":{},"items":[{"name":"flgomezc Sandbox Dataverse","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/flgomezcsandbox","identifier":"flgomezcsandbox","description":"This dataverse is created with testing purposes","published_at":"2024-09-10T13:59:55Z","publicationStatuses":["Published"],"affiliation":"BSC","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"Simon Carroll Dataverse","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/test1","identifier":"test1","published_at":"2024-09-10T08:04:42Z","publicationStatuses":["Published"],"parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"Dataverse Admin Dataverse","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/subverse","identifier":"subverse","published_at":"2024-09-10T08:05:38Z","publicationStatuses":["Published"],"affiliation":"Dataverse.org","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"Life Sciences","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/lifesciences","identifier":"lifesciences","description":"Computational Biology Life Sciences Group Dataverse","published_at":"2024-11-05T12:04:33Z","publicationStatuses":["Published"],"affiliation":"BSC","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"cyCLONE","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/cyclone","identifier":"cyclone","description":"cyCLONE: CLimate change impacts on the Onset of diseases and Eco-anxiety in Catalonia Climate change is long known to have an impact on different aspects of human health. CLIMATICAT addresses the impacts of climate on the health of the Catalan population, focusing on mental health. We address this topic by integrating different kinds of data: temperature, air pollution, geographic movement, and health (diagnoses, exacerbations, deaths). Our analyses will lead to the identification of which groups of individuals are most vulnerable to climate change impacts, including anxiety caused by climate change existential threats (an estimation of the prevalence of eco-anxiety will be made), what diseases are the most expected to co-occur with mental health diseases, and how likely these co-occurring diseases are to be exacerbated by climate change events. All the results and data used in the analyses will be available through an interactive dashboard.","published_at":"2024-11-05T12:05:01Z","publicationStatuses":["Published"],"affiliation":"BSC","parentDataverseName":"Life Sciences","parentDataverseIdentifier":"lifesciences"},{"name":"Computational Social Sciences & Humanities","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/CSSH","image_url":"https://dataverse-test.bsc.es/api/access/dvCardImage/344","identifier":"CSSH","description":"The BSC's newly created Computational Social Sciences programme a pioneering initiative in an European supercomputing centre, works to find new applications of data and supercomputing in social science research, including the fields of economics, political science, sociology, anthropology, geography, psychology and cognitive science, and in the digital humanities (an area of research where computing and the humanities converge, applied to fields such as history, literature, linguistics, archaeology and cultural heritage).","published_at":"2024-11-05T12:21:31Z","publicationStatuses":["Published"],"affiliation":"BSC","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"Earth Sciences","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/earth_sciences","identifier":"earth_sciences","description":"Collection for the Earth Sciences department","published_at":"2024-11-19T10:54:15Z","publicationStatuses":["Published"],"affiliation":"BSC","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"UncertAIR ","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/uncertair","identifier":"uncertair","published_at":"2024-11-21T14:59:16Z","publicationStatuses":["Published"],"affiliation":"BSC ","parentDataverseName":"Earth Sciences","parentDataverseIdentifier":"earth_sciences"},{"name":"AJC-Dataverse","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/ajc-dv","image_url":"https://dataverse-test.bsc.es/api/access/dvCardImage/396","identifier":"ajc-dv","description":"Lorem ipsum odor amet, consectetuer adipiscing elit. Porttitor quam curae mauris fames leo phasellus curae. Semper potenti ad feugiat primis montes dictumst augue magna. Pulvinar dolor eleifend ullamcorper adipiscing fringilla eu orci. Pellentesque praesent auctor netus sodales ultrices elit enim eget. Maximus nisi maecenas; duis amet ligula vel mattis. Est potenti cursus ac a sociosqu. Natoque hendrerit suspendisse ex cursus etiam dictumst. Conubia placerat ultrices fames lobortis nibh. Lorem nibh aptent tincidunt maecenas sed. Proin morbi viverra ullamcorper purus dui vitae risus. Urna consequat sagittis risus litora lacinia proin. Malesuada dui dictum vitae viverra est bibendum. Primis posuere curabitur duis auctor sociosqu platea porttitor leo suscipit. Pharetra iaculis senectus himenaeos aenean luctus ultrices. Risus malesuada non nisl praesent egestas nisl. Nibh per phasellus rutrum, suscipit duis mi? Velit congue nam montes elit cras venenatis sapien.","published_at":"2024-11-21T10:34:38Z","publicationStatuses":["Published"],"affiliation":"Barcelona Supercomputing Center","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"},{"name":"Paula Checchia Adell Dataverse TEST ","type":"dataverse","url":"https://dataverse-test.bsc.es/dataverse/test_paula","identifier":"test_paula","published_at":"2024-11-21T10:59:11Z","publicationStatuses":["Published"],"affiliation":"Barcelona Supercomputing Center ","parentDataverseName":"BSC","parentDataverseIdentifier":"BSC"}],"count_in_response":10}}
Im quite sure we are seeing less with the API if I am understanding well.
Good morning! Merry Christmas and happy new year !
I am checking in what will be our production instance, and I am seeing the same, 
image.png
0 datasets but here in the admin panel :
Could we be missing somethin fundemental :thinking:
Strange, If we restart solr everything indexes correctly
Happy New Year! Restarting Solr fixed it?!? :mind_blown:
It clears the backlog but new entries are not indexed until it is restarted.
We're seeing something similar in both our instances. I wonder if it relates the upgrade process seeing as the pod configuration is completely different
Oh, I was hoping it's fixed. Drat. Sounds like progress toward understanding the problem, at least. :thinking:
Any ideas how we can debug it a bit ? 
solr.log
Here is the solr log during and imediatly after uploading a dataset.
solr-restart.log
and here is the restart which indeed means the dataset is now visible in the service
I'm really not sure. How would I reproduce the problem on my laptop?
I guess it is difficult. I suppose the issue is coming from the upgrade as it was working well before. Could it help to provide the jvm options ? I suppose we have the same solr version
[rocky@dataverse-test ~]$ curl http://localhost:8983/solr/admin/info/system?wt=json
{
  "responseHeader":{
    "status":0,
    "QTime":47
  },
  "mode":"std",
  "solr_home":"/usr/local/solr/server/solr",
  "core_root":"/usr/local/solr/server/solr",
  "lucene":{
    "solr-spec-version":"9.3.0",
    "solr-impl-version":"9.3.0 de33f50ce79ec1d156faf204553012037e2bc1cb - houston - 2023-07-17 17:13:17",
    "lucene-spec-version":"9.7.0",
    "lucene-impl-version":"9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16"
  },
initially we deployed 6.2 via the ansible https://github.com/gdcc/dataverse-ansible and worked our way up
Ansible. So this is a "classic" installation? Not Docker?
Yes! Exactly. We have a test and prod enviroment installed with ansible.
"[rocky@dataverse-test ~]$ /usr/local/solr/bin/solr -v
9.3.0
[rocky@dataverse-test ~]$ /usr/local/payara6/bin/asadmin version
Picked up JAVA_TOOL_OPTIONS: -Djdk.util.zip.disableZip64ExtraFieldValidation=true --add-opens=java.base/java.io=ALL-UNNAMED
Version = Payara Server 6.2023.8 #badassfish (build 920)
Command version executed successfully.
[rocky@dataverse-test ~]$ psql --version
psql (PostgreSQL) 13.16
[rocky@dataverse-test ~]$ httpd -v
Server version: Apache/2.4.37 (Rocky Linux)
Server built:   Aug 12 2024 08:13:30
[rocky@dataverse-test ~]$
"
You started on Dataverse 6.2, you said. What version are you on now?
6.5
Thanks. You started this thread on Dec 17 and 6.5 was released on Dec 12. Did you start seeing this problem on 6.5? 6.4? 6.3?
Dec 17 we were running 6.4 Im quite sure the dataset indexing was working.
We thought maybe https://github.com/IQSS/dataverse/issues/11069 was stopping our permalink from working so we moved to 6.5
Gotcha. Well, there's no harm in upgrading to 6.5.
Do you want to go ahead and open an issue at https://github.com/IQSS/dataverse/issues ?
Sure tomorrow ill try to to write it up. Thanks!
Thanks. At least it will help us track it.
Did you try completely reinstalling Solr?
I just tried it
rocky@dataverse-test ~]$ curl http://localhost:8080/api/admin/index/status
{"status":"OK","data":{"message":"Index Status Batch Job initiated, check log for job status."}}[rocky@dataverse-test ~]$
[rocky@dataverse-test ~]$ curl http://localhost:8080/api/admin/index/clear-orphans
{"status":"OK","data":{"message":"Clear Orphans Batch Job initiated, check log for job status."}}[rocky@dataverse-test ~]$
[rocky@dataverse-test ~]$ curl http://localhost:8080/api/admin/index/clear
{"status":"OK","data":{"numRowsClearedByClearAllIndexTimes":238,"message":"Solr index and database index timestamps cleared."}}[rocky@dataverse-test ~]$
[rocky@dataverse-test ~]$ curl http://localhost:8080/api/admin/index
{"status":"OK","data":{"availablePartitionIds":[0],"args":{"numPartitions":1,"partitionIdToProcess":0},"message":"indexAllOrSubset has begun of 43 dataverses and 67 datasets."}}
after running that it is still required to restart solr to see the new records :mind_blown:
https://github.com/IQSS/dataverse/issues/11146 done!
Thanks for opening that issue.
And sorry, when I mean completely reinstalling Solr I mean stopping it, doing a rm -rf of the Solr directory (or move it aside), and then install fresh according to https://guides.dataverse.org/en/6.5/installation/prerequisites.html#installing-solr
Hey yes. I'll update the issue. I moved it aside and did a comprehensive reinstall.
And the problem is still there. D'oh! 
Yes. I am trying to think what the restart of solr can trigger that is not triggered in the data set creation flow.
You wrote "deploy 6.2 viable ansible and upgrade ? I think this will be hard to reproduce easily."
I imagine a number of installations deployed initially with Ansible. It's very popular.
Maybe not so many with 6.2 specifically.
Sure, do you know of many installations already using 6.4/5 ?
We're running 6.5 at https://dataverse.harvard.edu
and I guess you have upgraded in, more or less, the same way
https://dataverse.org/metrics shows three on 6.5
Yes, I imagine the upgrade was similar.
We still have "perma" in this Zulip topic. Do we suspect it might be related to the perma PID provider? We don't use the perma provider for Harvard Dataverse.
I'm just trying to think of what could be different in your setup.
yes maybe indexing problem is better
Philip Durbin ☃️ said:
We still have "perma" in this Zulip topic. Do we suspect it might be related to the perma PID provider? We don't use the perma provider for Harvard Dataverse.
although, the it indexes fine if your restart solr
Good idea. I renamed it to "indexing problem". Thanks.
After you restart Solr, do you have to manually reindex the dataset for it to be visible in the GUI?
nope not required
Hmm, well, these days there is a delay when you create a dataset before it's indexed. But it should only be a delay of maybe 5 seconds? 10? Have you tried simply waiting a bit and then refreshing the page?
This change was introduced in 6.3: IQSS/10559-2 Drop COMMIT_WITHIN which breaks autoSoftCommit by maxTime in solrconfig #10654
And this PR: Solr: Try Soft Commit on Indexing #10547
yes indeed we were testing for some days without seeing anything in the gui until we restarted solr :hurt:
Maybe we should ask on the Solr mailing list. Do you think it could be a problem with Solr?
What if we have you push some data into Solr outside of Dataverse. With curl or whatever?
To see if it shows up right away or not.
Philip Durbin ☃️ said:
What if we have you push some data into Solr outside of Dataverse. With curl or whatever?
good idea
Do you need help with this?
2025-01-10 15:09:49.733 INFO  (searcherExecutor-12-thread-1-processing-collection1) [ x:collection1] o.a.s.c.SolrCore Registered new searcher autowarm time: 0 ms
2025-01-10 15:10:15.291 INFO  (qtp240630125-27) [ x:collection1] o.a.s.c.S.Request webapp=/solr path=/select params={facet.field=dvObjectType&facet.field=metadata_type_ss&facet.field=datasetType&facet.field=dvCategory&facet.field=metadataSource&facet.field=publicationDate&facet.field=publicationStatus&facet.field=license&facet.field=authorName_ss&facet.field=subject_ss&facet.field=keywordValue_ss&facet.field=dateOfDeposit_s&facet.field=fileTypeGroupFacet&facet.field=fileTag&facet.field=fileAccess&qt=/select&hl=true&fl=*,score&start=0&sort=dateSort+desc&fq=dvObjectType:(dataverses+OR+datasets)&rows=10&version=2&hl.simple.pre=<span+class%3D"search-term-match">&facet.query=*&hl.snippets=1&q=*&hl.simple.post=</span>&hl.fl=dateOfCollectionStart&hl.fl=software&hl.fl=resolution.Temporal&hl.fl=responseRate&hl.fl=distributionDate&hl.fl=geographicBoundingBox&hl.fl=coverage.Redshift.MaximumValue&hl.fl=contributorName&hl.fl=affiliation_ss&hl.fl=datasetContactName&hl.fl=studyAssayOtherPlatform&hl.fl=dsDescriptionDate&hl.fl=coverage.Temporal&hl.fl=socialScienceNotesText&hl.fl=datasetContactAffiliation&hl.fl=journalArticleType&hl.fl=producerLogoURL&hl.fl=datasetLevelErrorNotes&hl.fl=state&hl.fl=timePeriodCoveredStart&hl.fl=otherIdValue&hl.fl=dateOfCollectionEnd&hl.fl=studyAssayOrganism&hl.fl=socialScienceNotesType&hl.fl=publicationIDType&hl.fl=seriesName&hl.fl=alternativeURL&hl.fl=coverage.Spectral.Bandpass&hl.fl=fileTypeDisplay&hl.fl=producerURL&hl.fl=authorAffiliation&hl.fl=astroType&hl.fl=redshiftType&hl.fl=astroFacility&hl.fl=producerAffiliation&hl.fl=coverage.Depth&hl.fl=authorName&hl.fl=subtitle&hl.fl=timeMethod&hl.fl=depositor&hl.fl=otherDataAppraisal&hl.fl=authorIdentifierScheme&hl.fl=coverage.Spectral.Wavelength&hl.fl=softwareVersion&hl.fl=coverage.SkyFraction&hl.fl=actionsToMinimizeLoss&hl.fl=city&hl.fl=dsDescriptionValue&hl.fl=characteristicOfSources&hl.fl=coverage.Temporal.StopTime&hl.fl=studyAssayOtherOrganism&hl.fl=alternativeTitle&hl.fl=topicClassVocabURI&hl.fl=journalVolumeIssue&hl.fl=distributorURL&hl.fl=literalQuestion&hl.fl=distributorName&hl.fl=coverage.Spatial&hl.fl=journalPubDate&hl.fl=distributorAffiliation&hl.fl=fileNameWithoutExtension&hl.fl=topicClassVocab&hl.fl=journalVolume&hl.fl=topicClassValue&hl.fl=distributorAbbreviation&hl.fl=publicationRelationType&hl.fl=originOfSources&hl.fl=resolution.Spectral&hl.fl=variableNotes&hl.fl=westLongitude&hl.fl=coverage.Polarization&hl.fl=studyFactorType&hl.fl=coverage.Spectral.CentralWavelength&hl.fl=keywordTermURI&hl.fl=dsPersistentId&hl.fl=distributor&hl.fl=dsPublicationDate&hl.fl=topicClassification&hl.fl=studyOtherDesignType&hl.fl=series&hl.fl=dsDescription&hl.fl=dateOfDeposit&hl.fl=coverage.Temporal.StartTime&hl.fl=frequencyOfDataCollection&hl.fl=socialScienceNotes&hl.fl=astroObject&hl.fl=coverage.Spectral.MinimumWavelength&hl.fl=variableLabel&hl.fl=timePeriodCovered&hl.fl=studyAssayOtherTechnologyType&hl.fl=country&hl.fl=studyAssayPlatform&hl.fl=subject&hl.fl=studyAssayOtherMeasurmentType&hl.fl=interviewInstructions&hl.fl=accessToSources&hl.fl=language&hl.fl=controlOperations&hl.fl=samplingProcedure&hl.fl=keywordVocabulary&hl.fl=coverage.ObjectCount&hl.fl=otherReferences&hl.fl=coverage.Redshift.MinimumValue&hl.fl=eastLongitude&hl.fl=producerAbbreviation&hl.fl=publication&hl.fl=unitOfAnalysis&hl.fl=postQuestion&hl.fl=keyword&hl.fl=fileTags&hl.fl=dataSources&hl.fl=distributorLogoURL&hl.fl=relatedMaterial&hl.fl=filePersistentId&hl.fl=variableUniverse&hl.fl=otherIdAgency&hl.fl=author&hl.fl=targetSampleActualSize&hl.fl=journalIssue&hl.fl=publicationIDNumber&hl.fl=coverage.ObjectDensity&hl.fl=publicationCitation&hl.fl=dataCollector&hl.fl=productionPlace&hl.fl=weighting&hl.fl=softwareName&hl.fl=grantNumberAgency&hl.fl=name&hl.fl=producer&hl.fl=contributorType&hl.fl=notesText&hl.fl=publicationURL&hl.fl=studyDesignType&hl.fl=keywordValue&hl.fl=geographicCoverage&hl.fl=description&hl.fl=resolution.Spatial&hl.fl=title&hl.fl=keywordVocabularyURI&hl.fl=studyOtherFactorType&hl.fl=productionDate&hl.fl=contributor&hl.fl=otherGeographicCoverage&hl.fl=northLatitude&hl.fl=collectorTraining&hl.fl=kindOfData&hl.fl=studyAssayCellType&hl.fl=coverage.RedshiftValue&hl.fl=timePeriodCoveredEnd&hl.fl=southLatitude&hl.fl=socialScienceNotesSubject&hl.fl=variableName&hl.fl=dataCollectionSituation&hl.fl=otherId&hl.fl=dateOfCollection&hl.fl=collectionMode&hl.fl=studyAssayTechnologyType&hl.fl=deviationsFromSampleDesign&hl.fl=producerName&hl.fl=targetSampleSizeFormula&hl.fl=coverage.Spectral.MaximumWavelength&hl.fl=authorIdentifier&hl.fl=relatedDatasets&hl.fl=datasetContact&hl.fl=researchInstrument&hl.fl=samplingErrorEstimates&hl.fl=astroInstrument&hl.fl=seriesInformation&hl.fl=datasetContactEmail&hl.fl=studyAssayMeasurementType&hl.fl=targetSampleSize&hl.fl=geographicUnit&hl.fl=universe&hl.fl=resolution.Redshift&hl.fl=grantNumberValue&hl.fl=grantNumber&hl.fl=cleaningOperations&hl.fl=fileType&facet=true&wt=javabin} hits=6 status=0 QTime=181
2025-01-10 15:10:15.344 INFO  (qtp240630125-19) [ x:collection1] o.a.s.c.S.Request webapp=/solr path=/select params={facet.query=*&q=*&facet.field=dvObjectType&qt=/select&fl=*,score&start=0&fq=dvObjectType:(files)&rows=1&facet=true&wt=javabin&version=2} hits=1 status=0 QTime=9
this appears after restart solr. I need to look a bit how solr works.
It's a lot of output but it might be normal. I'm not sure.
Here, please try this:
bin/solr post -commit techproducts example/exampledocs/* -url "http://localhost:8983/solr/collection1/update?commit=true"
It's adding Solr docs like this:
{
      "id":"viewsonic",
      "compName_s":"ViewSonic Corp",
      "address_s":"381 Brea Canyon Road Walnut, CA 91789-0708",
      "_version_":1820876012009816073
}
I'm sort of following https://solr.apache.org/guide/solr/9_6/getting-started/tutorial-techproducts.html
-c isn't working but -commit does :shrug:
Maybe try it without -c as well.
solrCreate.log
Good morning! It seems to work in both a new collection I created and in collection1 of dataverse
solrCreateBinSolr.log
(in line with your suggested way).
Great! But what is "it"? Are you talking about bin/solr post -commit techproducts example/exampledocs/* -url "http://localhost:8983/solr/collection1/update?commit=true"?
Well, just to post something and see it in a collection (without having to restart solr). I didn't use the example docks as they dont see to match the scheme for collection1. Does it make sense or have I missed something ?
I am going to add a solr debug level log just after adding a dataset. I am not able to make sense of it yet but perhaps it helps. 
solr_capture_20250114_101226.log
I have changed these settings in /var/solr/data/collection1/conf/solrconfig.xml and now it works!
<!--   <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit> -->
  <autoCommit>
    <maxTime>15000</maxTime> <!-- 15 seconds -->
    <openSearcher>true</openSearcher> <!-- Ensure the searcher is refreshed -->
  </autoCommit>
    <!-- softAutoCommit is like autoCommit except it causes a
         'soft' commit which only ensures that changes are visible
         but does not ensure that data is synced to disk.  This is
         faster and more near-realtime friendly than a hard commit.
      -->
<!--
    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
    </autoSoftCommit>
-->
  <autoSoftCommit>
    <maxTime>2000</maxTime> <!-- 2 seconds for soft commits -->
  </autoSoftCommit>
Great! So you changed it to 2 seconds?
yes and openSearcher true
Is this a bug in Dataverse? Again, we changed these values in this pull request: Solr: Try Soft Commit on Indexing #10547
well it seems it doesn't work with the old configuration of solr so maybe just advisory in the upgrades ? Then again, if nobdy else has seen the issue with 6.4 :thinking:
You've been good about updating https://github.com/IQSS/dataverse/issues/11146
Can you please leave another comment about your latest discoveries?
Sure just added it! I guess we can resolve this one ? If somenoes needed address it there. Many Thanks for all the help!
Thanks. Let me try to understand the situation. Your initial installation was with Ansible of Dataverse 6.2. Then you went through some upgrades. As part of those upgrades you should have been told to update your Solr config but this information wasn't in the release notes? Is it something like that?
Hi! Yes, unless I have missed something. I think if someone has a installation deployed via ansible from anything pre 6.4 they would have machine solr configuration to us that required changing ?
Let's see. The autocommit change was introduced in #10547 which landed in Dataverse 6.3. The release notes at https://github.com/IQSS/dataverse/releases/tag/v6.3 say to stop the Solr 9.3.0 installation and install Solr 9.4.1 fresh. And it says to use certain config files with Solr 9.4.1 that include the autocommit changes.
Also, under the first bullet of the release notes it says, "Dataverse now relies on the autoCommit and autoSoftCommit settings in the Solr configuration instead of explicitly committing documents to the Solr index. This improves indexing speed."
So I think the release notes are in decent shape. Do you happen to recall switching from Solr 9.3.0 to 9.4.1?
Ah ha! I didn't handle the upgrade myself (only 6.4 - 6.5) so completely missed that. We are running 9.3.0 still infact but it runs fine with the new conf. I also tested 9.7.0 out of frustration and it works !
Easy to miss.
so, again many thanks! Atleast it is here in case someone else makes the same mistake :).
Yes, exactly. Do you want to go ahead and close https://github.com/IQSS/dataverse/issues/11146 and point to this Zulip thread?
Simon Carroll has marked this topic as resolved.
Last updated: Oct 30 2025 at 06:21 UTC