Stream: troubleshooting

Topic: issue with solr indexing messages


view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:28):

Hello .. I've noticed lately that when I initiate a reindex in all three of our Dataverse installations, the following is occurring:

When I type " curl http://localhost:8080/api/admin/index/clear" .. the log shows "attempting to delete all Solr documents before a complete re-index" and the console shows "{"status":"OK","data":{"numRowsClearedByClearAllIndexTimes":3,"message":"Solr index and database index timestamps cleared."}}"

.. however, the log never comes back with an "All cleared" message ..

.. then, if I type "curl http://localhost:8080/api/admin/index" .. the console shows "{"status":"OK","data":{"availablePartitionIds":[0],"args":{"numPartitions":1,"partitionIdToProcess":0},"message":"indexAllOrSubset has begun of 1 dataverses and 1 datasets."}}" .... and the log goes through the indexing of all the dataverses and datasets (in this case there is only one) .. and then shows this message "1 dataverses and 1 datasets indexed. index all took 249 milliseconds. Solr index was not cleared before indexing."

.. the console never comes back with a "completed" message

.. the datasets all appear on the UI and seem to be indexed .. ?

I did find a few issues related to this, but they were from 7 years ago. There were some similar ones more recently, but in those cases there was an exception error and failures of some of the datasets indexing. I did reinstall solr and this particular installation is brand new. And then created one dataset. =)

We are running v6.6 on our dev site (the one above with 1 dataset), which was a complete new install. QA and prod are running v6.5 and both experience the same thing, QA has 0 datasets currently and prod has almost 100.

.. all of the "checks" messages in the logs seem normal when i check the status or timestamps ..

Appreciate any thoughts or ideas!

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:31):

Hmm, any errors in server logs?

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:32):

no nothing .. the "index was not cleared is the last message" .. until i check the status and then i get all the normal status messages

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:36):

maybe i need to set the log level higher .. i think it is just at the default

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:38):

This is what I see (I'm using Docker and only have the root collection and running the latest in the "develop" branch):

dev_dataverse>   indexing dataverse 1 of 1 (id=1, persistentId=root)|#]
dev_dataverse>
dev_dataverse> [#|2025-04-07T19:37:53.336+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=254;_ThreadName=__ejb-thread-pool2;_TimeMillis=1744054673336;_LevelValue=800;|
dev_dataverse>   done iterating through all datasets|#]
dev_dataverse>
dev_dataverse> [#|2025-04-07T19:37:53.336+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=254;_ThreadName=__ejb-thread-pool2;_TimeMillis=1744054673336;_LevelValue=800;|
dev_dataverse>   index all took 5 milliseconds|#]
dev_dataverse>
dev_dataverse> [#|2025-04-07T19:37:53.337+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=254;_ThreadName=__ejb-thread-pool2;_TimeMillis=1744054673337;_LevelValue=800;|
dev_dataverse>   1 dataverses and 0 datasets indexed. index all took 5 milliseconds. Solr index was not cleared before indexing.
dev_dataverse> |#]

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:39):

Can you find "milliseconds" in your log? Or "index all"?

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:39):

mine looks like this:

[#|2025-04-07T19:37:48.763+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=247;_ThreadName=__ejb-thread-pool4;_TimeMillis=1744054668763;_LevelValue=800;|
done iterating through all datasets|#]

[#|2025-04-07T19:37:48.764+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=247;_ThreadName=__ejb-thread-pool4;_TimeMillis=1744054668764;_LevelValue=800;|
index all took 290 milliseconds|#]

[#|2025-04-07T19:37:48.764+0000|INFO|Payara 6.2025.2|edu.harvard.iq.dataverse.search.IndexBatchServiceBean|_ThreadID=247;_ThreadName=__ejb-thread-pool4;_TimeMillis=1744054668764;_LevelValue=800;|
1 dataverses and 1 datasets indexed. index all took 290 milliseconds. Solr index was not cleared before indexing.
|#]

.. lol well increasing logging level gave a LOT of detail for each indexed dataset, but same result on the "not cleared" message .. that is the last message

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:40):

I guess we're saying the same thing, sorry. :sweat_smile:

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:40):

oh sorry .. i looked at yours and for some reason thought it said it was cleared

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:41):

so yea, maybe it's normal?

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:41):

When you say the console doesn't come back... you mean it hangs? It should spit out some JSON and give you your prompt back.

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:41):

yes, on the console it gives the "index has begun" message, but never comes back and says it was complete

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:42):

curl http://localhost:8080/api/admin/index
{"status":"OK","data":{"availablePartitionIds":[0],"args":{"numPartitions":1,"partitionIdToProcess":0},"message":"indexAllOrSubset has begun of 1 dataverses and 1 datasets."}}

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:44):

maybe it doesn't

view this post on Zulip Deirdre Kirmis (Apr 07 2025 at 19:44):

i just need constant verification :sweat_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 07 2025 at 19:45):

ha


Last updated: Oct 30 2025 at 05:14 UTC