Stream: troubleshooting

Topic: Incomplete HTML tags prevent loading of dataset view


view this post on Zulip Alexander Minges (Jun 26 2025 at 07:56):

I recently stumbled upon a dataset on our instance that failed to load, throwing a server-side 500 error:

HTTP Status 500 - Internal Server Error
type Exception report

messageInternal Server Error

descriptionThe server encountered an internal error that prevented it from fulfilling this request.

exception

jakarta.servlet.ServletException: Unexpected char -1 at (line no=1, column no=33321, offset=33320)
root cause

jakarta.json.stream.JsonParsingException: Unexpected char -1 at (line no=1, column no=33321, offset=33320)
note The full stack traces of the exception and its root causes are available in the Payara Server 6.2024.6 #badassfish logs.

Payara Server 6.2024.6 #badassfish

Payara log

[#|2025-06-25T13:52:41.637+0200|SEVERE|Payara 6.2024.6|jakarta.enterprise.resource.webcontainer.faces.application|_ThreadID=92;_ThreadName=http-thread-pool::jk-connector(3);_TimeMillis=1750852361637;_LevelValue=1000;|
  Error Rendering View[/dataset.xhtml]
jakarta.el.ELException: /dataset.xhtml @76,78 value="#{DatasetPage.jsonLd}": jakarta.json.stream.JsonParsingException: Unexpected char -1 at (line no=1, column no=33321, offset=33320)
    at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:77)
    at jakarta.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:207)
    at jakarta.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:176)
    at jakarta.faces.component.UIOutput.getValue(UIOutput.java:134)
    at com.sun.faces.renderkit.html_basic.HtmlBasicInputRenderer.getValue(HtmlBasicInputRenderer.java:163)
    at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.getCurrentValue(HtmlBasicRenderer.java:303)
    at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeEnd(HtmlBasicRenderer.java:135)
    at jakarta.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:586)
    at jakarta.faces.component.UIComponent.encodeAll(UIComponent.java:1442)
    at jakarta.faces.component.UIComponent.encodeAll(UIComponent.java:1438)
    at jakarta.faces.component.UIComponent.encodeAll(UIComponent.java:1438)
    at org.primefaces.renderkit.HeadRenderer.encodeBegin(HeadRenderer.java:83)
    at jakarta.faces.component.UIComponentBase.encodeBegin(UIComponentBase.java:531)
    at jakarta.faces.component.UIComponent.encodeAll(UIComponent.java:1432)
    at jakarta.faces.component.UIComponent.encodeAll(UIComponent.java:1438)
    at com.sun.faces.application.view.FaceletViewHandlingStrategy.renderView(FaceletViewHandlingStrategy.java:442)
    at com.sun.faces.application.view.MultiViewHandler.renderView(MultiViewHandler.java:162)
    at jakarta.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:125)
    at jakarta.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:125)
    at jakarta.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:125)
    at org.omnifaces.viewhandler.OmniViewHandler.renderView(OmniViewHandler.java:151)
    at com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:93)
    at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:72)
    at com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:178)
    at jakarta.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:692)
    at jakarta.faces.webapp.FacesServlet.service(FacesServlet.java:449)
    at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1554)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
    at org.glassfish.tyrus.servlet.TyrusServletFilter.doFilter(TyrusServletFilter.java:83)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
    at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:166)
    at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
    at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
    at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
    at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:372)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:239)
    at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
    at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
    at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:174)
    at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:153)
    at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:196)
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:88)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:246)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:178)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:118)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:96)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:51)
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:510)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:82)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:83)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:101)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: jakarta.el.ELException: jakarta.json.stream.JsonParsingException: Unexpected char -1 at (line no=1, column no=33321, offset=33320)
    at jakarta.el.BeanELResolver.getValue(BeanELResolver.java:351)
    at com.sun.faces.el.DemuxCompositeELResolver._getValue(DemuxCompositeELResolver.java:139)
    at com.sun.faces.el.DemuxCompositeELResolver.getValue(DemuxCompositeELResolver.java:164)
    at org.glassfish.expressly.parser.AstValue.getValue(AstValue.java:302)
    at org.glassfish.expressly.parser.AstValue.getValue(AstValue.java:144)
    at org.glassfish.expressly.ValueExpressionImpl.getValue(ValueExpressionImpl.java:138)
    at org.jboss.weld.module.web.el.WeldValueExpression.getValue(WeldValueExpression.java:50)
    at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:73)
    ... 60 more
Caused by: jakarta.json.stream.JsonParsingException: Unexpected char -1 at (line no=1, column no=33321, offset=33320)
    at org.eclipse.parsson.JsonTokenizer.unexpectedChar(JsonTokenizer.java:593)
    at org.eclipse.parsson.JsonTokenizer.readString(JsonTokenizer.java:166)
    at org.eclipse.parsson.JsonTokenizer.nextToken(JsonTokenizer.java:356)
    at org.eclipse.parsson.JsonParserImpl$ObjectContext.getNextEvent(JsonParserImpl.java:486)
    at org.eclipse.parsson.JsonParserImpl.next(JsonParserImpl.java:363)
    at org.eclipse.parsson.JsonParserImpl.getObject(JsonParserImpl.java:327)
    at org.eclipse.parsson.JsonParserImpl.getValue(JsonParserImpl.java:161)
    at org.eclipse.parsson.JsonParserImpl.getArray(JsonParserImpl.java:307)
    at org.eclipse.parsson.JsonParserImpl.getValue(JsonParserImpl.java:159)
    at org.eclipse.parsson.JsonParserImpl.getObject(JsonParserImpl.java:328)
    at org.eclipse.parsson.JsonParserImpl.getObject(JsonParserImpl.java:152)
    at org.eclipse.parsson.JsonReaderImpl.readObject(JsonReaderImpl.java:85)
    at edu.harvard.iq.dataverse.util.json.JsonUtil.getJsonObject(JsonUtil.java:79)
    at edu.harvard.iq.dataverse.export.InternalExportDataProvider.getDatasetSchemaDotOrg(InternalExportDataProvider.java:55)
    at edu.harvard.iq.dataverse.export.SchemaDotOrgExporter.exportDataset(SchemaDotOrgExporter.java:79)
    at edu.harvard.iq.dataverse.export.ExportService.cacheExport(ExportService.java:387)
    at edu.harvard.iq.dataverse.export.ExportService.exportFormat(ExportService.java:324)
    at edu.harvard.iq.dataverse.export.ExportService.getExport(ExportService.java:195)
    at edu.harvard.iq.dataverse.export.ExportService.getExportAsString(ExportService.java:214)
    at edu.harvard.iq.dataverse.DatasetPage.getJsonLd(DatasetPage.java:5974)
    at jdk.internal.reflect.GeneratedMethodAccessor1509.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at jakarta.el.BeanELResolver.getValue(BeanELResolver.java:346)
    ... 67 more
|#]

I could however access the complete dataset using API calls and verify it's integrity - just the web-based view was affected. A dump of the metadata JSON revealed that it contained over 60000 characters, while the error message reported an unexpected EOF already after position 33320.

The metadata contained some very elaborate file descriptions which I modified to be more concise. After that, the dataset view could be successfully loaded again.

My question now is: Is there a size limit that payara is able to ingest successfully as json stream? Is it even configurable?

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:11):

Hmm, great question. I'm not aware of a size limit.

I'm not seeing an EOF. Rather, I see JsonParsingException: Unexpected char

And it's being thrown from InternalExportDataProvider.getDatasetSchemaDotOrg(InternalExportDataProvider.java:55)

I wonder if we can reproduce this somehow? :thinking: What version of Dataverse are you on, please?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:15):

I digged into the Eclipse Parsson code already to see if there is a buffer size limit we might be hitting here.

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:15):

I'm on v.6.5. Yeah it's not strictly an EOF, but the Unexpected char -1 imho hints to a stream truncation. What I forgot to mention is that the metadata when dumped via API passes linting as valid JSON. I also do not see any suspicious character at the given position.

I can reliably reproduce the issue once I create a dataset with metadata exceeding this character count.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:16):

@Alexander Minges do you have the old descriptions around? It might be very interesting to try to manually find what the chars at position 33270 are. This might be a bad UTF-8 char thing.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:16):

Oh so this is not related to one specific string, but with any string that size?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:16):

So we should be able to create a unit test that blows up, right?

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:17):

Oliver Bertuch schrieb:

Alexander Minges do you have the old descriptions around? It might be very interesting to try to manually find what the chars at position 33270 are. This might be a bad UTF-8 char thing.

Oh so this is not related to one specific string, but with any string that size?

Seems so. However I do have the original JSON if you are interested to have a look.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:18):

I also want to highlight that you are talking about two different things here. Retrieving the JSON via the API is different to using the API to retrieve the Schema.org JSON-LD, which does JSON-P parsing first!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:19):

Does it blow up if you're using the Export API as well? https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats

So you'd use sth like https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.70122/FK2/WRYRT4

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:21):

I'll check that.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:26):

@Alexander Minges did you try to blow things up with some other random, very long thing to see if the culprit is indeed the specific dataset or if we are in bigger trouble?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:28):

BTW @Philip Durbin @Leo Andreev I did not expect that the schema.org exporter is called every time a dataset page is loaded - shouldn't it use the cached export?

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:28):

Not yet, but I more or less randomly removed metadata to shrink the size. But I can check that as well with some very long random stuff in a fresh dataset.

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:30):

@Oliver Bertuch was the schema.org export ever successfully cached?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:30):

I wasn't aware it would not be cached!

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:31):

If no exception is thrown, it should be cached and re-used.

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:32):

That's how the rest of the exports work, at least.

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:32):

Philip Durbin schrieb:

If no exception is thrown, it should be cached and re-used.

But I would only see that in the logs, once a dataset is published, right?

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:33):

Maybe? I'm not sure if you'd see "export successful" or whatever.

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:34):

I agree that having this case in a unit test would be nice. For now, maybe we could throw it it https://github.com/IQSS/dataverse-sample-data

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:34):

But at least the initial caching would happen only if you hit publish. (Except from re-exports triggered from new versions, API, etc)

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:34):

There is a unit test for a very very long description, Jim did that.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:35):

Digging through the code when some field may be truncated.

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 11:35):

@Alexander Minges I'd be curious to know if switching to Croissant helps. We put it in the <head> if that exporter is enabled: https://guides.dataverse.org/en/6.6/admin/discoverability.html#schema-org-json-ld-croissant-metadata

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:38):

@Alexander Minges could you upload the original JSON here as a file? Only "description" of the dataset get's truncated, so maybe we need to extend that.

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:38):

@Philip Durbin I'll try that as well, but first I'll check whether something in this specific dataset blows up the schema.org export and I just deleted that party by chance when trying to lower the size.

view this post on Zulip Alexander Minges (Jun 26 2025 at 11:39):

Oliver Bertuch schrieb:

Alexander Minges could you upload the original JSON here as a file? Only "description" of the dataset get's truncated, so maybe we need to extend that.

You mean the schema.org JSON-LD or the Dataverse JSON?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:42):

Alexander Minges said:

You mean the schema.org JSON-LD or the Dataverse JSON?

The Dataverse JSON

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 11:59):

So I extend the unit test for SchemaDotOrgExporter to test with a file description using 2^18 chars (testExportVeryLongFileDescription, see attached). Didn't blow up. Would be good to see your metadata :slight_smile:
SchemaDotOrgExporterTest.java

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 12:11):

Wow even if I smuggle in a "-1" character, I can't break it! Something very very very weird must be going on here!

view this post on Zulip Alexander Minges (Jun 26 2025 at 12:15):

@Oliver Bertuch I've sent you the JSON as direct message. Btw the schema.org export blows up as well using the API. When I remove the descriptions of the attached files the export is generated without issues.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 12:17):

I will try to rope the JSON into the unit test and see if I can reproduce it locally.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:01):

Man have I ever mentioned what a bloody mess it is when you try to parse a dataset JSON into a testcase? Our testing infrastructure is way to complicated. Too many implicits, etc...

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 13:07):

@Oliver Bertuch maybe you should warm up this thread with @Gustavo Durand about automated testing: https://groups.google.com/g/dataverse-dev/c/k6nZ_icK9fw/m/Qf6f2-NJEgAJ (or start a new topic in #dev, or both :smile: )

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:15):

Wow. Our JSON Parser does not parse the releasedDate if presented. Man not using JSON-B for all of this sux

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:19):

I just discovered a long lasting bug in Dataverse. And another occasion of "why use JSON-B if doing it all manually is so much more fun!!!!!!!!!!11111111"

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:20):

In JsonPrinter, the release time of a dataset version is set as "releaseTime".

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:21):

Guess what it is in JsonParser...?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:21):

YES YOU'RE RIGHT IT'S FRICKIN' releaseDate!!!11!11122345

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:25):

So, I finally :trumpet: are able to verify the issue locally!!!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:26):

I see jakarta.json.stream.JsonParsingException: Unexpected char -1 at (line no=1, column no=33239, offset=33238) in the test output!

view this post on Zulip Alexander Minges (Jun 26 2025 at 13:29):

So the file description length is really the culprit?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:30):

Not sure yet - I tested the length with a 2^18 char string and it was fine. I still suspect a bad char and now I can finally check out where exactly the problem is!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:43):

When I extract the text surrounding the failing location, I see nothing special...

$ TEXT=$(cat src/test/resources/json/dataset-breaking-desc-pretty.json)
$
$ printf '%q\n' "${TEXT[33000,34000]}"
\ their\ designated\ target\ LC3\ using\ a\ diversity\ of\ biophysical\ methods.\ Intrigued\ by\ the\ idea\ of\ developing\ ATTECs,\ we\ evaluated\ the\ ligandability\ of\ LC3/GABARAP\ by\ in\ silico\ docking\ and\ large-scale\ crystallographic\ fragment\ screening.\ Data\ based\ on\ approximately\ 1000\ crystal\ structures\ revealed\ that\ most\ fragments\ bound\ to\ the\ HP2\ but\ not\ to\ the\ HP1\ pocket\ within\ the\ LIR\ docking\ site,\ suggesting\ a\ favorable\ ligandability\ of\ HP2.\ Through\ this\ study,\ we\ identified\ diverse\ validated\ LC3/GABARAP\ ligands\ and\ fragments\ as\ starting\ points\ for\ chemical\ probe\ and\ ATTEC\ development.\</p\>\"$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \}$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \ \ \}$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \]$'\n'\ \ \ \ \ \ \ \ \ \ \},$'\n'\ \ \ \ \ \ \ \ \ \ \{$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"typeName\":\ \"subject\",$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"multiple\":\ true,$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"typeClass\":\ \"controlledVocabulary\",$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"value\":\ \[$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \ \ \"Medicine,\ Health\ and\ Life\ Sciences\"$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \]$'\n'\ \ \ \ \ \ \ \ \ \ \},$'\n'\ \ \ \ \ \ \ \ \ \ \{$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"typeName\":\ \"keyword\",$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"multiple\":\ true,$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"typeClass\":\ \"compound\",$'\n'\ \ \ \ \ \ \ \ \ \ \ \ \"v
$
$ printf '%q\n' "${TEXT[33230,33250]}"
ing.\ Data\ based\ on\ ap

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 13:52):

So the problem must be within the transformed JSON the exporter uses. Let me extract that...

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 13:52):

go go go!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:02):

So. Figure 4 is the culprit... Let's see if I can find any bad chars in the source file...

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:05):

I also dug a bit deeper and found some Unicode chars depicting alternate dashes (such as U+2013). But replacing all of them doesn't fix the issue. I also tried adding the file descriptions one by one and for me it blows up at Fig 2 already...

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:10):

I don't see any particularly problematic things in the source file at the position where things blow up...

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 14:11):

Can you reproduce the original bug if you create a dataset via API based on the JSON?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:12):

Wait a minute!

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:12):

But it's definitely not directly related to the length of the json. Adding description for Fig. 1 is fine... Fig1 + Fig 2 blows up, but descriptions for Fig 1 + Fig 3 are fine, however the resulting json is much larger than the the 33200 chars we hit in the errors

Forget it, I opened the old version by accident. 1 + 3 also does not work.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:12):

I think I just made progress. The JSON-LD string simply ENDS at this place!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:12):

So the -1 is indeed the EOF marker!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:13):

So there must be a problem during the generation of the JSON-LD!

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:14):

(line no=1, column no=33239, offset=33238)
(length=33239)
ing of the CSP values. Residues with small (CSP < SD), intermediate (SD < CSP <2xSD) or strong (2xS

jakarta.json.stream.JsonParsingException: Invalid token=EOF at (line no=1, column no=0, offset=-1). Expected tokens are: [CURLYOPEN, SQUAREOPEN, STRING, NUMBER, TRUE, FALSE, NULL]

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:14):

I'm not sure what upsets it, as I tested it with way longer strings and all was fine.

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:15):

And the next char there would be a D again, so nothing spectacular: or strong (2xSD <CSP) CSP

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:18):

Oliver Bertuch schrieb:

So there must be a problem during the generation of the JSON-LD!

And because this fails during initial publication, it is tried again and again when trying to view the dataset because no cached version exists. Once I get it into a non-broken state (e.g. by just deleting the files, thus creating a new DRAFT version), I can access all the older (published) and presumably broken versions from the version history just fine. What still fails for them is exporting to JSON-LD from the frontend:

{"status":"ERROR","message":"Export Failed"}

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:20):

So, it fails to add all the bits for the files. We know where that is in the code.

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 14:20):

So are we leaning toward certain characters being a problem? Not the length?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:20):

I'm not sure yet

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:21):

Smells like we might be opening a can of worms here.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:30):

Yep. The MarkupChecker.stripAllTags(jsonLd) strips too much of the JSON-LD.
Before:
image.png
After:
image.png

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:31):

The test executed successfully if I remove the markupchecker from the equation

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:31):

@Alexander Minges would you mind creating an issue?

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:32):

This is where things go wrong: https://github.com/poikilotherm/dataverse/blob/cce40c43cb082b98f9ee62d74f22b01e23d27a19/src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java#L2150-L2150

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:32):

I need to attend to kids logistics now.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 14:32):

It's not you, it's us :smile:

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:37):

Oliver Bertuch schrieb:

Alexander Minges would you mind creating an issue?

Sure, I'll create one. Not sure if I'll get to do it today (leaving the office right now), but latest tomorrow morning. @Oliver Bertuch Thanks for digging through this mess of mine :innocent:

view this post on Zulip Alexander Minges (Jun 26 2025 at 14:40):

So I guess something like <CSP in (2xSD <CSP) might be mistaken for an open HTML tag and due to a "missing" closing bracket, everything afterwards gets cut off?

view this post on Zulip Philip Durbin πŸš€ (Jun 26 2025 at 14:55):

Gotcha. I hope we can find a small case that exercises the bug.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 15:03):

I will look into creating a minimal test case as a reproducer to also verify the root cause is mitigated with the fix.

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 15:03):

But not today :smiley:

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 15:04):

Can't even create a branch yet - no issue number :see_no_evil:

view this post on Zulip Oliver Bertuch (Jun 26 2025 at 15:41):

I did create an issue just now for the releaseTime vs releaseDate snag. https://github.com/IQSS/dataverse/issues/11594

view this post on Zulip Alexander Minges (Jun 27 2025 at 07:15):

@Oliver Bertuch I'm currently creating the issue. To trigger the bug, it's sufficient to add <CSP to the file description. Something arbitrary like <test did not blow up though.

view this post on Zulip Alexander Minges (Jun 27 2025 at 07:27):

Issue is created: https://github.com/IQSS/dataverse/issues/11597

view this post on Zulip Philip Durbin πŸš€ (Jun 27 2025 at 11:42):

@Alexander Minges thanks! :heart:

view this post on Zulip Philip Durbin πŸš€ (Jul 02 2025 at 16:27):

@Alexander Minges thanks for PR #11600! I just left a comment there.

view this post on Zulip Alexander Minges (Jul 04 2025 at 07:08):

@Philip Durbin πŸš€ You're welcome! If I can help in any way to add a test for this, please let me know.

view this post on Zulip Philip Durbin πŸš€ (Jul 07 2025 at 11:22):

@Alexander Minges sure! Have you run any API tests yet? Maybe you could try running mvn test -Dtest=UsersIT from https://guides.dataverse.org/en/6.6/developers/testing.html#writing-api-tests-with-rest-assured

view this post on Zulip Alexander Minges (Jul 14 2025 at 08:37):

@Philip Durbin πŸš€ Not yet. I'll probably have time to do this end of this week.

view this post on Zulip Philip Durbin πŸš€ (Jul 14 2025 at 13:47):

No rush, thanks.

view this post on Zulip Alexander Minges (Jul 24 2025 at 07:48):

@Philip Durbin πŸš€ Test is available, see my comment in the PR: https://github.com/IQSS/dataverse/pull/11600#issuecomment-3112181778

view this post on Zulip Philip Durbin πŸš€ (Jul 24 2025 at 12:06):

Fantastic! Thanks! I'll take a look.

view this post on Zulip Philip Durbin πŸš€ (Jul 24 2025 at 12:06):

@Alexander Minges oh, it's in a different branch. Please go ahead and put it in the branch for your PR.

view this post on Zulip Alexander Minges (Jul 29 2025 at 10:58):

Philip Durbin πŸš€ schrieb:

Alexander Minges oh, it's in a different branch. Please go ahead and put it in the branch for your PR.

Done. No idea what jenkins is complaining about though as I cannot access the respective CI details.

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 11:50):

Yeah, that's what this is about, sadly:

Let contributors know why tests are failingΒ #9916

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 11:51):

I've been a bit heads down on a new notifications API (#dev > Notifications API usage and future) but I'll try to look at your PR soon. Thanks for adding tests!

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 12:55):

@Alexander Minges here's the complaint:

Error
1 expectation failed.
Expected status code <200> but was <403>.
Stacktrace
java.lang.AssertionError:
1 expectation failed.
Expected status code <200> but was <403>.
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
    at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:73)
    at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:108)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:57)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277)
    at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure.validate(ResponseSpecificationImpl.groovy:512)
    at io.restassured.internal.ResponseSpecificationImpl$HamcrestAssertionClosure$validate$1.call(Unknown Source)
    at io.restassured.internal.ResponseSpecificationImpl.validateResponseIfRequired(ResponseSpecificationImpl.groovy:696)
    at io.restassured.internal.ResponseSpecificationImpl.this$2$validateResponseIfRequired(ResponseSpecificationImpl.groovy)
    at jdk.internal.reflect.GeneratedMethodAccessor144.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:198)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:62)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
    at io.restassured.internal.ResponseSpecificationImpl.statusCode(ResponseSpecificationImpl.groovy:135)
    at io.restassured.specification.ResponseSpecification$statusCode$0.callCurrent(Unknown Source)
    at io.restassured.internal.ResponseSpecificationImpl.statusCode(ResponseSpecificationImpl.groovy:143)
    at io.restassured.internal.ValidatableResponseOptionsImpl.statusCode(ValidatableResponseOptionsImpl.java:89)
    at edu.harvard.iq.dataverse.api.JsonLDExportIT.testJsonLDExportWithIncompleteHtmlTagsInFileDescription(JsonLDExportIT.java:88)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Standard Error
Jul 28, 2025 1:02:23 PM edu.harvard.iq.dataverse.api.UtilIT getRestAssuredBaseUri
INFO: Base URL for tests: http://localhost:8080
Jul 28, 2025 1:02:23 PM edu.harvard.iq.dataverse.api.UtilIT createRandomUser
INFO: Creating random test user user3614cf35
Jul 28, 2025 1:02:24 PM edu.harvard.iq.dataverse.api.UtilIT getApiTokenFromResponse
INFO: API token found in create user response: 1dc8f8a1-73c8-4dda-9e86-25e9dd6d01e0
Jul 28, 2025 1:02:24 PM edu.harvard.iq.dataverse.api.UtilIT getUsernameFromResponse
INFO: Username found in create user response: user3614cf35
Jul 28, 2025 1:02:24 PM edu.harvard.iq.dataverse.api.UtilIT getAliasFromResponse
INFO: Alias found in create dataverse response: dv1edbd9e9
Jul 28, 2025 1:02:24 PM edu.harvard.iq.dataverse.api.UtilIT getDatasetIdFromResponse
INFO: Id found in create dataset response: 317
Jul 28, 2025 1:02:24 PM edu.harvard.iq.dataverse.api.UtilIT getDatasetPersistentIdFromResponse
INFO: Persistent id found in create dataset response: doi:10.5072/FK2/YFGBI7

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:15):

Ah, this is why:

"This dataset is locked. Reason: Ingest. Please try publishing later."

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:28):

@Alexander Minges ok, I pushed a fix: https://github.com/IQSS/dataverse/pull/11600/commits/767a191455e22550944c23f616eb6302da0a5fa5

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:29):

I'll wait for the API tests to pass before moving it to QA.

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:49):

Crap, I was trying to get fancy by following https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally instead of adding a remote with git remote add Athemis git@github.com:Athemis/dataverse.git like I usually do. But now I'm worried I screwed up your branch, @Alexander Minges

https://github.com/IQSS/dataverse/actions/runs/16604285701/job/46972337687?pr=11600 says "Error: fatal: couldn't find remote ref refs/heads/fix-json-ld" :scream:

Sorry! Do you think we can fix it? Or should you create a fresh PR?

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:54):

Ok, I force pushed to your branch. Fingers crossed! :grimacing:

view this post on Zulip Philip Durbin πŸš€ (Jul 29 2025 at 18:58):

Ok, good, it's getting farther.

view this post on Zulip Alexander Minges (Aug 18 2025 at 11:52):

Anything else I can do to help with this PR? I've seen tha it's been iterating over several Sprints now.

view this post on Zulip Philip Durbin πŸš€ (Aug 18 2025 at 13:02):

Let's see, #11600 is in QA. @Omer M Fahim do you have any questions about it?

view this post on Zulip Omer M Fahim (Aug 18 2025 at 13:27):

This PR's good to go, going to merge shortly.


Last updated: Oct 30 2025 at 05:14 UTC