Sorry but I was in a couple of workshops back-to-back today. I wanted to create the thread as @Philip Durbin suggested on #9905 @Oliver Bertuch, this is my last response on that thread:
I think that _* text=auto_ just defaults to the default behavior :rolling_on_the_floor_laughing: where people have to set the configuration on their machine with _git config --global core.autocrlf input_
Both solutions work but the idea of this change was that that every person who pulls out the configuration needs to change their configuration to input or false to turn off the conversion, this is an additional step that can be omitted, also if someone pulled out the project before reading the documentation (:rolling_eyes:), and they change the configuration after the files will be converted already.
This doesn't affect linux users since the conversion happens to Windows and docker runs a linux instance, so basically on linux is - linux (project) -> linux (dev) - > linux(docker) and for widnows is linux (project) -> windows (dev) - > linux(docker)
let me know your thoughts, as for more additional files we can add them as they find conflicting, mostly .sh files are the ones that give trouble since they are executed on the docker (same logic linux (project) -> windows (dev) - > linux(docker) ). I could think that if we have more resources executed by the server, like an example would be a JS file executed by Node.JS then it would need this.
If I or someone else find more resources that need to be added to this file, we can add them to the .gitattributes
What do you guys think about it?
Ideally the solution would "just work". :big_smile:
With no extra config.
The maintainer of libgit2 left a related comment back in 2018. I just copied it into the pull request: https://github.com/IQSS/dataverse/pull/9905#issuecomment-1718309113
text=auto actually enables autocrlf for all text based files. The setting is switched off by default, setting it this way is IMHO the way to go.
I don't see a necessity right now where we need to explicitly define a different behavior for some file/type.
Quoting from https://www.git-scm.com/docs/gitattributes on * text=auto:
If you want to ensure that text files that any contributor introduces to the repository have their line endings normalized, you can set the text attribute to "auto" for all files.
Probably I am getting a bit confused but you may have a better understanding. From what I see, the normalization is the process of transforming from LF (linux) to CRLF (windows) which would be ok if this was an application running on the host (traditional developer setup).
With *text= auto we "ensure that text files that any contributor introduces to the repository have their line endings normalized" which would cause the normalization from LF to CRLF and would cause issues when the normalized files try to execute on Docker's linux.
I will do some testing tomorrow (or maybe tonight) morning with the *text = auto feature and post the results but I think this is the opposite of the requirements for the application to work on the container.
You are right @Juan Pablo Tosca Villanueva - I did not think as far, the CRLF will be a problem when building the Linux containers on Windows! Good catch! :heart:
I digged some more into this and here's a thought: how about we go for this (stolen from here):
* text=auto eol=lf
# Windows batch scripts need CRLF
*.{cmd,[cC][mM][dD]} text eol=crlf
*.{bat,[bB][aA][tT]} text eol=crlf
This is BTW also recommended by VSCode here.
We probably should make our container build more bullet proof, too. Someone creating a new script or other file to be included in the container on Windows would create it with CRLF as it's the Windows default. (Not sure IDEs would resolve that in some magical way for you - IntelliJ has default as "system dependent".)
So maybe we need to run a dos2unix just to be sure during container builds (base, app, configbaker). Got the inspiration from here.
On a related note: @Philip Durbin we probably should cleanup CRLF files in the repo. Otherwise, they will be autoconverted with the lines above and show up as edited files in the checkout, probably causing great confusion. Yes, we could avoid that by doing the conversion for shell scripts only, but why would it be a good idea to have such an inconsistency in a codebase? (We simply screwed up by not taking care of it before)
You know, the diff is huge but it's easy enough to hide all the whitespace changes with https://github.com/IQSS/dataverse/pull/9905/files?diff=unified&w=1
If you say we need it, that's fine with me.
One quick though, should we put this best practice into place in a smaller repo first? One where we're free to merge at will?
should we put this best practice into place in a smaller repo first
Do you want to try sth first @Philip Durbin ? Something specific?
Buh, maybe https://github.com/IQSS/dataverse-installations or https://github.com/IQSS/dataverse-metrics ? I'm not picky.
The problem is that I don't have a way to test it. I don't have a Windows box.
There's a reformat-all.sh script in the installations repo, if that helps.
If you want, we can have a Zoom and do it on my windows box together. If that helps
This is what Zelig uses: https://github.com/IQSS/Zelig/blob/master/.gitattributes
I gotta walk the dog and get to work. Container meeting in 90 minutes. Maybe we can touch on this then. https://ct.gdcc.io
@Juan Pablo Tosca Villanueva you're very welcome to join us and hang on Zoom! 1 precious hour of container nerds talking shop! :party_ball:
I am very happy to help @Oliver Bertuch ! That second script starts to make sense, DOS files (cmd, bat) should still be normalized, even if the metaverse probably doesn't have any windows native files (as far as I know) I don't think it would hurt. But it seems we are going in a good direction.
I am really looking forward to the Zoom call tonight! :smile:
Oh no you just missed us! It's in UTC, so in the morning for ET/EST
:upside_down:
AM, PM, LF, CRLF, too many acronyms man... :rolling_on_the_floor_laughing: I apologize :face_exhaling: , I was really looking forward to it but will be on next week then. Also, if there are any tests on windows, I can help just let me know, as I commented to Phillip I am on the 73 line or I can connect over zoom.
Hi all, I tried running the mvn -Pct package docker:run command on my windows machine but in the container I'm getting the following error message:
[Entrypoint] running /opt/payara/scripts/init_2_configure.sh
2023-10-17T14:35:35.974842400Z /opt/payara/scripts/init_2_configure.sh: line 10: $'\r': command not found
can anyone help me out?
2 messages were moved here from #dev > Error when running "mvn -Pct package docker:run" by Philip Durbin.
@Sakshi Jain hi! I hope you don't mind, but because you're on Windows, I moved your question to this topic where we are discussing pull request #9905 by @Juan Pablo Tosca Villanueva (also on Windows) that I'm hoping will help you!
Hmm, something happened to that PR so I'd actually recommend not looking at the files right now. :sweat_smile:
At a high level, as a workaround, I believe you can either set up a global .gitconfig file or pass an argument when you clone the repo. I'm not sure about the details though.
@Sakshi Jain do you use the git command line? If so, what do you get when you run this?
git config --global --get core.autocrlf
Not intentionally
I don't have my windows host ATM, had to use it for a temporary other purpose
Philip Durbin said:
Sakshi Jain do you use the git command line? If so, what do you get when you run this?
git config --global --get core.autocrlf
Hi @Philip Durbin I tried running this command but didn't get anything.
Ok, that's normal, I think.
I'm looking at https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings (the Windows tab)
@Sakshi Jain can you please try:
git config --global core.autocrlf truegit config --global --get core.autocrlf again to see if it changeddataverse directorydataverse git repo again (fresh)Philip Durbin said:
Sakshi Jain can you please try:
- running
git config --global core.autocrlf true- running
git config --global --get core.autocrlfagain to see if it changed- delete your previous
dataversedirectory- clone the
dataversegit repo again (fresh)- try spinning up Dataverse with Docker again
@Philip Durbin Still getting the same error
bah!
Should we create a branch with @Juan Pablo Tosca Villanueva or @Oliver Bertuch 's fix for you to checkout? Maybe two branches? (The fixes are a little different.)
@Sakshi Jain ok I just pushed a new branch where I cherry-picked @Juan Pablo Tosca Villanueva 's fix. Can you please try:
dataverse directorygit clone -b 9894-gitattributes-eol-lf git@github.com:IQSS/dataverse.git (or the https version if you prefer)I think the issue is somewhere else because the changes to the local git config should have the same effect as the proposed change (adding the .gitattributes file). @Philip Durbin
@Sakshi Jain be sure to delete all the previous content before checking out again or you could also check out in a different directory. I will try to set this up, but I am not home until next week but if somehow, I figure it out, I will let you know.
@Juan Pablo Tosca Villanueva sure no worries
@Philip Durbin I'll try with this branch once
@Sakshi Jain awesome. Thanks.
@Oliver Bertuch as I mentioned above, PR #9905 is... not in a good state. If you'd like @Sakshi Jain to try your solution, please push a new branch.
@Juan Pablo Tosca Villanueva yeah, it's weird. I was hoping the global config would be a good workaround. Oh well.
Hi @Philip Durbin the cloning thing didn't work.
But I fixed the issue. It's a hacky solution but it worked :D
I opened the /init_2_configure.sh file in notepad++ and from there I was able to remove all the \r in the file
After that when I ran the docker the error didn't come again
Hmm, that's frustrating. Want me to push a new branch with the other fix? @Oliver Bertuch 's suggestion, I mean.
It would be great to figure this out for Windows developers once and for all.
true
@Sakshi Jain ok, I just pushed @Oliver Bertuch 's fix to a new branch. If you'd like to try it:
git clone -b 9894-gitattributes-text-auto git@github.com:IQSS/dataverse.git
I really appreciate you helping out with this. I want to better support developers on Windows.
Philip Durbin said:
Sakshi Jain ok, I just pushed Oliver Bertuch 's fix to a new branch. If you'd like to try it:
git clone -b 9894-gitattributes-text-auto git@github.com:IQSS/dataverse.gitI really appreciate you helping out with this. I want to better support developers on Windows.
Sure I'll check it out :+1:
Thanks!
Are you using git from the command line?
well normally I clone repository from within intellij or visual studio itself. But for this issue I tried cloning through command line and even through github desktop but the same issue came everytime.
and I can see that the autocrlf value is set as true
Do you see a .gitattributes file in the root of the repo? It should have different content based on the two branches.
trying to clone the particular branches you mentioned is giving me the following error:
` git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists. `
Ah, can you please try the https version instead of the git version? Do you know what I mean?
I was thinking about adding the file manually in my git branch and cloning that
that way I'll be easily able to continue working in it as well if it works out
Yes! Please try that!
cloned the repo with .gitattributes file having the text auto but still getting the same error
I found a stackoverflow for this same issue : https://stackoverflow.com/questions/2517190/how-do-i-force-git-to-use-lf-instead-of-crlf-under-windows
trying out a solution mentioned on this
this resolved the issue!
orly?
I did the following:
after this I cloned the branch on my system and ran the maven package docker command and the containers started without any errors :smiley:
Wow. Good job. So do we need something else in .gitattributes, I wonder. @Juan Pablo Tosca Villanueva and @Oliver Bertuch please take note!
@Sakshi Jain for now, do you want to document what you did? Make a PR for this, I mean.
well I can add the gitattributes file with the text auto change
can I add the steps I followed in the readme file so that if anyone works on windows can follow them?
I see though that https://github.com/IQSS/dataverse/blob/9894-gitattributes-eol-lf/.gitattributes is having the eol lf command in it. I didn't try adding that in the gitattributes file in my branch so not sure if it works on my system or not.
Maybe I can try with that file once tomorrow on another windows system to see if that works. If it does then we could directly merge the 9894-gitattributes-eol-lf branch into develop
Yes, it would be good to confirm the best approach. I think we're close! Thanks again!
Philip Durbin said:
Sakshi Jain can you please try:
- running
git config --global core.autocrlf true- running
git config --global --get core.autocrlfagain to see if it changed- delete your previous
dataversedirectory- clone the
dataversegit repo again (fresh)- try spinning up Dataverse with Docker again
I was reading this again after looking at the stackoverflow that @Sakshi Jain posted and I just realized that you mentioned to set autocrlf to true, so this is exactly the behavior that we want to avoid when using windows. these two solutions should work:
git config --global core.autocrlf false
git config --global core.autocrlf input (the one mentioned in the official dataverse documentation)
Here is the difference between false and input on the git documentation, but basically both avoid auto -conversion at checkout but auto will still auto-convert on commit. Probably @Sakshi Jain can help us with a test by changing it from false to input.
What is weird to me is that this behavior should be overwritten by setting up the .gitattributes, that is why on the first place why I introduced this request.
Philip Durbin said:
Do you see a .gitattributes file in the root of the repo? It should have different content based on the two branches.
I think the .gitattributes file is not checked out, but I have to test this again.
Sorry, I wanted to get back to both of you earlier, but we are in a hotel right now and the project took forever to download :rolling_on_the_floor_laughing: then we had to leave and we just got back to the room.
Hi all, I just tested with changes from https://github.com/IQSS/dataverse/blob/9894-gitattributes-eol-lf/.gitattributes on another windows system and that solution seems to be working fine. I didn't need to make any other changes that just add this same .gitattributes file.
9894-gitattributes-eol-lf branch can be merged into develop now as it resolves the windows crlf issue :)
@Sakshi Jain great news. Would you also be willing to test with https://github.com/IQSS/dataverse/blob/9894-gitattributes-text-auto/.gitattributes ?
I tested with text auto attribute earlier but it didn't work
Ok. Thanks. @Oliver Bertuch are you following this?
@Sakshi Jain if you run git config --global --get core.autocrlf what do you get?
as of now I've set it to false so that's what I get
Ok. It's late for Oliver but maybe we can check in with him tomorrow on what he thinks. I appreciate all the testing!
I'm sorry folks but its crazy over here today all day. I might have a chance looking into this next week on my windows host, but no promises. Open Science Week keeps me pretty busy.
No worries, Oliver! Have a good weekend!
As a Windows user I have battled with Windows over end of line characters. Windows is especially problematic with regards to VSCode editor. I have been trying for ages to configure VSCode to STOP USING CRLF, but no matter what Git settings I set or VSCode setting are set, if I use the Clone Git Repository... option that VSCode prominently displays on the start up screen, IT ALWAYS clones with CRLF. DO NOT USE THIS OPTION... Instead clone a Git repo using the Git command git clone --config core.autocrlf=false [repo clone URL] (for example git clone --config core.autocrlf=false https://github.com/IQSS/dataverse.git). This ensures that Git will clone the repository with LF end of line characters.
Hi @kuhlaid do you still have these issues? We recently added a .gitattributes file that should skip the conversion on *.sh files by default for everyone clonning the repo, all the other files should skip as your configuration is setup. Do you have any idea what other files should be converted? Right now the project works for me on Windows 10-11 with some exceptions using file storage but I haven't had a chance to investigate these.
Hi @Juan Pablo Tosca Villanueva , this is just a problem with VSCode and not the Dataverse itself.
Oh I see! Yeah you can suggest that the projects that add this file so you or everyone else doesn't need to tho additional configuration. I sugested it to another repos. Alternatively you could also setup your Git installation with git config --global core.autocrlf input, this way Git
convert CRLF to LF on commit but not the other way around.
Thanks for sharing this option!
For Windows users who want a custom VsCode task to clone Git repositories with LF end of line characters (which is what you want if you are working with Dataverse code), I created some instructions that can be found at https://stackoverflow.com/a/77586955/10027828.
Last updated: Oct 30 2025 at 05:14 UTC