export LANG=C
git-annex adjust --unlock
-What seems to be happening is that catCommit gets:
-
- commitName = Just "F\56515\56489lix"
-
-Which is I think ok, that's a utf-8 surrogate in the filesystem encoding.
-Then that's passed into commitWithMetaData, which sets the environment
-variable to its content. And apparently it fails to be converted back to
-the right bytes.
-
-One fix would be to keep it a ByteString all the way though, using
-`System.Posix.Env.ByteString`. I tried converting all environment in
-git-annex to use that, but CreateProcess uses String for env, so that is
-not really possible. Also it's pretty intrusive, and is problimatic for
-Windows since it would have to decode the ByteString back to String.
-So while this would be best -- it would ensure that any environment
-variable that for some reason needs to get set by git-annex would
-not incur mojibake -- it doesn't seem possible with the current library
-ecosystem.
-
-I tried making commitWithMetaData set the env var to a String that
-had the filesystem encoding applied. Eg `w82s (S.unpack (encodeBS v))`.
-Interestingly, that failed:
-
-git-annex: git: recoverEncode: invalid argument (cannot encode character '\195')
-
-Which looks like the filesystem encoding is being applied after all?
-And in System.Process.Posix, it does look like it does,
-withCEnvironment uses withFilePath on the contents of env.
-
-So huh, why then does the value not roundtrip?
+Err... I thought I had reproduced this with something like the above,
+but now that is not working for me. I get:
+
+ commit 50fedeefa3ece65ed4866fe7a1e0c1fe9cc90d78 (HEAD -> adjusted/master(unlocked))
+ Author: Félix <joeyh@joeyh.name>
+ Date: Fri Sep 22 15:23:18 2023 -0400
+
+ git-annex adjusted branch
+
+I've tried several other combinations of locale settings, LANG=C from the
+beginning, etc, and all seem to work ok. I also looked at the values coming
+into git-annex with LANG=C and going out, and it roundtrips unicode fine
+even in non-unicode locales.
"""]]
+++ /dev/null
-[[!comment format=mdwn
- username="joey"
- subject="""comment 3"""
- date="2023-09-22T19:13:32Z"
- content="""
- joey@darkstar:~>cat f
- Félix
- joey@darkstar:~>cat foo.hs
- import System.Process
- import qualified GHC.IO.Encoding as Encoding
-
- main = do
- e <- Encoding.getFileSystemEncoding
- Encoding.setLocaleEncoding e
- v <- readFile "f"
- print v
- (_, _, _, p) <- createProcess (proc "sh" ["-c", "echo test $V"])
- { env = Just [("V", v)] }
- waitForProcess p
- return ()
- joey@darkstar:~>LANG=C runghc foo.hs
- "F\56515\56489lix\n"
- test Félix
-
-Interesting! This confirms that "F\56515\56489lix" is the correctly
-encoded value. And yet here, the environment variable gets set correctly
-as well, and it round-trips.
-"""]]