Emacs should no longer unexpectedly alter the byte order mark
authorEli Zaretskii <eliz@gnu.org>
Fri, 15 Dec 2017 09:06:07 +0000 (11:06 +0200)
committerPeter Michael Green <plugwash@raspbian.org>
Fri, 14 Sep 2018 11:26:52 +0000 (12:26 +0100)
This upstream patch has been incorporated to fix the problem:

  Better support utf-8-with-signature and utf-8-hfs in XML/HTML

  * lisp/international/mule.el (sgml-xml-auto-coding-function):
  Support UTF-8 with BOM and utf-8-hfs as variants of UTF-8, and
  obey the buffer's encoding if it is one of these variants, instead
  of re-encoding in UTF-8 proper.  (Bug#20623)

Origin: backport, commit: 889f07c352f7e0deccf59353a60a45f2716551d8)
Bug: https://bugs.gnu.org/20623
Bug-Debian: http://bugs.debian.org/883434
Forwarded: not-needed

Gbp-Pq: Name 0014-Emacs-should-no-longer-unexpectedly-alter-the-byte-o.patch

lisp/international/mule.el

index 3da722d9f1bac18c82f2f39240c9b5b92de45621..ade76004d4469caa407670f5e3eb3099b305e2b1 100644 (file)
@@ -2493,7 +2493,17 @@ This function is intended to be added to `auto-coding-functions'."
            (let* ((match (match-string 1))
                   (sym (intern (downcase match))))
              (if (coding-system-p sym)
-                 sym
+                  ;; If the encoding tag is UTF-8 and the buffer's
+                  ;; encoding is one of the variants of UTF-8, use the
+                  ;; buffer's encoding.  This allows, e.g., saving an
+                  ;; XML file as UTF-8 with BOM when the tag says UTF-8.
+                  (let ((sym-type (coding-system-type sym))
+                        (bfcs-type
+                         (coding-system-type buffer-file-coding-system)))
+                    (if (and (coding-system-equal 'utf-8 sym-type)
+                             (coding-system-equal 'utf-8 bfcs-type))
+                        buffer-file-coding-system
+                     sym))
                (message "Warning: unknown coding system \"%s\"" match)
                nil))
           ;; Files without an encoding tag should be UTF-8. But users
@@ -2506,7 +2516,8 @@ This function is intended to be added to `auto-coding-functions'."
                    (coding-system-base
                     (detect-coding-region (point-min) size t)))))
             ;; Pure ASCII always comes back as undecided.
-            (if (memq detected '(utf-8 undecided))
+            (if (memq detected
+                      '(utf-8 'utf-8-with-signature 'utf-8-hfs undecided))
                 'utf-8
               (warn "File contents detected as %s.
   Consider adding an encoding attribute to the xml declaration,