I had to do some Windows Mobile development recently (Visual Studio 2005, C# and .NET Compact Framework) and needed to create an XML file with text that contained some Scottish Gaelic characters in it. My understanding of XML files isn’t what it should be so I soon ran into a few funnies. Whenever i tried to load the document using the XmlDocument.load() method, the application threw an exception for which no explanation was given.
Since the amount of text was quite small, I simply pasted the the paragraphs I needed to display within my application (from Word) into a VI editor and saved the file with a .XML extension from there. I figured VI would do a decent job of removing all of the formatting from Word. However, doing this doesn’t really save the file as a proper XML file with the appropriate encoding.
Fortunately, Visual Studio provided a tidy solution. All I had to do is open my XML file from Visual Studio and then open the Properties of the document (View, Properties). Studio recognised that my file was an XML document (of sorts) and took a guess at the type of encoding it was using (Western European – Windows). I then changed the encoding to UTF-8 and re-saved the file with a different name (File, Save As), after which the XmlDocument.load() method worked just fine.
It looks like the re-saved file is now using 2 bytes for each of the special characters whereas before it only used one (I presume that the “U” in UTF-8 at work, as in, Unicode) and my Windows Mobile application worked a treat after that.