static MCD_STR CMarkup::GetDeclaredEncoding( MCD_CSTR szDoc );
This method can be used to obtain the encoding name from the XML Declaration at the beginning of an XML string without parsing the whole string. Update December 17, 2008: with release 10.1, GetDeclaredEncoding
returns the charset name from the HTML Content-Type meta tag if the file is enclosed in an html
element. GetDeclaredEncoding
parses only the beginning of the document. If no encoding name is found, an empty string is returned.
The reason GetDeclaredEncoding
is a separate static function, and not a method that operates on the object document, is that often you need to determine the encoding before parsing so you can convert the text encoding before initializing the CMarkup object.
Here is an example of an XML declaration containing the encoding name:
<?xml version="1.0" encoding="Windows-1252"?>
The HTML document is parsed using case insensitive matching so that the case of the tag and attribute names and values do not matter. Here is an example of an HTML document with the encoding name specified in the Content-Type meta tag:
<html>
<head>
<title>Hello World</title>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
</head>
<body>
<p>Hi</p>
</body>
</html>
Update December 17, 2008: with release 10.1, GetDeclaredEncoding
is used internally by CMarkup in ReadTextFile, WriteTextFile, Load and Save to determine what encoding to use.
Encoding is a large topic that applies to all text files, and XML/HTML files are text files. See also ANSI and Unicode files and C++ strings. Remember that when an XML file is not stored in a Unicode encoding, the encoding is supposed to be specified in the XML Declaration.