Release 10.1 Date: December 17, 2008, download
I put a lot of heart into this release but you can ignore all this engineering because it is wrapped in the same clear and tight C++ class with simple methods -- CMarkup :). You get new source code in this release that carefully encapsulates what I've learned about text file encoding from countless projects and hundreds of hours of research, plus the 70+ CMarkup regression tests are now cross-platform and tested on Ubuntu Linux, Mac OS X, VC++ 6.0, VC++ 2008 Express, and even cygwin.
First of all, if you're using CMarkup 10.0 with MFC you must get this release for the performance fix if you haven't fixed it yourself. But this release also includes a lot of file handling improvements including ANSI encoding conversion previously only in the firstobject XML editor. Now ReadTextFile and WriteTextFile (and Load and Save too) will:
- convert non-Unicode/ANSI charset files that do not match the system locale
- handle big endian and little endian UTF-16BE and UTF-16LE files
- properly read/write files into OS X and Linux "double wide"
wchar_t
strings
- recover Microsoft ADO.NET generated files containing nulls
These text file handling features make this one of the biggest CMarkup releases ever, even if it is not obvious. More details are in the complete list of 10.1 enhancements:
- cross-platform CMarkupTest class allows you to run regression tests and walk through all the test examples on many platforms and compilers. It also has a new performance readout and new tests. Previously it was only for Visual C++ with MFC
- Handle text files in many non-Unicode and ANSI encodings. The ReadTextFile and WriteTextFile functions now let you determine and specify encodings by name, taking care of the conversion to and from your native string format. This feature uses Windows APIs or on OS X and Linux it uses iconv (see non-Unicode text handling in CMarkup)
- GetDeclaredEncoding which returns the encoding name from the XML declaration in XML documents, now also returns the encoding from the HTML Content-Type meta tag if it is an HTML document
- ReadTextFile performs UTF-8 auto-detection on text files which might be either UTF-8 or a non-Unicode ANSI or double-byte character set
- UTF-16BE files are now supported, together with big and little endian swapping based on your platform. See UTF-16 Files and the Byte Order Mark (BOM)
- For wide string builds on Linux and OS X (define
MARKUP_WCHAR
), this release properly converts text files to and from 4 byte UTF-32 wchar_t
strings (on Windows wchar_t
is generally 2 byte UTF-16)
- Read text file containing nulls. In rare cases, Microsoft components unintentionally generate XML files containing nulls (code point zero) at the end of some data values
- CMarkupMSXML now utilizes MSXML 6.0 and dynamically attempts previous versions of MSXML
- New optional argument in ReadTextFile and WriteTextFile to allow you to specify an ANSI encoding name
- UTF8To16 and UTF16To8 arguments use
unsigned short*
for UTF-16 because on Linux and OS X wchar_t*
is for UTF-32
- All file I/O, ANSI conversion, UTF-8 BOM and UTF-16 file features, including UTF-8 auto-detection, are now available in the Evaluation version of CMarkup
- fix: doc creation bug, and new document creation performance regression test
- fix: UnescapeText for converting Unicode numeric references above the basic multilingual plane (i.e. greater than
0xffff
)
- fix: VC++ 2005 safe string implementation had the wrong function names for
swprintf_s
and sprintf_s
(*thanks Petteri Salo)
- fix: some minor compiler warnings found on certain compilers on Solaris and IBM AIX (*thanks Eric)
See also:
CMarkup 10.0 Release Notes
Archived CMarkup Release Notes