static int CMarkup::UTF16To8( char *pszUTF8, const unsigned short* pwszUTF16, int nUTF8Count );
UTF16To8
converts the UTF-16 string in pwszUTF16
to UTF-8 in the pszUTF8
string buffer. It uses the same arguments as the ANSI C wcstombs
function, but instead of converting to the locale charset it converts to UTF-8.
Update December 17, 2008: With CMarkup release 10.1 the UTF-16 string type in the UTF16To8
and UTF8To16 functions changed from wchar_t*
to unsigned short*
, since wchar_t
means UTF-32 on Linux and OS X.
The pwszUTF16
source must be a null-terminated UTF-16 string. If pszUTF8
is NULL
, the number of bytes required is returned and nUTF8Count
is ignored. Otherwise pszUTF8
is filled with the result string. nUTF8Count
is the byte size of pszUTF8
and must be large enough to allow for a null-terminator in pszUTF8
if a null-terminator is desired. The number of bytes (excluding NULL
) is returned.
The following example converts the Treble Clef character from UTF-16 to UTF-8, and then back to UTF-16. This is an example of a (rare) character that requires a surrogate pair in UTF-16 (see UTF-16 Files and the Byte Order Mark (BOM)) and 4 bytes in UTF-8. Note that the 5 passed into UTF16To8
allows for the null-terminator (which is important for the strcmp
check and to generate the null-terminator in the UTF-16 result of UTF8To16
).
unsigned short szUTF16[3] = { 0xD950, 0xDF21, 0 }; char szUTF8[5]; int nUTFLen = CMarkup::UTF16To8(szUTF8,szUTF16,5); // 0x64321 Check( strcmp(szUTF8,"\xF1\xA4\x8C\xA1") == 0 ); unsigned short szUTF16Result[3]; nUTFLen = CMarkup::UTF8To16(szUTF16Result,szUTF8,nUTFLen+1); Check( szUTF16Result[0] == szUTF16[0] );
UTF16To8
and UTF8To16 have no dependencies and can be used in place of the MultiByteToWideChar
and WideCharToMultiByte
Win32 APIs which do not support UTF-8 on Windows 9X, NT3.5 and versions of CE, and are not available on other platforms.