static int CMarkup::UTF8To16( unsigned short* pwszUTF16, const char* pszUTF8, int nUTF8Count );
UTF8To16
converts the UTF-8 string in pszUTF8
to UTF-16 in the pwszUTF16
string buffer. It uses the same arguments as the ANSI C mbstowcs
function, but instead of converting from the locale charset it converts from UTF-8.
Update December 17, 2008: With CMarkup release 10.1 the UTF-16 string type in the UTF8To16
and UTF16To8 functions changed from wchar_t*
to unsigned short*
, since wchar_t
means UTF-32 on Linux and OS X.
The pszUTF8
source must be a UTF-8 string which will be processed up to null-terminator or nUTF8Count
. If pwszUTF16
is NULL
, the number of UTF-16 units required (i.e. UTF-16 length) is returned. nUTF8Count
is the maximum UTF-8 bytes to convert and should include NULL
if null-terminator is desired in result. If pwszUTF16
is not NULL
it is filled with the result string and it must be large enough! The result will be null-terminated if NULL
encountered in pszUTF8
before nUTF8Count
. When pwszUTF16
is not NULL
, the number of UTF-8 bytes converted is returned rather than the UTF-16 size.
The following example illustrates converting the letter z from UTF-16 to UTF-8, and then back to UTF-16. In the UTF16To8
call, we pass L"\x007A"
which is a way of expressing UTF-16 char z. In the UTF8To16
call, we pass the wszUTF16
buffer and receive the result, "z", specifying the length of the UTF-8 source + 1
to include the null-terminator.
char szUTF8[5]; unsigned short wszUTF16[3]; int nUTFLen; nUTFLen = CMarkup::UTF16To8(szUTF8,L"\x007A",5); // z Check( strcmp(szUTF8,"z") == 0 ); nUTFLen = CMarkup::UTF8To16(wszUTF16,szUTF8,nUTFLen+1); Check( wcscmp(wszUTF16,L"z") == 0 );
Here is an example to demonstrate the common technique of passing a NULL
result buffer so that the function returns the necessary result length, before allocating the result buffer and calling the function again.
const char* pszTest = "hello"; unsigned short* pwszBuffer; int nLen = strlen( pszTest ); int nUTF16Len = CMarkup::UTF8To16(NULL,pszTest,nLen); pwszBuffer = new unsigned short[nUTF16Len+1]; CMarkup::UTF8To16(pwszBuffer,pszTest,nLen+1); nLen = CMarkup::UTF16To8(NULL,pwszBuffer,0); CString csTest; CMarkup::UTF16To8(csTest.GetBuffer(nLen),pwszBuffer,nLen); csTest.ReleaseBuffer(nLen); delete [] pwszBuffer; Check( strcmp(csTest,pszTest) == 0 );
UTF8To16
and UTF16To8 have no dependencies and can be used in place of the MultiByteToWideChar
and WideCharToMultiByte
Win32 APIs which do not support UTF-8 on Windows 9X, NT3.5 and versions of CE, and are not available on other platforms.