To convert text that is not Unicode and not ASCII, CMarkup can use Windows APIs, or iconv on Linux and OS X. If you don't need this, just define MARKUP_STDCONV
.
It has taken a long time developing the simplest design for this cross-platform issue with CMarkup, but I think I've gotten closer.
Update December 17, 2008: With CMarkup release 10.1, CMarkup has 3 compile-time modes for character set conversions and multibyte functions:
MARKUP_WINCONV
MARKUP_ICONV
MARKUP_STDCONV
to turn other modes offIn VC++ or g++ you can add MARKUP_STDCONV
to your precompiler definitions to force it to standard C mode.
Windows API based mode MARKUP_WINCONV
is implemented using WideCharToMultiByte
, MultiByteToWideChar
where CP_ACP represents the Windows system locale code page (the same as GetACP
). In MBCS
builds MARKUP_WINCONV
mode uses _mbclen
to determine character length. _mbclen
is a Visual C++ function that uses the same Windows system locale code page as CP_ACP
.
The MARKUP_WINCONV
mode is automatically selected in VC++, otherwise if you are on Windows you should add MARKUP_WINCONV
to your project preprocessor definitions. For example, compile the CMarkup test program with g++ in cygwin as follows:
g++ main.cpp Markup.cpp MarkupTest.cpp -DMARKUP_WINCONV
The iconv API mode MARKUP_ICONV
on Linux and OS X sometimes requires an extra step to link into your program. On OS X with g++ I had to specify -liconv
on the command line. I compiled the test program as follows:
g++ main.cpp Markup.cpp MarkupTest.cpp -liconv
On some systems you may need to choose between libiconv or iconv. CMarkup only uses the most basic functionalily of the iconv API to try to avoid inconsistencies in implementations.
The MARKUP_ICONV
mode is currently automatically selected in g++ based on the __GNUC__
predefined macro. Again, you can turn off iconv usage by adding MARKUP_STDCONV
to your project preprocessor definitions, or on the command line with -DMARKUP_STDCONV
.
g++ main.cpp Markup.cpp MarkupTest.cpp -DMARKUP_STDCONV
With MARKUP_STDCONV
you are excluding iconv. It is usually fine to do without a full conversion API (see below how in standard C mode you can use setlocale
even when you have Far Eastern or ANSI files not in the system locale charset). But if you need iconv MARKUP_ICONV
mode in your program, you might have to download and install libiconv before compiling.
Standard C mode MARKUP_STDCONV
supports ANSI conversion to and from Unicode if you call setlocale
to initialize your character set. On Windows, CMarkup will be sensitive to the setlocale
charset if and only if it is in standard C MARKUP_STDCONV
mode.
You can use setlocale
to select a charset other than the system locale code page and the UTF8ToA and AToUTF8 functions will work for that charset.
Note that as a process-wide setting, setlocale
has potential disadvantages for your program as a whole. If another part of your program depends upon or uses setlocale
there could be unintended conflicts. And in a multi-threaded program there are additional implications. You have to decide if setlocale
is appropriate in your case. The firstobject XML editor used setlocale
with success until release 2.3.1. At the time of writing (12/2008) the editor is being switched over to CMarkup 10.1 and the Windows API conversion mode.
So if you are using standard C mode and converting a file or string to/from a single-byte or double-byte encoding, or using an MBCS
build, you must call setlocale
. If you are using just the system locale charset, call setlocale
in your program initialization to prime the C multibyte functions for the system/user locale charset.
#include <locale.h>
setlocale(LC_ALL, "");
Note that on Windows, setlocale
will default to the *user* locale charset which is usually but not always the same as the *system* locale charset (Regional Settings for non-Unicode programs). To make sure you are setting it to the system locale code page on Windows you can do something like this:
char szACP[10]; sprintf( szACP, ".%d", GetACP() ); setlocale(LC_ALL, szACP);
Standard C conversions use mbtowc
, wctomb
, and in MBCS
builds it uses mblen
to determine character length. All of these standard C functions use the code page specified in setlocale
, not necessarily the same as CP_ACP
.
See Also:
CMarkup 10.1 compiler problem
Eric 23-Dec-2008
My OS is HP-UX 11iv3; my C++ compiler is gcc 4.2.3. I could compile CMarkup 10.0 with no error. When I compiled my program with CMarkup 10.1, it ran into these error messages:
When I use link iconv like this:
it shows:
This compiles without error: