Here are all the best (official and most commonly accepted) character set names (labels/identifiers) to use in your XML declaration encoding or HTML content-type charset, plus the aliases, Windows code pages and descriptive titles.
Looking around the Internet I could not find a comprehensive character set name reference so I combined several online resources into this one. The primary reference for web technologies like XML is the IANA list of character sets but it:
It might be reasonable to list duplicate and ambiguous aliases if someone is trying to interpret the intention of an obscure alias in their data, but here I chose to remove all duplicates. I also made some choices (which will remain controversial) as follows:
Also, where the Windows code page was available I have used it as a basis for associating aliases that identify the same charset. Hopefully this has not led to grouping any unequal charsets together.
| Descriptive Title | Windows Code Page |
|---|---|
| Charset names (preferred name in bold) | |
| Adobe-Standard-Encoding | |
| Adobe-Standard-Encoding, csAdobeStandardEncoding | |
| Adobe-Symbol-Encoding | |
| Adobe-Symbol-Encoding, csHPPSMath | |
| Amiga-1251 | |
| Amiga-1251, Ami1251, Amiga1251, Ami-1251 | |
| ANSI_X3.110-1983 | |
| ANSI_X3.110-1983, iso-ir-99, CSA_T500-1983, NAPLPS, csISO99NAPLPS | |
| Arabic (864) | 864 |
| IBM864, cp864, csIBM864 | |
| Arabic (ASMO 708) | 708 |
| ASMO-708 | |
| Arabic (DOS) | 720 |
| DOS-720 | |
| Arabic (ISO) | 28596 |
| iso-8859-6, arabic, csISOLatinArabic, ECMA-114, ISO_8859-6, ISO_8859-6:1987, iso-ir-127, iso8859-6 | |
| Arabic (Mac) | 10004 |
| x-mac-arabic | |
| Arabic (Windows) | 1256 |
| windows-1256, cp1256 , cp1256, MS-ARAB | |
| ASMO_449 | |
| ASMO_449, ISO_9036, arabic7, iso-ir-89, csISO89ASMO449 | |
| Baltic (DOS) | 775 |
| ibm775, CP500, ebcdic-cp-be, ebcdic-cp-ch, csIBM500, cp775, csPC775Baltic | |
| Baltic (ISO) | 28594 |
| iso-8859-4, csISOLatin4, ISO_8859-4, ISO_8859-4:1988, iso-ir-110, l4, latin4, iso8859-4 | |
| Baltic (Windows) | 1257 |
| windows-1257, CP1257, WINBALTRIM | |
| BOCU-1 | |
| BOCU-1, csBOCU-1 | |
| BRF | |
| BRF, csBRF | |
| BS_4730 | |
| BS_4730, iso-ir-4, ISO646-GB, gb, uk, csISO4UnitedKingdom | |
| BS_viewdata | |
| BS_viewdata, iso-ir-47, csISO47BSViewdata | |
| Central European (DOS) | 852 |
| ibm852, cp852, 852, csPCp852 | |
| Central European (ISO) | 28592 |
| iso-8859-2, csISOLatin2, iso_8859-2, iso_8859-2:1987, iso8859-2, iso-ir-101, l2, latin2 | |
| Central European (Mac) | 10029 |
| x-mac-ce | |
| Central European (Windows) | 1250 |
| windows-1250, x-cp1250, CP1250, MS-EE | |
| CESU-8 | |
| CESU-8, csCESU-8 | |
| Chinese National Standard (GB18030) | 54936 |
| GB18030, ISO-4873:1986 | |
| Chinese Simplified (EUC) | 51936 |
| EUC-CN, x-euc-cn | |
| Chinese Simplified (GB2312) | 936 |
| gb2312, chinese, CN-GB, csGB2312, csGB231280, csISO58GB231280, GB_2312-80, GB231280, GB2312-80, GBK, iso-ir-58, CP936, MS936, windows-936 | |
| Chinese Simplified (GB2312-80) | 20936 |
| x-cp20936 | |
| Chinese Simplified (HZ) | 52936 |
| hz-gb-2312 | |
| Chinese Simplified (ISO 2022) | 50227 |
| x-cp50227 | |
| Chinese Simplified (Mac) | 10008 |
| x-mac-chinesesimp | |
| Chinese Traditional (Big5) | 950 |
| big5, cn-big5, csbig5, x-x-big5, CP950, Big5-HKSCS | |
| Chinese Traditional (CNS) | 20000 |
| x-Chinese-CNS, x-Chinese_CNS | |
| Chinese Traditional (Eten) | 20002 |
| x-Chinese-Eten, x_Chinese-Eten | |
| Chinese Traditional (Mac) | 10002 |
| x-mac-chinesetrad | |
| CP1125 | 1125 |
| CP1125 | |
| CP1133 | 1133 |
| CP1133, IBM-CP1133 | |
| CP853 | 853 |
| CP853 | |
| Croatian (Mac) | 10082 |
| x-mac-croatian | |
| CSA_Z243.4-1985-1 | |
| CSA_Z243.4-1985-1, iso-ir-121, ISO646-CA, csa7-1, ca, csISO121Canadian1 | |
| CSA_Z243.4-1985-2 | |
| CSA_Z243.4-1985-2, iso-ir-122, ISO646-CA2, csa7-2, csISO122Canadian2 | |
| CSA_Z243.4-1985-gr | |
| CSA_Z243.4-1985-gr, iso-ir-123, csISO123CSAZ24341985gr | |
| CSN_369103 | |
| CSN_369103, iso-ir-139, csISO139CSN369103 | |
| Cyrillic (DOS) | 866 |
| cp866, ibm866, 866, csIBM866 | |
| Cyrillic (ISO) | 28595 |
| iso-8859-5, csISOLatin5, csISOLatinCyrillic, cyrillic, ISO_8859-5, ISO_8859-5:1988, iso-ir-144, iso8859-5 | |
| Cyrillic (KOI8-R) | 20866 |
| koi8-r, csKOI8R, koi, koi8, koi8r | |
| Cyrillic (KOI8-U) | 21866 |
| koi8-u, koi8-ru | |
| Cyrillic (Mac) | 10007 |
| x-mac-cyrillic | |
| Cyrillic (Windows) | 1251 |
| windows-1251, x-cp1251, CP1251, MS-CYRL | |
| DEC-MCS | |
| DEC-MCS, dec, csDECMCS | |
| DIN_66003 | |
| DIN_66003, iso-ir-21, de, ISO646-DE, csISO21German | |
| dk-us | |
| dk-us, csDKUS | |
| DS_2089 | |
| DS_2089, DS2089, ISO646-DK, dk, csISO646Danish | |
| EBCDIC-AT-DE | |
| EBCDIC-AT-DE, csIBMEBCDICATDE | |
| EBCDIC-AT-DE-A | |
| EBCDIC-AT-DE-A, csEBCDICATDEA | |
| EBCDIC-CA-FR | |
| EBCDIC-CA-FR, csEBCDICCAFR | |
| EBCDIC-DK-NO | |
| EBCDIC-DK-NO, csEBCDICDKNO | |
| EBCDIC-DK-NO-A | |
| EBCDIC-DK-NO-A, csEBCDICDKNOA | |
| EBCDIC-ES | |
| EBCDIC-ES, csEBCDICES | |
| EBCDIC-ES-A | |
| EBCDIC-ES-A, csEBCDICESA | |
| EBCDIC-ES-S | |
| EBCDIC-ES-S, csEBCDICESS | |
| EBCDIC-FI-SE | |
| EBCDIC-FI-SE, csEBCDICFISE | |
| EBCDIC-FI-SE-A | |
| EBCDIC-FI-SE-A, csEBCDICFISEA | |
| EBCDIC-FR | |
| EBCDIC-FR, csEBCDICFR | |
| EBCDIC-IT | |
| EBCDIC-IT, csEBCDICIT | |
| EBCDIC-PT | |
| EBCDIC-PT, csEBCDICPT | |
| EBCDIC-UK | |
| EBCDIC-UK, csEBCDICUK | |
| EBCDIC-US | |
| EBCDIC-US, csEBCDICUS | |
| ECMA-cyrillic | |
| ECMA-cyrillic, iso-ir-111, KOI8-E, csISO111ECMACyrillic | |
| ES | |
| ES, iso-ir-17, ISO646-ES, csISO17Spanish | |
| ES2 | |
| ES2, iso-ir-85, ISO646-ES2, csISO85Spanish2 | |
| Europa | 29001 |
| x-Europa | |
| Extended_UNIX_Code_Fixed_Width_for_Japanese | |
| Extended_UNIX_Code_Fixed_Width_for_Japanese, csEUCFixWidJapanese | |
| French Canadian (DOS) | 863 |
| IBM863, cp863, 863, csIBM863 | |
| GB_1988-80 | |
| GB_1988-80, iso-ir-57, cn, ISO646-CN, csISO57GB1988 | |
| German (IA5) | 20106 |
| x-IA5-German | |
| GOST_19768-74 | |
| GOST_19768-74, ST_SEV_358-88, iso-ir-153, csISO153GOST1976874 | |
| Greek (DOS) | 737 |
| ibm737, CP737 | |
| Greek (ISO) | 28597 |
| iso-8859-7, csISOLatinGreek, ECMA-118, ELOT_928, greek, greek8, ISO_8859-7, ISO_8859-7:1987, iso-ir-126, iso8859-7 | |
| Greek (Mac) | 10006 |
| x-mac-greek | |
| Greek (Windows) | 1253 |
| windows-1253, CP1253, MS-GREEK | |
| Greek, Modern (DOS) | 869 |
| ibm869, cp869, 869, cp-gr, csIBM869 | |
| greek-ccitt | |
| greek-ccitt, iso-ir-150, csISO150, csISO150GreekCCITT | |
| greek7 | |
| greek7, iso-ir-88, csISO88Greek7 | |
| greek7-old | |
| greek7-old, iso-ir-18, csISO18Greek7Old | |
| Hebrew (DOS) | 862 |
| DOS-862, IBM862, cp862, 862, csPC862LatinHebrew | |
| Hebrew (ISO-Logical) | 38598 |
| iso-8859-8-i, logical, iso8859-8-i | |
| Hebrew (ISO-Visual) | 28598 |
| iso-8859-8, csISOLatinHebrew, hebrew, ISO_8859-8, ISO_8859-8:1988, iso-ir-138, visual, iso8859-8 | |
| Hebrew (Mac) | 10005 |
| x-mac-hebrew | |
| Hebrew (Windows) | 1255 |
| windows-1255, CP1255, MS-HEBR | |
| HP-DeskTop | |
| HP-DeskTop, csHPDesktop | |
| HP-Legal | |
| HP-Legal, csHPLegal | |
| HP-Math8 | |
| HP-Math8, csHPMath8 | |
| HP-Pi-font | |
| HP-Pi-font, csHPPiFont | |
| hp-roman8 | |
| hp-roman8, roman8, r8, csHPRoman8 | |
| IBM EBCDIC (Arabic) | 420 |
| x-EBCDIC-Arabic | |
| IBM EBCDIC (Cyrillic Serbian-Bulgarian) | 21025 |
| x-EBCDIC-CyrillicSerbianBulgarian, cp1025 | |
| IBM EBCDIC (Denmark-Norway-Euro) | 1142 |
| x-ebcdic-denmarknorway-euro, IBM01142, CCSID01142, CP01142, ebcdic-dk-277+euro, ebcdic-no-277+euro | |
| IBM EBCDIC (Finland-Sweden-Euro) | 1143 |
| x-ebcdic-finlandsweden-euro, X-EBCDIC-France, IBM01143, CCSID01143, CP01143, ebcdic-fi-278+euro, ebcdic-se-278+euro | |
| IBM EBCDIC (France-Euro) | 1147 |
| x-ebcdic-france-euro, IBM01147, CCSID01147, CP01147, ebcdic-fr-297+euro | |
| IBM EBCDIC (Germany-Euro) | 1141 |
| x-ebcdic-germany-euro, IBM01141, CCSID01141, CP01141, ebcdic-de-273+euro | |
| IBM EBCDIC (Greek Modern) | 875 |
| x-EBCDIC-GreekModern, cp875 | |
| IBM EBCDIC (Icelandic-Euro) | 1149 |
| x-ebcdic-icelandic-euro, IBM01149, CCSID01149, CP01149, ebcdic-is-871+euro | |
| IBM EBCDIC (International-Euro) | 1148 |
| x-ebcdic-international-euro, IBM01148, CCSID01148, CP01148, ebcdic-international-500+euro | |
| IBM EBCDIC (Italy-Euro) | 1144 |
| x-ebcdic-italy-euro, IBM01144, CCSID01144, CP01144, ebcdic-it-280+euro | |
| IBM EBCDIC (Japanese and Japanese Katakana) | 50930 |
| x-EBCDIC-JapaneseAndKana | |
| IBM EBCDIC (Japanese and Japanese-Latin) | 50939 |
| x-EBCDIC-JapaneseAndJapaneseLatin | |
| IBM EBCDIC (Japanese and US-Canada) | 50931 |
| x-EBCDIC-JapaneseAndUSCanada | |
| IBM EBCDIC (Korean and Korean Extended) | 50933 |
| x-EBCDIC-KoreanAndKoreanExtended | |
| IBM EBCDIC (Korean Extended) | 20833 |
| x-EBCDIC-KoreanExtended | |
| IBM EBCDIC (Multilingual Latin-2) | 870 |
| CP870, ebcdic-cp-roece, ebcdic-cp-yu, csIBM870, IBM870 | |
| IBM EBCDIC (Simplified Chinese) | 50935 |
| x-EBCDIC-SimplifiedChinese | |
| IBM EBCDIC (Spain-Euro) | 1145 |
| x-ebcdic-spain-euro, IBM01145, CCSID01145, CP01145, ebcdic-es-284+euro | |
| IBM EBCDIC (Traditional Chinese) | 50937 |
| x-EBCDIC-TraditionalChinese | |
| IBM EBCDIC (Turkish Latin-5) | 1026 |
| CP1026, csIBM1026, IBM1026 | |
| IBM EBCDIC (UK-Euro) | 1146 |
| x-ebcdic-uk-euro, IBM01146, CCSID01146, CP01146, ebcdic-gb-285+euro | |
| IBM EBCDIC (US-Canada) | 37 |
| ebcdic-cp-us, ebcdic-cp-ca, ebcdic-cp-wt, ebcdic-cp-nl, csIBM037, IBM037, cp037 | |
| IBM EBCDIC (US-Canada-Euro) | 1140 |
| x-ebcdic-cp-us-euro, IBM01140, CCSID01140, CP01140, ebcdic-us-37+euro | |
| IBM EBCDIC Arabic | 20420 |
| IBM420, cp420, ebcdic-cp-ar1, csIBM420 | |
| IBM EBCDIC Cyrillic Russian | 20880 |
| x-EBCDIC-CyrillicRussian, IBM880, cp880, EBCDIC-Cyrillic, csIBM880 | |
| IBM EBCDIC Denmark-Norway | 20277 |
| x-EBCDIC-DenmarkNorway, IBM277, EBCDIC-CP-DK, EBCDIC-CP-NO, csIBM277 | |
| IBM EBCDIC Finland-Sweden | 20278 |
| x-EBCDIC-FinlandSweden, IBM278, CP278, ebcdic-cp-fi, ebcdic-cp-se, csIBM278 | |
| IBM EBCDIC France | 20297 |
| IBM297, cp297, ebcdic-cp-fr, csIBM297 | |
| IBM EBCDIC Germany | 20273 |
| x-EBCDIC-Germany, IBM273, CP273, csIBM273 | |
| IBM EBCDIC Greek | 20423 |
| x-EBCDIC-Greek, IBM423, cp423, ebcdic-cp-gr, csIBM423 | |
| IBM EBCDIC Hebrew | 20424 |
| x-EBCDIC-Hebrew, IBM424, cp424, ebcdic-cp-he, csIBM424 | |
| IBM EBCDIC Icelandic | 20871 |
| x-EBCDIC-Icelandic, IBM871, CP871, ebcdic-cp-is, csIBM871 | |
| IBM EBCDIC International | 500 |
| IBM500 | |
| IBM EBCDIC Italy | 20280 |
| x-EBCDIC-Italy, IBM280, CP280, ebcdic-cp-it, csIBM280 | |
| IBM EBCDIC Japanese Katakana Extended | 20290 |
| x-EBCDIC-JapaneseKatakana, IBM290, cp290, EBCDIC-JP-kana, csIBM290 | |
| IBM EBCDIC Latin 1/Open System | 1047 |
| IBM01047 | |
| IBM EBCDIC Latin 1/Open System (1047 + Euro symbol) | 20924 |
| IBM00924, CCSID00924, CP00924, ebcdic-Latin9--euro | |
| IBM EBCDIC Latin America-Spain | 20284 |
| X-EBCDIC-Spain, IBM284, CP284, ebcdic-cp-es, csIBM284 | |
| IBM EBCDIC Thai | 20838 |
| x-EBCDIC-Thai, IBM-Thai, csIBMThai | |
| IBM EBCDIC Turkish | 20905 |
| x-EBCDIC-Turkish, IBM905, CP905, ebcdic-cp-tr, csIBM905 | |
| IBM EBCDIC United Kingdom | 20285 |
| x-EBCDIC-UK, IBM285, CP285, ebcdic-cp-gb, csIBM285 | |
| IBM-Symbols | |
| IBM-Symbols, csIBMSymbols | |
| IBM038 | |
| IBM038, EBCDIC-INT, cp038, csIBM038 | |
| IBM1047 | |
| IBM1047, IBM-1047 | |
| IBM274 | |
| IBM274, EBCDIC-BE, CP274, csIBM274 | |
| IBM275 | |
| IBM275, EBCDIC-BR, cp275, csIBM275 | |
| IBM281 | |
| IBM281, EBCDIC-JP-E, cp281, csIBM281 | |
| IBM5550 Taiwan | 20003 |
| x-cp20003 | |
| IBM851 | |
| IBM851, cp851, 851, csIBM851 | |
| IBM868 | |
| IBM868, CP868, cp-ar, csIBM868 | |
| IBM891 | |
| IBM891, cp891, csIBM891 | |
| IBM903 | |
| IBM903, cp903, csIBM903 | |
| IBM904 | |
| IBM904, cp904, 904, csIBBM904 | |
| IBM918 | |
| IBM918, CP918, ebcdic-cp-ar2, csIBM918 | |
| Icelandic (DOS) | 861 |
| ibm861, cp861, 861, cp-is, csIBM861 | |
| Icelandic (Mac) | 10079 |
| x-mac-icelandic | |
| IEC_P27-1 | |
| IEC_P27-1, iso-ir-143, csISO143IECP271 | |
| INIS | |
| INIS, iso-ir-49, csISO49INIS | |
| INIS-8 | |
| INIS-8, iso-ir-50, csISO50INIS8 | |
| INIS-cyrillic | |
| INIS-cyrillic, iso-ir-51, csISO51INISCyrillic | |
| INVARIANT | |
| INVARIANT, csINVARIANT | |
| ISCII Assamese | 57006 |
| x-iscii-as | |
| ISCII Bengali | 57003 |
| x-iscii-be | |
| ISCII Devanagari | 57002 |
| x-iscii-de | |
| ISCII Gujarathi | 57010 |
| x-iscii-gu | |
| ISCII Kannada | 57008 |
| x-iscii-ka | |
| ISCII Malayalam | 57009 |
| x-iscii-ma | |
| ISCII Oriya | 57007 |
| x-iscii-or | |
| ISCII Punjabi | 57011 |
| x-iscii-pa | |
| ISCII Tamil | 57004 |
| x-iscii-ta | |
| ISCII Telugu | 57005 |
| x-iscii-te | |
| ISO 6937 Non-Spacing Accent | 20269 |
| x-cp20269 | |
| ISO 8859-13 Estonian | 28603 |
| ISO-8859-13, iso8859-13 | |
| ISO-10646-J-1 | |
| ISO-10646-J-1 | |
| ISO-10646-UCS-2 | |
| ISO-10646-UCS-2, csUnicode | |
| ISO-10646-UCS-4 | |
| ISO-10646-UCS-4, csUCS4 | |
| ISO-10646-UCS-Basic | |
| ISO-10646-UCS-Basic, csUnicodeASCII | |
| ISO-10646-Unicode-Latin1 | |
| ISO-10646-Unicode-Latin1, csUnicodeLatin1, ISO-10646 | |
| ISO-10646-UTF-1 | |
| ISO-10646-UTF-1, csISO10646UTF1 | |
| ISO-11548-1 | |
| ISO-11548-1, ISO_11548-1, ISO_TR_11548-1, csISO115481 | |
| ISO-2022-CN | |
| ISO-2022-CN | |
| ISO-2022-CN-EXT | |
| ISO-2022-CN-EXT | |
| ISO-2022-JP-2 | |
| ISO-2022-JP-2, csISO2022JP2 | |
| ISO-8859-1-Windows-3.0-Latin-1 | |
| ISO-8859-1-Windows-3.0-Latin-1, csWindows30Latin1 | |
| ISO-8859-1-Windows-3.1-Latin-1 | |
| ISO-8859-1-Windows-3.1-Latin-1, csWindows31Latin1 | |
| ISO-8859-10 | |
| ISO-8859-10, iso-ir-157, l6, ISO_8859-10:1992, csISOLatin6, latin6 | |
| ISO-8859-14 | |
| ISO-8859-14, iso-ir-199, ISO_8859-14:1998, ISO_8859-14, latin8, iso-celtic, l8 | |
| ISO-8859-16 | |
| ISO-8859-16, iso-ir-226, ISO_8859-16:2001, ISO_8859-16, latin10, l10 | |
| ISO-8859-2-Windows-Latin-2 | |
| ISO-8859-2-Windows-Latin-2, csWindows31Latin2 | |
| ISO-8859-6-E | |
| ISO-8859-6-E, ISO_8859-6-E, csISO88596E | |
| ISO-8859-6-I | |
| ISO-8859-6-I, ISO_8859-6-I, csISO88596I | |
| ISO-8859-8-E | |
| ISO-8859-8-E, ISO_8859-8-E, csISO88598E | |
| ISO-8859-9-Windows-Latin-5 | |
| ISO-8859-9-Windows-Latin-5, csWindows31Latin5 | |
| iso-ir-90 | |
| iso-ir-90, csISO90 | |
| ISO-Unicode-IBM-1261 | |
| ISO-Unicode-IBM-1261, csUnicodeIBM1261 | |
| ISO-Unicode-IBM-1264 | |
| ISO-Unicode-IBM-1264, csUnicodeIBM1264 | |
| ISO-Unicode-IBM-1265 | |
| ISO-Unicode-IBM-1265, csUnicodeIBM1265 | |
| ISO-Unicode-IBM-1268 | |
| ISO-Unicode-IBM-1268, csUnicodeIBM1268 | |
| ISO-Unicode-IBM-1276 | |
| ISO-Unicode-IBM-1276, csUnicodeIBM1276 | |
| ISO_10367-box | |
| ISO_10367-box, iso-ir-155, csISO10367Box | |
| ISO_2033-1983 | |
| ISO_2033-1983, iso-ir-98, e13b, csISO2033 | |
| ISO_5427 | |
| ISO_5427, iso-ir-37, csISO5427Cyrillic | |
| ISO_5427:1981 | |
| ISO_5427:1981, iso-ir-54, ISO5427Cyrillic1981 | |
| ISO_5428:1980 | |
| ISO_5428:1980, iso-ir-55, csISO5428Greek | |
| ISO_646.basic:1983 | |
| ISO_646.basic:1983, ref, csISO646basic1983 | |
| ISO_646.irv:1983 | |
| ISO_646.irv:1983, iso-ir-2, irv, csISO2IntlRefVersion | |
| ISO_6937-2-25 | |
| ISO_6937-2-25, iso-ir-152, csISO6937Add | |
| ISO_6937-2-add | |
| ISO_6937-2-add, iso-ir-142, csISOTextComm | |
| ISO_8859-8-I | |
| ISO_8859-8-I, csISO88598I | |
| ISO_8859-supp | |
| ISO_8859-supp, iso-ir-154, latin1-2-5, csISO8859Supp | |
| IT | |
| IT, iso-ir-15, ISO646-IT, csISO15Italian | |
| Japanese (EUC) | 51932 |
| x-euc, x-euc-jp, CP51932, MS51932, WINDOWS-51932 | |
| Japanese (JIS 0208-1990 and 0121-1990) | 20932 |
| EUC-JP, Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese | |
| Japanese (JIS) | 50220 |
| iso-2022-jp | |
| Japanese (JIS-Allow 1 byte Kana - SO/SI) | 50222 |
| _iso-2022-jp$SIO | |
| Japanese (JIS-Allow 1 byte Kana) | 50221 |
| csISO2022JP, _iso-2022-jp, CP50221, ISO-2022-JP-MS, ISO2022-JP-MS, MS50221, WINDOWS-50221 | |
| Japanese (Mac) | 10001 |
| x-mac-japanese | |
| Japanese (Shift-JIS) | 932 |
| shift_jis, csShiftJIS, csWindows31J, ms_Kanji, shift-jis, x-ms-cp932, x-sjis, sjis, CP932, MS932, SHIFFT_JIS, SHIFFT_JIS-MS, SJIS-MS, SJIS-OPEN, SJIS-WIN, WINDOWS-932, Windows-31J | |
| JIS_C6220-1969-jp | |
| JIS_C6220-1969-jp, JIS_C6220-1969, iso-ir-13, katakana, x0201-7, csISO13JISC6220jp | |
| JIS_C6220-1969-ro | |
| JIS_C6220-1969-ro, iso-ir-14, jp, ISO646-JP, csISO14JISC6220ro | |
| JIS_C6226-1978 | |
| JIS_C6226-1978, iso-ir-42, csISO42JISC62261978 | |
| JIS_C6226-1983 | |
| JIS_C6226-1983, iso-ir-87, x0208, JIS_X0208-1983, csISO87JISX0208 | |
| JIS_C6229-1984-a | |
| JIS_C6229-1984-a, iso-ir-91, jp-ocr-a, csISO91JISC62291984a | |
| JIS_C6229-1984-b | |
| JIS_C6229-1984-b, iso-ir-92, ISO646-JP-OCR-B, jp-ocr-b, csISO92JISC62991984b | |
| JIS_C6229-1984-b-add | |
| JIS_C6229-1984-b-add, iso-ir-93, jp-ocr-b-add, csISO93JIS62291984badd | |
| JIS_C6229-1984-hand | |
| JIS_C6229-1984-hand, iso-ir-94, jp-ocr-hand, csISO94JIS62291984hand | |
| JIS_C6229-1984-hand-add | |
| JIS_C6229-1984-hand-add, iso-ir-95, jp-ocr-hand-add, csISO95JIS62291984handadd | |
| JIS_C6229-1984-kana | |
| JIS_C6229-1984-kana, iso-ir-96, csISO96JISC62291984kana | |
| JIS_Encoding | |
| JIS_Encoding, csJISEncoding | |
| JIS_X0201 | |
| JIS_X0201, X0201, csHalfWidthKatakana | |
| JIS_X0212-1990 | |
| JIS_X0212-1990, x0212, iso-ir-159, csISO159JISX02121990 | |
| JUS_I.B1.002 | |
| JUS_I.B1.002, iso-ir-141, ISO646-YU, js, yu, csISO141JUSIB1002 | |
| JUS_I.B1.003-mac | |
| JUS_I.B1.003-mac, macedonian, iso-ir-147, csISO147Macedonian | |
| JUS_I.B1.003-serb | |
| JUS_I.B1.003-serb, iso-ir-146, serbian, csISO146Serbian | |
| KOI7-switched | |
| KOI7-switched | |
| Korean | 949 |
| ks_c_5601-1987, csKSC56011987, iso-ir-149, korean, ks_c_5601, ks_c_5601_1987, ks_c_5601-1989, KSC_5601, KSC5601, ks-c-5601, ks-c5601, CP949, UHC | |
| Korean (EUC) | 51949 |
| euc-kr, csEUCKR | |
| Korean (ISO) | 50225 |
| iso-2022-kr, csISO2022KR, iso2022-kr | |
| Korean (Johab) | 1361 |
| Johab, CP1361 | |
| Korean (Mac) | 10003 |
| x-mac-korean | |
| Korean Wansung | 20949 |
| x-cp20949 | |
| KSC5636 | |
| KSC5636, ISO646-KR, csKSC5636 | |
| KZ-1048 | |
| KZ-1048, STRK1048-2002, RK1048, csKZ1048 | |
| Latin 3 (ISO) | 28593 |
| iso-8859-3, Latin3, ISO_8859-3, ISO_8859-3:1988, iso-ir-109, l3, csISOLatin3, iso8859-3 | |
| Latin 9 (ISO) | 28605 |
| iso-8859-15, Latin9, ISO_8859-15, l9, Latin-9, iso8859-15 | |
| latin-greek | |
| latin-greek, iso-ir-19, csISO19LatinGreek | |
| Latin-greek-1 | |
| Latin-greek-1, iso-ir-27, csISO27LatinGreek1 | |
| latin-lap | |
| latin-lap, lap, iso-ir-158, csISO158Lap | |
| Microsoft-Publishing | |
| Microsoft-Publishing, csMicrosoftPublishing | |
| MNEM | |
| MNEM, csMnem | |
| MNEMONIC | |
| MNEMONIC, csMnemonic | |
| MSZ_7795.3 | |
| MSZ_7795.3, iso-ir-86, ISO646-HU, hu, csISO86Hungarian | |
| NATS-DANO | |
| NATS-DANO, iso-ir-9-1, csNATSDANO | |
| NATS-DANO-ADD | |
| NATS-DANO-ADD, iso-ir-9-2, csNATSDANOADD | |
| NATS-SEFI | |
| NATS-SEFI, iso-ir-8-1, csNATSSEFI | |
| NATS-SEFI-ADD | |
| NATS-SEFI-ADD, iso-ir-8-2, csNATSSEFIADD | |
| NC_NC00-10:81 | |
| NC_NC00-10:81, cuba, iso-ir-151, ISO646-CU, csISO151Cuba | |
| NF_Z_62-010 | |
| NF_Z_62-010, iso-ir-69, ISO646-FR, fr, csISO69French | |
| NF_Z_62-010_ | |
| NF_Z_62-010_, iso-ir-25, ISO646-FR1, csISO25French | |
| Nordic (DOS) | 865 |
| IBM865, cp865, 865, csIBM865 | |
| Norwegian (IA5) | 20108 |
| x-IA5-Norwegian | |
| NS_4551-1 | |
| NS_4551-1, iso-ir-60, ISO646-NO, no, csISO60DanishNorwegian, csISO60Norwegian1 | |
| NS_4551-2 | |
| NS_4551-2, ISO646-NO2, iso-ir-61, no2, csISO61Norwegian2 | |
| OEM Cyrillic (primarily Russian) | 855 |
| IBM855, cp855, 855, csIBM855 | |
| OEM Multilingual Latin 1 + Euro symbol | 858 |
| IBM00858, CCSID00858, CP00858, PC-Multilingual-850+euro, CP858 | |
| OEM United States | 437 |
| IBM437, 437, cp437, csPC8, CodePage437, csPC8CodePage437 | |
| OSD_EBCDIC_DF03_IRV | |
| OSD_EBCDIC_DF03_IRV | |
| OSD_EBCDIC_DF04_1 | |
| OSD_EBCDIC_DF04_1 | |
| OSD_EBCDIC_DF04_15 | |
| OSD_EBCDIC_DF04_15 | |
| PC8-Danish-Norwegian | |
| PC8-Danish-Norwegian, csPC8DanishNorwegian | |
| PC8-Turkish | |
| PC8-Turkish, csPC8Turkish | |
| Portuguese (DOS) | 860 |
| IBM860, cp860, 860, csIBM860 | |
| PT | |
| PT, iso-ir-16, ISO646-PT, csISO16Portuguese | |
| PT2 | |
| PT2, iso-ir-84, ISO646-PT2, csISO84Portuguese2 | |
| PTCP154 | 154 |
| PTCP154, csPTCP154, PT154, CP154, Cyrillic-Asian | |
| Romanian (Mac) | 10010 |
| x-mac-romanian | |
| SCSU | |
| SCSU | |
| SEN_850200_B | |
| SEN_850200_B, iso-ir-10, FI, ISO646-FI, ISO646-SE, se, csISO10Swedish | |
| SEN_850200_C | |
| SEN_850200_C, iso-ir-11, ISO646-SE2, se2, csISO11SwedishForNames | |
| Swedish (IA5) | 20107 |
| x-IA5-Swedish | |
| T.101-G2 | |
| T.101-G2, iso-ir-128, csISO128T101G2 | |
| T.61 | 20261 |
| x-cp20261 | |
| T.61-7bit | |
| T.61-7bit, iso-ir-102, csISO102T617bit | |
| T.61-8bit | |
| T.61-8bit, T.61, iso-ir-103, csISO103T618bit | |
| TCA Taiwan | 20001 |
| x-cp20001 | |
| TeleText Taiwan | 20004 |
| x-cp20004 | |
| Thai (Mac) | 10021 |
| x-mac-thai | |
| Thai (Windows) | 874 |
| windows-874, DOS-874, iso-8859-11, TIS-620, CP874 | |
| TSCII | |
| TSCII, csTSCII | |
| Turkish (DOS) | 857 |
| ibm857, cp857, 857, csIBM857 | |
| Turkish (ISO) | 28599 |
| iso-8859-9, Latin5, ISO_8859-9, ISO_8859-9:1989, iso-ir-148, l5, iso8859-9 | |
| Turkish (Mac) | 10081 |
| x-mac-turkish | |
| Turkish (Windows) | 1254 |
| windows-1254, CP1254, MS-TURK | |
| Ukrainian (Mac) | 10017 |
| x-mac-ukrainian | |
| Unicode | 1200 |
| unicode, utf-16, CP1200, UTF16LE, UCS-2LE, UTF16, UCS-2, UTF-16LE | |
| Unicode (Big-Endian) | 1201 |
| unicodeFFFE, CP1201, UTF16BE, UCS-2BE, UTF-16BE | |
| Unicode (UTF-7) | 65000 |
| utf-7, csUnicode11UTF7, unicode-1-1-utf-7, x-unicode-2-0-utf-7 | |
| Unicode (UTF-8) | 65001 |
| utf-8, unicode-1-1-utf-8, unicode-2-0-utf-8, x-unicode-2-0-utf-8, CP65001, UTF8 | |
| UNICODE-1-1 | |
| UNICODE-1-1, csUnicode11 | |
| UNKNOWN-8BIT | |
| UNKNOWN-8BIT, csUnknown8BiT | |
| US-ASCII | 20127 |
| us-ascii, ANSI_X3.4-1968, ANSI_X3.4-1986, ascii, cp367, csASCII, IBM367, ISO_646.irv:1991, ISO646-US, iso-ir-6us, iso-ir-6, us | |
| us-dk | |
| us-dk, csUSDK | |
| UTF-32 | 12000 |
| UTF-32, UTF-32LE, CP12000, UTF32LE, UTF32 | |
| UTF-32BE | 12001 |
| UTF-32BE, CP12001, UTF32BE | |
| Ventura-International | |
| Ventura-International, csVenturaInternational | |
| Ventura-Math | |
| Ventura-Math, csVenturaMath | |
| Ventura-US | |
| Ventura-US, csVenturaUS | |
| videotex-suppl | |
| videotex-suppl, iso-ir-70, csISO70VideotexSupp1 | |
| Vietnamese (Windows) | 1258 |
| windows-1258, CP1258 | |
| VIQR | |
| VIQR, csVIQR | |
| VISCII | |
| VISCII, csVISCII | |
| Wang Taiwan | 20005 |
| x-cp20005 | |
| Western European (DOS) | 850 |
| ibm850, cp850, 850, csPC850Multilingual | |
| Western European (IA5) | 20105 |
| x-IA5 | |
| Western European (ISO) | 28591 |
| iso-8859-1, cp819, Latin1, ibm819, iso_8859-1, iso_8859-1:1987, iso8859-1, iso-ir-100, l1, csISOLatin1 | |
| Western European (Mac) | 10000 |
| macintosh, mac, csMacintosh | |
| Western European (Windows) | 1252 |
| Windows-1252, x-ansi, CP1252, MS-ANSI | |
Everyone is encouraged to use Unicode (especially UTF-8), however the reality is that many of these non-Unicode encodings are in broad use and we still need to standardize the way we identify them.
I will be posting the full XML data set for this list and my firstobject XML editor foal script that built it at a later time.
See also:
Convert ANSI file to Unicode
ANSI and Unicode files and C++ strings
UTF-8 Files and the Preamble
Setting the XML Declaration With CMarkup
CMarkup GetDeclaredEncoding Method
UTF-16 Files and the Byte Order Mark (BOM)