|
Answer» In general, the parser supports all IANA encodings and aliases (seehttp://www.iana.org/assignments/character-sets) that have clear mappings to Java encodings (see here for details).
Some of the more common encodings are:
- UTF-8
- UTF-16 BIG ENDIAN, UTF-16 Little Endian
- IBM-1208
- ISO Latin-1 (ISO-8859-1)
- ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, Hungarian, POLISH, Romanian, Serbian (in Latin transcription), Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
- ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
- ISO Latin-4 (ISO-8859-4)
- ISO Latin Cyrillic (ISO-8859-5)
- ISO Latin Arabic (ISO-8859-6)
- ISO Latin Greek (ISO-8859-7)
- ISO Latin Hebrew (ISO-8859-8)
- ISO Latin-5 (ISO-8859-9) [Turkish]
- Extended Unix Code, packed for Japanese (euc-jp, eucjis)
- Japanese Shift JIS (shift-jis)
- Chinese (big5)
- Chinese for PRC (mixed 1/2 byte) (gb2312)
- Japanese ISO-2022-JP (iso-2022-jp)
- Cyrllic (koi8-r)
- Extended Unix Code, packed for KOREAN (euc-kr)
- Russian Unix, Cyrillic (koi8-r)
- Windows Thai (cp874)
- Latin 1 Windows (cp1252)
- cp858
- EBCDIC encodings:
o EBCDIC US (ebcdic-cp-us) o EBCDIC Canada (ebcdic-cp-ca) o EBCDIC Netherland (ebcdic-cp-nl) o EBCDIC Denmark (ebcdic-cp-dk) o EBCDIC NORWAY (ebcdic-cp-no) o EBCDIC Finland (ebcdic-cp-fi) o EBCDIC Sweden (ebcdic-cp-se) o EBCDIC Italy (ebcdic-cp-it) o EBCDIC Spain, Latin America (ebcdic-cp-es) o EBCDIC Great Britain (ebcdic-cp-gb) o EBCDIC France (ebcdic-cp-fr) o EBCDIC Hebrew (ebcdic-cp-he) o EBCDIC Switzerland (ebcdic-cp-ch) o EBCDIC Roece (ebcdic-cp-roece) o EBCDIC Yugoslavia (ebcdic-cp-yu) o EBCDIC Iceland (ebcdic-cp-is) o EBCDIC Urdu (ebcdic-cp-ar2) o Latin 0 EBCDIC o EBCDIC Arabic (ebcdic-cp-ar1)
In general, the parser supports all IANA encodings and aliases (seehttp://www.iana.org/assignments/character-sets) that have clear mappings to Java encodings (see here for details). Some of the more common encodings are:
|