1.

What International Encodings Are Supported By Xerces-j?

Answer»

In general, the parser supports all IANA encodings and aliases (seehttp://www.iana.org/assignments/character-sets) that have clear mappings to Java encodings (see here for details).

Some of the more common encodings are:

  • UTF-8
  • UTF-16 BIG ENDIAN, UTF-16 Little Endian
  • IBM-1208
  • ISO Latin-1 (ISO-8859-1)
  • ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, Hungarian, POLISH, Romanian, Serbian (in Latin transcription), Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
  • ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
  • ISO Latin-4 (ISO-8859-4)
  • ISO Latin Cyrillic (ISO-8859-5)
  • ISO Latin Arabic (ISO-8859-6)
  • ISO Latin Greek (ISO-8859-7)
  • ISO Latin Hebrew (ISO-8859-8)
  • ISO Latin-5 (ISO-8859-9) [Turkish]
  • Extended Unix Code, packed for Japanese (euc-jp, eucjis)
  • Japanese Shift JIS (shift-jis)
  • Chinese (big5)
  • Chinese for PRC (mixed 1/2 byte) (gb2312)
  • Japanese ISO-2022-JP (iso-2022-jp)
  • Cyrllic (koi8-r)
  • Extended Unix Code, packed for KOREAN (euc-kr)
  • Russian Unix, Cyrillic (koi8-r)
  • Windows Thai (cp874)
  • Latin 1 Windows (cp1252)
  • cp858
  • EBCDIC encodings:
    o EBCDIC US (ebcdic-cp-us)
    o EBCDIC Canada (ebcdic-cp-ca)
    o EBCDIC Netherland (ebcdic-cp-nl)
    o EBCDIC Denmark (ebcdic-cp-dk)
    o EBCDIC NORWAY (ebcdic-cp-no)
    o EBCDIC Finland (ebcdic-cp-fi)
    o EBCDIC Sweden (ebcdic-cp-se)
    o EBCDIC Italy (ebcdic-cp-it)
    o EBCDIC Spain, Latin America (ebcdic-cp-es)
    o EBCDIC Great Britain (ebcdic-cp-gb)
    o EBCDIC France (ebcdic-cp-fr)
    o EBCDIC Hebrew (ebcdic-cp-he)
    o EBCDIC Switzerland (ebcdic-cp-ch)
    o EBCDIC Roece (ebcdic-cp-roece)
    o EBCDIC Yugoslavia (ebcdic-cp-yu)
    o EBCDIC Iceland (ebcdic-cp-is)
    o EBCDIC Urdu (ebcdic-cp-ar2)
    o Latin 0 EBCDIC
    o EBCDIC Arabic (ebcdic-cp-ar1)

In general, the parser supports all IANA encodings and aliases (seehttp://www.iana.org/assignments/character-sets) that have clear mappings to Java encodings (see here for details).

Some of the more common encodings are:



Discussion

No Comment Found