1.

What Does "unicode Build" Really Mean?

Answer»

Unicode in Python: There are a few aspects of unicode-ness to keep track of here. First, in Python there are unicode objects and there are string objects. String objects are essentially a sequence of 8-bit characters, and unicode objects are a sequence of "wide" characters (either 16-bit or 32-bit depending on platform and OPTIONS used when building Python.) They are related to each other in that a unicode object can be encoded into a string object using a specific "codec" (a matched enCODer and DECoder pair). You can think of a codec as being like the "magic decoder ring" that came in the box of cereal when you were a kid, (or perhaps when your dad was a kid...) String objects can be decoded into a unicode object using the decoder part of the codec.

Unicode in wxWidgets: On the other side of the FENCE is wxWidgets and how it can use unicode. In the C++ code there is a class named wxString, and all string type parameters and return values in the wxWidgets library use a wxString type. The wxWidgets library has a unicode compile switch that makes wxString be either an array of 8-bit characters (the C char data type) or an array of wide characters (C's wchar_t data type.) So in other words you can have a wxWidgets build where wxStrings are unicode and another build where wxStrings are ansi strings.

Unicode in wxPython: So what does all that MEAN for wxPython? Since Python does know about string and unicode objects, and you can have both in the same PROGRAM, the wxPython wrappers need to attempt to do something intelligent based on if the wxWidgets being used is an unicode build or an ansi build.

So, if wxPython is using an ansi build of wxWidgets then:

  • Return values from wx functions and METHODS will be string objects.
  • Strings passed as parameters to wx functions and methods will be used without conversion.
  • Unicode objects passed as parameters to wx functions and methods will first be converted to a string object using the default encoding. You can use sys.getdefaultencoding() to find out what the default encoding is. You may get an UnicodeEncodeError if there are characters in the unicode object which don't have a matching ordinal in the default encoding. To work around this you can convert the unicode object to a string yourself using a different codec.

And if wxPython is using an unicode build of wxWidgets then:

  • Return values from wx functions and methods will be unicode objects.
  • Strings passed as parameters to wx functions and methods will first be converted to a unicode object using the default encoding. You may get an UnicodeDecodeError exception if there are characters in the string that are not mapped to unicode ordinals by the default encoding. To work around this you can convert the string to unicode yourself using a specific codec.
  • Unicode objects passed as parameters to wx functions and methods will be used without conversion.

Unicode in Python: There are a few aspects of unicode-ness to keep track of here. First, in Python there are unicode objects and there are string objects. String objects are essentially a sequence of 8-bit characters, and unicode objects are a sequence of "wide" characters (either 16-bit or 32-bit depending on platform and options used when building Python.) They are related to each other in that a unicode object can be encoded into a string object using a specific "codec" (a matched enCODer and DECoder pair). You can think of a codec as being like the "magic decoder ring" that came in the box of cereal when you were a kid, (or perhaps when your dad was a kid...) String objects can be decoded into a unicode object using the decoder part of the codec.

Unicode in wxWidgets: On the other side of the fence is wxWidgets and how it can use unicode. In the C++ code there is a class named wxString, and all string type parameters and return values in the wxWidgets library use a wxString type. The wxWidgets library has a unicode compile switch that makes wxString be either an array of 8-bit characters (the C char data type) or an array of wide characters (C's wchar_t data type.) So in other words you can have a wxWidgets build where wxStrings are unicode and another build where wxStrings are ansi strings.

Unicode in wxPython: So what does all that mean for wxPython? Since Python does know about string and unicode objects, and you can have both in the same program, the wxPython wrappers need to attempt to do something intelligent based on if the wxWidgets being used is an unicode build or an ansi build.

So, if wxPython is using an ansi build of wxWidgets then:

And if wxPython is using an unicode build of wxWidgets then:



Discussion

No Comment Found