ArchiveOrangemail archive

jython-users.lists.sourceforge.net


(List home) (Recent threads) (67 other SourceForge lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • Low traffic list: less than 3 messages per day
  • This list contains about 2,179 messages, beginning Aug 2009
  • 0 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Jython 2.5.1 and various encodings support - LookupError: unknown encoding

Ad
Chris Clark 1266969691Wed, 24 Feb 2010 00:01:31 +0000 (UTC)
I seen a few email trails and (old) bugs on encoding support in Jython:

http://sourceforge.net/mailarchive/forum.php?...
http://sourceforge.net/mailarchive/forum.php?...

http://bugs.jython.org/issue1410
http://bugs.jython.org/issue1066

but I'm still confused about what is and is not supported.

Sample session:

C:\jython2.5.1>dir C:\jython2.5.1\Lib\encodings\shift*py
 Volume in drive C has no label.
 Volume Serial Number is 547C-9409

 Directory of C:\jython2.5.1\Lib\encodings

09/26/2009  12:48 PM             1,039 shift_jis.py
09/26/2009  12:48 PM             1,059 shift_jisx0213.py
09/26/2009  12:48 PM             1,059 shift_jis_2004.py
               3 File(s)          3,157 bytes
               0 Dir(s)   3,572,412,416 bytes free

C:\jython2.5.1>jython
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
Type "help", "copyright", "credits" or "license" for more information.
>>> x='' >>> x.decode('shift_jis')
Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding 'shift_jis' The encoding does exist, however there may be something going on with _codecs_jp (or I guess not going on) - here is what happens if the encoding is explicitly imported: C:\jython2.5.1\Lib\encodings>C:\jython2.5.1\jython Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 Type "help", "copyright", "credits" or "license" for more information. >>> import shift_jis Traceback (most recent call last): File "<stdin>", line 1, in <module> File "shift_jis.py", line 7, in <module> import _codecs_jp, codecs ImportError: No module named _codecs_jp Here is what I was expecting :-) C:\jython2.5.1>c:\Python25\python Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
>>> x='' >>> x.decode('shift_jis')
u'' Any tips/workarounds I can use? Is there something wrong with my install? Thanks in advance, Chris
Philip Jenvey 1267215686Fri, 26 Feb 2010 20:21:26 +0000 (UTC)
On Feb 23, 2010, at 3:49 PM, Chris Clark wrote: > I seen a few email trails and (old) bugs on encoding support in Jython: > > http://sourceforge.net/mailarchive/forum.php?... > http://sourceforge.net/mailarchive/forum.php?... > > http://bugs.jython.org/issue1410 > http://bugs.jython.org/issue1066 > > but I'm still confused about what is and is not supported. > > Sample session: > > C:\jython2.5.1>dir C:\jython2.5.1\Lib\encodings\shift*py > Volume in drive C has no label. > Volume Serial Number is 547C-9409 > > Directory of C:\jython2.5.1\Lib\encodings > > 09/26/2009 12:48 PM 1,039 shift_jis.py > 09/26/2009 12:48 PM 1,059 shift_jisx0213.py > 09/26/2009 12:48 PM 1,059 shift_jis_2004.py > 3 File(s) 3,157 bytes > 0 Dir(s) 3,572,412,416 bytes free > > C:\jython2.5.1>jython > Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) > [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 > Type "help", "copyright", "credits" or "license" for more information. >>>> x='' >>>> x.decode('shift_jis') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > LookupError: unknown encoding 'shift_jis' > > The encoding does exist, however there may be something going on with > _codecs_jp (or I guess not going on) - here is what happens if the > encoding is explicitly imported: > > C:\jython2.5.1\Lib\encodings>C:\jython2.5.1\jython > Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) > [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 > Type "help", "copyright", "credits" or "license" for more information. >>>> import shift_jis > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "shift_jis.py", line 7, in <module> > import _codecs_jp, codecs > ImportError: No module named _codecs_jp > > > Here is what I was expecting :-) > > C:\jython2.5.1>c:\Python25\python > Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit > (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> x='' >>>> x.decode('shift_jis') > u'' > > > Any tips/workarounds I can use? Is there something wrong with my install? > > Thanks in advance,
#1066 is the main bug for this issue -- we just currently lack support for the asian codecs like shiftjis. The ImportError in sample #2 is a symptom of that. The same ImportError happens when you attempt to use the codec but it's masked as a LookupError. Supporting these via the JVM's nio codecs is definitely doable but nobody's gotten around to it yet. -- Philip Jenvey
Chris Clark 1267215078Fri, 26 Feb 2010 20:11:18 +0000 (UTC)
Philip Jenvey wrote: > On Feb 23, 2010, at 3:49 PM, Chris Clark wrote: > > >> .... >> C:\jython2.5.1\Lib\encodings>C:\jython2.5.1\jython >> Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) >> [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 >> Type "help", "copyright", "credits" or "license" for more information. >> >>>>> import shift_jis >>>>> >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "shift_jis.py", line 7, in <module> >> import _codecs_jp, codecs >> ImportError: No module named _codecs_jp >> >> > #1066 is the main bug for this issue -- we just currently lack support for the asian codecs like shiftjis. The ImportError in sample #2 is a symptom of that. The same ImportError happens when you attempt to use the codec but it's masked as a LookupError. > > Supporting these via the JVM's nio codecs is definitely doable but nobody's gotten around to it yet. >
Thanks for the heads up. Is http://java.sun.com/j2se/1.4.2/docs/guide/nio... the package you are referring to? I'm not a big Java guy but I may start hacking on a Python layer on top of this as an experiment/proof-of-concept. Presumably http://java.sun.com/j2se/1.4.2/docs/api/java/... is what needs wrapping? Chris
Chris Clark 1267232232Sat, 27 Feb 2010 00:57:12 +0000 (UTC)
Chris Clark wrote: > Philip Jenvey wrote: > >> #1066 is the main bug for this issue -- we just currently lack support for the asian codecs like shiftjis. The ImportError in sample #2 is a symptom of that. The same ImportError happens when you attempt to use the codec but it's masked as a LookupError. >> >> Supporting these via the JVM's nio codecs is definitely doable but nobody's gotten around to it yet. >> >> > > Is http://java.sun.com/j2se/1.4.2/docs/guide/nio... the package you are > referring to? I'm not a big Java guy but I may start hacking on a Python > layer on top of this as an experiment/proof-of-concept. Presumably > http://java.sun.com/j2se/1.4.2/docs/api/java/... > is what needs wrapping? >
I had some time this afternoon whilst waiting for some builds to complete... So I started experimenting on using nio from Python along with a quick attempt at a shift_jis I'm seeking feedback on a very INCOMPLETE demo that is attached. Sample session: C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 Type "help", "copyright", "credits" or "license" for more information.
>>> x='' >>> x.decode('shift_jis') # at this point there is a shift_jis.py in curdir
Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding 'shift_jis'
>>> import shift_jis # register the local module/encoding >>> x.decode('shift_jis')
u'' >>> There is no support for errors (or less strict conversion options), there are imports in the middle of the script and you have to import the encoding you need (and right now there is only one but it is easy to do multiple with a template). I'm beginning to wonder if it would simply be cleaner to use the CPython gencodec.py script and generate input to it by using the CPython encodings. I've done this for some Windows (single byte) encodings that are not supported by Python by auto-generating tables from Windows codepages like cp708. The tables would be pretty big though :-) I'm really looking for "yes nio from Python approach is worth pursuing" or "this is stupid, you should stop now" comments. I'm pretty sure performance wise this approach is not a good idea but it is infinitely faster than "doesn't work at all" :-) Here is a slightly more real example: C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02 Type "help", "copyright", "credits" or "license" for more information.
>>> import shift_jis # register the local module/encoding >>> x = u"\u3042" # '3042 HIRAGANA LETTER A' >>> x.encode('shift_jis')
'\x82\xa0' >>> # hey! Looks like it matches http://demo.icu-project.org/icu-bin/convexp?c... Finally, does anyone know how IronPython handles CJK (or do they simply make use of .NET strings)? Chris # java imports import java.nio.charset import java.nio.CharBuffer import java.nio.ByteBuffer # python imports import array class MyBaseException(Exception): pass def nio_unicode_to_bytes(nio_charset_name, unicode_string_data): """Take Python Unicode string and return python str type (byte) encoded in nio_charset_name nio_charset_name is a java.nio.charset name """ assert isinstance(unicode_string_data, unicode) nio_charset = java.nio.charset.Charset.forName(nio_charset_name) # TODO lookup could fail nio_charset_encoder = nio_charset.newEncoder() try: bbuf = nio_charset_encoder.encode(java.nio.CharBuffer.wrap(unicode_string_data)) except java.nio.charset.UnmappableCharacterException: # not possible to represent one or more Unicode character(s) in this encoding raise MyBaseException('nio encoding failure - not implemented support yet') tmp_byte_array = array.array('b', bbuf.array()) return tmp_byte_array .tostring() def nio_bytes_to_unicode(nio_charset_name, byte_string_data): """Take Python str (byte) string and return python Unicode string type decoded using nio_charset_name nio_charset_name is a java.nio.charset name """ assert isinstance(byte_string_data, str) nio_charset = java.nio.charset.Charset.forName(nio_charset_name) # TODO lookup could fail nio_charset_decoder = nio_charset.newDecoder() tmp_byte_buffer = java.nio.ByteBuffer.wrap(byte_string_data) try: cbuf = nio_charset_decoder.decode(tmp_byte_buffer) except java.nio.charset.MalformedInputException: raise MyBaseException('nio decoding failure - not implemented support yet') tmp_unicode_str = cbuf.toString() return tmp_unicode_str ## could probably use a decorator here.... def decode_Shift_JIS(input): return nio_bytes_to_unicode('Shift_JIS', input) def encode_Shift_JIS(input): return nio_unicode_to_bytes('Shift_JIS', input) #### Pretty much boiler plate codec import codecs def decode(input, errors='strict'): return decode_Shift_JIS(input), len(input) def encode(input, errors='strict'): return encode_Shift_JIS(input), len(input) class Codec(codecs.Codec): def decode(self, input, errors='strict'): return decode(input, errors) def encode(self, input, errors='strict'): return encode(input, errors) class StreamReader(codecs.Codec, codecs.StreamReader): pass class StreamWriter(codecs.Codec, codecs.StreamWriter): pass # entry point def getregentry(): return (encode, decode, StreamReader, StreamWriter) ##### not so boiler plate..... def shift_jis_search_function(name): if name == 'shift_jis': import shift_jis codec = shift_jis.Codec() return (codec.encode, codec.decode, shift_jis.StreamReader, shift_jis.StreamWriter) else: return None ## works for 2.4 and 2.5 codecs.register(shift_jis_search_function)
Home | About | Privacy