ArchiveOrangemail archive

The Haskell Cafe


haskell-cafe.haskell.org
(List home) (Recent threads) (40 other Haskell lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • Moderate traffic list: up to 30 messages per day
  • This list contains about 107,784 messages, beginning Oct 2000
  • 19 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Encoding issues with LDAP package

Ad
Vincent Ambo 1337697113Tue, 22 May 2012 14:31:53 +0000 (UTC)
Hej,

I'm using the LDAP package by John Goerzen to retrieve some information from an Active Directory database. Part of this information are the full names of my company's employees.

Many of these names contain characters which aren't part of the standard ASCII set, for example ä å ü ê and so on. When I retrieve those names from the directory (the LDAP package returns them as Strings) the encoding breaks and I get results like "R\195\188diger" instead of "Rüdiger".

The Active Directory server supports LDAP v2 and v3. I assume the OpenLDAP C API, which is the backend behind the LDAP package, automatically chooses v3 to connect if available (this is speculation, correct me if I'm wrong).

Since LDAP v3 only speaks UTF8 and ASCII I also assume that the server returns UTF8.

Is this a known problem in the LDAP package? Or is this related to the OpenLDAP C API? Or even something on the server side?

Any information would be helpful!

Best regards,
Vincent
wren ng thornton 1337744063Wed, 23 May 2012 03:34:23 +0000 (UTC)
On 5/22/12 10:30 AM, Vincent Ambo wrote:
> Hej,
>
> I'm using the LDAP package by John Goerzen to retrieve some information from an Active Directory database. Part of this information are the full names of my company's employees.
>
> Many of these names contain characters which aren't part of the standard ASCII set, for example ä å ü ê and so on. When I retrieve those names from the directory (the LDAP package returns them as Strings) the encoding breaks and I get results like "R\195\188diger" instead of "Rüdiger".
>
> The Active Directory server supports LDAP v2 and v3. I assume the OpenLDAP C API, which is the backend behind the LDAP package, automatically chooses v3 to connect if available (this is speculation, correct me if I'm wrong).
>
> Since LDAP v3 only speaks UTF8 and ASCII I also assume that the server returns UTF8.
>
> Is this a known problem in the LDAP package? Or is this related to the OpenLDAP C API? Or even something on the server side?I haven't used the LDAP package, though I have done a good deal of LDAP 
hackery back in the day. Without looking at any of the code involved, it 
sounds like the LDAP server is handing off utf8 encoded C-style char[] 
but that the Haskell code is interpreting that byte-by-byte (a la 
Data.ByteString.Char8 or similar) rather than properly decoding it into 
a list of Char (i.e., Unicode code points).

If you're familiar with the LDAP package and the FFI, it should be easy 
to poke into the code and see if that's actually what's going on.-- 
Live well,
~wren
Home | About | Privacy