Thursday, March 17, 2005

OAB sort orders and character handling

Borrowed from Neil Shipps Blog...

In Exchange 2000 and earlier, 10% of Exchange users used the Offline Address Book was the statistic that was most often repeated to me. But with Exchange 2003 and Office 2003, that number has gone way up due to Outlook's usage of Cached Mode Exchange. This means that everyone that uses Outlook in cached mode now downloads the Offline Address Book or OAB, and they may get a slightly different experience than they were used to.

Before Cached Mode Exchange, clients would get their address books directly from the Active Directory. As long as the global catalogs they were using had the client language installed (see KB;en-us;301314), the Active Directory would sort the GAL for browsing using the correct sorting locale and return it to the client rendered in the right code page. When the client actively searched for entries, again, the AD would use the sort locale of the client to compute the result. When the OAB server generates the OAB files, it reads the entries from the AD sorted in the default sort order of the OAB server and builds the OAB search files using that same sort order. Now in cached mode, since the OAB is sorted according to OAB server and not the client, it may not match the clients language locale. This means that users may not be able to find entries in the GAL as easily because they're not sorted in the way the user expects.

By default, Exchange Server creates a single default OAB with the Global Address List on the first server installed in the organization. This works well if your users all have the same language set as default on their client computers. But for those organizations who must support different client languages it now gets complex. The OAB generation process in Exchange Server 2003 can build two different versions of the OAB: a Unicode version and an ANSI version. The Unicode version reads the Active Directory recipients in Unicode and keeps the data as Unicode when stored on the client. It's up to the client to render the characters in the right code page. However for the ANSI files, the OAB server renders the Unicode AD characters using the default code page of the server. So if your OAB server is running on a US English Windows server and you have some Active Directory recipients with non-western characters in their properties, you're going to get question marks for those characters in the OAB data for the ANSI version of the OAB.

For the display names that can't be rendered in ANSI there is a work around. If the AD or the OAB finds that characters in the displayname can't be rendered correctly in the current code page, they will then substitute the "simple display name" attribute for the display name. If you have filled out this attribute then that text will show up instead of the original display name in the address book and on the property pages. If it hasn't been set, the alias name will be used. However, the records will not be resorted. The simple display name will show up where the display name would have sorted to. Since the display display name or alias may not have any relation to the display name, then the address book may not appear to be in the correct order for those entries.

This leads us into sort order problems. Alphabetical English is not the only way that words are sorted. Many languages use different sorting rules. The far-east languages are the most removed from English, but even western European languages use slightly different sorting rules. Letters with diacritical marks may sort differently: 'A' with an umlaut may come after 'Z', double 'ss' may sort the same as a single 's', or 'ch' is considered a different letter that comes in-between 'c' and 'd'.

When you have users who are geographically diverse, they may require that the OAB files they use be sorted differently. Address Name Resolution may not be able to find entries and type-down in the Address Book pane may not work. This is because the user's client is using one set of rules to search a list of words sorted using an entirely different set of rules. The user could enter 'A' with an umlaut and they may be taken to the end of the address book when the entries they wanted are at the beginning. Or when entering a display name in Outlook and trying to resolve the name comes up with no matches found because the OAB was sorted with a different sort order. To improve this situation, in Exchange Server 2003 the OAB server stamps the OAB with the sort order that was used to create the files so the client knows what search rules to use when searching the OAB. However this only works for Outlook 2003. Older clients may still use different search rules.

To completely solve this issue you have one option: install a different Exchange OAB server with its own OAB for each client locale you want to support. Each server would have the system locale set to the locale of the client language you want to support. Once you've done this you now need to change the OAB settings for each user who requires a different sort locale. If you can move all your users who require a specific language to one Mailbox Store, you can change the default OAB setting for the Mailbox Store to the OAB that is sorted in the correct locale. If you can't put all the users who require the same sort order on the same Mailbox Store, you can set the user's OAB on the user object itself. See KB;en-us;275203

Since the OAB is now used by a larger part of the customer base, Microsoft is putting more resources on documenting the OAB. Microsoft has a good whitepaper on Offline Address Book best practices which covers OAB issues including sort orders. It can be found here:

No comments: