This is a fantastic article posted by Mike Lee on EHLO. It describes the exact issues with Kernel memory and Token bloat. This is a MUST READ!
This is the third CXP flash about Windows 2003 kernel memory issues and Exchange 2003.
The first flash provided technical background about the demands Exchange makes on kernel resources.
The second flash discussed hardware configurations that can restrict the kernel memory available for applications.
This flash explains the effect of large user security tokens on Exchange's kernel memory usage.
Each client connection to a Windows server uses some paged pool kernel memory. The amount of kernel memory used per connection can vary widely. The size of the client's security token is the most important factor here.
Security tokens increase in size in proportion to the number of security groups to which a user belongs. This increase is generally linear, but there are sharp jumps in memory usage at certain thresholds.
There are an increasing number of Microsoft and third party clients and services that can connect to an Exchange server to provide expanded client, search and mobile functionality. Each of these clients may make multiple connections to the server, and each connection has token information associated with it. By reducing average token sizes, you can greatly increase the number of simultaneous connections that the server can manage.
CALL TO ACTION
- Exchange administrators should monitor the amount of paged pool memory in use on Exchange servers during peak connection times, and reconfigure servers that are close to running out of pool memory.
- Exchange administrators should calculate the effects on pool memory of adding new mailboxes, clients or services to an Exchange server.
- Exchange administrators should coordinate with Active Directory administrators to manage the number of security groups to which Exchange users are assigned. This is especially critical if the number of security group memberships is near a threshold where token size could suddenly jump.
Environmental changes such as the implementation of a new client or an increase in the number of security group memberships can suddenly increase kernel memory consumption to a critical level, causing the Exchange server to suddenly become slow or unstable. This can happen even though no changes have been made or mailboxes added on the Exchange server
FREQUENTLY ASKED QUESTIONS (FAQ)
What is a user token?
A security token is the bundle of information that identifies the user and the security groups to which the user belongs. Each time the user tries to connect to a secured resource, the token must be presented to the resource so that it can determine whether access should be granted or denied. For detailed information about security tokens please refer to the Access Tokens Technical Reference in the Windows Server 2003 Technical Reference at:
How big is a user token?
It depends. The most important factor is the number of security groups to which a user account belongs. Making a user a member of an additional security group can add up to 68 bytes to the token.
Various factors affect total token size and how many bytes are added per group. There is no simple mathematical formula for calculating token size based on the number of group memberships.
Nonetheless, the token for a user who belongs to 60 security groups is very likely to be less than 4K in size. The token for a user who belongs to 80 security groups is likely to be slightly more than 4K in size.
With regard to kernel memory consumption, there is a critical difference between a token that is "slightly less" than 4K and one that is "slightly more" than 4K.
As soon as the size of a token goes above 4K, even by a few bytes, it will take 8K of kernel memory to hold the token. If the token grows to slightly larger than 8K, its memory requirement will suddenly jump to 12K.
This increase may not seem that significant until you multiply it by thousands of clients--each of whom may make several connections to the server.
How much memory is taken up by user tokens on a typical server?
Consider an Exchange server where 1000 clients are logged on using Outlook 2003 in cached mode. Cached mode is a new Outlook feature that greatly improves the end user experience across slow or unreliable connections. Cached mode clients typically will not notice server disconnections lasting several minutes while online mode clients would experience errors in the same circumstance.
The typical Outlook 2003 cached mode client holds open 3 to 5 connections to the Exchange server. Older versions of Outlook (and Outlook 2003 in online mode) will make 1 or 2 connections to the server. Multiple connections allow Outlook to perform multiple server tasks in parallel. As a general rule, more connections per client means a better user experience. But, more connections per client also means more kernel memory consumption.
Each client connection to the Exchange store operates independently and needs a copy of the user token. There is also some RPC overhead for managing these connections, and busy clients may have more than one RPC connection to the server. Each RPC connection to the server also needs a copy of the user's token. Thus, each cached mode client has multiple copies of its token in memory on the server.
For the purposes of this example, assume that each client has a total of 5 connections and a token that is just under 4K in size. Each client then needs 20K of kernel memory.
This means that a thousand cached mode users will require about 20 megabytes of paged pool memory to support their connections. On a correctly tuned Exchange server with the /3GB switch set, there is a maximum of about 250 megabytes of paged pool memory available to all applications on the server.
NOTE: Please refer back to the first flash in this series for more information about the trade-offs involved in setting the /3GB switch. You can follow this link to read that flash on the Microsoft Exchange Team blog:
If additional security groups push the token size above 4K, then the amount of kernel memory used for tokens will suddenly be 40 megabytes instead of 20 megabytes. If each user starts using Microsoft Communicator and MSN Desktop Search, there can be additional connections made to the Exchange server. If these additional clients add a single connection each, they increase paged pool demand by another 40%.
There may also be clients on your network who have Outlook add-ons that make additional connections, clients who connect to the server from multiple computers at once, and delegates who open multiple calendars or mailboxes simultaneously. Each of these puts additional pressure on paged pool memory by making additional connections.
What happens if I run out of paged pool memory?
The server will become slow or refuse additional requests and connections. Applications may fail suddenly. In extreme cases, the server can even "blue screen" and stop entirely.
If the paged pool shortage is transient, the server will likely recover. Applications can be somewhat resilient to temporary shortages of memory, but no application can run forever if critical resource requests are not satisfied. If the paged pool shortage lasts very long, it is likely to trigger cascading bottlenecks. In such a case, the server will probably have to be rebooted to make it functional again.
How much paged pool memory should be free in order for me to feel safe?
Under peak load, there should be approximately 50 megabytes of available paged pool. If you have less than 30 megabytes free, you should take immediate steps to reduce load on the server.
Paged pool is allocated statically at Windows boot time. The pool cannot be increased without reconfiguring and rebooting the server. The amount of paged pool memory available depends on a number of factors, including boot switches (such as /USERVA and /3GB), registry settings and physical RAM.
You can use a kernel debugger to view the size of initial paged pool and other kernel memory allocations. Setting up a traditional kernel debugging session can be a daunting task, typically requiring an extra computer, specialized cabling and a server reboot. Alternately, the LiveKD utility from Sysinternals can be used to start a kernel debugging session from the server console. LiveKD does not require you to reboot the server. For more information, please see this article in the Microsoft Knowledge Base:
The Performance tool does not accurately show the available Free System Page Table entries in Windows Server 2003
IMPORTANT NOTE: Commands that can be used during a kernel debugging session can cause the system to become unstable or to stop. Microsoft recommends that you stop all Exchange services before initiating a kernel debugging session, and that you reboot the server after the session.
Without running a kernel debugger, you can still estimate how much paged pool is available on your server. A typical Exchange mailbox server with 1 gigabyte or more of RAM should have an initial paged pool allocation of slightly under 250 megabytes. (This assumes that the server has been configured with the /3GB switch and other recommended optimizations. Without the /3GB switch, initial paged pool on the same server would be about 350 megabytes.)
It is easy to check how much paged pool is currently in use. Windows Task Manager displays paged pool usage on the Performance page under Kernel Memory\Paged. You can also monitor paged pool usage over time with Windows System Monitor through the Memory\Pool Paged Bytes counter.
As a general rule then, paged pool usage in excess of 200 megabytes is cause for concern, and paged pool usage in excess of 220 megabytes requires immediate attention. If you are within these limits, and the server is still running out of paged pool, then the problem is likely that the initial paged pool allocation is insufficient. You can use a kernel debugger to verify whether this is the problem.
How can I find out how much of a server's pool usage is for user tokens?
You can find out how much kernel memory is being used for tokens at any given moment by using the Poolmon or Memsnap utilities. These utilities can be run without interrupting the server. They are available with the Support Tools for Windows 2000 and 2003.
Both utilities rely on the fact that each allocation of pool memory has a tag associated with it. The tag for tokens is called TOKE. Therefore, you can look through the output of either utility and find the TOKE line to see how much paged pool memory is in use at the moment for tokens.
NOTE: For Windows 2000, display of the pool tags is not enabled by default. You must enable tag display with the Gflags utility and then reboot the server before you can use Poolmon or Memsnap.
More information about monitoring and tuning kernel memory for Exchange is available here:
How to Use Memory Pool Monitor (Poolmon.exe) to Troubleshoot Kernel Memory Leaks
The "Ruling Out Memory-Bound Problems" section of the Troubleshooting Exchange Server 2003 Performance white paper
How to optimize memory usage in Exchange Server 2003
What can I do to reduce the size of user tokens?
There are three strategies you can follow:
- Reduce the number of security groups to which each user belongs.
Nesting security groups will not help in doing this. The SID of each nested group is stored in the user token. In fact, if you are puzzled by what appears to be a discrepancy in the number of groups to which users belong and the size of their tokens, nested security groups may be the explanation. You can simplify group administration by following the best practice of not nesting groups beyond two levels.
If you are migrating user accounts from Windows NT 4 domains or between Active Directory domains, users may have sIDHistory entries for previous domains. If users belong to a large number of groups in that domain, this can greatly increase the token size. Completing the migration and decommissioning the previous domains may therefore reduce token size.
- Host Exchange servers in a different domain than the users who connect to them.
This can reduce the size of user tokens by stripping domain local groups for the user account domain from the token presented to the Exchange server. This works because domain local groups from one domain are not kept in the token generated on a server in a different domain.
- Where possible, convert security groups to distribution groups.
Token size is increased by membership in security groups, not distribution groups. Users can belong to thousands of distribution groups with no effect on token size. If a group is not actually being used to deny or grant access to resources, it should be a distribution group, not a security group.
What else can I do to reduce the total TOKE size on my server?
Once you have reduced the typical token size to the practical minimum, the next step is to manage the number of simultaneous connections made to the server. As mentioned above, each client may make multiple connections to the server, and different clients make different numbers of connections based on a wide variety of factors. You may not even have a full list of all the clients that connect to your server.
Users may install Outlook add-ons that make additional connections. Developers may run applications that make large numbers of connections or that don't shut down connections when they are done. Therefore, the first thing to do is to analyze what kind of clients connect to your server.
Exchange System Manager (ESM) will help you do this analysis. Each database displayed in ESM has a Logons page that lists the number of connections per logon, along with much other useful information.
You can get even more detailed information, including the name of each client process, by running the Exchange Server User Monitor (EXMon), which can be downloaded here:
After you have inventoried the connections being made, you are ready to consider the following actions to reduce the number of connections:
- Restrict unauthorized clients and applications.
- Remove the Public Information Store from the server and direct clients to public folders on a different server. This eliminates public folder connections made by clients.
- Remove specific public folders that account for large numbers of client connections. Good candidates here are the Schedule+ Free/Busy folder and the Offline Address Book. Clients must make additional connections to these folders when scheduling appointments or downloading the address book.
- Add replicas of heavily accessed public folders to distribute the number of clients who connect to them between multiple servers.
- Install dedicated public folder servers to eliminate all public folder connections from mailbox servers.
- Distribute heavy connection users evenly across multiple servers. Heavy connection users are likely to be those with multiple computers or devices and field and mobile users.
- Distribute users with large security tokens across multiple servers.
What is Microsoft doing to help with this problem?
Microsoft has made several tools and scripts available to help administrators manage and monitor kernel memory usage. A new sample script will soon be available that can scan Active Directory user objects and report how many groups each user account belongs to. When published, this script will be available in the following Microsoft Knowledge Base article. We will also publish the script on this blog site within next few days.
By knowing the number of groups each user belongs to, you can estimate how large user tokens are likely to be, and judge how close you are to a line where token size will suddenly jump by another 4K.
The configuration file for the Exchange Best Practices Analyzer (ExBPA) has been updated recently and now detects many problematic hardware configurations. You can download ExBPA here.
ExBPA is updated frequently to detect new issues and environments and to make recommendations to Exchange administrators for dealing with them. Each time you run ExBPA, it can check for new configurations and versions, and automatically update itself. Running ExBPA on a regular schedule gives you immediate and automatic access to the latest best practices for Exchange configuration.
The ultimate solution for these issues is to move to a 64-bit platform. The situation is similar to that which existed when the 16-bit DOS operating system was nearing the end of its life. Making even marginal improvements in memory management became important. While careful memory optimization on the 32-bit platform can improve scalability significantly, it cannot entirely overcome the fundamental problem of the 32-bit limit.
The next version of Exchange will be 64-bit. Until that version is available, Microsoft will continue to provide guidance and optimizations for maximizing the scalability of Exchange running in a 32-bit memory space.