Windows Memory Management Reading Notes

Windows Memory Management

  • Abstracts the concept of memory to the applications
  • All applications use Virtual memory
  • Only a few core kernel process access physical memory directly
  • Virtual memory is not backed by physical storage until it is “committed”
  • Each process has a Virtual Address Space (VAS)
  • Size of VAS varies by architecture
  • 32-bit Native: 4GB (2GB user/2GB kernel)
  • 64-bit Compatibility (WoW): 4GB (kernel runs outside WoW)
  • 64-bit Native: 16TB (8TB user/8TB kernel)

Additional Reading:

Windows Memory Access

Windows Memory Access

Windows Memory Access

Windows uses the Page Table to store the mapping of Virtual Addresses to Physical Addresses. Even if there is memory available, if there is no more room in the Page Table no more memory allocations can be made. With modern systems there is likely no need to monitor PTEs, but in 32-bit systems with /3GB and /PAE enabled, running out of PTEs was more common.

Paging File

  • Each process gets the same amount of VAS regardless of the amount of physical RAM installed
  • Total VAS across all processes may exceed total physical RAM
  • Windows moves pages from RAM to the page file when physical RAM is exceeded (called trimming)
  • Systems with large amounts of RAM may not use the page file at all
  • No need for a large page file in most systems

Since all processes running on a server get the same size VAS, and the size of the VAS is larger than the maximum amount of physical RAM supported in Windows, it is possible to allocate more virtual memory than the physical memory installed in the server. In order to accommodate this memory, Windows using a Paging File (which is also referred to as Virtual Memory) to store data that does not fit in physical RAM.

Since accessing the page file is much slower than accessing RAM, Windows will move data that hasn’t been accessed recently out to the page file when needed. This is called trimming.

On modern systems with large amounts of RAM, there is typically plenty of physical memory to support the needs of the applications on the server. Often a large page file is unnecessary and just wastes disk space. Certainly the old recommendation of 1 ½ or 2 times the size of RAM is outdated.

Additional Reading:

Memory consideration for 32 bits

  • 32-bit Architecture can only address 4GB of memory (virtual or physical)
  • Physical RAM can be extended beyond 4GB by using /PAE
  • User-mode VAS can be extended using /3GB
  • Converts 1GB of kernel-mode VAS to user-mode leaving only 1GB for kernel-mode
  • Consider /userva if kernel requires more memory
  • Some applications such as SQL Server 2008 support Address Windowing Extensions (AWE) to go beyond standard VAS
  • AWE is no longer supported in SQL Server 2012

By default, the 32-bit Windows memory manager uses 32-bit pointers to map the memory. This amounts to 2^32 bytes of memory, which is 4,294,967,296 bytes, or 4 GB. You can use the /PAE switch to enable Windows to use larger pointers and, consequently, to address more memory. You set the /PAE switch in the C:\boot.ini file, and you should turn it ON whenever the server has more than 4 GB of physical memory installed. If this switch is not enabled, the 4 GB limit applies even though more memory is installed.

The 32-bit Windows operating system also uses a virtual memory system based on a 32-bit address space. Even though the 32-bit Windows systems have access to 4 GB of virtual memory per process, this virtual memory is partitioned between user mode and kernel mode. In many cases, Windows does not require a full 2 GB worth of virtual memory for kernel mode. As of Microsoft Windows NT 4.0 Enterprise Edition, Service Pack 3 (SP3), the Windows kernel changed to support a 3 GB user-mode address space and a 1 GB kernel-mode address space. You can enable this feature by using the /3GB switch in the boot.ini file. Programs will not use this extra 1 GB of user-mode address space unless they are explicitly set to do so. SQL Server will take advantage of this extra memory.

When using the /3GB switch, you must consider the following:

  • Processes that use a large number of handles may use up the page pool memory.
  • Processes that add users to a large number of security groups will cause the security token for these users to bloat. This may cause the page pool to deplete.
  • The kernel uses up the Free System page table entries (PTEs) on heavily loaded servers that are configured with the /3GB switch. This results in server instability such as random network problems (the server drops packets or can no longer be reached), which might require a system restart.

Additional Reading:

Non-Uniform Memory Access (NUMA)

  • Current trend in processor hardware design
  • Increases scalability by reducing front-side bus contention
  • CPUs and memory are divided into NUMA nodes
  • A foreign memory access is slower than a local one (NUMA ratio)

One major problem with the symmetric multiprocessing (SMP) architecture in high CPU count systems is the contention on the system bus, sometimes referred to as the memory front or front-side bus. This is because with SMP, all CPUs use the same bus to access memory. The more CPUs that use the bus, the more the contention. Increasing the speed of the system bus has only somewhat alleviated the problem. NUMA was designed to bypass the single bus architecture.

NUMA architecture groups CPU and Memory into nodes. Nodes include their own CPUs and memory, and in some cases, their own I/O channels. Each node has its own system bus for the CPUs to reference the node memory. There are interconnects between the nodes to allow the memory to be referenced by other nodes.

Memory local to a node is called local memory. Memory located in another node is referred as foreign or remote memory. Access to foreign memory is often more expensive in time than accessing local memory. The cost of local memory access over the cost of foreign memory access is called the NUMA ratio. A higher NUMA ratio implies a higher performance impact.

Because foreign memory access is slower than accessing memory local to the node, it is important for resource-intensive applications to be NUMA aware. Relying heavily on foreign memory access will make an application run noticeably slower on NUMA hardware. SQL Server is fully NUMA aware.

Currently most servers on the market have a NUMA hardware configuration. Typically each socket represents a NUMA node, although some of the newer hardware hosts 2 nodes per socket.

You can detect if your SQL Server is running on NUMA hardware by examining the startup messages in the SQL Server error log. You will see messages similar to the following:

Node configuration: node 0: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.

If there are more than just a single message for node 0, the hardware has a NUMA configuration.

Additional Reading:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s