Keystone LDAP Pool Overview

Author: John Dennis <jdennis@redhat.com>
Version: 1.0

Executive Summary

There are many misconceptions concerning the LDAP pool functionality in Keystone and what advantages it offers. In general the LDAP pool does not increase performance via parallelism. Instead performance gains via the LDAP pool derive from the fact connections to the LDAP server are "kept alive" across multiple LDAP requests thus eliminating the overhead of setting up and tearing down the connection to the LDAP server. It is unlikely the LDAP pool size will grow beyond 1 (see why pool size of 1)

What is an LDAP pool?

An LDAP pool is a collection of long lived [1] LDAP connections available to the threads executing in a single Python interpreter process. Each LDAP connection in the pool is uniquely identified by the (user,password) 2-tuple used to bind the LDAP connection. All members of the LDAP pool connect to the same LDAP server identified by the Keystone LDAP url configuration parameter. [2]

How is the LDAP pool managed?

When a Python thread needs an LDAP connection the LDAP pool is locked and searched for the first inactive connection whose (user,password) 2-tuple matches. Inactive pool members whose pool_connection_lifetime have expired when the search is performed are ejected from the pool and hence from further consideration. If no inactive connections with the requisite (user,password) 2-tuple is found then a new pool member is added. Independent of whether a candidate pool member was found in the pool or a new pool member was created the selected member is immediately marked as active and returned for use. If no candidate pool members can be located and the pool is at it's pool_size maximum then a MaxConnectionReachedError is raised.

After the connection pool member is returned it is used to perform a synchronous LDAP request. During this time the Python thread blocks until the result is returned. The LDAP pool member used to perform the synchronous LDAP request is then marked as inactive at the completion of the LDAP request thus becoming available to other threads in the same Python interpreter process. At this point any thread in the Python interpreter process can then use the already open connection belonging the newly returned inactive pool member to perform the next LDAP request in it's thread without having to perform a costly connect operation (e.g. keep-alive).

HTTPD Servers, WSGI Daemons, Python Interpreter processes and threads

In the discussion on Pool Management we learned that the LDAP pool is shared by threads in one single Python interpreter process. In a typical Keystone deployment there are many Python interpreter processes available to service Keystone requests hence there are many LDAP pools. This is contrast to the often erroneous assumption members of the LDAP pool are shared between processes thus forming one large aggregate.

Keystone is run inside a HTTPD server (e.g. Apache). Because Keystone is a Python Application it is serviced by HTTPD via the mod_wsgi module which implements WSGI (Web Server Gateway Interface).

Each HTTPD server may fork multiple copies of itself to simultaneously service incoming HTTP requests. Within an HTTPD server there may be one or more Python interpreter processes running due to the configuration of mod_wsgi. Typically mod_wsgi is run in daemon mode via the mod_wsgi WSGIDaemonProcess configuration parameter. The parent HTTPD server will fork a number of Python interpreter processes each of which may run one or more threads. This is controlled by this HTTPD configuration parameter:

WSGIDaemonProcess processes=num threads=num

Exactly how many Python interpreter processes are spawned depends on a somewhat complicated set of relationships between HTTPD configuration directives and mod_wsgi configuration parameters. This is explained in detail in the WSGI Process & Thread Documentation.

In addition to the multiple processes and threads servicing Keystone requests in a single HTTPD instance there may be multiple HTTPD instances if a load balancer is present in an High Availability (HA) deployment.

The important point to remember is in most Keystone deployments there will be many Python interpreter processes, each of which will have it's own LDAP pool. Requests to the Keystone server will be serviced in round-robin fashion across each of these Python interpreter processes.

Recalling that the LDAP pool is available for threads within a single Python interpreter process we note the following consequences:

A single Keystone request typically induces multiple LDAP requests to satisfy the Keystone request. The Keystone request will execute within only one Python thread. Thus the multiple LDAP operations need to satisfy the Keystone request will execute sequentially within the thread. This is further enforced by the fact the LDAP pool demands each LDAP request be synchronous. Hence there is no parallel overlap when executing LDAP operations for a given Keystone request.

Note

Most Keystone deployments set WSGIDaemonProcess threads=1 enforcing a one-to-one relationship between Python interpreter processes and threads with the consequence the LDAP pool size will never exceed 1. (see When WSGIDaemonProcess threads=1)

What is the advantage of using LDAP pools?

The primary advantage of using the Keystone LDAP pool is the pool's ability to keep a continuous LDAP connection open for use by the threads in a single Python interpreter process. Because a Keystone request is serviced by one thread and because during the servicing of that request multiple LDAP operations will be executed in a sequential synchronous fashion access to the LDAP server will be much faster because the same connection will be left open and utilized thus avoiding the overhead of connection setup and tear down.

Footnotes

[1](1, 2)

The duration of a connection to a LDAP server in the LDAP pool is controlled by the pool_connection_lifetime configuration variable.

pool_connection_lifetime: The maximum connection lifetime to the LDAP server in seconds. When this lifetime is exceeded, the connection will be unbound and removed from the connection pool. This option has no effect unless [ldap] use_pool is also enabled.
[2]

The LDAP url configuration parameter permits you to specify a list of LDAP server URL's. This feature has a confusing interaction with pools and automatic reconnect behavior. Internally Keystone treats the LDAP url parameter as a simple string, it does not break it apart into independent LDAP URL's where each LDAP URL is managed independently. The LDAP URL comma separated list is passed directly into the OpenLDAP client C library which then breaks the list into independent LDAP URL's. When a connection to an LDAP server is attempted the OpenLDAP client C library iterates over the members of the list attempting to connect to each in turn. Thus the Keystone code does not know which LDAP server you're actually connected to if you specify multiple servers in the url configuration parameter.

url: URL(s) for connecting to the LDAP server. Multiple LDAP URLs may be specified as a comma separated string. The first URL to successfully bind is used for the connection.
[3]The pool size would also grow if distinct (user,password) 2-tuples are used to bind the connection because the pool keeps connections segregated by their bind parameters. However the current Keystone LDAP configuration prevents multiple distinct bind credentials hence the pool can only grow when threads in a single Python interpreter process simultaneously compete for a connection.