Module dogtag
[hide private]
[frames] | no frames]

Module dogtag

source code

Overview of interacting with CMS:

CMS stands for "Certificate Management System". It has been released under a variety of names, the open source version is called "dogtag".

CMS consists of a number of servlets which in rough terms can be thought of as RPC commands. A servlet is invoked by making an HTTP request to a specific URL and passing URL arguments. Normally CMS responds with an HTTP reponse consisting of HTML to be rendered by a web browser. This HTTP HTML response has both Javascript SCRIPT components and HTML rendering code. One of the Javascript SCRIPT blocks holds the data for the result. The rest of the response is derived from templates associated with the servlet which may be customized. The templates pull the result data from Javascript variables.

One way to get the result data is to parse the HTML looking for the Javascript varible initializations. Simple string searchs are not a robust method. First of all one must be sure the string is only found in a Javascript SCRIPT block and not somewhere else in the HTML document. Some of the Javascript variable initializations are rather complex (e.g. lists of structures). It would be hard to correctly parse such complex and diverse Javascript. Existing Javascript parsers are not generally available. Finally, it's important to know the character encoding for strings. There is a somewhat complex set of precident rules for determining the current character encoding from the HTTP header, meta-equiv tags, mime Content-Type and charset attributes on HTML elements. All of this means trying to read the result data from a CMS HTML response is difficult to do robustly.

However, CMS also supports returning the result data as a XML document (distinct from an XHTML document which would be essentially the same as described above). There are a wide variety of tools to robustly parse XML. Because XML is so well defined things like escapes, character encodings, etc. are automatically handled by the tools.

Thus we never try to parse Javascript, instead we always ask CMS to return us an XML document by passing the URL argument xml="true". The body of the HTTP response is an XML document rather than HTML with embedded Javascript.

To parse the XML documents we use the Python lxml package which is a Python binding around the libxml2 implementation. libxml2 is a very fast, standard compliant, feature full XML implementation. libxml2 is the XML library of choice for many projects. One of the features in lxml and libxml2 that is particularly valuable to us is the XPath implementation. We make heavy use of XPath to find data in the XML documents we're parsing.

Parse Results vs. IPA command results:

CMS results can be parsed from either HTML or XML. CMS unfortunately is not consistent with how it names items or how it utilizes data types. IPA has strict rules about data types. Also IPA would like to see a more consistent view CMS data. Therefore we split the task of parsing CMS results out from the IPA command code. The parse functions normalize the result data by using a consistent set of names and data types. The IPA command only deals with the normalized parse results. This also allow us to use different parsers if need be (i.e. if we had to parse Javascript for some reason). The parse functions attempt to parse as must information from the CMS result as is possible. It puts the parse result into a dict whose normalized key/value pairs are easy to access. IPA commands do not need to return all the parsed results, it can pick and choose what it wants to return in the IPA command result from the parse result. It also rest assured the values in the parse result will be the correct data type. Thus the general sequence of steps for an IPA command talking to CMS are:

  1. Receive IPA arguments from IPA command
  2. Formulate URL with arguments for CMS
  3. Make request to CMS server
  4. Extract XML document from HTML body returned by CMS
  5. Parse XML document using matching parse routine which returns response dict
  6. Extract relevant items from parse result and insert into command result
  7. Return command result

Serial Numbers:

Serial numbers are integral values of any magnitude because they are based on ASN.1 integers. CMS uses the Java BigInteger to represent these. Fortunately Python also has support for big integers via the Python long() object. Any BigIntegers we receive from CMS as a string can be parsed into a Python long without loss of information.

However Python has a neat trick. It normally represents integers via the int object which internally uses the native C long type. If you create an int object by passing the int constructor a string it will check the magnitude of the value. If it would fit in a C long then it returns you an int object. However if the value is too big for a C long type then it returns you a Python long object instead. This is a very nice property because it's much more efficient to use C long types when possible (e.g. Python int), but when necessary you'll get a Python long() object to handle large magnitude values. Python also nicely handles type promotion transparently between int and long objects. For example if you multiply two int objects you may get back a long object if necessary. In general Python int and long objects may be freely mixed without the programmer needing to be aware of which type of intergral object is being operated on.

The leads to the following rule, always parse a string representing an integral value using the int() constructor even if it might have large magnitude because Python will return either an int or a long automatically. By the same token don't test for type of an object being int exclusively because it could either be an int or a long object.

Internally we should always being using int or long object to hold integral values. This is because we should be able to compare them correctly, be free from concerns about having the know the radix of the string, perform arithmetic operations, and convert to string representation (with correct radix) when necessary. In other words internally we should never handle integral values as strings.

However, the XMLRPC transport cannot properly handle a Python long object. The XMLRPC encoder upon seeing a Python long will test to see if the value fits within the range of an 32-bit integer, if so it passes the integer parameter otherwise it raises an Overflow exception. The XMLRPC specification does permit 64-bit integers (e.g. i8) and the Python XMLRPC module could allow long values within the 64-bit range to be passed if it were patched, however this only moves the problem, it does not solve passing big integers through XMLRPC. Thus we must always pass big integers as a strings through the XMLRPC interface. But upon receiving that value from XMLRPC we should convert it back into an int or long object. Recall also that Python will automatically perform a conversion to string if you output the int or long object in a string context.

Radix Issues:

CMS uses the following conventions: Serial numbers are always returned as hexadecimal strings without a radix prefix. When CMS takes a serial number as input it accepts the value in either decimal or hexadecimal utilizing the radix prefix (e.g. 0x) to determine how to parse the value.

IPA has adopted the convention that all integral values in the user interface will use base 10 decimal radix.

Basic rules on handling these values

  1. Reading a serial number from CMS requires conversion from hexadecimal by converting it into a Python int or long object, use the int constructor:

    >>> serial_number = int(serial_number, 16)
  2. Big integers passed to XMLRPC must be decimal unicode strings

    >>> unicode(serial_number)
  3. Big integers received from XMLRPC must be converted back to int or long objects from the decimal string representation.

    >>> serial_number = int(serial_number)

Xpath pattern matching on node names:

There are many excellent tutorial on how to use xpath to find items in an XML document, as such there is no need to repeat this information here. However, most xpath tutorials make the assumption the node names you're searching for are fixed. For example:

doc.xpath('//book/chapter[*]/section[2]')

Selects the second section of every chapter of the book. In this example the node names 'book', 'chapter', 'section' are fixed. But what if the XML document embedded the chapter number in the node name, for example 'chapter1', 'chapter2', etc.? (If you're thinking this would be incredibly lame, you're right, but sadly people do things like this). Thus in this case you can't use the node name 'chapter' in the xpath location step because it's not fixed and hence won't match 'chapter1', 'chapter2', etc. The solution to this seems obvious, use some type of pattern matching on the node name. Unfortunately this advanced use of xpath is seldom discussed in tutorials and it's not obvious how to do it. Here are some hints.

Use the built-in xpath string functions. Most of the examples illustrate the string function being passed the text contents of the node via '.' or string(.). However we don't want to pass the contents of the node, instead we want to pass the node name. To do this use the name() function. One way we could solve the chapter problem above is by using a predicate which says if the node name begins with 'chapter' it's a match. Here is how you can do that.

>>> doc.xpath("//book/*[starts-with(name(), 'chapter')]/section[2]")

The built-in starts-with() returns true if it's first argument starts with it's second argument. Thus the example above says if the node name of the second location step begins with 'chapter' consider it a match and the search proceeds to the next location step, which in this example is any node named 'section'.

But what if we would like to utilize the power of regular expressions to perform the test against the node name? In this case we can use the EXSLT regular expression extension. EXSLT extensions are accessed by using XML namespaces. The regular expression name space identifier is 're:' In lxml we need to pass a set of namespaces to XPath object constructor in order to allow it to bind to those namespaces during it's evaluation. Then we just use the EXSLT regular expression match() function on the node name. Here is how this is done:

>>> regexpNS = "http://exslt.org/regular-expressions"
>>> find = etree.XPath("//book/*[re:match(name(), '^chapter(_\d+)$')]/section[2]",
...                    namespaces={'re':regexpNS}
>>> find(doc)

What is happening here is that etree.XPath() has returned us an evaluator function which we bind to the name 'find'. We've passed it a set of namespaces as a dict via the 'namespaces' keyword parameter of etree.XPath(). The predicate for the second location step uses the 're:' namespace to find the function name 'match'. The re:match() takes a string to search as it's first argument and a regular expression pattern as it's second argument. In this example the string to seach is the node name of the location step because we called the built-in node() function of XPath. The regular expression pattern we've passed says it's a match if the string begins with 'chapter' is followed by any number of digits and nothing else follows.

Classes [hide private]
  ra
Request Authority backend plugin.
Functions [hide private]
 
cms_request_status_to_string(request_status)
Returns: String name of request status
source code
 
cms_error_code_to_string(error_code)
Returns: String name of the error code
source code
 
parse_and_set_boolean_xml(node, response, response_name)
Read the value out of a xml text node and interpret it as a boolean value.
source code
 
get_error_code_xml(doc)
Returns the error code when the servlet replied with CMSServlet.outputError()
source code
 
get_request_status_xml(doc)
Returns the request status from a CMS operation.
source code
 
parse_error_template_xml(doc)
CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).
source code
 
parse_error_response_xml(doc)
CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).
source code
 
parse_profile_submit_result_xml(doc)
CMS returns an error code and an array of request records.
source code
 
parse_check_request_result_xml(doc)
After parsing the results are returned in a result dict.
source code
 
parse_display_cert_xml(doc)
After parsing the results are returned in a result dict.
source code
 
parse_revoke_cert_xml(doc)
After parsing the results are returned in a result dict.
source code
 
parse_unrevoke_cert_xml(doc)
After parsing the results are returned in a result dict.
source code
Variables [hide private]
  CMS_SUCCESS = 0
  CMS_FAILURE = 1
  CMS_AUTH_FAILURE = 2
  CMS_STATUS_UNAUTHORIZED = 1
  CMS_STATUS_SUCCESS = 2
  CMS_STATUS_PENDING = 3
  CMS_STATUS_SVC_PENDING = 4
  CMS_STATUS_REJECTED = 5
  CMS_STATUS_ERROR = 6
  CMS_STATUS_EXCEPTION = 7
Function Details [hide private]

cms_request_status_to_string(request_status)

source code 
Parameters:
  • request_status - The integral request status value
Returns:
String name of request status

cms_error_code_to_string(error_code)

source code 
Parameters:
  • error_code - The integral error code value
Returns:
String name of the error code

parse_and_set_boolean_xml(node, response, response_name)

source code 

Read the value out of a xml text node and interpret it as a boolean value. The text values are stripped of whitespace and converted to lower case prior to interpretation.

If the value is recognized the response dict is updated using the request_name as the key and the value is set to the bool value of either True or False depending on the interpretation of the text value. If the text value is not recognized a ValueError exception is thrown.

Text values which result in True:

  • true
  • yes
  • on

Text values which result in False:

  • false
  • no
  • off
Parameters:
  • node - xml node object containing value to parse for boolean result
  • response - response dict to set boolean result in
  • response_name - name of the respone value to set
Raises:
  • ValueError

get_error_code_xml(doc)

source code 

Returns the error code when the servlet replied with CMSServlet.outputError()

The possible error code values are:

  • CMS_SUCCESS = 0
  • CMS_FAILURE = 1
  • CMS_AUTH_FAILURE = 2

However, profileSubmit sometimes also returns these values:

  • EXCEPTION = 1
  • DEFERRED = 2
  • REJECTED = 3
Parameters:
  • doc - The root node of the xml document to parse
Returns:
error code as an integer or None if not found

get_request_status_xml(doc)

source code 

Returns the request status from a CMS operation. May be one of:

  • CMS_STATUS_UNAUTHORIZED = 1
  • CMS_STATUS_SUCCESS = 2
  • CMS_STATUS_PENDING = 3
  • CMS_STATUS_SVC_PENDING = 4
  • CMS_STATUS_REJECTED = 5
  • CMS_STATUS_ERROR = 6
  • CMS_STATUS_EXCEPTION = 7

CMS will often fail to return requestStatus when the status is SUCCESS. Therefore if we fail to find a requestStatus field we default the result to CMS_STATUS_SUCCESS.

Parameters:
  • doc - The root node of the xml document to parse
Returns:
request status as an integer

parse_error_template_xml(doc)

source code 

CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).

This routine is used to parse a "template" style error or exception document.

This routine should be use when the CMS requestStatus is ERROR or EXCEPTION. It is capable of parsing both. A CMS ERROR occurs when a known anticipated error condition occurs (e.g. asking for an item which does not exist). A CMS EXCEPTION occurs when an exception is thrown in the CMS server and it's not caught and converted into an ERROR. Think of EXCEPTIONS as the "catch all" error situation.

ERROR's and EXCEPTIONS's both have error message strings associated with them. For an ERROR it's errorDetails, for an EXCEPTION it's unexpectedError. In addition an EXCEPTION may include an array of additional error strings in it's errorDescription field.

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

cms name cms type result name result type
requestStatus int request_status int
errorDetails string error_string [1] unicode
unexpectedError string error_string [1] unicode
errorDescription [string] error_descriptions [unicode]
authority string authority unicode
[1](1, 2) errorDetails is the error message string when the requestStatus is ERROR. unexpectedError is the error message string when the requestStatus is EXCEPTION. This routine recognizes both ERROR's and EXCEPTION's and depending on which is found folds the error message into the error_string result value.
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict

parse_error_response_xml(doc)

source code 

CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).

This routine is used to parse a "response" style error document.

cms name cms type result name result type
Status int error_code int [1]
Error string error_string unicode
RequestID string request_id string
[1]

error code may be one of:

  • CMS_SUCCESS = 0
  • CMS_FAILURE = 1
  • CMS_AUTH_FAILURE = 2

However, profileSubmit sometimes also returns these values:

  • EXCEPTION = 1
  • DEFERRED = 2
  • REJECTED = 3
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict

parse_profile_submit_result_xml(doc)

source code 

CMS returns an error code and an array of request records.

This function returns a response dict with the following format: {'error_code' : int, 'requests' : [{}]}

The mapping of fields and data types is illustrated in the following table.

If the error_code is not SUCCESS then the response dict will have the contents described in parse_error_response_xml.

cms name cms type result name result type
Status int error_code int
Requests[].Id string requests[].request_id unicode
Requests[].SubjectDN string requests[].subject unicode
Requests[].serialno BigInteger requests[].serial_number int|long
Requests[].b64 string requests[].certificate unicode [1]
Requests[].pkcs7 string    
[1]Base64 encoded
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict
Raises:
  • ValueError

parse_check_request_result_xml(doc)

source code 

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name cms type result name result type
authority string authority unicode
requestId string request_id string
staus string cert_request_status unicode [1]
createdOn long, timestamp created_on datetime.datetime
updatedOn long, timestamp updated_on datetime.datetime
requestNotes string request_notes unicode
pkcs7ChainBase64 string pkcs7_chain unicode [2]
cmcFullEnrollmentResponse string full_response unicode [2]
records[].serialNumber BigInteger serial_numbers [int|long]
[1]

cert_request_status may be one of:

  • "begin"
  • "pending"
  • "approved"
  • "svc_pending"
  • "canceled"
  • "rejected"
  • "complete"
[2](1, 2) Base64 encoded
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict
Raises:
  • ValueError

parse_display_cert_xml(doc)

source code 

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name cms type result name result type
emailCert Boolean email_cert bool
noCertImport Boolean no_cert_import bool
revocationReason int revocation_reason int [1]
certPrettyPrint string cert_pretty unicode
authorityid string authority unicode
certFingerprint string fingerprint unicode
certChainBase64 string certificate unicode [2]
serialNumber string serial_number int|long
pkcs7ChainBase64 string pkcs7_chain unicode [2]
[1]

revocation reason may be one of:

  • 0 = UNSPECIFIED
  • 1 = KEY_COMPROMISE
  • 2 = CA_COMPROMISE
  • 3 = AFFILIATION_CHANGED
  • 4 = SUPERSEDED
  • 5 = CESSATION_OF_OPERATION
  • 6 = CERTIFICATE_HOLD
  • 8 = REMOVE_FROM_CRL
  • 9 = PRIVILEGE_WITHDRAWN
  • 10 = AA_COMPROMISE
[2](1, 2) Base64 encoded
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict
Raises:
  • ValueError

parse_revoke_cert_xml(doc)

source code 

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name cms type result name result type
dirEnabled string [1] dir_enabled bool
certsUpdated int certs_updated int
certsToUpdate int certs_to_update int
error string [2] error_string unicode
revoked string [3] revoked unicode
totalRecordCount int total_record_count int
updateCRL string [1] [4] update_crl bool
updateCRLSuccess string [1] [4] update_crl_success bool
updateCRLError string [4] update_crl_error unicode
publishCRLSuccess string [1]_[4]_ publish_crl_success bool
publishCRLError string [4] publish_crl_error unicode
crlUpdateStatus string [1] [5] crl_update_status bool
crlUpdateError string [5] crl_update_error unicode
crlPublishStatus string [1] [5] crl_publish_status bool
crlPublishError string [5] crl_publish_error unicode
records[].serialNumber BigInteger records[].serial_number int|long
records[].error string [2] records[].error_string unicode
[1](1, 2, 3, 4, 5) String value is either "yes" or "no"
[2](1, 2) Sometimes the error string is empty (null)
[3]

revoked may be one of:

  • "yes"
  • "no"
  • "begin"
  • "pending"
  • "approved"
  • "svc_pending"
  • "canceled"
  • "rejected"
  • "complete"
[4](1, 2, 3, 4)

Only sent if CRL update information is available. If sent it's only value is "yes". If sent then the following values may also be sent, otherwise they will be absent:

  • updateCRLSuccess
  • updateCRLError
  • publishCRLSuccess
  • publishCRLError
[5](1, 2, 3, 4) The cms name varies depending on whether the issuing point is MasterCRL or not. If the issuing point is not the MasterCRL then the cms name will be appended with an underscore and the issuing point name. Thus for example the cms name crlUpdateStatus will be crlUpdateStatus if the issuing point is the MasterCRL. However if the issuing point is "foobar" then crlUpdateStatus will be crlUpdateStatus_foobar. When we return the response dict the key will always be the "base" name without the _issuing_point suffix. Thus crlUpdateStatus_foobar will appear in the response dict under the key 'crl_update_status'
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict
Raises:
  • ValueError

parse_unrevoke_cert_xml(doc)

source code 

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name cms type result name result type
dirEnabled string [1] dir_enabled bool
dirUpdated string [1] dir_updated bool
error string error_string unicode
unrevoked string [3] unrevoked unicode
updateCRL string [1] [4] update_crl bool
updateCRLSuccess string [1] [4] update_crl_success bool
updateCRLError string [4] update_crl_error unicode
publishCRLSuccess string [1] [4] publish_crl_success bool
publishCRLError string [4] publish_crl_error unicode
crlUpdateStatus string [1] [5] crl_update_status bool
crlUpdateError string [5] crl_update_error unicode
crlPublishStatus string [1] [5] crl_publish_status bool
crlPublishError string [5] crl_publish_error unicode
serialNumber BigInteger serial_number int|long
[1](1, 2, 3, 4, 5, 6, 7) String value is either "yes" or "no"
[3]

unrevoked may be one of:

  • "yes"
  • "no"
  • "pending"
[4](1, 2, 3, 4, 5)

Only sent if CRL update information is available. If sent it's only value is "yes". If sent then the following values may also be sent, otherwise they will be absent:

  • updateCRLSuccess
  • updateCRLError
  • publishCRLSuccess
  • publishCRLError
[5](1, 2, 3, 4) The cms name varies depending on whether the issuing point is MasterCRL or not. If the issuing point is not the MasterCRL then the cms name will be appended with an underscore and the issuing point name. Thus for example the cms name crlUpdateStatus will be crlUpdateStatus if the issuing point is the MasterCRL. However if the issuing point is "foobar" then crlUpdateStatus will be crlUpdateStatus_foobar. When we return the response dict the key will always be the "base" name without the _issuing_point suffix. Thus crlUpdateStatus_foobar will appear in the response dict under the key 'crl_update_status'
Parameters:
  • doc - The root node of the xml document to parse
Returns:
result dict
Raises:
  • ValueError