Overview of interacting with CMS:

CMS stands for "Certificate Management System". It has been released under a variety of names, the open source version is called "dogtag".

CMS consists of a number of servlets which in rough terms can be thought of as RPC commands. A servlet is invoked by making an HTTP request to a specific URL and passing URL arguments. Normally CMS responds with an HTTP reponse consisting of HTML to be rendered by a web browser. This HTTP HTML response has both Javascript SCRIPT components and HTML rendering code. One of the Javascript SCRIPT blocks holds the data for the result. The rest of the response is derived from templates associated with the servlet which may be customized. The templates pull the result data from Javascript variables.

One way to get the result data is to parse the HTML looking for the Javascript varible initializations. Simple string searchs are not a robust method. First of all one must be sure the string is only found in a Javascript SCRIPT block and not somewhere else in the HTML document. Some of the Javascript variable initializations are rather complex (e.g. lists of structures). It would be hard to correctly parse such complex and diverse Javascript. Existing Javascript parsers are not generally available. Finally, it's important to know the character encoding for strings. There is a somewhat complex set of precident rules for determining the current character encoding from the HTTP header, meta-equiv tags, mime Content-Type and charset attributes on HTML elements. All of this means trying to read the result data from a CMS HTML response is difficult to do robustly.

However, CMS also supports returning the result data as a XML document (distinct from an XHTML document which would be essentially the same as described above). There are a wide variety of tools to robustly parse XML. Because XML is so well defined things like escapes, character encodings, etc. are automatically handled by the tools.

Thus we never try to parse Javascript, instead we always ask CMS to return us an XML document by passing the URL argument xml="true". The body of the HTTP response is an XML document rather than HTML with embedded Javascript.

To parse the XML documents we use the Python lxml package which is a Python binding around the libxml2 implementation. libxml2 is a very fast, standard compliant, feature full XML implementation. libxml2 is the XML library of choice for many projects. One of the features in lxml and libxml2 that is particularly valuable to us is the XPath implementation. We make heavy use of XPath to find data in the XML documents we're parsing.

Parse Results vs. IPA command results:

CMS results can be parsed from either HTML or XML. CMS unfortunately is not consistent with how it names items or how it utilizes data types. IPA has strict rules about data types. Also IPA would like to see a more consistent view CMS data. Therefore we split the task of parsing CMS results out from the IPA command code. The parse functions normalize the result data by using a consistent set of names and data types. The IPA command only deals with the normalized parse results. This also allow us to use different parsers if need be (i.e. if we had to parse Javascript for some reason). The parse functions attempt to parse as must information from the CMS result as is possible. It puts the parse result into a dict whose normalized key/value pairs are easy to access. IPA commands do not need to return all the parsed results, it can pick and choose what it wants to return in the IPA command result from the parse result. It also rest assured the values in the parse result will be the correct data type. Thus the general sequence of steps for an IPA command talking to CMS are:

Receive IPA arguments from IPA command
Formulate URL with arguments for CMS
Make request to CMS server
Extract XML document from HTML body returned by CMS
Parse XML document using matching parse routine which returns response dict
Extract relevant items from parse result and insert into command result
Return command result

Serial Numbers:

Serial numbers are integral values of any magnitude because they are based on ASN.1 integers. CMS uses the Java BigInteger to represent these. Fortunately Python also has support for big integers via the Python long() object. Any BigIntegers we receive from CMS as a string can be parsed into a Python long without loss of information.

However Python has a neat trick. It normally represents integers via the int object which internally uses the native C long type. If you create an int object by passing the int constructor a string it will check the magnitude of the value. If it would fit in a C long then it returns you an int object. However if the value is too big for a C long type then it returns you a Python long object instead. This is a very nice property because it's much more efficient to use C long types when possible (e.g. Python int), but when necessary you'll get a Python long() object to handle large magnitude values. Python also nicely handles type promotion transparently between int and long objects. For example if you multiply two int objects you may get back a long object if necessary. In general Python int and long objects may be freely mixed without the programmer needing to be aware of which type of intergral object is being operated on.

The leads to the following rule, always parse a string representing an integral value using the int() constructor even if it might have large magnitude because Python will return either an int or a long automatically. By the same token don't test for type of an object being int exclusively because it could either be an int or a long object.

Internally we should always being using int or long object to hold integral values. This is because we should be able to compare them correctly, be free from concerns about having the know the radix of the string, perform arithmetic operations, and convert to string representation (with correct radix) when necessary. In other words internally we should never handle integral values as strings.

However, the XMLRPC transport cannot properly handle a Python long object. The XMLRPC encoder upon seeing a Python long will test to see if the value fits within the range of an 32-bit integer, if so it passes the integer parameter otherwise it raises an Overflow exception. The XMLRPC specification does permit 64-bit integers (e.g. i8) and the Python XMLRPC module could allow long values within the 64-bit range to be passed if it were patched, however this only moves the problem, it does not solve passing big integers through XMLRPC. Thus we must always pass big integers as a strings through the XMLRPC interface. But upon receiving that value from XMLRPC we should convert it back into an int or long object. Recall also that Python will automatically perform a conversion to string if you output the int or long object in a string context.

Radix Issues:

CMS uses the following conventions: Serial numbers are always returned as hexadecimal strings without a radix prefix. When CMS takes a serial number as input it accepts the value in either decimal or hexadecimal utilizing the radix prefix (e.g. 0x) to determine how to parse the value.

IPA has adopted the convention that all integral values in the user interface will use base 10 decimal radix.

Basic rules on handling these values

Reading a serial number from CMS requires conversion from hexadecimal by converting it into a Python int or long object, use the int constructor:
```
>>> serial_number = int(serial_number, 16)
```
Big integers passed to XMLRPC must be decimal unicode strings
```
>>> unicode(serial_number)
```
Big integers received from XMLRPC must be converted back to int or long objects from the decimal string representation.
```
>>> serial_number = int(serial_number)
```

Xpath pattern matching on node names:

There are many excellent tutorial on how to use xpath to find items in an XML document, as such there is no need to repeat this information here. However, most xpath tutorials make the assumption the node names you're searching for are fixed. For example:

doc.xpath('//book/chapter[*]/section[2]')

Selects the second section of every chapter of the book. In this example the node names 'book', 'chapter', 'section' are fixed. But what if the XML document embedded the chapter number in the node name, for example 'chapter1', 'chapter2', etc.? (If you're thinking this would be incredibly lame, you're right, but sadly people do things like this). Thus in this case you can't use the node name 'chapter' in the xpath location step because it's not fixed and hence won't match 'chapter1', 'chapter2', etc. The solution to this seems obvious, use some type of pattern matching on the node name. Unfortunately this advanced use of xpath is seldom discussed in tutorials and it's not obvious how to do it. Here are some hints.

Use the built-in xpath string functions. Most of the examples illustrate the string function being passed the text contents of the node via '.' or string(.). However we don't want to pass the contents of the node, instead we want to pass the node name. To do this use the name() function. One way we could solve the chapter problem above is by using a predicate which says if the node name begins with 'chapter' it's a match. Here is how you can do that.

>>> doc.xpath("//book/*[starts-with(name(), 'chapter')]/section[2]")

The built-in starts-with() returns true if it's first argument starts with it's second argument. Thus the example above says if the node name of the second location step begins with 'chapter' consider it a match and the search proceeds to the next location step, which in this example is any node named 'section'.

But what if we would like to utilize the power of regular expressions to perform the test against the node name? In this case we can use the EXSLT regular expression extension. EXSLT extensions are accessed by using XML namespaces. The regular expression name space identifier is 're:' In lxml we need to pass a set of namespaces to XPath object constructor in order to allow it to bind to those namespaces during it's evaluation. Then we just use the EXSLT regular expression match() function on the node name. Here is how this is done:

>>> regexpNS = "http://exslt.org/regular-expressions"
>>> find = etree.XPath("//book/*[re:match(name(), '^chapter(_\d+)$')]/section[2]",
...                    namespaces={'re':regexpNS}
>>> find(doc)

What is happening here is that etree.XPath() has returned us an evaluator function which we bind to the name 'find'. We've passed it a set of namespaces as a dict via the 'namespaces' keyword parameter of etree.XPath(). The predicate for the second location step uses the 're:' namespace to find the function name 'match'. The re:match() takes a string to search as it's first argument and a regular expression pattern as it's second argument. In this example the string to seach is the node name of the location step because we called the built-in node() function of XPath. The regular expression pattern we've passed says it's a match if the string begins with 'chapter' is followed by any number of digits and nothing else follows.

parse_and_set_boolean_xml(node, response, response_name)

source code

Read the value out of a xml text node and interpret it as a boolean value. The text values are stripped of whitespace and converted to lower case prior to interpretation.

If the value is recognized the response dict is updated using the request_name as the key and the value is set to the bool value of either True or False depending on the interpretation of the text value. If the text value is not recognized a ValueError exception is thrown.

Text values which result in True:

true
yes
on

Text values which result in False:

false
no
off

Parameters:

node - xml node object containing value to parse for boolean result
response - response dict to set boolean result in
response_name - name of the respone value to set

Raises:

ValueError

parse_error_template_xml(doc)

source code

CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).

This routine is used to parse a "template" style error or exception document.

This routine should be use when the CMS requestStatus is ERROR or EXCEPTION. It is capable of parsing both. A CMS ERROR occurs when a known anticipated error condition occurs (e.g. asking for an item which does not exist). A CMS EXCEPTION occurs when an exception is thrown in the CMS server and it's not caught and converted into an ERROR. Think of EXCEPTIONS as the "catch all" error situation.

ERROR's and EXCEPTIONS's both have error message strings associated with them. For an ERROR it's errorDetails, for an EXCEPTION it's unexpectedError. In addition an EXCEPTION may include an array of additional error strings in it's errorDescription field.

After parsing the results are returned in a result dict. The following table illustrates the mapping from the CMS data item to what may be found in the result dict. If a CMS data item is absent it will also be absent in the result dict.

cms name	cms type	result name	result type
requestStatus	int	request_status	int
errorDetails	string	error_string [1]	unicode
unexpectedError	string	error_string [1]	unicode
errorDescription	[string]	error_descriptions	[unicode]
authority	string	authority	unicode

[1]

(1, 2) errorDetails is the error message string when the requestStatus is ERROR. unexpectedError is the error message string when the requestStatus is EXCEPTION. This routine recognizes both ERROR's and EXCEPTION's and depending on which is found folds the error message into the error_string result value.

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

parse_error_response_xml(doc)

source code

CMS currently returns errors via XML as either a "template" document (generated by CMSServlet.outputXML() or a "response" document (generated by CMSServlet.outputError()).

This routine is used to parse a "response" style error document.

cms name	cms type	result name	result type
Status	int	error_code	int [1]
Error	string	error_string	unicode
RequestID	string	request_id	string

[1]	error code may be one of: CMS_SUCCESS = 0 CMS_FAILURE = 1 CMS_AUTH_FAILURE = 2 However, profileSubmit sometimes also returns these values: EXCEPTION = 1 DEFERRED = 2 REJECTED = 3

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

parse_profile_submit_result_xml(doc)

source code

CMS returns an error code and an array of request records.

This function returns a response dict with the following format: {'error_code' : int, 'requests' : [{}]}

The mapping of fields and data types is illustrated in the following table.

If the error_code is not SUCCESS then the response dict will have the contents described in parse_error_response_xml.

cms name	cms type	result name	result type
Status	int	error_code	int
Requests[].Id	string	requests[].request_id	unicode
Requests[].SubjectDN	string	requests[].subject	unicode
Requests[].serialno	BigInteger	requests[].serial_number	int\|long
Requests[].b64	string	requests[].certificate	unicode [1]
Requests[].pkcs7	string

[1]	Base64 encoded

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

Raises:

ValueError

parse_check_request_result_xml(doc)

source code

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name	cms type	result name	result type
authority	string	authority	unicode
requestId	string	request_id	string
staus	string	cert_request_status	unicode [1]
createdOn	long, timestamp	created_on	datetime.datetime
updatedOn	long, timestamp	updated_on	datetime.datetime
requestNotes	string	request_notes	unicode
pkcs7ChainBase64	string	pkcs7_chain	unicode [2]
cmcFullEnrollmentResponse	string	full_response	unicode [2]
records[].serialNumber	BigInteger	serial_numbers	[int\|long]

[1]	cert_request_status may be one of: "begin" "pending" "approved" "svc_pending" "canceled" "rejected" "complete"

[2]	(1, 2) Base64 encoded

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

Raises:

ValueError

parse_display_cert_xml(doc)

source code

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name	cms type	result name	result type
emailCert	Boolean	email_cert	bool
noCertImport	Boolean	no_cert_import	bool
revocationReason	int	revocation_reason	int [1]
certPrettyPrint	string	cert_pretty	unicode
authorityid	string	authority	unicode
certFingerprint	string	fingerprint	unicode
certChainBase64	string	certificate	unicode [2]
serialNumber	string	serial_number	int\|long
pkcs7ChainBase64	string	pkcs7_chain	unicode [2]

[1]	revocation reason may be one of: 0 = UNSPECIFIED 1 = KEY_COMPROMISE 2 = CA_COMPROMISE 3 = AFFILIATION_CHANGED 4 = SUPERSEDED 5 = CESSATION_OF_OPERATION 6 = CERTIFICATE_HOLD 8 = REMOVE_FROM_CRL 9 = PRIVILEGE_WITHDRAWN 10 = AA_COMPROMISE

[2]	(1, 2) Base64 encoded

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

Raises:

ValueError

parse_revoke_cert_xml(doc)

source code

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name	cms type	result name	result type
dirEnabled	string [1]	dir_enabled	bool
certsUpdated	int	certs_updated	int
certsToUpdate	int	certs_to_update	int
error	string [2]	error_string	unicode
revoked	string [3]	revoked	unicode
totalRecordCount	int	total_record_count	int
updateCRL	string [1] [4]	update_crl	bool
updateCRLSuccess	string [1] [4]	update_crl_success	bool
updateCRLError	string [4]	update_crl_error	unicode
publishCRLSuccess	string [1]_[4]_	publish_crl_success	bool
publishCRLError	string [4]	publish_crl_error	unicode
crlUpdateStatus	string [1] [5]	crl_update_status	bool
crlUpdateError	string [5]	crl_update_error	unicode
crlPublishStatus	string [1] [5]	crl_publish_status	bool
crlPublishError	string [5]	crl_publish_error	unicode
records[].serialNumber	BigInteger	records[].serial_number	int\|long
records[].error	string [2]	records[].error_string	unicode

[1]	(1, 2, 3, 4, 5) String value is either "yes" or "no"

[2]	(1, 2) Sometimes the error string is empty (null)

[3]	revoked may be one of: "yes" "no" "begin" "pending" "approved" "svc_pending" "canceled" "rejected" "complete"

[4]	(1, 2, 3, 4) Only sent if CRL update information is available. If sent it's only value is "yes". If sent then the following values may also be sent, otherwise they will be absent: updateCRLSuccess updateCRLError publishCRLSuccess publishCRLError

[5]

(1, 2, 3, 4) The cms name varies depending on whether the issuing point is MasterCRL or not. If the issuing point is not the MasterCRL then the cms name will be appended with an underscore and the issuing point name. Thus for example the cms name crlUpdateStatus will be crlUpdateStatus if the issuing point is the MasterCRL. However if the issuing point is "foobar" then crlUpdateStatus will be crlUpdateStatus_foobar. When we return the response dict the key will always be the "base" name without the _issuing_point suffix. Thus crlUpdateStatus_foobar will appear in the response dict under the key 'crl_update_status'

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

Raises:

ValueError

parse_unrevoke_cert_xml(doc)

source code

If the requestStatus is not SUCCESS then the response dict will have the contents described in parse_error_template_xml.

cms name	cms type	result name	result type
dirEnabled	string [1]	dir_enabled	bool
dirUpdated	string [1]	dir_updated	bool
error	string	error_string	unicode
unrevoked	string [3]	unrevoked	unicode
updateCRL	string [1] [4]	update_crl	bool
updateCRLSuccess	string [1] [4]	update_crl_success	bool
updateCRLError	string [4]	update_crl_error	unicode
publishCRLSuccess	string [1] [4]	publish_crl_success	bool
publishCRLError	string [4]	publish_crl_error	unicode
crlUpdateStatus	string [1] [5]	crl_update_status	bool
crlUpdateError	string [5]	crl_update_error	unicode
crlPublishStatus	string [1] [5]	crl_publish_status	bool
crlPublishError	string [5]	crl_publish_error	unicode
serialNumber	BigInteger	serial_number	int\|long

[1]	(1, 2, 3, 4, 5, 6, 7) String value is either "yes" or "no"

[3]	unrevoked may be one of: "yes" "no" "pending"

[4]	(1, 2, 3, 4, 5) Only sent if CRL update information is available. If sent it's only value is "yes". If sent then the following values may also be sent, otherwise they will be absent: updateCRLSuccess updateCRLError publishCRLSuccess publishCRLError

[5]

Parameters:

doc - The root node of the xml document to parse

Returns:

result dict

Raises:

ValueError

Module dogtag

Overview of interacting with CMS:

Parse Results vs. IPA command results:

Serial Numbers:

Radix Issues:

Xpath pattern matching on node names:

cms_request_status_to_string(request_status)

cms_error_code_to_string(error_code)

parse_and_set_boolean_xml(node, response, response_name)

get_error_code_xml(doc)

get_request_status_xml(doc)

parse_error_template_xml(doc)

parse_error_response_xml(doc)

parse_profile_submit_result_xml(doc)

parse_check_request_result_xml(doc)

parse_display_cert_xml(doc)

parse_revoke_cert_xml(doc)

parse_unrevoke_cert_xml(doc)