Federation Mapping

Introduction

With federated authentication a trusted external IdP (Identity Provider) authenticates an entity and provides attributes associated with the authenticated entity such as full name, organization, location, group membership, roles, etc. to the local authorization system. The remote identity attributes usually do not directly correlate to the local identity attributes as such they must be mapped or transformed to be consistent with the local identity attributes.

Problems with existing contributed OpenStack federation mapping

A federated mapping implementation was contributed to OpenStack which is based on static rules. However that mapping system lacks basic features required in real world deployments. Examples of the deficiencies are:

Replacement substitutions are specified by an index, e.g. {0} making them difficult to use.
- If you edit a rule all the indices shift and you have carefully adjust multiple locations, this is tedious and error prone.
- Numeric replacements are not friendly, it's difficult to remember what index maps to which value. Likewise its difficult to read a rule with a indexed substitutions and understand what is being replaced, a number provides no context for the reader.
Impossible to specify how strings containing lists of values are to be split.
Any string containing a colon is automatically split irregardless of semantics of the string.
Impossible to perform tests and perform conditional logic. For example you cannot test if the user is a member of a group and if so add a role based on the group membership. You cannot test if a list is empty, has a certain number of members, if a value starts with a prefix or ends with a suffix, etc.
No mechanism to handle case sensitivity.
Cannot split direct map value into multiple values, e.g. user@domain cannot be returned as ['user', 'domain'], i.e. {0}, {1}. This requires more robust regular expression support than is provided.
Cannot operate on substrings or do replacements. Common operations such as replacing hyphens with underscores or stripping off prefixes or suffixes are impossible.
Direct map values cannot be reassigned by later rules. e.g. email might be in assertion so it would be direct map, but if it was absent one can't synthesize it from other values in the assertion, i.e. user + @ + domain
Difficult to build values by string interpolation or concatenation.
The local array elements are dicts whose keys can contain 'user' or 'group' or both, but you can't have more that one user or group in an element and elements that define a user more than once produce an error.
Logical OR requires cut-n-paste copies of many rules with only a minor difference in each rule, rules quickly become unreadable and difficult to ascertain the logic.
not_any_of does not work when regex is True, enabling regex effectively changes the condition to any_one_of. (This is a bug that can be fixed)

Examples of real work tasks current mapping cannot handle

I've worked with RADIUS for years. RADIUS is often configured to operate in a federated mode where the attributes supplied to the RADIUS server have to be manipulated to match local conventions and policies. Below are some of the most common issues I've seen admins have to tackle.

Split the realm from the username. Assign the username and realm independently in the result. This is by far the most common issue admins raise.
Strip prefixes and suffixes and/or take a prefix or suffix and map it to a new independent value. Believe it or not many organizations embed group/role information in their usernames. Usernames such as "johndoe_staff" where the username is "johndoe" and role is "staff" are depressingly common.
Behave differently depending on the realm, the IdP, a DNS name or a network address.
Test for membership in a collection. Examples are whitelists, blacklists, groups, etc. The result of the membership test modifies how the user is ultimately mapped and the privileges they receive.
Search for and/or extract substrings, usually demands regular expression support. The result of the regular expression search may alter the flow of control.
Filter certain characters.
Convert to lower case or upper case.
Test for empty or absent values.
Build compound values from a series of conditions.

Mapping requirements

The type of data transformations required by real world exchanges with foreign identity systems demands the ability to make comparisons, perform tests, assign values to variables, and call basic transformation and regular expression functions.

Simple lookups on source values which are then copied over to destination values are not sufficient.

Ideal Federation Mapping

Federation mapping is simply transforming one set of attributes into another set of attributes. JSON notation provides a rich yet simple way to express a wide range of attributes and values.

Federation mapping should be based on a simple input/output filter model. The mapper receives a JSON document containing the validated assertions from an external IdP. The mapper examines the contents of the JSON assertion and returns a JSON document of values mapped to the local authorization environment or an error indicator if the mapping cannot be performed.

Suggested solutions

The two most viable ways to address the current deficiencies are:

Embedded Scripting Language
Enhanced Rule Mapping

Embedded Scripting Language

Transforming one JSON document into another JSON document is trivial for modern scripting languages. Rather than defining the mapping via a set of rules a script is executed whose input is a JSON document of assertion values and whose output is a JSON document of mapped values. The script has full access to all the features of a modern programming language as such there are virtually no limitations as to what the script can do, the implementation can be clear and straight forward (easy to read and understand). Usually a script will be much simpler than enumerating a complex set of static rules. Scripts are easier to debug than static rules. Most administrators have the ability to program in a scripting language. Learning how to use a rule based mapping system is often just as difficult as learning script fundamentals but unlike script languages rule systems are not general purpose. There is more value in learning a script language than a one-off rule system.

Script based transformations could be implemented this way:

Embed the script interpreter into the running process.
Fork a script interpreter for each script evaluation.
Run a separate service implemented in the scripting language and pass the JSON documents via inter-process communication.

Option 1 (embedded) offers performance and ease of implementation. Many modern script interpreters can easily be embedded into programs with varying degrees of data exchange mechanisms. ECMAScript (i.e. Javascript), Python and Perl are obvious candidates [1]

However the embedded option has one potential problem which deserves serious consideration. For security reasons (and general robustness reasons) you do not want the script to be able to access internal data nor be able to execute operations in the context of a privileged process. Running scripts provided by general users should always be prohibited lest a security breach occur. However in this context the mapping scripts are provided by trusted administrators with high levels of privilege. We should be able to assume these scripts are safe and not nefarious, the same administrators usually have enough permission to do real damage irregardless of the script loading issue so discounting the use of embedded scripts provided by administrators does not have merit. Note that options 2 (fork) and 3 (service) do not expose the primary process to the same potential security breaches.

Option 2 (forking) is too heavy weight severely hampering throughput.

Option 3 (service) raises deployment issues, for instance a mechanism that starts and stops the service, monitors it's health, locks down both the communication and resources such that nothing is leaked and nothing can be executed which shouldn't be. Some deployments are concerned with the number of processes and services which are run.

Option 1 (embedded) seems the most viable.

Enhanced Rule Mapping

We've established real world attribute mapping requires the ability to invoke basic transformations, perform tests then execute conditional logic and build compound values from discreet values. Such operations require the use of intermediate variables.

All of these requirements come for free in a scripting language. But what if we don't want to use a script to perform a transformation and prefer the rule based approach? Our goals in this case are:

Provide the minimal set of functions that cover the maximum number of real world use cases.
Keep it simple!. Simplicity aids users and makes the implementation easier to produce.
Do not design a language. Any design that wanders off towards being a language is better replaced by an actual embedded language which is already fully implemented, debugged and familiar to users.
Make everything self-consistent, no special case exceptions. Consistency aids both users and implementors.
Works as in the ideal case, accepts a JSON assertion and emits a JSON mapping.

Prototype Rule Mapping and Request For Comments

A prototype implementation which tries to address the above goals has been completed. It is currently implemented in Python and may be re-implemented in Java for another project. It is documented here Enhanced Federation Rule Mapping

The approach needs to evaluated and reviewed. Then a judgment call needs to be made. Is a static rule system (even if it has the advanced functionality in the prototype) still a better alternative than writing a script and have it executed by an interpreter?

[1]

This federation mapping is being considered for other implementations besides OpenStack (Python based). For example it is under consideration for a Java based project. The best supported embedded interpreters in Java today are ECMAScript (JavaScript) and Python (via Jython). Python also has support for loading an ECMAScript interpreter or the script language could be Python in which case the script would simply be evaluated in the context of the running process (however there is no effective sand-boxing when you eval a Python script inside Python).