Creating large AD test environments from the Linux command line

I’ve been recently working on making SSSD perform better in large environments, that contain thousands of users or groups. And because two of the types of setup I wanted to test were SSSD directly enrolled into an Active Directory domain and SSSD as a client of an IPA server that trusts an Active Directory domain, I needed to create a large Active Directory environment for testing.

Of course I didn’t want to create the users in the AD Users and Groups GUI snap-in one by one. And I preferred to create the users and groups from the Linux command line, because that’s what I’m familiar with. After some tinkering, I realized the adcli command already does exactly what I need to with its “create-user”, “create-group” and “add-member” commands.

For example, creating an active directory user is as simple as:

 $ adcli create-user --domain=your.ad.domain adusername

To create a group, you’d call:

 $ adcli create-group --domain=your.ad.domain adgroupname

And finally add a user to the group:

 $ adcli add-member --domain=your.ad.domain adgroupname adusername

By default, adcli would ask you for the AD Administrator, but using a Kerberos ccache is as simple as adding the “-c” option.

Using these three commands I wrote three simple shell scripts that helped me create a thousand users, a thousand groups and add members to these groups in one go. The scripts are available at: https://github.com/jhrozek/adcli_scripts

Of course, adcli does much more than just creating user or group object in Active Directory. adcli is a powerful command-line tool for performing actions on an Active Directory domain. You probably used it already without realizing if you ever added an Active Directory client with realmd, because realmd uses adcli internally. In particular, it would use the “adcli join” command.

On RHEL-6, adcli is available already in EPEL and will be available in RHEL-6.8 as a regular RHEL package. In RHEL-7 and supported Fedora distributions, both realmd and adcli are already available in repositories.

Performance tuning SSSD for large IPA-AD trust deployments

Written by Alexander Bokovoy and Jakub Hrozek

This blog post describes several sssd.conf options that are available for performance tuning of SSSD, especially focusing on deployment of an IPA server with trust established with an AD server. Some of the options are useful for other scenarios as well, but it should be noted that diverting from the defaults is something that needs understanding of the implication the change has. This post is also written with SSSD version 1.12 and 1.13 in mind – later versions might have different tunables available or might not need these at all as we continuously improve performance.

The anatomy of a trusted identity lookup

The typical use-case this post is trying to address is “a login to my IPA client using AD users and his AD credentials is way too slow”. Later in the article, we’ll be setting several options in the IPA server’s sssd.conf. That might be confusing if you’re not familiar with the architecture of the IPA-AD trust lookups and authentication data flow – so let’s illustrate it briefly.

The two basic requests type the client can perform are identity lookups and authentication. For IPA users, the identity lookups are normal LDAP searches against the IPA server and authentication is performed using an internal kinit-like tool also connecting to the IPA server. Trusted AD users work differently — the identity lookups from the client machines connect to the IPA server using an LDAP extended operation. The IPA server’s extdom plugin queries the SSSD instance running on the server for the AD user information, the SSSD instance runs a search against the Active Directory server using the LDAP protocol and passes the data back to the IPA client. For password-based authentication requests, the IPA clients connect directly to the AD servers using Kerberos.

Keep in mind that because the correct set of groups must be set during authentication time, the authentication request must also perform an identity lookup, typically the initgroups operation, before the authentication. The following diagram illustrates the data flow:

ipa-passwordless-auth

The diagram illustrates the data flow in case a Windows user logs in using his Kerberos credentials (passwordless) to an IPA client:

  1. The windows user opens an SSH client (like putty) and logs in to an IPA client
  2. The SSH daemon initiates processing of the Kerberos ticket
  3. The Kerberos tickets of Active Directory users have a blob attached to them, called MS-PAC (a privilege attribute certificate, see https://msdn.microsoft.com/en-us/library/cc237917.aspx for details of the specification). The MS-PAC (or just PAC going forward) contains the complete and precise list of group memberships at the time user logged in into the Active Directory domain, signed by the domain controller. It is actually the mechanism Windows clients use to set list of the groups the user is a member of. If there is a PAC present, then a special libkrb5 plugin invokes the sssd_pac responder.
  4. The sssd_pac responder parses the group memberships from the PAC blob during login. This processing needs to resolve internal identifiers of groups in Active Directory (SIDs) to their names and POSIX IDs because Linux cannot yet deal with SIDs natively. For each SID identifier that is not stored in cache yet, the sssd_pac process asks the sssd_be process to translate the SID into a name and/or GID.
  5. The sssd_be process calls an LDAP extended operation towards the IPA server, asking for information about a particular SID.
  6. The extended operation is received by the IPA server’s Directory server plugin, that in turn queries the SSSD instance running on the IPA server through the standard NSS interface, which contacts the sssd_nss process on the IPA server
  7. If the information about this object’s SID is not present in the cache on the IPA server itself either, then the sssd_be process is asked to resolve this SID and store the object that corresponds to the SID into the SSSD cache on the server
  8. The sssd_be process on the server searches the Active Directory server’s LDAP store for the corresponding LDAP objects and stores them into the cache. When the object is stored, the flow reverses – the sssd_be process on the server tells the sssd_nss process the cache is up-to-date. The sssd_nss process returns data to the DS plugin on the server, which in turn returns data in the extdom-extop operation reply to the client.
  9. Finally, all the groups from the PAC object are processed. When the SSH daemon on the client opens the session for the user, it would call the initgroups() function to set up group memberships. When this function’s equivalent reaches the sssd_nss responder, all the data would be cached from processing the PAC blob and returned from the SSSD cache.

Please note that for simplicity, this diagram omits additional actions that might happen during login, such as HBAC access control or setting the SELinux label for the user.

If password-based authentication is used, the data flow is quite similar, except the sssd_pac responder is not involved until later in the process when the ticket is acquired on the IPA client itself. With identity lookup (for example id aduser@ad_domain), the PAC responder is not invoked at all, but all lookups are driven completely by the sssd_nss responder. But even here, the general flow between IPA client and IPA server is the same.

Also keep in mind that retrieving members of trusted groups or retrieving user groups prior to login requires RHEL-7.1 or RHEL-6.7 packages on the client side. Additionally, the IPA server must be running RHEL-7.1 or newer at the same time.

Not all IPA masters are capable to resolve AD users this way. In FreeIPA < 4.2, only IPA masters where ipa-adtrust-install command was executed can retrieve information from Active Directory domain controllers. Such IPA masters were also used for establishing actuall cross-forest trust to Active Directory and AD domain controllers were talking back to IPA masters as part of that flow. With FreeIPA 4.2 a concept of ‘AD trust agents’ was introduced that allows other IPA masters to resolve AD users and groups without the need to configure them as a full domain controller.

The data flow also has implications on debugging. If it’s not possible to fetch data about a trusted user on the IPA client (ie getent passwd aduser@ad_domain doesn’t display the user), first make sure it’s possible to run the command on the IPA server. The error messages on the client would normally manifest in functions that contain s2n in the name, such as:

[ipa_s2n_exop_done] (0x0040): ldap_extended_operation result: Operations error(1), Failed to handle the request.

or:

[ipa_s2n_get_user_done] (0x0040): s2n exop request failed.

The most data-intensive operation is usually retrieving the information about user’s groups on the server side. If the group objects are not cached yet, that operation can amount to downloading a large amount of data, storing it in server’s sssd cache, transferring the data to clients and storing them again in client’s sssd cache. Subsequent lookups for the same LDAP objects, even from another client instance would then just read the objects from the server side sssd’s cache, eliminating the server side costs, but the very first search might be costly in a huge environment.

Available tunables

Now that we know how the flow works, we can start optimizing it. Because the heavy lifting of the lookups is done on the server side, the changes in server’s sssd.conf also have the biggest impact on the lookups performance.

Server options

The following options should be added to the /etc/sssd/sssd.conf file on the IPA masters.

ignore_group_members
Normally the most data-intensive operation is downloading the groups including their members. Usually, we are interested in what groups a user is a member of (id aduser@ad_domain) as the initial step rather than what members do specific groups include (getent group adgroup@ad_domain). Setting the ignore_group_members option to True makes all groups appear as empty, thus downloading only information about the group objects themselves and not their members, providing a significant performance boost. Please note that id aduser@ad_domain would still return all the correct groups!

  • Minimal version: For users from IPA domain, this option was introduced in sssd-1.10. However, it’s only possible to use it for trusted domain users via the subdomain_inherit option (see below).
  • Recommended value: ignore_group_members=True unless your environment requires the output of getent group to also contain members
  • Pros: getgrnam/getgrgid calls are significantly faster
  • Cons: getgrnam/getgrgid calls only return the group information, not the members
ldap_purge_cache_timeout
SSSD 1.12.x runs a cache cleanup operation on startup and then periodically. The cache cleanup operation removes cached entries that were not used for a while to make sure the cache doesn’t grow too large. But since SSSD can also operate in offline mode, we must be very conservative about what is removed, otherwise we might lock out user’s access to a file. At the same time, searching the cache and examining the entry could take a lot of CPU time. Therefore, the cache cleanup operation was of limited use and was disabled by default in 1.13. You can disable it in previous versions with a config change as well.

  • Minimal SSSD version: All supported SSSD versions have this option.
  • Recommended value: ldap_purge_cache_timeout = 0
  • Pros: the cache cleanup operation was not particularly useful as a periodic task, but took a long time to execute
  • Cons: the cache can grow in size.
subdomain_inherit
The options above can be passed to the trusted AD domains’ configuration. At the moment, the only supported method is using the subdomain_inherit option in the sssd.conf‘s domain section. Any of the two options’ names from above can be listed as a value of subdomain_inherit and they will apply to both the main (IPA) domain as well as the AD subdomain. In the future, we would prefer to add support for sub-sections in sssd.conf, but we’re not there yet.. The complete list of options that can be inherited by a subdomain is listed in the sssd.conf manual page.

  • Miminal version: Upstream 1.12.5. However, this option has been backported to both RHEL-6.7 and also RHEL-7.1 updates.
  • Pros-Cons: N/A, this is just a control option to extend influence of the options above to domains from a trusted AD forest.

Mount the cache in tmpfs

Quite a bit of the time spent processing the request is writing LDAP objects to the cache. Because the cache maintains full ACID properties, it does disk syncs with every internal SSSD transaction, which causes data to be written to disk. On the plus side, this makes sure the cache is always available in case of a network outage and will be usable after a machine crash, but on the other hand, writing the data takes time. It’s possible to mount the cache into a ramdisk, elliminating the disk IO cost by adding the following to /etc/fstab as a single line:

tmpfs /var/lib/sss/db/ tmpfs size=300M,mode=0700,rootcontext=system_u:object_r:sssd_var_lib_t:s0 0 0

Then mount the directory and restart the sssd afterwards:

# mount /var/lib/sss/db/
# systemctl restart sssd

Please tune the size parameter according to your IPA and AD directory size. As a rule of thumb, you can use 100MBs per 10000 LDAP entries.

Doing this change on the IPA server is a bit safer than IPA clients, because the SSSD instance on the server will never lose connectivity to the IPA server, so the cache can always be rebuilt. But in case the cache was lost after a reboot and the AD server was not reachable due to a network error or a similar condition, the node would not be able to fall back to cached data about AD users.

  • Pros: I/O operations on the cache are much faster.
  • Cons: The cache does not persist across reboots. Practically, this means the cache must be rebuilt after machine reboot, but also that cached password are lost after a reboot.

All in all, we’re looking at adding the following changes to the server side’s sssd.conf:

[domain/domname]
subdomain_inherit = ignore_group_members, ldap_purge_cache_timeout
ignore_group_members = True
ldap_purge_cache_timeout = 0

and optionally an fstab change to remount the database directory in a
ramdisk.

Client options

The following options should be added to /etc/sssd/sssd.conf on the IPA clients.

pam_id_timeout
On a Linux system, user group membership is set for processes during login time. Therefore, during PAM conversation, SSSD has to prefer precision over speed and contact the server for accurate information. However, a single login can span over multiple PAM requests as PAM processing is split into several stages – for example, there might be a request for authentication and a separate request for account check (HBAC). It’s not beneficial to contact the server separately for both requests, therefore we set a very short timeout for the PAM responder during which the requests will be answered from in-memory cache. The default value of 5 seconds might not be enough in cases where complex group memberships are populated on server and client side. The recommended value of this option is as long as a single un-cached login takes.

  • Recommended value: pam_id_timeout = login_time_in_seconds
  • Pros: The remote server would only be contacted once during a login session
  • Cons: If the group memberships change rapidly on the server side, SSSD might still only use the cached values
krb5_auth_timeout
In case the Kerberos ticket has a PAC blob attached to it (see above) and password authentication is used, the krb5_child processes the PAC blob to help establish the group memberships. This processing might take longer than the time the krb5_child process is allowed to run. Therefore, for environments where users are members of a large number of groups, the krb5_auth_timeout value should be increased to allow the groups to be processed. In future SSSD versions, we aim at making the processing much faster, so the default 6 seconds timeout would suffice. Contacting the PAC responder can also be avoided completely by disabling the krb5_validate option, however disabling that option has security implications as we can’t any longer verify the TGT has not been spoofed with a MITM attack.

  • Recommended value: krb5_auth_timeout = login_time_in_seconds
  • Pros: The PAC responder has enough time to process user’s group memberships
  • Cons: Detecting legitimate offline situations might take too long
Mount the cache to tmpfs
Please see the server side section. On the client side, there is an additional caveat – the risk the client would reboot and then lose connectivity to the server is higher. Please do not set this option on roaming clients where you rely on offline logins!

To sum up the client side login changes, we’re looking at these additions to the config file:

[pam]
pam_id_timeout = N

[domain/domname]
krb5_auth_timeout = N
# Disabling the validation is dangerous!!
# krb5_validate = false

Don’t forget to restart the SSSD instance for the new settings to take effect!

Get rid of manually calling kinit with SSSD’s help

For quite some time, my laptop doesn’t have a UNIX /etc/passwd user for the account I normally use at all. Instead, I’ve used SSSD exclusively, logging in with our Red Hat corporate account. That allows me to use some cool SSSD features, most notably the krb5_store_password_if_offline option.

When that option is enabled and you log in while SSSD is offline (typically when a laptop was just started and is not connected to corporate VPN yet) using your cached password, SSSD stores the password you use to the kernel keyring. Then, when SSSD figures out that the servers are reachable, it attempts to kinit using that password on your behalf.

Because SSSD can detect when you connected to the VPN by listening to libnl events or watching resolv.conf for changes, the routine of connecting to Kerberos simplifies rapidly to:

  • open laptop
  • login using Kerberos password
  • connect to VPN

That’s it. kinit happens on the background and Kerberos ticket is acquired without any user intervention. Now you can just start Firefox and start accessing HTML pages or start Thunderbird and read your mail. No more starting an extra terminal and typing kinit or starting a separate Gnome dialog.

However, until now, this setup required that you also use an account from a centralized store that contains the Kerberos principal as one of its attributes or at least has the same user name as the first part of the Kerberos principal (called primary, up to the ‘@’ sign). And that’s a deal breaker for many people who are used to a certain local user name for ages, irrespective of their company’s Kerberos principal.

SSSD 1.12.5 that was released recently includes a very nice new feature, that allows you to work around that and map a local UNIX user to a particular Kerberos principal. The setup includes a fairly simple sssd.conf file:

 [sssd]
 domains = example.com
 config_file_version = 2
 services = nss,pam

 [domain/example.com]
 id_provider = proxy
 proxy_lib_name = files

 auth_provider = krb5
 krb5_server = kdc.example.com
 krb5_realm = EXAMPLE.COM
 krb5_map_user = jakub:jhrozek
 krb5_store_password_if_offline = True

 cache_credentials = True

All the interesting configuration is in the [domain] section. The identities are read from good old UNIX files by proxying nss_files through SSSD’s proxy provider. Authentication is performed using the krb5 provider, with a KDC server set to kdc.example.com and Kerberos realm EXAMPLE.COM

The krb5_map_user parameter represents new functionality. The parameter takes a form of username:primary and in this particular case maps a UNIX user jakub to a Kerberos principal which would in this case be jhrozek@EXAMPLE.COM.

On Fedora-22, care must be taken that the pam_sss.so module is called in the PAM stack even if pam_unix.so fails as the default behaviour for pam_unix.so is default=die – end the PAM conversation in case the user logging in exists in /etc/passwd and the password is incorrect. Which is our case, we’re logging in with UNIX user, we just want the authentication to be done by pam_sss.so.

Thanks to Sumit Bose who helped me figure out the PAM config, I used the following auth stack successfully:

auth     required pam_env.so
auth     [default=2 success=ok] pam_localuser.so
auth     sufficient pam_unix.so nullok try_first_pass
auth     [success=done ignore=ignore default=die] pam_sss.so use_first_pass
auth     requisite pam_succeed_if.so uid >= 1000 quiet_success
auth     sufficient pam_sss.so forward_pass
auth     required pam_deny.so

I also recorded a video shows the feature in action. The first login is with the UNIX password, so the ticket is not automatically acquired and the user must run kinit manually. The next login happens with the Kerberos password while the machine is connected to network, so the ticket is acquired on login. Finally, the last login simulates the case where the machine starts disconnected from the network, then connects. In this case, SSSD acquires the ticket on the background when it detects the networking parameters have changed and the KDC is available.

Nice, right? You can also get the video in a nicer resolution.

The Kerberos authentication provider has more features that can beat using kinit from the command line – automatized ticket renewal might be one of them. Please refer to the sssd-krb5 manual page or share your tips in the comments section!

Authenticating a docker container against host’s UNIX accounts

Recently, with the advent of Docker and similar technologies, there’s been an effort to containerize different kinds of setups that previously were running on a single machine or a set of tightly coupled machines. In case the components communicate over a network connection, the containerization might not be that hard, but what about cases where the components would talk over a strictly local interface, such as a UNIX socket?

A colleague of mine asked me to help setup up a container to authenticate against the host’s UNIX accounts using the PAM API the other day. It turned out to be doable, but maybe not obvious to anyone not familiar with some of the more esoteric SSSD options, so I’d like to write down the instructions in a blog post.

In our setup, there was a pam service, sss_test, that previously would run on the host and authenticate against accounts either locally stored on the hosts in /etc/passwd and /etc/shadow. The same setup could in principle be used to authenticate against a remote database using SSSD, just the host SSSD settings would be different.

So how does this work with containers? One possibility, especially with a remote database such as IPA or AD would be to run an SSSD instance in every container and authenticate to the remote store. But that doesn’t help us with cases where we’d like to authenticate against the local database stored on the host, since it’s not exposed outside the host. Moreover, putting SSSD into each container might also mean we’d need to put a keytab in each container, then each container would open its separate connection to the remote server..it just gets tedious. But let’s focus on the local accounts..

The trick we’ll use is bind-mounting. It’s possible to mount part of host’s filesystem into the container’s filesystem and we’ll leverage this to bind-mount the UNIX sockets SSSD communicates over into the container. This will allow the SSSD client side libraries to authenticate against the SSSD running on the host. Then, we’ll set up SSSD on the host to authenticate aganst the hosts’s UNIX files with the proxy back end.

This is the traditional schema:
application -> libpam -> pam_authenticate -> pam_unix.so -> /etc/passwd

If we add SSSD to the mix, it becomes:
application -> libpam -> pam_authenticate -> pam_sss.so -> SSSD -> pam_unix.so -> /etc/passwd

Let’s configure the host and the container, step by step.

  1. Host config
    1. Install packages
    2. yum -y install sssd-common sssd-proxy

    3. create a PAM service for the container.
    4. The name doesn’t matter, it just needs to be referenced from sssd.conf later.
      # cat /etc/pam.d/sss_proxy
      auth required pam_unix.so
      account required pam_unix.so
      password required pam_unix.so
      session required pam_unix.so

    5. create SSSD config file, /etc/sssd/sssd.conf
    6. Please note that the permissions must be 0600 and the file must be owned by root.root.
      # cat /etc/sssd/sssd.conf
      [sssd]
      services = nss, pam
      config_file_version = 2
      domains = proxy
      [nss]
      [pam]
      [domain/proxy]
      id_provider = proxy
      # The proxy provider will look into /etc/passwd for user info
      proxy_lib_name = files
      # The proxy provider will authenticate against /etc/pam.d/sss_proxy
      proxy_pam_target = sss_proxy

    7. start sssd
    8. # systemctl start sssd

    9. verify a user can be retrieved with sssd
    10. $ getent passwd -s sss localuser

  2. Container setup
  3. It’s important to bind-mount the /var/lib/sss/pipes directory from the host to the container since the SSSD UNIX sockets are located there.

    # docker run -i -t -v=/var/lib/sss/pipes/:/var/lib/sss/pipes/:rw --name sssd-cli fedora:latest /bin/sh

  4. Container config
  5. The container runs the PAM-aware application that authenticates against the host. I used a simple program in C that comes from the SSSD source. It pretty much just runs pam_authenticate() against a service called sss_test. Your application might need a different service, but then you just need to adjust the filename in /etc/pam.d/ in the container. All the steps below should be executed on the container itself.

    1. install only the sss client libraries
    2. # yum -y install sssd-client

    3. make sure sss is configured for passwd and group databases in /etc/nsswitch.conf
    4. # grep sss /etc/nsswitch.conf

    5. configure the PAM service that the application uses to call into SSSD

    6. # cat /etc/pam.d/sss_test
      auth required pam_sss.so
      account required pam_sss.so
      password required pam_sss.so
      session required pam_sss.so

    7. authenticate using your PAM application.
    8. In my case that was
      # ./pam_test_client auth localtest
      action: auth
      user: localtest
      testing pam_authenticate
      Password:
      pam_authenticate: Success

  6. Profit!

It would be possible to authenticate against any database, just by changing what the SSSD on the host authenticates against. There’s several gotchas, though, especially should you require that only certain containers are allowed to retrieve users from certain domains. Multitenancy doesn’t really work well, because we don’t have a good mechanism to retrieve the identity of the container.