CGroups (Control Groups)

From Bauman National Library
This page was last modified on 20 December 2015, at 13:08.

CGroups (abbreviated from Control Groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

Control groups can be used in multiple ways:

  • create and manage them on the fly using tools like cgcreate, cgexec, cgclassify etc
  • the "rules engine daemon", to automatically move certain users/groups/commands to groups (/etc/cgrules.conf and /usr/lib/systemd/system/cgconfig.service)
  • through other software such as Linux Containers (LXC) virtualization

History

Engineers at Google (primarily Paul Menage and Rohit Seth) started the work on this feature in 2006, under the name "process containers". In late 2007 the nomenclature changed to "control groups" due to the confusion caused by multiple meanings of the term "container" in the Linux kernel context, and control-group functionality merged into kernel version 2.6.24. Since then, developers have added many new features and controllers, such as support for kernfs, firewalling, and unified hierarchy.

Сapabilities

While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, UTS, network and SysV IPC namespaces.

  • The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.
  • Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.
  • "UTS" namespace allows changing the hostname.
  • Mount namespace allows creating a different file system layout, or making certain mount points read-only.
  • IPC namespace isolates the System V inter-process communication between namespaces.
  • User namespace isolates the user IDs between namespaces.

Namespaces are created with the "unshare" command or syscall, or as new flags in a "clone" syscall.

The "ns" subsystem was added early in cgroups development to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel.

Linux namespaces were inspired by the more general namespace functionality used heavily throughout Plan 9 from Bell Labs.

Whenever designing software, a software engineer seeks solutions which overall best address exigencies regarding stability, security, performance, as well as maintainability, programmability (API) and usability (ABI). By their nature, these exigencies balance each other, e.g., a mighty API to user space, that doesn't offer too much functionality, but carelessly exposes some key inner working, might seriously compromise stability and security. That is especially true if that software is part of the Linux kernel.

Tejun Heo decided to alter cgroups to prevent these scenarios, designing and implementing a unified hierarchy with only one user space entity that has exclusive access to the facilities offered by cgroups.

Kernfs was introduced into the Linux kernel with version 3.14, the main author being Tejun Heo. One of the main motivators for a separate kernfs is the cgroups file system. Kernfs is basically created by splitting off some of the sysfs logic into an independent entity so that other kernel subsystems can more easily implement their own virtual file system with handling for device connect and disconnect, dynamic creation and removal as needed or unneeded, and other attributes. Redesign continued into version 3.15 of the Linux kernel.

Kernel memory control groups (kmemcg) were merged into version 3.8 of the Linux kernel mainline. The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes.

Installing

First, install the utilities for managing cgroups; you need to install the libcgroup package from the AUR and cgmanager. If you wish to use the client script cgm, you will need to start the cgmanager daemon. This can be done with a systemd unit like the following:

[Unit]
Description=Control Group manager

[Service]
ExecStart=/usr/bin/cgmanager

[Install]
WantedBy=sysinit.target

Examples

As an example of indirect usage, systemd assumes exclusive access to the cgroups facility

To start a new job that is to be contained within a cgroup, using the "cpuset" cgroup subsystem, the steps are something like:

  1. mount -t tmpfs cgroup_root /sys/fs/cgroup
  2. mkdir /sys/fs/cgroup/cpuset
  3. mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
  4. Create the new cgroup by doing mkdir's and write's (or echo's) in the /sys/fs/cgroup/cpuset virtual file system.
  5. Start a task that will be the "founding father" of the new job.
  6. Attach that task to the new cgroup by writing its PID to the /sys/fs/cgroup/cpuset tasks file for that cgroup.
  7. fork, exec or clone the job tasks from this founding father task.

For example, the following sequence of commands will setup a cgroup named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cgroup:

mount -t tmpfs cgroup_root /sys/fs/cgroup
  mkdir /sys/fs/cgroup/cpuset
  mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
  cd /sys/fs/cgroup/cpuset
  mkdir Charlie
  cd Charlie
  /bin/echo 2-3 > cpuset.cpus
  /bin/echo 1 > cpuset.mems
  /bin/echo $$ > tasks
  sh
  # The subshell 'sh' is now running in cgroup Charlie
  # The next line should display '/Charlie'
  cat /proc/self/cgroup

Basic Usage

Creating, modifying, using cgroups can be done through the cgroup virtual filesystem.

To mount a cgroup hierarchy with all available subsystems, type:
<source lang=bash># mount -t cgroup xxx /sys/fs/cgroup

The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like.

To mount a cgroup hierarchy with just the cpuset and memory subsystems, type:

# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1

While remounting cgroups is currently supported, it is not recommend to use it. Remounting allows changing bound subsystems and release_agent. Rebinding is hardly useful as it only works when the hierarchy is empty and release_agent itself should be replaced with conventional fsnotify. The support for remounting will be removed in the future.

To Specify a hierarchy's release_agent:

# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
  xxx /sys/fs/cgroup/rg1

If you want to change the value of release_agent:

# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent

It can also be changed via remount.

If you want to create a new cgroup under /sys/fs/cgroup/rg1:

# cd /sys/fs/cgroup/rg1
# mkdir my_cgroup

Now you want to do something with this cgroup.

# cd my_cgroup

In this directory you can find several files:

# ls
cgroup.procs notify_on_release tasks

(plus whatever files added by the attached subsystems)

Now attach your shell to this cgroup:

# /bin/echo $$ > tasks

You can also create cgroups inside your cgroup by using mkdir in this directory.

# mkdir my_sub_cs

To remove a cgroup, just use rmdir:

# rmdir my_sub_cs

This will fail if the cgroup is in use (has cgroups inside, or has processes attached, or is held alive by other subsystem-specific reference).

Matlab

Matlab does not have any protection against taking all your machine's memory or CPU. Launching a large calculation can thus trash your system. You could put the following in /etc/cgconfig.conf to protect from this (where $USER is your username):

/etc/cgconfig.conf 
# Prevent Matlab from taking all memory
group matlab {
    perm {
        admin {
            uid = $USER;
        }
        task {
            uid = $USER;
        }
    }

    cpuset {
        cpuset.mems="0";
        cpuset.cpus="0-5";
    }
    memory {
# 5 GiB limit
        memory.limit_in_bytes = 5368709120;
    }
}

This cgroup will bind Matlab to cores 0 to 5 (e.g., if you have have 8, Matlab will only see 6) and cap its memory usage to 5 GiB. The "cpu" resource constraint can also be defined to prevent CPU usage, but you may find the "cpuset" constrain to be sufficient. Launch matlab like this:

$ cgexec -g memory,cpuset:matlab /opt/MATLAB/2012b/bin/matlab -desktop

Make sure to use the right path to the executable.

Limiting Resources

Starting the Service

The cgconfig(control group config) service is used to create cgroups and manage subsystems. It can be configured to start up at boot time and reestablish your predefined cgroups, thus making them persistent across reboots. The cgconfig service is not started by default on CentOS 6, so let us start it:

$ sudo service cgconfig start

Starting the cgconfig service creates a virtual filesystem mounted at /cgroup with all the subsystems. Let us verify this:

$ sudo ls /cgroup

This command should show the following subsystems:

blkio  cpu  cpuacct  cpuset  devices  freezer  memory  net_cls

You could also run the `lscgroup' command to verify:

$ sudo lscgroup

Configuration

In this section, we will create example cgroups and set some resource limits for those cgroups. The cgroup configuration file is /etc/cgconfig.conf. Depending on the contents of the configuration file, cgconfig can create hierarchies, mount necessary file systems, create cgroups, and set subsystem parameters (resource limits) for each cgroup.

A hierarchy is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy. In a default CentOS 6 configuration, each subsystem is put into its own hierarchy.

Let us first create a few cgroups named limitcpu, limitmem, limitio, and browsers. The /etc/cgconfig.conf file contains two major types of entries — mount and group. Lines that start with group create cgroups and set subsystem parameters. Edit the file /etc/cgconfig.conf and add the following cgroup entries at the bottom:

group limitcpu{
        cpu {
                cpu.shares = 400;
        }
}

group limitmem{
        memory {
                memory.limit_in_bytes = 512m;
        }
}

group limitio{
        blkio {
                blkio.throttle.read_bps_device = "252:0         2097152";
        }
}

group browsers{
        cpu {
                cpu.shares = 200;
        }
        memory {
                memory.limit_in_bytes = 128m;
        }
}
  • In the limitcpu cgroup, we are limiting the cpu shares available to processes in this cgroup to 400. cpu.shares specifies the relative share of CPU time available to the tasks in the cgroup.
  • In the limitmem cgroup, we are limiting memory available to the cgroup processes to 512MB.
  • In the limitio cgroup, we are limiting the disk read throughput to 2MiB/s. Here we are limiting read I/O to the primary disk, /dev/vda, with major:minor number 252:0 and 2MiB/s is converted to bytes per second (2x1024x1024=2097152).
  • In the browsers cgroup, we are limiting cpu shares to 200 and available memory to 128MB.

We need to restart the cgconfig service for the changes in the /etc/cgconfig.conf file to take effect:

$ sudo service cgconfig restart

Let us enable cgconfig to start on system boot. When you enable the service with chkconfig, it reads the cgroup configuration file /etc/cgconfig.conf at boot time. cgroups are recreated from session to session and remain persistent.

$ sudo chkconfig cgconfig on

Our next goal is to add the processes (tasks) for which we wish to limit resources to the cgroups we created earlier.

Cgred (control group rules engine daemon) is a service that moves tasks into cgroups according to parameters set in the /etc/cgrules.conf file. Entries in the /etc/cgrules.conf file can take form:

user subsystems control_group

user refers to a username or a groupname prefixed with the "@" character. subsystems refer to a comma-separated list of subsystem names. control_group represents a path to the cgroup, and command stands for a process name or a full command path of a process.

Now let us add the programs/processes we wish to limit. Edit /etc/cgrules.conf and add the following at the bottom:

*:firefox       cpu,memory      browsers/
*:hdparm        blkio   limitio/
sammy   blkio   limitio/
@admin:memhog  memory  limitmem/
*:cpuhog        cpu     limitcpu/

In the above lines, we are setting the following rules:

  • firefox processes run by any user will be automatically added to the browsers cgroup and limited in cpu and memory subsystems.
  • hdparm processes run by any user will be added to the limitio cgroup and will be limited in blkio subsystem according to the parameter values specified in that cgroup.
  • All processes run by user sammy will be added to the limitio cgroup and limited in blkio subsystem.
  • memhog processes run by anyone in the admin group will be added to the cgroup limitmem and limited in memory subsystem.
  • cpuhog processes run by any user will be added to the cgroup limitcpu and limited in cpu subsystem.

Links