CGroups (Control Groups)
This page was last modified on 20 December 2015, at 13:08.
CGroups (abbreviated from Control Groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
Control groups can be used in multiple ways:
- create and manage them on the fly using tools like cgcreate, cgexec, cgclassify etc
- the "rules engine daemon", to automatically move certain users/groups/commands to groups (/etc/cgrules.conf and /usr/lib/systemd/system/cgconfig.service)
- through other software such as Linux Containers (LXC) virtualization
Contents
History
Engineers at Google (primarily Paul Menage and Rohit Seth) started the work on this feature in 2006, under the name "process containers". In late 2007 the nomenclature changed to "control groups" due to the confusion caused by multiple meanings of the term "container" in the Linux kernel context, and control-group functionality merged into kernel version 2.6.24. Since then, developers have added many new features and controllers, such as support for kernfs, firewalling, and unified hierarchy.
Сapabilities
While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, UTS, network and SysV IPC namespaces.
- The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.
- Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.
- "UTS" namespace allows changing the hostname.
- Mount namespace allows creating a different file system layout, or making certain mount points read-only.
- IPC namespace isolates the System V inter-process communication between namespaces.
- User namespace isolates the user IDs between namespaces.
Namespaces are created with the "unshare" command or syscall, or as new flags in a "clone" syscall.
The "ns" subsystem was added early in cgroups development to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel.
Linux namespaces were inspired by the more general namespace functionality used heavily throughout Plan 9 from Bell Labs.
Whenever designing software, a software engineer seeks solutions which overall best address exigencies regarding stability, security, performance, as well as maintainability, programmability (API) and usability (ABI). By their nature, these exigencies balance each other, e.g., a mighty API to user space, that doesn't offer too much functionality, but carelessly exposes some key inner working, might seriously compromise stability and security. That is especially true if that software is part of the Linux kernel.
Tejun Heo decided to alter cgroups to prevent these scenarios, designing and implementing a unified hierarchy with only one user space entity that has exclusive access to the facilities offered by cgroups.
Kernfs was introduced into the Linux kernel with version 3.14, the main author being Tejun Heo. One of the main motivators for a separate kernfs is the cgroups file system. Kernfs is basically created by splitting off some of the sysfs logic into an independent entity so that other kernel subsystems can more easily implement their own virtual file system with handling for device connect and disconnect, dynamic creation and removal as needed or unneeded, and other attributes. Redesign continued into version 3.15 of the Linux kernel.
Kernel memory control groups (kmemcg) were merged into version 3.8 of the Linux kernel mainline. The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes.
Installing
First, install the utilities for managing cgroups; you need to install the libcgroup package from the AUR and cgmanager. If you wish to use the client script cgm, you will need to start the cgmanager daemon. This can be done with a systemd unit like the following:
[Unit]
Description=Control Group manager
[Service]
ExecStart=/usr/bin/cgmanager
[Install]
WantedBy=sysinit.target
Examples
To start a new job that is to be contained within a cgroup, using the "cpuset" cgroup subsystem, the steps are something like:
- mount -t tmpfs cgroup_root /sys/fs/cgroup
- mkdir /sys/fs/cgroup/cpuset
- mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
- Create the new cgroup by doing mkdir's and write's (or echo's) in the /sys/fs/cgroup/cpuset virtual file system.
- Start a task that will be the "founding father" of the new job.
- Attach that task to the new cgroup by writing its PID to the /sys/fs/cgroup/cpuset tasks file for that cgroup.
- fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cgroup named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cgroup:
mount -t tmpfs cgroup_root /sys/fs/cgroup
mkdir /sys/fs/cgroup/cpuset
mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
cd /sys/fs/cgroup/cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpuset.cpus
/bin/echo 1 > cpuset.mems
/bin/echo $$ > tasks
sh
# The subshell 'sh' is now running in cgroup Charlie
# The next line should display '/Charlie'
cat /proc/self/cgroup
Basic Usage
Creating, modifying, using cgroups can be done through the cgroup virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type:
<source lang=bash># mount -t cgroup xxx /sys/fs/cgroup
The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like.
To mount a cgroup hierarchy with just the cpuset and memory subsystems, type:
# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
While remounting cgroups is currently supported, it is not recommend to use it. Remounting allows changing bound subsystems and release_agent. Rebinding is hardly useful as it only works when the hierarchy is empty and release_agent itself should be replaced with conventional fsnotify. The support for remounting will be removed in the future.
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /sys/fs/cgroup/rg1
If you want to change the value of release_agent:
# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
It can also be changed via remount.
If you want to create a new cgroup under /sys/fs/cgroup/rg1:
# cd /sys/fs/cgroup/rg1
# mkdir my_cgroup
Now you want to do something with this cgroup.
# cd my_cgroup
In this directory you can find several files:
# ls
cgroup.procs notify_on_release tasks
(plus whatever files added by the attached subsystems)
Now attach your shell to this cgroup:
# /bin/echo $$ > tasks
You can also create cgroups inside your cgroup by using mkdir in this directory.
# mkdir my_sub_cs
To remove a cgroup, just use rmdir:
# rmdir my_sub_cs
This will fail if the cgroup is in use (has cgroups inside, or has processes attached, or is held alive by other subsystem-specific reference).
Matlab
Matlab does not have any protection against taking all your machine's memory or CPU. Launching a large calculation can thus trash your system. You could put the following in /etc/cgconfig.conf to protect from this (where $USER is your username):
/etc/cgconfig.conf
# Prevent Matlab from taking all memory
group matlab {
perm {
admin {
uid = $USER;
}
task {
uid = $USER;
}
}
cpuset {
cpuset.mems="0";
cpuset.cpus="0-5";
}
memory {
# 5 GiB limit
memory.limit_in_bytes = 5368709120;
}
}
This cgroup will bind Matlab to cores 0 to 5 (e.g., if you have have 8, Matlab will only see 6) and cap its memory usage to 5 GiB. The "cpu" resource constraint can also be defined to prevent CPU usage, but you may find the "cpuset" constrain to be sufficient. Launch matlab like this:
$ cgexec -g memory,cpuset:matlab /opt/MATLAB/2012b/bin/matlab -desktop
Make sure to use the right path to the executable.
Limiting Resources
Starting the Service
The cgconfig(control group config) service is used to create cgroups and manage subsystems. It can be configured to start up at boot time and reestablish your predefined cgroups, thus making them persistent across reboots. The cgconfig service is not started by default on CentOS 6, so let us start it:
$ sudo service cgconfig start
Starting the cgconfig service creates a virtual filesystem mounted at /cgroup with all the subsystems. Let us verify this:
$ sudo ls /cgroup
This command should show the following subsystems:
blkio cpu cpuacct cpuset devices freezer memory net_cls
You could also run the `lscgroup' command to verify:
$ sudo lscgroup
Configuration
In this section, we will create example cgroups and set some resource limits for those cgroups. The cgroup configuration file is /etc/cgconfig.conf. Depending on the contents of the configuration file, cgconfig can create hierarchies, mount necessary file systems, create cgroups, and set subsystem parameters (resource limits) for each cgroup.
A hierarchy is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy. In a default CentOS 6 configuration, each subsystem is put into its own hierarchy.
Let us first create a few cgroups named limitcpu, limitmem, limitio, and browsers. The /etc/cgconfig.conf file contains two major types of entries — mount and group. Lines that start with group create cgroups and set subsystem parameters. Edit the file /etc/cgconfig.conf and add the following cgroup entries at the bottom:
group limitcpu{
cpu {
cpu.shares = 400;
}
}
group limitmem{
memory {
memory.limit_in_bytes = 512m;
}
}
group limitio{
blkio {
blkio.throttle.read_bps_device = "252:0 2097152";
}
}
group browsers{
cpu {
cpu.shares = 200;
}
memory {
memory.limit_in_bytes = 128m;
}
}
- In the limitcpu cgroup, we are limiting the cpu shares available to processes in this cgroup to 400. cpu.shares specifies the relative share of CPU time available to the tasks in the cgroup.
- In the limitmem cgroup, we are limiting memory available to the cgroup processes to 512MB.
- In the limitio cgroup, we are limiting the disk read throughput to 2MiB/s. Here we are limiting read I/O to the primary disk, /dev/vda, with major:minor number 252:0 and 2MiB/s is converted to bytes per second (2x1024x1024=2097152).
- In the browsers cgroup, we are limiting cpu shares to 200 and available memory to 128MB.
We need to restart the cgconfig service for the changes in the /etc/cgconfig.conf file to take effect:
$ sudo service cgconfig restart
Let us enable cgconfig to start on system boot. When you enable the service with chkconfig, it reads the cgroup configuration file /etc/cgconfig.conf at boot time. cgroups are recreated from session to session and remain persistent.
$ sudo chkconfig cgconfig on
Our next goal is to add the processes (tasks) for which we wish to limit resources to the cgroups we created earlier.
Cgred (control group rules engine daemon) is a service that moves tasks into cgroups according to parameters set in the /etc/cgrules.conf file. Entries in the /etc/cgrules.conf file can take form:
user subsystems control_group
user refers to a username or a groupname prefixed with the "@" character. subsystems refer to a comma-separated list of subsystem names. control_group represents a path to the cgroup, and command stands for a process name or a full command path of a process.
Now let us add the programs/processes we wish to limit. Edit /etc/cgrules.conf and add the following at the bottom:
*:firefox cpu,memory browsers/
*:hdparm blkio limitio/
sammy blkio limitio/
@admin:memhog memory limitmem/
*:cpuhog cpu limitcpu/
In the above lines, we are setting the following rules:
- firefox processes run by any user will be automatically added to the browsers cgroup and limited in cpu and memory subsystems.
- hdparm processes run by any user will be added to the limitio cgroup and will be limited in blkio subsystem according to the parameter values specified in that cgroup.
- All processes run by user sammy will be added to the limitio cgroup and limited in blkio subsystem.
- memhog processes run by anyone in the admin group will be added to the cgroup limitmem and limited in memory subsystem.
- cpuhog processes run by any user will be added to the cgroup limitcpu and limited in cpu subsystem.
Присоединяйся к команде
ISSN:
Следуй за Полисом
Оставайся в курсе последних событий
License
Except as otherwise noted, the content of this page is licensed under the Creative Commons Creative Commons «Attribution-NonCommercial-NoDerivatives» 4.0 License, and code samples are licensed under the Apache 2.0 License. See Terms of Use for details.