PolarSPARC

Linux Capabilities Unraveled


Bhaskar S 06/26/2021


Introduction

Ever wondered why the Linux command passwd allowed a regular non-privileged user (not the privileged user root) to change their password (and update system owned file(s)) ???

Let us list the file details by executing the following command:

$ ls -l /usr/bin/passwd

The following illustration shows the typical output:


passwd command
Figure.1

Notice the pointers (red arrows) highlighting the important aspects in the illustration above. It is because of the special Linux set uid bit (s), any regular user executing the command will run in the context of the owner of the command (root) and hence able to change their password (and update system owned file(s)).

What about the Linux command ping that sends out ICMP packets to the desired destination using a raw socket (which needs system privileges) ???

Once again, let us list the file details by executing the following command:

$ ls -l /usr/bin/ping

The following illustration shows the typical output:


ping command
Figure.2

STRANGE !!! As is evident from the illustration above, there is *NO* special Linux set uid bit set, even though the command file is owned by the privileged user root.

One can look at the source code for ping and infer that it does use a raw socket. Let us try to implement a simple C code to open a raw socket to determine if it is permitted.


simple_test.c
/*
 * Name:   simple_test
 * Author: Bhaskar S
 * Date:   06/26/2021
 * Blog:   https://www.polarsparc.com
 */

#include "stdio.h"
#include "string.h"
#include "errno.h"
#include "stdlib.h"
#include "sys/socket.h"
#include "netinet/ip.h"

#define STOP_ALL 2

int main()
{
    int sd = socket(PF_INET, SOCK_RAW, IPPROTO_TCP);
    if (sd < 0) {
        printf("%s: ERROR - %s\n", __FILE__, strerror(errno));
        exit(1);
    }
    shutdown(sd, STOP_ALL);

    printf("%s: SUCCESS !!!\n", __FILE__);
}

To compile the above C code, open a terminal window and execute the following command:

$ gcc -o simple_test simple_test.c

To run the compiled binary, execute the following command in the terminal:

$ ./simple_test

The following would be the typical output:

Output.1

simple_test.c: ERROR - Operation not permitted

WHAT ??? This seems to prove the point that one needs system privileges to open raw socket connections.

Let us change the ownership of the compiled binary to root by executing the following command in the terminal:

$ sudo chown root:root simple_test

Re-running the compiled binary (with the root ownership) will still produce the same result as shown in Output.1 above.

Now, let us enable the suid bit of the compiled binary by executing the following command in the terminal:

$ sudo chmod u+s simple_test

Re-running the compiled binary (with the suid bit enabled and with the root ownership), the following would be the typical output:

Output.2

simple_test.c: SUCCESS !!!

How come the ping command not have the suid bit enabled and still is able to open a raw socket to send the ICMP packets ???

MAGIC !!! This is where the Linux Capabilities come into play !!!

In the traditional unix world, the typical way to grant a command (executable file) the superuser (root) privileges is to enable the suid bit and have it owned by root. This gave the command the full unrestricted access on the system. Any security vulnerabilities in the command (executable file) would then allow a bad actor (hacker) to compromise the system(s). In the current times, this all or nothing approach is a challenge, as it opens the attack vector on the systems.

Linux capabilities breaks the all-or-nothing model into distinct capabilities, which allows a command (executable running as a process) to perform only those actions it is permitted to, irrespective of the user. For example, to open a raw socket connection, one needs to have the cap_net_raw capability enabled in the capability set.

To determine the capability set of the ping command, execute the following command in the terminal:

$ getcap /usr/bin/ping

The following would be the typical output:

Output.3

/usr/bin/ping = cap_net_raw+ep

Now, let us undo the suid bit and root ownership from our compiled binary and instead enable the cap_net_raw capability by executing the following command in the terminal:

$ sudo setcap cap_net_raw+ep simple_test

Re-running the compiled binary (WITHOUT the suid bit and root ownership) will produce the same result as shown in Output.2 above.

To determine the capability set of our compiled binary, execute the following command in the terminal:

$ getcap simple_test

The following would be the typical output:

Output.4

simple_test = cap_net_raw+ep

Capabilities in Depth

There is a list of capabilities defined in Linux, but we will only describe a handful, which are as follows:


Linux Capability Description
CAP_CHOWN Change a file's user ID (owner) or group ID
CAP_NET_BIND_SERVICE Bind a socket to a network port that is less than 1024
CAP_NET_RAW Allow the use of raw and packet sockets
CAP_SYS_NICE Lower a process nice value (increase priority) of any process
CAP_SYS_TIME Modify the system clock

In the following paragraph(s), we will use the term thread to be synonymous with either a process or a thread.

A capability set is a 64-bit number where each bit position represents a certain capability.

Each thread has 5 capability sets associated with it, which are as follows:


Thread Capability Set Description
Bounding Set Referred to as CapBnd, it is the set of all the capabilities that a thread may ever acquire and is limited to
Inheritable Set Referred to as CapInh, it is the set of all the capabilities that a child thread may inherit from its parent
Permitted Set Referred to as CapPrm, it is the set of all the capabilities that a thread can use
Effective Set Referred to as CapEff, it is the the set of all the capabilities that is in effect for a thread, which is used by the kernel to check against
Ambient Set Referred to as CapAmb, it is the the set of all the capabilities that apply to the non-suid executables. One *VERY* important requirement is that no capability can ever be in this set if it is not *BOTH* in the permitted and the inheritable sets

Let us implement a simple C code that basically tries to open a raw socket and then displays a message and goes to sleep in a loop.


simple_wait_loop.c
/*
 * Name:   simple_wait_loop
 * Author: Bhaskar S
 * Date:   06/26/2021
 * Blog:   https://www.polarsparc.com
 */

#include "stdio.h"
#include "unistd.h"
#include "string.h"
#include "errno.h"
#include "sys/socket.h"
#include "netinet/ip.h"

#define MAX_LOOP_COUNT 1000
#define MAX_SLEEP_SECS 10
#define STOP_ALL 2

int main()
{
    printf("%s: Ready to open a raw socket !!!\n", __FILE__);

    int sd = socket(PF_INET, SOCK_RAW, IPPROTO_TCP);
    if (sd < 0) {
        printf("%s: ERROR - %s\n", __FILE__, strerror(errno));
    }
    else {
        shutdown(sd, STOP_ALL);

        printf("%s: SUCCESS !!!\n", __FILE__);
    }

    printf("%s: Ready to perform wait-loop !!!\n", __FILE__);

    for (int i = 0; i < MAX_LOOP_COUNT; i++) {
        printf("%s: [%03d] Just woke up to say Hello !!!\n", __FILE__, i+1);
        sleep(MAX_SLEEP_SECS);
    }
}

To compile the above C code, open a terminal window and execute the following command:

$ gcc -o simple_wait_loop simple_wait_loop.c

To run the compiled binary, execute the following command in the terminal:

$ ./simple_wait_loop

The following would be the typical output:

Output.5

simple_wait_loop.c: Ready to open a raw socket !!!
simple_wait_loop.c: ERROR - Operation not permitted
simple_wait_loop.c: Ready to perform wait-loop !!!
simple_wait_loop.c: [001] Just woke up to say Hello !!!
simple_wait_loop.c: [002] Just woke up to say Hello !!!
...
...

How do we determine the 5 capability sets associated with this process ???

We first need to determine the process ID (pid) of the above process. To determine that, open a terminal window and execute the following command:

$ ps -fu$USER | grep simple | grep -v grep

The following would be the typical output:

Output.6

polarsparc    8908    8549  0 16:50 pts/0    00:00:00 ./simple_wait_loop

From the Output.6 above, we can infer the pid as 8908.

Next, to determine the capability sets of the process with pid 8908, execute the following command in the terminal:

$ cat /proc/8908/status | grep Cap

The following would be the typical output:

Output.7

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

From the Output.7 above, it is clear that all the capability sets are empty except for the bounding set (CapBnd). The results are correct as our executable has no capabilities enabled. The bounding set is assigned by the kernel to a default value. Now, the question that may arise is how do we interpret the hexadecimal capability value in the bounding set ???

To decode the hexadecimal capability value, execute the following command in the terminal:

$ capsh --decode=0000003fffffffff

The following would be the typical output:

Output.8

0x0000003fffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read

Just to be clear - it is the terminal shell (parent) that is fork'ing and exec'ing our simple_wait_loop (child) process.

Now that we know there are 5 capability sets associated with any thread and looking from the context of a thread (without considering the file capability sets which we will cover next), the following illustration shows how the thread capability sets propagate from the parent thread to the child thread after the clone (fork) and exec:


Parent-Child Capabilities
Figure.3

Just like a thread can have capability sets, an executable can have capability sets as well. Each file can have 3 capability sets associated with it, which are as follows:


File Capability Set Description
Permitted Set these capabilities are automatically added to the permitted set (CapPrm) of a thread. This is the p flag in the setcap command
Inheritable Set these capabilities are AND'ed with the inheritable set (CapInh) of a thread and added to the permitted set (CapPrm) of a thread. This is the i flag in the setcap command
Effective Set it is really *NOT* a set, but just a single bit (flag), which if enabled (set), will make the effective set (CapEff) of a thread equal to the permitted set (CapPrm) of a thread after exec. If this flag is *NOT* set (disabled), it will EMPTY the effective set (CapEff) of a thread. This is the e flag in the setcap command

Now that we understand both the thread and file capability sets, the following illustration shows how the capability sets propagate from the parent thread to the child thread after the clone (fork) and exec (taking into account the file capability sets as well):


Capabilities on exec
Figure.4

Now, we are in a much better situation to understand the command setcap cap_net_raw+ep we used on the simple_test executable above. This command enabled the effective set (flag) on the executable file (option 'e') and added the cap_net_raw capability to the permitted set of the executable file (options '+'' and 'p').

We now know that non-privileged user thread(s) need the appropriate Linux capabilities to perform the desired action. Rather than grant every user developed utility the Linux capabilities, it would be more prudent to have a single utility with the desired Linux capabilities and have it exec the user developed utility to propagate the Linux capabilities.

Let us implement a simple C code that will be granted a small set of Linux capabilities and will exec a child process. The child executable file will NOT have any capabilities assigned on itself.


simple_exec.c
/*
 * Name:   simple_exec
 * Author: Bhaskar S
 * Date:   06/26/2021
 * Blog:   https://www.polarsparc.com
 */

#include "stdio.h"
#include "errno.h"
#include "string.h"
#include "stdlib.h"
#include "unistd.h"

#define CHILD "./simple_wait_loop"

int main()
{
    char *p_args[] = { NULL };
    char *p_env[] = { NULL };

    printf("%s: Parent process started !!!\n", __FILE__);

    printf("Press ENTER to exec child ...\n");
    getchar();

    printf("%s: Process [%d] ready to exec child\n", __FILE__, getpid());

    int rc = execve(CHILD, p_args, p_env);
    if (rc < 0) {
        printf("%s: [2] ERROR - %s\n", __FILE__, strerror(errno));
        exit(1);
    }
}

To compile the above C code, open a terminal window and execute the following command:

$ gcc -o simple_exec simple_exec.c

Now, let us enable the capabilities of cap_chown, cap_net_raw, and cap_sys_nice on our compiled binary by executing the following command in the terminal:

$ sudo setcap 'cap_chown,cap_net_raw,cap_sys_nice+eip' ./simple_exec

To verify the capability set of our compiled binary, execute the following command in the terminal:

$ getcap ./simple_exec

The following would be the typical output:

Output.9

./simple_exec = cap_chown,cap_net_raw,cap_sys_nice+eip

Now, to run the compiled binary, execute the following command in the terminal:

$ ./simple_exec

The following would be the typical output:

Output.10

simple_exec.c: Parent process started !!!
Press ENTER to exec child ...

The binary simple_exec is waiting for the user to press the ENTER key, allowing us to determine the current capabilities of the parent thread.

To determine the process ID (pid) of the above process, open a terminal window and execute the following command:

$ ps -fu$USER | grep simple | grep -v grep

The following would be the typical output:

Output.11

polarsparc    9517    8549  0 16:50 pts/0    00:00:00 ./simple_exec

From the Output.11 above, we can infer the pid as 9517.

Next, to determine the capability sets of the process with pid 9517, execute the following command in the terminal:

$ cat /proc/9517/status | grep Cap

The following would be the typical output:

Output.12

CapInh: 0000000000000000
CapPrm:	0000000000802001
CapEff:	0000000000802001
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

From the Output.12 above, it is clear that permitted and effective sets are enabled.

To decode the hexadecimal capability value of the effective set, execute the following command in the terminal:

$ capsh --decode=0000000000802001

The following would be the typical output:

Output.13

0x0000000000802001=cap_chown,cap_net_raw,cap_sys_nice

Now, it is time to see what the capabilities look once the child is exec'ed. For that press the ENTER key on the terminal where the thread is waiting.

The following would be the typical output:

Output.14

simple_exec.c: Process [9517] ready to exec child
simple_wait_loop.c: Ready to open a raw socket !!!
simple_wait_loop.c: ERROR - Operation not permitted
simple_wait_loop.c: Ready to perform wait-loop !!!
simple_wait_loop.c: [001] Just woke up to say Hello !!!
simple_wait_loop.c: [002] Just woke up to say Hello !!!
...
...

Right off the bat we can see that the child thread (process) did not inherit any capability as one of the operations failed as can be inferred from Output.14 above. Let us verify that is the case.

To determine the process ID (pid) of the child process, execute the following command in the terminal:

$ ps -fu$USER | grep simple | grep -v grep

The following would be the typical output:

Output.15

polarsparc    9517    8549  0 16:50 pts/0    00:00:00 [simple_wait_loo]

From the Output.15 above, we can infer the pid as 9517 and it has not changed.

Next, to determine the capability sets of the process with pid 9517, execute the following command in the terminal:

$ cat /proc/9517/status | grep Cap

The following would be the typical output:

Output.16

CapInh: 0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

From the Output.16 above, it is clear that the child process has *NO* capabilities and it is the correct behavior (which can be inferred from the capabilities flow in Figure.4 above).

From the capabilities flow in Figure.4 above, it is clear that the only way a child thread (process) can acquire capabilities is when it is in the ambient set. One important constraint for the ambient set is that the capabilities have to be in both the inheritable and the permitted sets.

The ambient set can only be set via system calls. Ensure that the package libcap-dev is installed on the system. To do just that, execute the following command in the terminal:

$ sudo apt install libcap-dev -y

Let us implement a simple C code that will be granted a small set of Linux capabilities via the inheritable and the permitted sets, which will be added to the ambient set (via system calls), and then will exec a child process to propagate the required capabilities.


ambient_exec.c
/*
 * Name:   ambient_exec
 * Author: Bhaskar S
 * Date:   06/26/2021
 * Blog:   https://www.polarsparc.com
 */

#include "stdio.h"
#include "errno.h"
#include "string.h"
#include "sys/types.h"
#include "stdlib.h"
#include "unistd.h"
#include "sys/prctl.h"
#include "sys/capability.h"

#define CHILD "./simple_wait_loop"
#define MAX_LOOP_COUNT 3

int main()
{
    pid_t pid;

    int amb_cap[] = { CAP_CHOWN, CAP_NET_RAW, CAP_SYS_NICE };

    char *p_args[] = { NULL };
    char *p_env[] = { NULL };

    /* Set the ambient capabilities */
    for (int i = 0; i < MAX_LOOP_COUNT; i++) {
        printf("%s: Ready to set capability bit for %s\n", __FILE__, cap_to_name(amb_cap[i]));

        cap_t caps = cap_get_proc();
        if (caps == NULL) {
            printf("%s: ERROR in cap_get_proc() - %s\n", __FILE__, strerror(errno));
            exit(1);
        }

        cap_value_t cap_value[1];
        cap_value[0] = amb_cap[i];

        int rc = cap_set_flag(caps, CAP_INHERITABLE, 1, cap_value, CAP_SET);
        if (rc < 0) {
            printf("%s: ERROR in cap_set_flag() - %s\n", __FILE__, strerror(errno));
            cap_free(caps);
            exit(1);
        }

        rc = cap_set_proc(caps);
        if (rc < 0) {
            printf("%s: ERROR in cap_set_proc() - %s\n", __FILE__, strerror(errno));
            cap_free(caps);
            exit(1);
        }

        rc = prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, amb_cap[i], 0, 0);
        if (rc < 0) {
            printf("%s: ERROR in prctl() - %s\n", __FILE__, strerror(errno));
            cap_free(caps);
            exit(1);
        }

        cap_free(caps);
    }

    printf("Press ENTER to exec child ...\n");
    getchar();

    printf("%s: Process [%d] ready to exec child\n", __FILE__, getpid());

    int rc = execve(CHILD, p_args, p_env);
    if (rc < 0) {
        printf("%s: ERROR in execve() - %s\n", __FILE__, strerror(errno));
        exit(1);
    }
}

To compile the above C code, open a terminal window and execute the following command:

$ gcc -o ambient_exec ambient_exec.c -lcap

Now, let us enable the capabilities of cap_chown, cap_net_raw, and cap_sys_nice on our compiled binary by executing the following command in the terminal:

$ sudo setcap 'cap_chown,cap_net_raw,cap_sys_nice+ip' ./ambient_exec

Notice that we *ONLY* need the inheritable and permitted sets for the ambient set to be activated - hence the use of the flag +ip in the command above.

Now, to run the compiled binary, execute the following command in the terminal:

$ ./ambient_exec

The following would be the typical output:

Output.17

ambient_exec.c: Ready to set capability bit for cap_chown
ambient_exec.c: Ready to set capability bit for cap_net_raw
ambient_exec.c: Ready to set capability bit for cap_sys_nice
Press ENTER to exec child ...

The binary ambient_exec is waiting for the user to press the ENTER key, allowing us to determine the current capabilities of the parent thread.

To determine the process ID (pid) of the above process, open a terminal window and execute the following command:

$ ps -fu$USER | grep ambient | grep -v grep

The following would be the typical output:

Output.18

polarsparc    10048    8549  0 16:50 pts/0    00:00:00 ./ambient_exec

From the Output.18 above, we can infer the pid as 10048.

Next, to determine the capability sets of the process with pid 10048, execute the following command in the terminal:

$ cat /proc/10048/status | grep Cap

The following would be the typical output:

Output.19

CapInh: 0000000000802001
CapPrm:	0000000000802001
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000802001

From the Output.19 above, it is clear that the ambient set is initialized to the required value.

Now, it is time to see what the capabilities look once the child is exec'ed. For that press the ENTER key on the terminal where the thread is waiting.

The following would be the typical output:

Output.20

ambient_exec.c: Process [10048] ready to exec child
simple_wait_loop.c: Ready to open a raw socket !!!
simple_wait_loop.c: SUCCESS !!!
simple_wait_loop.c: Ready to perform wait-loop !!!
simple_wait_loop.c: [001] Just woke up to say Hello !!!
simple_wait_loop.c: [002] Just woke up to say Hello !!!

Clearly we can see that the child thread (process) has succeeded as can be inferred from Output.20 above. Let us verify that is the case.

To determine the process ID (pid) of the child process, execute the following command in the terminal:

$ ps -fu$USER | grep simple | grep -v grep

The following would be the typical output:

Output.21

polarsparc    10048    8549  0 16:50 pts/0    00:00:00 [simple_wait_loo]

From the Output.21 above, we can infer the pid as 10048 and it has not changed.

Next, to determine the capability sets of the process with pid 10048, execute the following command in the terminal:

$ cat /proc/10048/status | grep Cap

The following would be the typical output:

Output.22

CapInh: 0000000000802001
CapPrm:	0000000000802001
CapEff:	0000000000802001
CapBnd:	0000003fffffffff
CapAmb:	0000000000802001

BINGO !!! We have successfully developed a single utility with the desired Linux capabilities and used it exec a child thread (process) to propagate the Linux capabilities.

References

Capabilities Man Page



© PolarSPARC