PolarSPARC

Building a Linux Container using Namespaces :: Part - 2


Bhaskar S 03/15/2020


Overview

In Part - 1 of this series, we demonstrated isolation of the Host name, User/Group IDs, and Process IDs using namespaces UTS, User, PID, and Mount.

In this article, we continue the journey with Mount and Network namespaces. We will not explore the IPC namespace.

Hands-on with Namespaces

Mount Namespace

Next, we will mimic the above UTS, User, PID, and Mount, namespace isolation using the following go program:

Listing.1
package main

import (
	"log"
	"os"
	"os/exec"
	"syscall"
)

func createTxtFile() {
	f, err := os.Create("/tmp/leopard.txt")
	if err != nil {
		panic(err)
	}

	defer f.Close()

	_, err = f.WriteString("leopard")
	if err != nil {
		panic(err)
	}
}

func execContainerShell() {
	log.Printf("Ready to exec container shell ...\n")

	if err := syscall.Sethostname([]byte("leopard")); err != nil {
		panic(err)
	}

	log.Printf("Chaning to /tmp directory ...\n")

	if err := os.Chdir("/tmp"); err != nil {
		panic(err)
	}

	log.Printf("Mounting / as private ...\n")

	mf := uintptr(syscall.MS_PRIVATE | syscall.MS_REC)
	if err := syscall.Mount("", "/", "", mf, ""); err != nil {
		panic(err)
	}

	log.Printf("Binding rootfs/ to rootfs/ ...\n")

	mf = uintptr(syscall.MS_BIND | syscall.MS_REC)
	if err := syscall.Mount("rootfs/", "rootfs/", "", mf, ""); err != nil {
		panic(err)
	}

	log.Printf("Pivot new root at rootfs/ ...\n")

	if err := syscall.PivotRoot("rootfs/", "rootfs/.old_root"); err != nil {
		panic(err)
	}

	log.Printf("Changing to / directory ...\n")

	if err := os.Chdir("/"); err != nil {
		panic(err)
	}

	log.Printf("Mounting /tmp as tmpfs ...\n")

	mf = uintptr(syscall.MS_NODEV)
	if err := syscall.Mount("tmpfs", "/tmp", "tmpfs", mf, ""); err != nil {
		panic(err)
	}

	log.Printf("Mounting /proc filesystem ...\n")

	mf = uintptr(syscall.MS_NODEV)
	if err := syscall.Mount("proc", "/proc", "proc", mf, ""); err != nil {
		panic(err)
	}

	createTxtFile()

	log.Printf("Mounting /.old_root as private ...\n")

	mf = uintptr(syscall.MS_PRIVATE | syscall.MS_REC)
	if err := syscall.Mount("", "/.old_root", "", mf, ""); err != nil {
		panic(err)
	}

	log.Printf("Unmount parent rootfs from /.old_root ...\n")

	if err := syscall.Unmount("/.old_root", syscall.MNT_DETACH); err != nil {
		panic(err)
	}

	const sh = "/bin/sh"

	env := os.Environ()
	env = append(env, "PS1=-> ")

	if err := syscall.Exec(sh, []string{""}, env); err != nil {
		panic(err)
	}
}

func main() {
	log.Printf("Starting process %s with args: %v\n", os.Args[0], os.Args)

	const clone = "CLONE"

	if len(os.Args) > 1 && os.Args[1] == clone {
		execContainerShell()
		os.Exit(0)
	}

	log.Printf("Ready to run command ...\n")

	cmd := exec.Command(os.Args[0], []string{clone}...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWUSER | syscall.CLONE_NEWNS | syscall.CLONE_NEWPID,
		UidMappings: []syscall.SysProcIDMap{
			{ContainerID: 0, HostID: 0, Size: 1},
		},
		GidMappings: []syscall.SysProcIDMap{
			{ContainerID: 0, HostID: 0, Size: 1},
		},
	}

	if err := cmd.Run(); err != nil {
		panic(err)
	}
}

Create and change to the directory $GOPATH/mount by executing the following commands in TB:

$ mkdir -p $GOPATH/mount

$ cd $GOPATH/mount

Copy the above code into the program file main.go in the current directory.

To compile the program file main.go, execute the following command in TB:

$ go build main.go

To run program main, execute the following command in TB:

$ sudo ./main

The following would be a typical output:

Output.1

2020/03/14 22:05:46 Starting process ./main with args: [./main]
2020/03/14 22:05:46 Ready to run command ...
2020/03/14 22:05:46 Starting process ./main with args: [./main CLONE]
2020/03/14 22:05:46 Ready to exec container shell ...
2020/03/14 22:05:46 Chaning to /tmp directory ...
2020/03/14 22:05:46 Mounting / as private ...
2020/03/14 22:05:46 Binding rootfs/ to rootfs/ ...
2020/03/14 22:05:46 Pivot new root at rootfs/ ...
2020/03/14 22:05:46 Changing to / directory ...
2020/03/14 22:05:46 Mounting /tmp as tmpfs ...
2020/03/14 22:05:46 Mounting /proc filesystem ...
2020/03/14 22:05:46 Mounting /.old_root as private ...
2020/03/14 22:05:46 Unmount parent rootfs from /.old_root ...
->

The command prompt will change to a ->.

To display the host name of the simple container, execute the following command in TB:

-> hostname

The following would be a typical output:

Output.2

leopard

To display the user ID and group ID in the new namespace, execute the following command in TB:

-> id

The following would be a typical output:

Output.3

uid=0(root) gid=0(root) groups=0(root)

To display all the processes in the simple container, execute the following command in TB:

-> ps -fu

The following would be a typical output:

Output.4

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4628   824 pts/1    S    22:05   0:00 
root         8  0.0  0.0  37368  3368 pts/1    R+   22:05   0:00 ps -fu

To list all the mount points in the new namespace by executing the following command in TB :

-> cat /proc/mounts | sort

The following would be a typical output:

Output.5

/dev/sda1 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
proc /proc proc rw,relatime 0 0
tmpfs /tmp tmpfs rw,relatime 0 0

To list all the file(s) under / in the new namespace, execute the following command in TB:

# ls -l /

The following would be a typical output:

Output.6

total 68
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:24 bin
drwxr-xr-x   2 nobody nogroup 4096 Apr 24  2018 boot
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:24 dev
drwxr-xr-x  29 nobody nogroup 4096 Feb  3 20:24 etc
drwxr-xr-x   2 nobody nogroup 4096 Apr 24  2018 home
drwxr-xr-x   8 nobody nogroup 4096 May 23  2017 lib
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:23 lib64
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:23 media
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:23 mnt
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:23 opt
dr-xr-xr-x 329 root   root       0 Mar 21 17:32 proc
drwx------   2 nobody nogroup 4096 Feb  3 20:24 root
drwxr-xr-x   4 nobody nogroup 4096 Feb  3 20:23 run
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:24 sbin
drwxr-xr-x   2 nobody nogroup 4096 Feb  3 20:23 srv
drwxr-xr-x   2 nobody nogroup 4096 Apr 24  2018 sys
drwxrwxrwt   2 root   root      60 Mar 21 17:32 tmp
drwxr-xr-x  10 nobody nogroup 4096 Feb  3 20:23 usr
drwxr-xr-x  11 nobody nogroup 4096 Feb  3 20:24 var

To list the the properties of the file /tmp/leopard.txt in the simple container, execute the following command in TB:

-> ls -l /tmp/leopard.txt

The following would be a typical output:

Output.7

-rw-r--r-- 1 root root 7 Mar 14 22:05 /tmp/leopard.txt

To list all the namespaces associated with the simple container, execute the following command in TB:

-> ls -l /proc/$$/ns

The following would be a typical output:

Output.8

total 0
lrwxrwxrwx 1 root root 0 Mar 14 22:07 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 mnt -> 'mnt:[4026532609]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 net -> 'net:[4026531993]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 pid -> 'pid:[4026532611]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 pid_for_children -> 'pid:[4026532611]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 user -> 'user:[4026532608]'
lrwxrwxrwx 1 root root 0 Mar 14 22:07 uts -> 'uts:[4026532610]'

To exit the simple container, execute the following command in TB:

-> exit

SUCCESS !!! We have demonstrated the combined UTS, User, PID, and Mount namespaces using both the unshare command and a simple go program.

Network Namespace

Finally, let us now layer the Network namespace on top of the UTS, the User, the PID, and the Mount namespaces.

To launch a simple container whose networking as well as the mount points, the process IDs, the user/group IDs, and the host name are isolated from the parent namespace, execute the following command in TB:

$ sudo unshare -uUrpfmn --mount-proc /bin/sh

The -n option enables the Network namespace.

The command prompt will change to a #.

To list all the network interfaces in the new namespace, execute the following command in TB:

# ip link

The following would be a typical output:

Output.9

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

From Output.12 above, we see only the loopback (127.0.0.1) interface and it is DOWN.

To bring up the loopback interface in the new namespace, execute the following command in TB:

# ip link set dev lo up

To test the loopback interface in the new namespace, execute the following command in TB:

# ping 127.0.0.1 -c3

The following would be a typical output:

Output.10

PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.024 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.020 ms

--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2040ms
rtt min/avg/max/mdev = 0.020/0.022/0.024/0.001 ms

We need to create a bridge network interface in the parent namespace. A bridge is a virtual network switch used to connect two or more network devices.

To create a bridge interface called br0 and in the parent namespace, execute the following command in TA:

S sudo brctl addbr br0

To list all the bridge interfaces in the parent namespace, execute the following command in TA :

$ sudo brctl show

The following would be a typical output:

Output.11

bridge name bridge id           STP enabled interfaces
br0         8000.000000000000   no

Let us assign br0 the address 172.20.1.2. To assign an ip address to the bridge interface br0 in the parent namespace, execute the following command in TA:

$ sudo ip addr add 172.20.1.2/24 dev br0

To bring up the bridge interface br0 in the parent namespace, execute the following command in TA:

$ sudo ip link set br0 up

To list all the network interfaces in the parent namespace, execute the following command in TA :

$ ip link

The following would be a typical output:

Output.12

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 18:18:18:05:05:05 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0a:ae:d0:65:21:bb brd ff:ff:ff:ff:ff:ff

One can add a virtual ethernet device veth to the Network namespace. They can act as a tunnel between Network namespaces and are always created in pairs. Packets transmitted on one device in the pair are immediately received on the other device. One end of the pair would be in the parent namespace and the other end of the pair would be in the new namespace.

The following diagram illustrates the bridge network with the virtual ethernet pairs:

Bridge Network
Bridge Network

To create a veth interface pairs called veth0 and veth1 in the parent namespace, execute the following command in TA :

$ sudo ip link add veth0 type veth peer name veth1

To list all the network interfaces in the parent namespace, execute the following command in TA :

$ ip link

The following would be a typical output:

Output.13

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 18:18:18:05:05:05 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0a:ae:d0:65:21:bb brd ff:ff:ff:ff:ff:ff
4: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:46:7c:18:1c:ef brd ff:ff:ff:ff:ff:ff
5: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 76:3e:78:4e:9d:28 brd ff:ff:ff:ff:ff:ff

The end veth0 should be in the parent namespace, while the end veth1 should be in the new namespace.

To place the end veth1 in the new namespace, we need to identify the process ID of the command unshare.

To find and store the pid of unshare in an environment variable UPID, execute the following command in TA:

$ export UPID=$(pidof unshare)

To place the end veth1 in the new namespace, execute the following command in TA:

$ sudo ip link set veth1 netns $UPID

To list all the network interfaces in the parent namespace, execute the following command in TA :

$ ip link

The following would be a typical output:

Output.14

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 18:18:18:05:05:05 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0a:ae:d0:65:21:bb brd ff:ff:ff:ff:ff:ff
5: veth0@if6: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 76:3e:78:4e:9d:28 brd ff:ff:ff:ff:ff:ff

To list all the network interfaces in the new namespace, execute the following command in TB:

# ip link

The following would be a typical output:

Output.15

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth1@if3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:46:7c:18:1c:ef brd ff:ff:ff:ff:ff:ff

Comparing Output.15 and Output.14, we see they are completely different.

To connect the end veth0 to the bridge br0 in the parent namespace, execute the following command in TA:

$ sudo ip link set veth0 master br0 up

Let us assign veth0 the address 172.20.1.3. To assign an ip address to the network interface veth0 in the parent namespace, execute the following command in TA:

$ sudo ip addr add 172.20.1.3/24 dev veth0

To bring up the network interface veth0 in the parent namespace, execute the following command in TA:

$ sudo ip link set veth0 up

Let us assign veth1 the address 172.20.1.4. To assign an ip address to the network interface veth1 in the new namespace, execute the following command in TB:

# ip addr add 172.20.1.4/24 dev veth1

To bring up the network interface veth1 in the new namespace, execute the following command in TB:

# ip link set veth1 up

To test the ip address 172.20.1.4 (of the container) in the parent namespace, execute the following command in TA:

$ ping 172.20.1.4 -c3

The following would be a typical output:

Output.16

PING 172.20.1.4 (172.20.1.4) 56(84) bytes of data.
64 bytes from 172.20.1.4: icmp_seq=1 ttl=64 time=0.079 ms
64 bytes from 172.20.1.4: icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from 172.20.1.4: icmp_seq=3 ttl=64 time=0.040 ms

--- 172.20.1.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2036ms
rtt min/avg/max/mdev = 0.038/0.052/0.079/0.019 ms

Similarly, to test the ip address 172.20.1.3 (of the host) in the new namespace, execute the following command in TB:

# ping 172.20.1.3 -c3

The following would be a typical output:

Output.17

PING 172.20.1.3 (172.20.1.3) 56(84) bytes of data.
64 bytes from 172.20.1.3: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 172.20.1.3: icmp_seq=2 ttl=64 time=0.039 ms
64 bytes from 172.20.1.3: icmp_seq=3 ttl=64 time=0.044 ms

--- 172.20.1.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2044ms
rtt min/avg/max/mdev = 0.039/0.051/0.072/0.016 ms

YAY !!! We have successfully demonstrated a simple container by combining UTS, User, PID, Mount, and Network namespaces using the unshare command.

To clean up the bridge interface we created earlier, we need to first bring it down and then delete it.

To bring down the bridge interface br0 in the parent namespace, execute the following command in TA:

$ sudo ip link set br0 down

To delete the bridge interface br0 in the parent namespace, execute the following command in TA:

$ sudo brctl delbr br0

Next, we will mimic the above UTS, User, PID, Mount, and Network, namespace isolation using the following go program:

Listing.2
package main

import (
    "fmt"
    "github.com/vishvananda/netlink"
    "log"
    "net"
    "os"
    "os/exec"
    "syscall"
)

const (
    Bridge   = "br0"
    BridgeIp = "172.20.1.2/24"
    Lo       = "lo"
    Peer0    = "veth0"
    Peer0Ip  = "172.20.1.3/24"
    Peer1    = "veth1"
    Peer1Ip  = "172.20.1.4/24"
)

func createTxtFile() {
    f, err := os.Create("/tmp/leopard.txt")
    if err != nil {
        panic(err)
    }

    _, err = f.WriteString("leopard")
    if err != nil {
        panic(err)
    }

    _ = f.Close()
}

func checkBridge() (*netlink.Bridge, error) {
    la := netlink.NewLinkAttrs()
    la.Name = Bridge

    br := &netlink.Bridge{LinkAttrs: la}

    if _, err := net.InterfaceByName(Bridge); err != nil {
        return br, err
    }

    return br, nil
}

func setupBridge() error {
    br, err := checkBridge()
    if err != nil {
        log.Printf("Bridge %s does not exists ...\n", Bridge)
        log.Printf("Creating the Bridge %s ...\n", Bridge)

        if err = netlink.LinkAdd(br); err != nil {
            fmt.Println(err)
            return err
        }
    } else {
        log.Printf("Bridge %s already exists ...\n", Bridge)
    }

    addr, err := netlink.ParseAddr(BridgeIp)
    if err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Attaching address %s to the Bridge %s ...\n", BridgeIp, Bridge)

    if err = netlink.AddrAdd(br, addr); err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Activating the Bridge %s ...\n", Bridge)

    if err = netlink.LinkSetUp(br); err != nil {
        fmt.Println(err)
        return err
    }

    return nil
}

func deleteBridge() error {
    br, err := checkBridge()
    if err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Deactivating the Bridge %s ...\n", Bridge)

    if err := netlink.LinkSetDown(br); err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Deleting the Bridge %s ...\n", Bridge)

    if err := netlink.LinkDel(br); err != nil {
        fmt.Println(err)
        return err
    }

    return nil
}

func setupVethPeers() error {
    br, err := checkBridge()
    if err != nil {
        fmt.Println(err)
        return err
    }

    la := netlink.NewLinkAttrs()
    la.Name = Peer0
    la.MasterIndex = br.Attrs().Index

    log.Printf("Creating the pairs %s and %s ...\n", Peer0, Peer1)

    // ip link add veth0 type veth peer name veth1
    veth := &netlink.Veth{LinkAttrs: la, PeerName: Peer1}
    if err := netlink.LinkAdd(veth); err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Link %s as master of %s ...\n", Bridge, Peer0)

    // ip link set veth0 master br0
    if err = netlink.LinkSetMaster(veth, br); err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Activating the pairs %s & %s ...\n", Peer0, Peer1)

    if err = netlink.LinkSetUp(veth); err != nil {
      fmt.Println(err)
      return err
    }

    return nil
}

func namespaceVethPeer(pid int) error {
    log.Printf("Getting the link for pair %s ...\n", Peer1)

    veth1, err := netlink.LinkByName(Peer1)
    if err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Namespacing the pair %s with pid %d ...\n", Peer1, pid)

    // ip link set veth1 netns $UPID
    if err := netlink.LinkSetNsPid(veth1, pid); err != nil {
        fmt.Println(err)
        return err
    }

    return nil
}

func activateLo() error {
    log.Printf("Getting the link for pair %s ...\n", Lo)

    loIf, err := netlink.LinkByName(Lo)
    if err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Activating %s ...\n", Lo)

    // ip link set dev lo up
    if err = netlink.LinkSetUp(loIf); err != nil {
        fmt.Println(err)
        return err
    }

    return nil
}

func activateVethPair(name, ip string) error {
    log.Printf("Getting the link for pair %s ...\n", name)

    veth, err := netlink.LinkByName(name)
    if err != nil {
        fmt.Println(err)
        return err
    }

    addr, err := netlink.ParseAddr(ip)
    if err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Attaching address %s to the pair %s ...\n", ip, name)

    // ip addr add ip dev vethX
    if err = netlink.AddrAdd(veth, addr); err != nil {
        fmt.Println(err)
        return err
    }

    log.Printf("Activating the pair %s ...\n", name)

    // ip link set dev vethX up
    if err = netlink.LinkSetUp(veth); err != nil {
        fmt.Println(err)
        return err
    }

    return nil
}

func execContainerShell() {
    log.Printf("Ready to exec container shell ...\n")

    if err := syscall.Sethostname([]byte("leopard")); err != nil {
        panic(err)
    }

    log.Printf("Chaning to /tmp directory ...\n")

    if err := os.Chdir("/tmp"); err != nil {
        panic(err)
    }

    log.Printf("Mounting / as private ...\n")

    mf := uintptr(syscall.MS_PRIVATE | syscall.MS_REC)
    if err := syscall.Mount("", "/", "", mf, ""); err != nil {
        panic(err)
    }

    log.Printf("Binding rootfs/ to rootfs/ ...\n")

    mf = uintptr(syscall.MS_BIND | syscall.MS_REC)
    if err := syscall.Mount("rootfs/", "rootfs/", "", mf, ""); err != nil {
        panic(err)
    }

    log.Printf("Pivot new root at rootfs/ ...\n")

    if err := syscall.PivotRoot("rootfs/", "rootfs/.old_root"); err != nil {
        panic(err)
    }

    log.Printf("Changing to / directory ...\n")

    if err := os.Chdir("/"); err != nil {
        panic(err)
    }

    log.Printf("Mounting /tmp as tmpfs ...\n")

    mf = uintptr(syscall.MS_NODEV)
    if err := syscall.Mount("tmpfs", "/tmp", "tmpfs", mf, ""); err != nil {
        panic(err)
    }

    log.Printf("Mounting /proc filesystem ...\n")

    mf = uintptr(syscall.MS_NODEV)
    if err := syscall.Mount("proc", "/proc", "proc", mf, ""); err != nil {
        panic(err)
    }

    createTxtFile()

    log.Printf("Mounting /.old_root as private ...\n")

    mf = uintptr(syscall.MS_PRIVATE | syscall.MS_REC)
    if err := syscall.Mount("", "/.old_root", "", mf, ""); err != nil {
        panic(err)
    }

    log.Printf("Unmount parent rootfs from /.old_root ...\n")

    if err := syscall.Unmount("/.old_root", syscall.MNT_DETACH); err != nil {
        panic(err)
    }

    if err := activateLo(); err != nil {
        panic(err)
    }

    if err := activateVethPair(Peer1, Peer1Ip); err != nil {
        panic(err)
    }

    const sh = "/bin/sh"

    env := os.Environ()
    env = append(env, "PS1=-> ")

    if err := syscall.Exec(sh, []string{""}, env); err != nil {
        panic(err)
    }
}

func main() {
    log.Printf("Starting process %s with args: %v\n", os.Args[0], os.Args)

    const clone = "CLONE"

    if len(os.Args) > 1 && os.Args[1] == clone {
        // Clone
        execContainerShell()
    } else {
        // Parent
        if err := setupBridge(); err != nil {
            panic(err)
        }

        if err := setupVethPeers(); err != nil {
            panic(err)
        }

        if err := activateVethPair(Peer0, Peer0Ip); err != nil {
            panic(err)
        }
    }

    log.Printf("Ready to run command ...\n")

    cmd := exec.Command(os.Args[0], []string{clone}...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS |
                    syscall.CLONE_NEWUSER |
                    syscall.CLONE_NEWNS |
                    syscall.CLONE_NEWPID |
                    syscall.CLONE_NEWNET,
        UidMappings: []syscall.SysProcIDMap{
            {ContainerID: 0, HostID: 0, Size: 1},
        },
        GidMappings: []syscall.SysProcIDMap{
            {ContainerID: 0, HostID: 0, Size: 1},
        },
    }

    if err := cmd.Start(); err != nil {
        panic(err)
    }

    if err := namespaceVethPeer(cmd.Process.Pid); err != nil {
        panic(err)
    }

    _ = cmd.Wait()

    _ = deleteBridge()
}

Create and change to the directory $GOPATH/network by executing the following commands in TB:

$ mkdir -p $GOPATH/network

$ cd $GOPATH/network

Copy the above code into the program file main.go in the current directory.

To compile the program file main.go, execute the following command in TB:

$ go build main.go

To run program main, execute the following command in TB:

$ sudo ./main

The following would be a typical output:

Output.18

2020/03/14 22:17:52 Starting process ./main with args: [./main]
2020/03/14 22:17:52 Bridge br0 does not exists ...
2020/03/14 22:17:52 Creating the Bridge br0 ...
2020/03/14 22:17:52 Attaching address 172.20.1.2/24 to the Bridge br0 ...
2020/03/14 22:17:52 Activating the Bridge br0 ...
2020/03/14 22:17:52 Creating the pairs veth0 and veth1 ...
2020/03/14 22:17:52 Link br0 as master of veth0 ...
2020/03/14 22:17:52 Activating the pairs veth0 & veth1 ...
2020/03/14 22:17:52 Getting the link for pair veth0 ...
2020/03/14 22:17:52 Attaching address 172.20.1.3/24 to the pair veth0 ...
2020/03/14 22:17:52 Activating the pair veth0 ...
2020/03/14 22:17:52 Ready to run command ...
2020/03/14 22:17:52 Getting the link for pair veth1 ...
2020/03/14 22:17:52 Namespacing the pair veth1 with pid 20367 ...
2020/03/14 22:17:52 Starting process ./main with args: [./main CLONE]
2020/03/14 22:17:52 Ready to exec container shell ...
2020/03/14 22:17:52 Chaning to /tmp directory ...
2020/03/14 22:17:52 Mounting / as private ...
2020/03/14 22:17:52 Binding rootfs/ to rootfs/ ...
2020/03/14 22:17:52 Pivot new root at rootfs/ ...
2020/03/14 22:17:52 Changing to / directory ...
2020/03/14 22:17:52 Mounting /tmp as tmpfs ...
2020/03/14 22:17:52 Mounting /proc filesystem ...
2020/03/14 22:17:52 Mounting /.old_root as private ...
2020/03/14 22:17:52 Unmount parent rootfs from /.old_root ...
2020/03/14 22:17:52 Getting the link for pair lo ...
2020/03/14 22:17:52 Activating lo ...
2020/03/14 22:17:52 Getting the link for pair veth1 ...
2020/03/14 22:17:52 Attaching address 172.20.1.4/24 to the pair veth1 ...
2020/03/14 22:17:52 Activating the pair veth1 ...
->

The command prompt will change to a ->.

To list all the network interfaces in the parent namespace, execute the following command in TA :

$ cat /proc/self/net/dev

The following would be a typical output:

Output.19

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
enp5s0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
docker0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    lo:  471708    4702    0    0    0     0          0         0   471708    4702    0    0    0     0       0          0
 veth0:     936      12    0    0    0     0          0         0    27370     162    0    0    0     0       0          0
   br0:     768      12    0    0    0     0          0         0    17220     106    0    0    0     0       0          0

To list all the network interfaces in the new namespace, execute the following command in TB:

-> cat /proc/self/net/dev

The following would be a typical output:

Output.20

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
 veth1:   20994     126    0    0    0     0          0         0      796      10    0    0    0     0       0          0

Comparing Output.20 and Output.19, we see they are completely different.

To test the ip address 172.20.1.4 (of the container) in the parent namespace, execute the following command in TA:

$ ping 172.20.1.4 -c3

The following would be a typical output:

Output.21

PING 172.20.1.4 (172.20.1.4) 56(84) bytes of data.
64 bytes from 172.20.1.4: icmp_seq=1 ttl=64 time=0.101 ms
64 bytes from 172.20.1.4: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 172.20.1.4: icmp_seq=3 ttl=64 time=0.052 ms

--- 172.20.1.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2041ms
rtt min/avg/max/mdev = 0.044/0.065/0.101/0.026 ms

Note the new namespace is running a minimalistic Ubuntu Base image and there is no ping command to check the connectivity back to the parent namespace.

To exit the simple container, execute the following command in TB:

-> exit

WALLA !!! We have successfully demonstrated a simple container by combining UTS, User, PID, Mount, and Network namespaces using a simple go program.

References

Overview of Linux Mount Namespace

Overview of Linux Network Namespace

Go Package - Netlink

Building a Linux Container using Namespaces :: Part - 1



© PolarSPARC