What Linux bind mounts are really doing

Lots of Unixes have some form of ‘loopback‘ mounts, where you can mount a bit of an existing filesystem somewhere else; they‘re called loopback mounts by analogy with the loopback interface.

The general idea behind them is that they are a more efficient (and easier to use) version of doing an NFS mount from localhost.

Linux‘s bind mounts (so called because they are done with mount --bind, or by specifying bind as the filesystem type in /etc/fstab) look like any other sort of loopback mounting. However, they actually operate in a way quite different from the usual idea of loopback mounting, and the difference has some important consequences.

What bind mounts are really doing is more or less mounting the filesystem again with a different inode as the root inode. Thus, if you do:

mount /dev/md1 /foo
mount --bind /foo/bar /bar

what you really have is /dev/md1 mounted twice, once with the root inode of the filesystem on md1 as the root of the mount point, and once with the inode for ‘bar‘ in the root of the filesystem on md1 as the root of the mount point.

The mount command makes this hard to see by being misleading in its output, reporting things like/data/home on /home type none (rw,bind).

Because they use /etc/mtab, which mount maintains, things like df also report like this. More of the real state of affairs is visible in /proc/mounts, where the kernel itself reports:

/dev/md5 /data ext3 rw,data=ordered 0 0
/dev/md5 /home ext3 rw,data=ordered 0 0

Unfortunately the kernel doesn‘t report that what root inode /home is mounted with, which generally makes mount‘s output more useful once you know what is really going on.

One consequence of this is that once you‘ve set up your bind mounts, you can unmount the original mount point, something which I believe is not true of things like Solaris‘s loopback mounts (and which definitely wouldn‘t be true of NFS mounts from localhost). There might be a use for this in obscure situations.

Sidebar: Deeper under the hood

Disclaimer: I am not sure I understand this correctly.

Under the hood, there are two things: actual mounts of filesystems from devices (or the network), and namespace-based views of such filesystems. Rather than create new copies of both, bind mounts create new views (‘mounts‘ or ‘vfsmounts‘) of the same underlying mounted filesystem.

This explains one limitation of bind mounts, which is that you can‘t change mount flags when you do a bind mount (so you can‘t have a bind mount that is a read-only version of part of a read-write filesystem). Currently, all mount flags are associated with the filesystem, not with the view, so all views have to have the same mount flags.

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。