Thursday, September 07, 2006

Inodes

Much of this is drawn from the excellent Linux Tutorials page (http://www.linux-tutorial.info/modules.php?name=Tutorial&pageid=224), and a other bits from wikipedia

Inodes are used in most unix filesystems, although there are differences in implementation. Also, some filesystems (such as ReiserFS) do not use them. In BSD, inodes are called 'vnodes', the 'v' standing for their role as part of the filesystem abstraction layer (VFS). Whether inode or vnode, they all play the same sort of role. The term 'inode' dates from unix pioneer Dennis Ritchie, and he hazards a guess that the term may have once stood for 'index node'.

The number of inodes on a disk is set when the disk filesystem is created (e.g., with mke2fs), and cannot be adjusted (at least for ext2). The list of inodes is stored in a table called the inode table near the start of the disk, along with the superblock, which contains the number of free inodes, as well as the location of the inode table (I am not sure whether, like the superblock, there can be backup copies of the inode table placed at other locations on the disk. I would presume so). One of the purposes of inodes is to store information about files, which in POSIX implementations includes:

* The length of the file in bytes.
* Device ID (this identifies the device containing the file).
* The User ID of the file's owner.
* The Group ID of the file.
* An inode number that identifies the file within the filesystem.
* The file mode, which determines what users can read, write, and execute the file.
* Timestamps telling when the inode itself was last changed (ctime), the file content last modified (mtime), and last accessed (atime).
* A reference count telling how many hard links point to the inode.

The 'stat' system call retrieves a file's inode number and some of the information in the inode.
As users, we mostly deal with files via name. However, filenames just point to inodes, which are referred to by number. The inode corresponding to a file can be shown in a variety of ways, including the '-i' switch with 'ls', which shows the inode number. Inodes are not files themselves, but data structures (taking a guess here). They do take up blocks on disk however. Inodes act as pointers to the actual data blocks that files take up on the disk. They can do this either directly or indirectly. There are a total of 15 references to datablocks stored in each inode (presumably, the number is kept low to keep things efficient and stop the inode getting too large). The first 12 blocks of a file (under the ext2 implementation) refer directly to the addresses of the blocks on disk. Assuming a data block size of 4096 bytes, this would allow a file size of 12 x 4096 bytes = 48k. If all references were direct, the remaining three references would only allow a maximum file size of 60k, so ext2 uses a system of indirect referencing. For the next (13th) reference, the inode points to a data block which contains a 4 byte value (allowing 128 references). Subsequently, there are double indirect and triple indirect references. This allow files of up to 4TB in size, assuming a 4k block size. However, as the maximum for a 32-bit integer is 4GB, this is the limit on ext2 on 32-bit systems.

One inode per file. There can also be inodes without files.

BSD (vnodes):
* the permissions of the file
* the file link count
* old user and group ids
* the inode number
* the size of the file in bytes
* the last time the file was accessed (atime)
* the last time the file was modified (mtime)
* the last inode change time for the file (ctime)
* direct disk blocks
* indirect disk blocks
* status flags (chflags)
* blocks actually held
* file generation number
* the owner of the file
* the primary group of the owner of the file
Notice that the name of the file is not part of the inode's metadata. The filesystem doesn't care what the name of the file is; it only needs to know what inode number is associated with that file.

No comments: