How does unix file system work




















Linux file structure is a tree like structure. All the partitions are under the root directory. Directories that are only one level below the root directory are often preceded by a slash, to indicate their position.

Explain file system of linux. The root file system is generally small and should not be changed often as it may interrupt in booting. The root directory usually does not have the critical files. Instead sub directories are created. Files are usually the ones installed while installing Linux. This makes it possible to update the system from a new version of the distribution, or even a completely new distribution, without having to install all programs again.

It is called as var because the data keeps changing. Note: systemd is now replacing init on Linux. It solves a few problems with init , and overall more stable. Read more. A new child process is created by cloning the existing parent process fork. This new child process calls exec to replace the parent process running in the child with the process the child wants to run. Next, the child process calls exit to terminate itself.

It only passes an exit code out. The parent process needs to call the wait system call to get access to this exit code. What if the parent dies before the child process? An orphan process is adopted by the init process the special root parent , which then waits on the child process to finish.

How can the parent get access to more information from the child? However, there are other ways to do inter process communication. We will go into more detail about how things work with an example. Before that, we need a bit more information. Remember how the OS provides 3 open files to every running process? For example,. The file redirection happens before the command runs. Tip: You can ensure none of your redirects clobber an existing file by setting the noclobber option in the shell.

Read more about redirection. We can think of Unix like an onion. One layer out is the kernel. The kernel is the core responsible for interaction with file system and devices. It also handles process scheduling, task execution, memory management, and access control. The kernel exposes API calls for anything built on top to leverage. The most popular ones are exec , fork , and wait.

Another layer up are the unix utilities. These are super helpful processes that help us interact with the kernel. They do this via system calls like exec and fork , which the kernel provides. Others include: python , gcc , vi , sh , ls , cp , mv , cat , awk. You can invoke most of them from the shell. They do the same thing.

Another utility that people find daunting is the text editor Vim. It covers the kernel in a protective … shell. Remember how shell is a process? When run from the terminal, stdin is connected to the keyboard input. What you write is passed into the terminal. This happens via a file called tele typewriter , or tty. You can find out the file your terminal is attached to via the tty command. Now you can do something funky: since shell reads from this file, you can get another shell to write to this file too, or clobber the shells together.

Remember how to redirect files from the process section above? Try echoing ls , the command to list files this time. Remember, only input coming in via stdin is passed as input to the shell. Everything else is just displayed to the screen. The natural extension of the above then, is that when you redirect stdin , then the commands should run.

This is an undefined state, but on my Mac, one character went to one terminal, the other character went to the second, and this continued. Which was funny, because to exit the new shell I had to type eexxiitt.

And then I lost both shells. We never specified the output stream, only the input stream. This happens because processes inherit from their parent process.

Every time you write a command on the terminal, the shell creates a duplicate process via fork. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek 2 on a descriptor in the child process can affect a subsequent read or write by the parent.

This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes. Once forked, this new child process inherits the file descriptors from the parent, and then calls exec execve to execute the command.

This replaces the process image. From man 2 execve 8 :. File descriptors open in the calling process image remain open in the new process image, except for those for which the close-on-exec flag is set. Thus, our file descriptors are same as the original bash process, unless we change them via redirection. While this child process is executing, the parent waits for the child to finish. When this happens, control is returned back to the parent process.

With ls , the process returns as soon as it has output the list of files to stdout. Note: Not all commands on the shell result in a fork and exec. You can find the list here. Have you ever thought how weird it is that while something is running and outputting stuff to the terminal, you can write your next commands and have them work as soon as the existing process finishes?

I used sleep 10; to demonstrate because the other commands happen too quickly for me to type anything before control returns to the parent bash process. Now is a good time to try out the exec builtin command it replaces current process so it will kill your shell session. Armed with the knowledge of how shell works, we can venture into the world of the pipe:.

Remember the philosophy we began with? Do one thing, and do it well. Now that all our utilities work well, how do we make them work together? This is where the pipe, , pipes in.

It represents the system call to pipe and all it does is redirect stdin and stdout for processes. Since things have been designed so well, this otherwise complex function reduces to just this. This image is a bit of a simplification to explain the pipe redirection. You know how the shell works now, so you know that the top bash forks another bash connected to tty , which produces the output of ls. You also know that the top bash was forked from the lower one, which is why it inherited the file descriptors of the lower one.

This pipeline figures out the largest file in the current directory and outputs its size. Who knew this was built into ls already. Notice how stderr is always routed directly to tty? What if you wanted to redirect stderr instead of stdout to the pipe? You can switch streams before the pipe. The printer menu is the interface of its firmware.

The mission of the firmware among other things is to boot up the computer, run the operating system, and pass it the control of the whole system. A firmware also runs pre-OS environments with network support , like recovery or diagnostic tools, or even a special shell to run text-based commands.

The first few screens you see before your operating system's logo appears are the output of your computer's firmware, verifying the health of hardware components and the memory. On MBR-partitioned disks, the first sector on the storage device contains essential data to boot up the system. Once the system is powered on, the BIOS firmware starts and loads the content of MBR into the memory, and runs the boot loader inside it.

Having the boot loader and the partition table in a predefined location like MBR enables BIOS to boot up the system without having to deal with any file. That said, sophisticated boot loaders like GRUB 2 on Linux split their functionality into pieces, or stages. It can be used to place another piece of the boot loader, if needed. GRUB calls this the stage 1. The stage 1. The second stage boot loader, which is now file-system-aware, can load the operating system's boot loader file to boot up the operating system.

A common workaround is to make an extended partition beside the primary partitions, as long as the total number of partitions won't exceed four. And every partition can be the size of the biggest storage device available in the market - actually a lot more. This sector is called Protective MBR.

This is where the first-stage boot loader would reside in an MBR-partitioned disk. As a backup, the GPT entries and the GPT header are also stored at the end of the storage device, so it can be recovered if the main copy gets corrupted.

This backup is called Secondary GPT. If this path cannot be found on your system, then you firmware is probably a BIOS-based firmware. Once the EFI partition is found, it looks for the configured boot loader, which is normally a file ending with. NVRAM contains the booting settings as well as paths to the operating system boot loaders. You can use the parted command on Linux to see what partitioning scheme is used for a storage device. The storage device is partitioned based on GPT, and has three partitions.

Formatting involves the creation of various data structures and metadata used to manage files within a partition. A file system is a set of data structures, interfaces, abstractions and APIs that work together to manage any type of file on any type of storage device, in a consistent manner.

Starting from Windows NT 3. So basically, if you have a removable disk you want to use on Windows, Mac, and Linux, you need to format it to exFAT. The Extended File System ext family of file systems was specifically created for the Linux kernel - the core of the Linux operating system.

The first version of ext was released in but soon after it was replaced by the second extended file system ext2 in In the s the third extended filesystem ext3 and fourth extended filesystem ext4 were developed for Linux with journaling capability. Although these layers are different across operating systems, the concept is pretty much the same. The physical layer is the concrete implementation of a file system. It is responsible for data storage and retrieval, as well as space management on the storage device.

The physical file system interacts with the actual storage hardware, via device drivers. The virtual file system provides a consistent view of various file systems mounted on the same operating system. It's common for a removable storage medium to have a different file system than that of the computer.

For instance, when you open up your file explorer program, you can copy an image from an EXT4 file system, and paste it over to your exFAT-formatted flash memory - without having to know that files are managed differently under the hood. This convenient layer between the user you and the underlying file systems is provided by the VFS. A VFS defines a contract that all physical file systems must implement to be supported by the operating system. However, this compliance isn't built into the file system core, meaning the source code of a file system doesn't include support for every operating system.

A driver is a special program that enables a software to communicate with another software or hardware. On the other hand, VFS provides a bridge between the logical layer which programs interact with and a set of physical file systems. Then, it creates a virtual directory tree and puts the content of each device under that directory tree as separate directories.

The act of assigning a directory to a storage device under the root directory tree is called mounting , and the assigned directory is called a mount point. That said, on a Unix-like operating system, all partitions and removable storage devices appear as if they are directories under the root directory. In Unix-like systems, the metadata is in the form of special data structures, called inode.

Each file on the storage device has an inode, which contains information about the file, including the address of the blocks allocated to the file.

In an ext4 inode, the address of the allocated blocks is stored as a set of data structures called extents within the inode. Each extent contains the address of the first data block allocated to the file, and the number of the next continuous blocks that the file has occupied. This is different than ext3's pointer system, which points to individual data blocks via indirect pointers.

Using an extent data structure enables the file system to point to large files without taking up too much space. Once the inode is fetched, the file system starts to compose the file from the data blocks stored in the inode.

You can use the df command with the -i parameter on Linux to see the inodes total, used, and free in your partitions:. Additionally, to see the inodes associated with files in a directory, you can use the ls command with -il parameters.

The number of inodes on a partition is decided when the partition is formatted. So as long as there's free space and there are unused inodes, files can be stored on the storage device. It's unlikely that a personal Linux OS would run out of inodes. However, enterprise services that deal with a large number of files like mail servers have to manage their inode quota smartly. Every file has at least one entry in MFT, which contains everything about the respective file, including its location on the storage device - similar to inodes.

On most operating systems, general file metadata can be accessed from the graphical user interface as well. For instance, when you right-click on a file on Mac OS, and select Get Info Properties in Windows , a window appears with information about the file.

A sector is the minimum storage unit on a storage device, and is between bytes and bytes Advanced Format. However, file systems use a high-level concept as the storage unit, called blocks.

However, the contiguous blocks are grouped into block groups for easier management. Ext4 file systems even take one step further comparing to ext3 , and organise block groups into a bigger group called flex block groups. The data structures of each block group, including the block bitmap, inode bitmap, and inode table, are concatenated and stored in the first block group within the respective flex block group.

Having all the data structures concatenated in one block group the first one frees up more contiguous data blocks on other block groups within each flex block group. When a file is being written to a disk it is written to a one or more blocks within a certain block group.

Have you ever noticed that your file explorer displays two different sizes for each file: size, and size on disk. One block is the minimum space that can be allocated to a file. This means the remaining space of a partially-filled block cannot be used by another file. Since the size of the file isn't an integer multiple of blocks , the last block might be partially used, and the remaining space would remain unused - or would be filled with zeros.

Over time, new files are written to the disk, existing files increase in size, are shrunk, or are deleted. File Fragmentation occurs when a file is stored as fragments on the storage device because the file system cannot find enough contiguous blocks to store the whole file in a row.

Now, if you add more content to myfile. Since myfile. File fragmentation puts a burden on the file system because every time a fragmented file is requested by a user program, the file system needs to collect every piece of the file from various locations on disk.

The fragmentation might also occur when a file is written to the disk for the first time, probably because the file is huge, and not much space is left on the storage device.

Modern file system use smart algorithms to avoid or early detect fragmentation as much as possible. Ext4 also does some sort of preallocation, which involves reserving blocks for a file before they are actually needed - making sure the file won't get fragmented if it get bigger over time. The number of the preallocated blocks is defined in the length field of the file's extent of its inode object.

The idea is instead of writing to data blocks one at a time during a write, the allocation requests are accumulated in a buffer. Finally, the allocation is done and data is written to the disk. Not having to call the block allocator on every write request helps the file system make better choices with distributing the available space. For instance, by placing large files apart from smaller files. Imagine that a small file is located between to large files.

Now, if the small file is deleted, it leaves a small space between the two files. If the big files and small files are kept in different areas on the storage device, when a file deletes small files won't leave many gaps on the storage device.

Spreading the files out in this manner leaves enough gaps between data blocks, which helps the filesystem manage and avoid fragmentation more easily.



0コメント

  • 1000 / 1000