Storage Devices and Linux ch 8.1
Does not provide fault tolerance
A failure of one disk in the set means all data is lost
The file system is not managed by the kernel, but rather opened by the kernel
Because of this we can easily port or move the file system between machines, or make it available to resources such as containers or virtual machines.
One of the more common external storage methods is Network Attached Storage or NAS
As the name implies, the storage is connected to the network and is accessed using network protocols. Your system may access the data via the data LAN or have specific network cards that are specifically connected to a storage network. Most often, NAS uses shares similar to an MS Windows or a Linux share and provides access to users. Another method is the Storage Area Network or SAN. A SAN provides all the connectivity, storage, and control to systems. Generally, a SAN provides external storage to servers, including diskless servers. There's a direct connection, usually fiber optic, from the server to the storage array.
Although the mount command is used by users occasionally, most file systems are mounted automatically at boot or mounted when they are added to the system
As with other operating systems, Linux can mount both locally connected and remotely connected devices
Linux builds software RAIDs with the multiple device administration tool, or mdadm
Let's say that we have five hard drives. Each drive will automatically be assigned a drive name, beginning with sda. To create a software RAID, Linux builds a new disk device that begins with md, which is short for multiple device. Then it gets assigned a device number, usually starting with zero. Now, if we want to use this RAID as a boot device for Linux, we need to set aside partitions on one of the drives to be used as the /boot directory.
In this lesson, we're going to look at how Linux builds and manages RAID
Linux typically creates software RAIDs, and these are installed on the Linux kernel and loaded by the boot loader. Let's take a closer look at how this works.
When adding storage to a local system, there may be a requirement to manage large data stores
Many systems have the ability to manage multiple storage devices, and Linux provides a method for this management called Linux Volume Management or LVM. Suppose you had a system with four physical storage devices. Via Linux, you create a partition on each device that spans the entire device, which becomes the physical volume. Via LVM management tools, you create a Logical volume group or VG by combining the space available on the physical volumes into a pool of storage space. With the VG in place, you can now create Logical volumes or LVs and format them with a Linux filesystem for general use. You simply configure an LV with the amount of space you wish to consume from the space available on the VG, define the Linux mount point, and format the LV with BtrFS, ext4, or any other available Linux file system. Once formatted, the LV is ready to use.DE
The system switches immediately from the failed disk to a functioning disk
Mirroring: Provides fault tolerance for a single disk failure
Depending on whom you talk to, you may get a different definition of RAID
Most will say RAID is an acronym for Redundant Array of Independent Disks. There are several different array types designated by a number. Each number designates a different type of array that performs a different function. All RAID numbers provide a model for storage that's handled a bit differently than a single storage device could. Some add capacity, and others add redundancy. The following is a partial list of common RAID levels:
So far, we've discussed internal storage
Now, we'll discuss external storage. Simply put, external storage is managed storage available using networking and network protocols. It's often a higher capacity found in most internal systems and is available to devices connected to an internal network. The storage may be in the same room, building, or halfway around the world.
This section helps you prepare for the following certification exam objectives: Exam
Objective TestOut Linux Pro 2
RAID 0 is striping
The array's storage capacity is the sum of all storage devices in the array. This means no additional cost or storage loss is using this RAID level. Data is written and read across all drives in the array, making it the fasted array in RAID. The problem is that there's no redundancy, so if a single drive fails, the entire array fails. Backups are very important. RAID 1 is mirroring. This is a costly level since the array's storage capacity is halved. It requires two drives, but only provides the capacity of a single drive. This level provides redundancy. Data is written to both drives simultaneously, causing a small performance delay. This delay is often imperceptible, and reads often see a performance increase. The benefit is redundancy. RAID 1 often provides fault tolerance to a server's boot disk. Should one of the two drives fail, the other takes over without skipping a beat.
There's another method for connecting servers to a SAN
iSCSI provides another connectivity option using a standard Ethernet infrastructure. iSCSI sends commands to the SAN using Ethernet for transport rather than a direct fiber optic connection. The iSCSI initiator sends SCSI commands to the iSCSI target, and the target provides the requested data. Often a separate storage Ethernet network is created for iSCSI communications using higher-speed Ethernet devices such as 10 Gbps or faster. Linux can operate as an iSCSI target and iSCSI initiator. Depending on your distribution, you may have to add the iSCSI components.
Lastly, the fuse
ko kernel module is used to access the FUSE system, but doesn't contain the actual file system. Bugs in the VFS won't affect the kernel, which provides a stable system environment.
Depending on the configuration, a RAID array can improve performance, provide fault tolerance, or both
The following table describes common RAID levels
1 Manage storage devices Create and manage disk partitions CompTIA Linux+ XK0-005
1
Hard drives are large unallocated disks used for storing data
Alone they are not very useful. When using file systems to organize and save our data, the computer saves our files, folders, and configurations. Also, users can move and save files. File systems are very useful but come with a few issues.
—comes from the Unix world
An optical device that has a data DVD inserted by a user will typically be found under /run/media/<username>
These storage types provide the ability to write and read data
Another type of data storage is optical. Optical storage can be thought of as write-once-read-many or WORM. While there are media available that provide for erasing and reusing optical storage, WORM drives are much more common. DVD and BluRay movies are examples of optical media. Optical drives were once the primary method for transporting data. Now, thumb drives or USB drives are more popular since they're much less expensive, very easy to use, and are available in much higher capacities than optical.
FUSE gives users more control over file operations
It provides a way for non-privileged users to create and mount file systems, restrict access, and give services or programs full permissions to entire directories or files.
Copyright © 2023 TestOut Corp
Copyright © CompTIA, Inc. All rights reserved.
Object storage - The newest method for storing data, object storage makes data available to clients in their original form, usually accessed in the form of a URL
FUSE The Filesystem in USErspace (FUSE) project was built as a way for regular, non-privileged users to create file systems without affecting the kernel
In this lesson, we're going to look at creating a virtual file system, or VFS, in the user space
File systems built in the user space are known as FUSE. Let's quickly review the importance of file systems.
Suppose you have three 8 TB drives in your system
For striping, we combine the space of all the drives giving us a 24 TB capacity. We have to add parity which takes away from the total capacity. Each drive's capacity is reduced by a fraction equal to 1 divided by the number of drives in the array. In this example, we have three drives, so we must reduce the capacity of 1 divided by 3 or one-third. Another way to measure the lost capacity is to remove the capacity of a single drive from the array. This calculation provides the total available capacity of the RAID 5 array.
Requires a minimum of two disks
Has no overhead because all disk space is available for storing data
So let's look at our first hard disk device, sda
It'll be split into three partitions. Each one will receive the device name, sda, followed by which partition number it is. First, we need to set aside 1 megabyte to be a BIOS grub spacer, and we need to create a partition to house the /boot directory. The remaining space will be used as a RAID component.
RAID 1 (mirroring) A mirrored volume stores data to two (or more) duplicate disks simultaneously
If one disk fails, data is present on another disk
Internal Storages
Inside most computer systems, there's some kind of internal storage with at least the system's operating system installed and configured. The first type is the magnetic or rotational drive. Magnetic hard drives were the first type of mass storage available for microcomputers and have remained since PCs have used mass storage. While they're still used, they're being replaced with flash-based solid state drives, or SSD. SSDs are more common today than magnetic drives due to their speed for storing and retrieving data. They're more expensive than magnetic storage and don't have the capacity that magnetic has. However, they're still a better choice for most PCs and notebooks. Solid state also comes in a different form-factor known as M2 or its update, non-volatile memory express, or NVMe.
There are two basic categories for storage: internal and external.
Internal storage is inside the computer case. External means the storage is elsewhere—usually accessed via a network connection. Most systems have some sort of internal storage that contains local data requirements, such as an operating system, applications, and local data. External storage may contain common applications and shared data. Internal storage includes a magnetic, optical, and solid state. External storage may consist of the same type of storage devices. However, it's managed separately and usually has a much higher capacity than local storage—often in the hundreds of terabytes. This storage is accessed via networked devices such as SAN or NAS.
There are two primary network file systems that are used in Linux: Network File System (NFS) - NFS is a protocol used by servers and clients to share storage on a network
It comes from the Unix world and has been in use since 1984
One of the types of storage that is used but won't be covered in detail is Fibre Channel
It is used in high-speed storage environment Storage Area Networks (SANs), and the fcstat command is used to gather information about fibre channel configuratoins
Has overhead
Overhead is 1 / n where n is the number of disks
RAID 5 (striping with distributed parity) A RAID 5 volume combines disk striping across multiple disks with parity for data redundancy
Parity information is stored on each disk
If data is written twice, half of the disk space is used to store the second copy of the data
RAID 1 is the most expensive fault tolerant system
There are other RAID levels that combine the ones already discussed
RAID 1+0 or RAID 10 is a mirror of stripes, and RAID 0+1 or RAID 01 is a stripe of mirrors.
One of the most popular RAID levels is RAID 5
RAID 5 is striping with parity. Data is striped, just like RAID 1, across all of the drives in the array. The difference between RAID 1 and RAID 5 is parity. This means each drive reserves a portion of its capacity to store information about the other drives. Parity reduces overall capacity but adds redundancy. If a single drive in the array fails, the surviving drives use their parity to take its place.
Linux uses the multipath daemon called multipathd to manage the behavior of the data being written to the storage array(s) in such a way as to provide redundancy in the case of a failure along one path
RAID on Linux Redundant Array of Independent Disks (RAID), also called Redundant Array of Inexpensive Disks, is a disk subsystem that combines multiple physical disks into a single logical storage unit
There are a few steps needed to establish connectivity to an iSCSI target
Remember, we're using Ethernet, so we have to define the device we're connecting to. The first step is ensuring you have the iSCSI initiator tools for your distribution. One example is shown here. After the tools are installed, consult the tools manual for the correct usage of the tools to connect to the iSCSI target. The method shown here is for a specific distribution. Your method may differ. Once you have the tools, you need to find your initiator's name. We need to find the iSCSI-qualified name or IQN for the iSCSI target. We need to know its IP address and send it a query to find its name. Here we use the ISCSI administrator tool to send a discovery for the send target type at the IP address listed. With the IQN, we can now connect to the target. We need to look in the messages database to find the connected iSCSI device name. Now, we have to format the device. Once the device is formatted, it can be mounted to the local filesystem.
Does not increase performance
Requires a minimum of two disks
The RAID levels available to you are defined by the RAID controller in your system
Several vendors manufacture RAID controllers, and some are proprietary. Consult your RAID controller's implementation guide to find out which RAID levels are supported by your system. Additionally, Linux LVM provides the ability to utilize software to create RAID levels, such as mirroring or striping with parity.
Should a single drive fail, the others will utilize their parity to keep the array running
Should this happen, the array will be in a degraded state until the failed drive is replaced and the RAID rebuilds the drive.
So, to access a NAS device on my Linux system after the device has been mounted, I just use the cd command to change into the mount directory, and the files are visible there even though they are physically located on another system, perhaps even many miles away
Since this lesson is not about all of the storage types or the various ways of mounting or configuring the storage, we'll keep our focus on two of the most commonly used types of storage: FUSE and RAID
The primary use for FUSE is for creating virtual file systems for applications
Specifically, sandboxed applications such as AppImages use FUSE to create a restricted, disconnected from the kernel, file system
Summarize Linux fundamentals
Storage concepts File storage Block storage Object storage Partition type FUSE RAID Striping Mirroring Parity Configure and manage storage using appropriate tools Storage area network (SAN) / network-attached storage (NAS) multipathd Network filesystems Network File Systems (NFS) Server Message Block/Common Internet File System (CIFS)
Block storage
Storage used by Linux to store traditional data in blocks or chunks of space (also called a block device)
RAID Level Description RAID 0 (striping) A stripe set breaks data into units and stores the units across a series of disks by reading and writing to all disks simultaneously
Striping: Provides an increase in performance
Server Message Block (SMB)/Common Internet File System (CIFS) - These protocols describe how to share storage across the network, much like NFS
The core protocols are used by Microsoft Windows for storage sharing in a Windows environment, which Linux can participate in with some limitations
RAID 6 is similar to RAID 5 as it's striping with parity
The difference is that RAID 6 uses double parity and can withstand a loss of 2 drives from the array. The net capacity of the RAID 6 array is the total capacity minus the capacity of 2 drives.
FUSE stands for file system in user space
The idea is that we set aside portions of the file system in use by users to create a virtual file system, or VFS. Once this portion of the file system is set aside, we create a FUSE kernel module named fuse.ko and insert it into the kernel.
Globally Unique Identifier (GUID) Partition Table (GPT)
The successor to MBR partition tables it provides much more storage capability and partition flexibility
Master Boot Record (MBR)
The traditional partition type used for storage devices
A USB storage device, such as a thumb drive, will also be found under /run/media/<username> when inserted by a user
There is a mount command that is used to do two primary things: list all file systems that are currently mounted and allow a privileged user to mount a file system on a storage device somewhere in the root file system tree
Storage Types There are several different ways in which data is stored on a Linux system
These are described by the manner in which the data is organized on the devices: File storage - This method is used by services such as NFS and SMB/CIFS for storage of files over the network, although locally attached storage devices also can use this storage type
In order to use FUSE, you'll need 3 elements installed on your Linux System
These elements typically need to be installed by an Administrator.
First, typically, only users with administrative access are allowed to make changes to the file system and protected portions of the hard drive
They are also the only users that can mount and unmount different hard drives or partitions.
Partition 3, which is /dev/sda3 on Disk 1, and the remaining 4 disks—sdb, sdc, sdd, and sde—are the ones we'll use
They'll be marked as components of the new RAID, which is md0, and be used to create the new device, /dev/md0.
Last, file systems live in the kernel space for operating systems, meaning the OS is responsible for managing the file system
This can make debugging file system issues difficult. And while debugging, we have a greater chance of crashing the machine.
On Linux systems, all mounted storage devices are attached to the same file system somewhere below the / location
This idea of a single tree—instead of a number of trees as found under Windows such as C:\, D:\, etc
The mdadm utility can be used to create most of the RAID types you'll need
This includes RAID 0, which splits the data across two or more hard drives, and RAID 1, which copies the data from one drive to another. There's also RAID 5, which stripes data across three or more drives while providing redundancy with parity. And RAID 10 takes RAID 5 and mirrors the data to another RAID 5.
Block storage - This is the oldest and most common type of storage, where data is placed in fixed length blocks of data
This is commonly used for hosting the operating system, applications and databases, and local data storage
Second, a user space library to interact with the FUSE VFS
This is typically one of the libfuse packages.
For example, the /home directory is where a standard user keeps their personal files
This standard also includes definitions where types of storage are generally located
The layout of the files and folders on Linux is, depending on the distribution, determined loosely or tightly by the File system Hierarchy Standard (FHS)
This standard defines where files and folders are stored, based on their function
The process of attaching storage devices in Linux is called mounting
Thus, directly attached storage that is used as the root of the file system is found in the / directory, which is called the root directory
Here, we have a sample of how a new RAID 5 will look on an Ubuntu server
You can see the md0 device that was created and how each disk is marked as a component of the software RAID. You can also see our BIOS grub spacer, the /boot directory, and the partition that'll be used as a component of the RAID.
Multipathing Storage One of the common ways redundancy for storage is created is using multipathing
Using multiple physical connections between a server and a storage array, such as Storage Area Network (SAN) or Network Attached Storage (NAS), data can be written to the target storage device when one of the paths becomes unavailable, such as what might be caused by a hardware failure
Network File Systems In addition to storage attached directly to a server, storage devices can be located on another host on the network that shares its storage space with other network hosts
Using network file systems, data can be written to the remote location as if attached locally, at least from the user's perspective
The remotely connected devices are usually some type of Storage Area Network device (SAN) or Network Attached Storage (NAS) device
Using networking protocols, these storage devices, usually managed by other systems, provide storage to the local system through the mount point in the root tree
That's all for this lesson
We learned about creating software RAIDs in Linux. We reviewed the mdadm utility and looked at how to configure a boot disk within a RAID. We also briefly reviewed common RAID types.
In this lesson, we talked about the file system in user space or FUSE
We looked at the requirements in order to run FUSE and the purpose of creating FUSE virtual file systems.
This lesson covers the following topics: Linux storage concepts FUSE RAID on Linux Linux Storage Concepts Linux is an operating system with roots in many historical computing environments
When discussing storage on Linux, we need to understand the ancestry of some of the concepts in order to make sense of how they are implemented on Linux
These isolated file systems leave kernel access to the FUSE kernel module, keeping the application from compromising system security even if they have vulnerabilities in them
While this approach is not always effective, it does provide another layer of security
Second, file systems can be large and complex to navigate
With larger hard drives available, file systems continue to grow.