Disk Storage
The spinning disk is inevitably going to
be the choke point for application performance in many server environments, as
information is exchanged between the server bus and magnetized areas on a
spinning platter. Typical storage deployments have changed radically over the
past 30 years, starting with consolidated storage deployments (mainframe
centric) toward distributed computing with captive storage (onset of open
systems computing) and back to a consolidated storage deployment model (open
systems with storage networking).
During the rapid proliferation of
distributed servers with captive storage, each node had its own directly
attached storage capacity, also known as direct-attached storage (DAS). With the
onset of storage area networks (SANs), which enable consolidation of storage
capacity onto a storage array device that can be logically provisioned and
shared through a networking fabric, many organizations find themselves
collapsing storage infrastructure once again to simplify deployment, control
costs, and improve utilization and efficiency.
DAS can be as simple as a single disk
within the server chassis or multiple disks within a dedicated unintelligent
external chassis (just a bunch of disks, or JBOD), or as complex as a directly
attached dedicated array using Redundant Array of Independent Disks (RAID)
technology. With DAS, storage is completely captive to the server it is
connected to and can be
connected to that server using a number of storage interconnects, including
Small Computer Systems Interface (SCSI), Serial Attached SCSI (SAS), serial
advanced technology attachment (SATA), or Fibre Channel.
SANs provide logically the same
function as DAS with the exception that the disk (physical or virtual) being
accessed is attached to a network and potentially behind a controller that is
managing disks that are accessible to other nodes on that network. With
SAN-attached storage, disk capacity appears to the server as DAS but is managed
as a provisioned amount of capacity on a shared array, accessible through a
high-speed network such as Fibre Channel.
Network-attached storage (NAS) is similar
to SAN in that storage capacity is made accessible via a network. NAS, however,
does not provide a server with a physical or logical disk to perform direct
operations against. With NAS, a file system is made accessible via the network,
and not a physical or logical disk. This means that accessing information on the
shared file system requires that it be accessed through a file system protocol
such as CIFS or NFS.
Each of these types of storage
interconnect have strengths and weaknesses. For instance, with DAS, the upfront
cost is minimal, but longer-term management costs are extremely high, especially
in dynamic environments where capacity may need to be repurposed. SAN, on the
other hand, has a very high upfront cost but far lower long-term costs in terms
of management and data protection. NAS generally has a lower initial cost than
SAN but is commonly more expensive than DAS. As clients that access NAS for
storage pool access utilize network file system protocols such as CIFS and NFS,
NAS is generally less applicable in certain application environments. For
instance, in database application environments where the server attempts to
leverage its own file system or make direct calls against a physical volume, NAS
may not be the best fit due to the abstraction of the file system protocol that
must be used to access the capacity.
So how does this impact application
performance? Simple. The slowest component in the path of an application is
typically the spinning disk on the client and the server. If the storage
subsystem is adequately sized and configured, application performance will still
be dependent on the speed of the rotating disk, but the characteristics of how
data is accessed may be changed to improve performance. For instance, some
levels of RAID provide higher levels of data throughput than others, while some
merely provide an added level of data redundancy without providing much of an
improvement to overall throughput.
RAID implementations remain one of the
most popular options for in-server and direct-attached storage because of their
performance, price point, and simplicity to implement and support. In arrays
attached to SANs, RAID is commonly used behind the disk controller to provide
performance and high availability. Table
2-4 shows the characteristics of commonly used RAID levels.
Table 2-4. Commonly Used RAID Levels
| Hardware Disk Implementation |
Pro |
Con |
Throughput |
Storage Capacity |
Performance Limitation |
| JBOD |
1:1 storage capacity, no wasted
space |
No redundancy, low performance |
Low |
Equal to sum of all disks |
Without additional software configuration, operations
are not spread across spindles as it would be with RAID. |
| RAID0 |
Speed |
No redundancy |
Excellent |
Equal to sum of all disks |
Data is
striped across all spindles, which provides very high levels of
performance. |
| RAID1 |
Full 1:1 redundancy |
Limited capacity |
Good |
Equal to half of overall disk
capacity |
Data must be
written to two disks at the same time to provide redundancy. |
| RAID5 |
Speed and redundancy |
Write penalty associated with parity
calculation |
Good |
Equal to sum of all disks minus one
disk |
Parity
information and data are both striped across all spindles, and each write
operation requires parity calculation. |
In many enterprise environments, RAID
levels are abstracted from servers because storage capacity is deployed in a
SAN. It is important to note, however, that some applications prefer to use
spindles that are configured for a certain RAID level. For instance, an
application that needs a volume with extremely high availability characteristics
would most likely prefer to use a RAID-1 protected volume. An application that
is performing a large number of reads from disk would prefer RAID-5, because
multiple spindles could be used concurrently for read operations. An application
that is constantly writing data may prefer RAID-0, because of its ability to
stripe data across spindles without performance penalty.
With some
subsystems, RAID levels can even be mixed to provide the best of both worlds.
For instance, RAID 10 provides mirroring across equal-sized stripe sets. In this
way, it provides 1:1 redundancy of the entire stripe set, and stripes read and
write data across the spindles within the stripe set.
When choosing a storage
interconnect, it is important to examine the performance characteristics. Many
servers deployed today are still using legacy SCSI technology that limits
maximum disk performance to 20 MBps or less. SAS and Fibre Channel are the more
commonly used storage interconnects available today, as described in the next
sections.