File System Considerations
A file system facilitates the
storage needs of an application, housing any data that the application or server
itself cannot hold in physical RAM. A file system is a creation of its parent,
the operating system. File systems maintain many unique characteristics,
including physical block size requirements, file and partition size limitations,
and, occasionally, specific configurations defined by a given application. When
installing an application onto a new server, the administrator must take the
application's needs into consideration; not all default file system values will
support optimal application performance.
Solaris
Solaris uses blocks in a consecutive
manner for a logical method of disk space allocation. Applications such as
Solaris Volume Manager can be used to configure and optimize a server's file
system. There are two common types of file system: a raw I/O file system and the
UNIX File System (UFS). Solaris 8 and later releases allow for database
applications to be run directly from the UFS partition, with very good
performance.
Solaris block sizes are 4 KB and 8 KB,
allowing for data to be written to blocks in smaller incremental sizes. For 4-KB
block sizes, data may be written in 512-B, 1-KB, 2-KB, and 4-KB fragment sizes.
For 8-KB block sizes, data may be written in 1-KB, 2-KB, 4-KB, and 8-KB fragment
sizes. To check the current block size configuration of a server, execute the
command given in Example 2-1 as the super user or administrator of the server.
Example 2-1. UNIX Block Size Command
Output
|
Code View: # df –g / (/dev/dsk/c0t0d0s0 ): 8192 block size 1024 fragment size /proc (/proc ): 512 block size 512 fragment size /etc/mnttab (mnttab ): 512 block size 512 fragment size /dev/fd (fd ): 1024 block size 1024 fragment size /tmp (swap ): 8192 block size 8192 fragment size
|
Database
application vendors offer guidelines for configuring an application server's
file systems to operate optimally with their application. If the disk block
sizes are 4 KB, and the fragment sizes are configured to utilize 4 KB each, then
buffering and memory utilization will be improved, because the amount of data
that needs to be passed to disk is identical to the size of the block being
written. In this way, data does not need to be manipulated or otherwise shaped
to match the block size of the file system. The same rule applies to 8-KB blocks
and 8-KB fragment sizes, allowing for fewer read operations prior to writing
data to a specific block on the disk.
Using the forcedirectio option allows for data writing that spans several blocks
to occur as a single I/O event. This allows database applications to bundle
together events scheduled to be written to disk.
Caution
Although using the forcedirectio options does
allow writes to occur faster to disk, it does bypass caching options for the
file system. Some database applications are dependent on file system caching to
improve overall database application performance. Implementing the forcedirectio option should
not be done without consulting the application vendor or author.
Microsoft Windows Server 2003
Microsoft Windows Server 2003 defaults to
4-KB allocation units within the NTFS file system. Several functions of the NTFS
are dependent on the 4-KB allocation unit, such as disk compression. Any other
size will impact the ability to enable compression, which may not improve
overall server performance on an application server.
Disk partitions of 2 GB to 2 TB
utilize the 4-KB application unit by default. To change the default allocation
unit size of Windows Server 2003, the Disk Management snap-in is required to
change a disk's partition properties. Once an application is installed on a
server and the partition housing the database is created, you should not change
the allocation unit size. Any time an administrator attempts to change the
allocation unit size, the disk partition must be reformatted. Reformatting will
eliminate all data within the altered partition. Windows Server 2003 supports
allocation unit sizes up to 64 KB in size.
HP-UX
For HP-UX, there are three common file
systems, VxFS (Veritas File System), UFS (UNIX File System), and HFS
(Hierarchical File System). All three of these file systems have different
characteristics for their block size allocation parameters. For example, VxFS,
which has a default block allocation unit size of 1 KB, supports a maximum of 8
KB per allocation unit. HFS defaults to 8 KB per block, allowing for several
different options at the time the disk partition is created, ranging from 4 KB
to 64 KB in size.
It is common for database applications
running under an HP-UX operating system to leverage 8-KB block sizes for their
file systems. 8 KB is a recommended block size for applications such as Oracle
11i and SAP. 8-KB block sizes are ideal for large transactions on the server,
whereas smaller block sizes are most appropriate for database applications that
call for a significant amount of random access. In the case of Oracle and other
database applications, the application itself must be configured to match the
disk block size for optimal and efficient application performance.
The 64-KB block sizes are viewed by many
as the optimal disk allocation size for both HFS and JFS, with HFS supporting
8-KB fragment sizes, and JFS having no fragment support. These sizes may not be
optimal for the application's database on the server, requiring the application
vendor or author to be consulted to validate which file system format and block
size should be implemented at the time of the server installation.
Linux
Red Hat Linux supports three
commonly deployed operating systems:
Each file system has characteristics
that are critical to database application performance. Areas such as block size,
maximum file size supported, journaling, and maximum supported partition size
all become factors when selecting the proper Linux file system.
The second extended file system (ext2) is
considered the benchmark of Linux file systems. Although ext2 is supported by
many database applications, ext2 is not considered to be suitable by some
vendors due to a lack of journaling. Journaling is a process in which a segment
of the actual disk is used to store a temporary copy of the data prior to being
written to the actual data partition specific to the disk. In the event of a
disk failure, any disk access intentions are read from the journaling section of
the disk, allowing the system to recover after power is restored to the
server.
ext2 supports a maximum partition size
of 4 TB, allowing for a maximum file size within the 4-TB partition of 2 TB. The
ext2 file system also supports block sizes ranging from 1 KB to 4 KB. The ext2
file system is considered to be more efficient in terms of disk utilization than
ext3, nearly eliminating the need for the use of any defragmentation tools.
The third extended file system (ext3)
shares many commonalities with the ext2 file system. The primary differences can
be classified as the support of journaling, as well as a method to change the
file system and partition sizes of existing partitions. Journaling has been
added to ext3, using a section of disk that is separate from the general data
storage section of the disk.
ext3 supports three different levels of journaling:
-
Journal: Supports the writing of both the metadata and file
content to the journaling space prior to writing it to the usable data space of
the disk
-
Writeback: Differs from journal in that only the metadata is
written to the journaling space of the disk
-
Ordered: The ext3 default writes the data to the disk prior to
writing the metadata to the journaling segment of the disk
Each level of journaling
provides the application server increased storage reliability. Although file
systems that support journaling may be a requirement, it does have a minor
impact on the performance of the storage, requiring some data to be written to
the disk twice. ext3 file systems do not require utilities to defragment storage
to optimize disk block usage.
ReiserFS is the latest file system
supported by Red Hat Linux. ReiserFS is a journaling file system, much like ext3
but supports additional features. The ReiserFS file system supports a maximum
partition size of 16 TB and a maximum file size of 8 TB per file. For block
sizes, the ReiserFS file system supports only a 4-KB block size, which is
optimal for applications that leverage files of 4 KB or smaller. Some database
applications do not support the ReiserFS file system, which may be a concern
when large file support must exceed the 2-TB limit of the ext2 and ext3 file
systems.
The differences in file system
limitations across several operating system vendors are illustrated in Table 2-5.
Table 2-5. File System Differences
| File System |
Maximum File Size |
Maximum Volume Size |
Block Journaling |
Meta-Data Journaling |
| ext2(1-KB block) |
16 GB |
2048 GB |
No |
No |
| ext2(2-KB block) |
256 GB |
2048 GB |
No |
No |
| ext2(4-KB block) |
2048 GB |
2048 GB |
No |
No |
| ext2(8-KB block) |
2048 GB |
2048 GB |
No |
No |
| ext3(1-KB block) |
16 GB |
2048 GB |
Yes |
Yes |
| ext3(2-KB block) |
256 GB |
2048 GB |
Yes |
Yes |
| ext3(4-KB block) |
2048 GB |
2048 GB |
Yes |
Yes |
| Ext3(8-KB block) |
2048 GB |
2048 GB |
Yes |
Yes |
| ReiserFS 3.6 |
17 TB |
17 TB |
Yes |
Yes |
| NTFS(4-KB block) |
16 TB |
16 TB |
No |
Yes |
| NTFS(64-KB block) |
16 TB |
256 TB |
No |
Yes |
| NTFS(dynamic volume) |
16 TB |
64 TB |
No |
Yes |
| JFS(512-KB block) |
8 EB |
512 TB |
Yes |
No |
| JFS(4-KB block) |
8 EB |
4 PB |
Yes |
No |
The Reiser4 file
system, which has not yet been merged into the Linux operating system, has added
some unique enhancements to the Linux file system. Disk block sharing allows a
block that may not have been completely filled to share space with other files.
Block sharing will allow for significantly better disk space utilization.
Reiser4 also introduces a new concept of journaling, termed the wandering log. Wandering logs
change the method of the initial write of file data during the journaling
process, allowing data to be written anywhere in the data portion of the disk.
If data is written throughout the disk, data defragmentation may be required to
optimize the Reiser4 file system. The Reiser4 file system is too new to be
supported by major application vendors and will require support of the Linux
kernel prior to becoming supported by application vendors.
Some database applications require
an alternative to traditional disk partitions that use a formatted file system.
These applications use raw I/O disks or raw partitions. Eliminating the
traditional barriers of file systems, raw I/O is written to a specific disk,
array, or host. When using raw I/O, the application does not allow for the same
type of disk administration as a traditional operating system managed partition,
and is commonly managed by the database application. Raw I/O partitions are
commonly faster than operating system formatted partitions, because there is
less operating system overhead managing the raw partitions or disks. Raw I/O
partitions are typically specified as a requirement by the application, and not
the operating system. Management of the data that resides in the raw I/O
partition is commonly a function of the database application.