TCP Optimization
TCP is particularly challenged in WAN
environments due to the connection-oriented, guaranteed-delivery behavior of the
protocol. Furthermore, TCP generally has only a limited amount of memory
capacity assigned to each connection, meaning only a small amount of data can be
in flight at a given time. Many of the limitations of TCP are self-imposed by
the nodes that are exchanging data; that is, off-the-shelf operating systems
have limited TCP stacks that do not help facilitate high-performance
transmission of data over a WAN.
Figure 3-15 shows how TCP can be latency sensitive and inefficient in
terms of retransmission when packet loss is encountered. As the figure shows,
TCP exponentially increases throughput. In environments with high latency, it
could take a considerable length of time before a large amount of data could be
transmitted per network round trip. When the loss of a packet is detected, TCP
is forced to retransmit the
entire set of data contained within the window that experienced the loss,
leading to inefficiency of network utilization and poor performance.
TCP optimization capabilities help
TCP applications better utilize the existing network by overcoming limitations
imposed by network latency, bandwidth, packet loss, and the TCP stack itself.
Many of these services are implemented as part of a TCP proxy, which allows the
accelerator to temporarily buffer data on behalf of clients and servers so that
high throughput and reliability are achieved over the WAN.
TCP optimization commonly consists of the following
optimizations:
-
Virtual window scaling: Allows the WAN bandwidth to be more effectively utilized
by the connection by increasing the window size. This allows a much larger
amount of data to be outstanding and unacknowledged in the network at any given
time.
-
Loss mitigation: Enables more intelligent retransmission and
error-correction algorithms to ensure that the impact of packet loss is
minimized.
-
Advanced congestion
avoidance: Changes the behavior of transport
protocols to enable better packet-loss recovery handling and bandwidth
scalability.
Figure 3-16 illustrates an accelerator architecture where a TCP proxy is
implemented. Notice that each of the accelerator devices terminates the adjacent
TCP connection. In this way, the accelerator provides localized handling of TCP
data with the adjacent client or server. By buffering TCP data locally and
managing optimized connections between accelerator peers, unruly WAN conditions
can be handled by the accelerator on behalf of the adjacent client or server,
thereby shielding them.
In contrast, Figure
3-17 illustrates an accelerator architecture that does
not use a TCP proxy. As shown in the figure, the accelerator is unable to
locally buffer TCP data and manage WAN events such as packet loss on behalf of
the adjacent client or server. In this way, the client and server do not
experience a LAN-like TCP behavior, as is experienced when using accelerators
that implement a TCP proxy.
Data Suppression
Data
suppression is a function of WAN optimization that allows accelerators to
eliminate the transfer of redundant data across the network, thereby providing
significant levels of throughput and bandwidth savings. Data suppression is a
means by which accelerator devices can keep a repository of previously seen
patterns of data. When a redundant pattern of data is identified, the redundant
pattern can be replaced by a unique identifier. This unique identifier is a
representation of the original pattern of data and references a block of data
found in the distant accelerator's memory or disk repository. This unique
identifier, when seen by the distant accelerator, is used as an instruction to
locate the original block of data, which is subsequently added to the message in
place of the unique identifier that was received. In this way, a unique
identifier that is very small in size can be used to replace an arbitrarily
large amount of data in flight.
Data suppression is commonly called codebook compression because each of the two accelerators maintains a
compression history (unique identifiers and previously seen patterns of data)
called a codebook that allows them to mitigate transmission
of redundant data patterns. In many cases, this codebook can be implemented
using capacity in both memory (high performance for the most frequently seen
patterns and identifiers) and disk (lower performance, but allows the
accelerator to maintain a very long compression history). These codebooks can
also be implemented in a hierarchical fashion; that is, a single, unique
identifier can be used as an instruction to reference multiple disparate blocks
of data, providing even higher levels of compression. Data suppression is
discussed in more detail in Chapter
6, "Overcoming Transport and Link Capacity Limitations."
Compression
Compression is similar to data
suppression in that it minimizes the amount of data that must traverse the
network. Whereas data suppression uses a codebook to minimize the transmission
of redundant data, traditional compression employs algorithms that scour data
within a window (that is, a packet, or within a connection for session-based
compression) to find areas for consolidation (see Figure 3-18).
Compression is very helpful in that the
first transfer of a data pattern may not be seen as redundant by the data
suppression library but may be compressible. In those cases, compression will
help minimize the amount of bandwidth consumption for the first
transmission of a given data set. In many cases, the unique identifiers
generated by data suppression technologies for redundant data patterns are
compressible, so the transmission of redundant data patterns will not only be
redundancy-eliminated, but also be compressed to provide additional bandwidth
savings and throughput improvements.
Application Acceleration
Functions
WAN optimization is designed to help make
the network a better place for applications to live but does little to nothing
in terms of actually changing the behavior of applications to make them perform
better over a network. Application acceleration complements WAN optimization in
that application protocol–pecific optimizations are applied to overcome the
performance-limiting behavior associated with the application protocol itself.
When application acceleration and WAN optimization are used together, the
network becomes a better place for applications to live (because transmission of
redundant data is minimized, data is compressed in flight, and transport
protocol behavior is improved and made more efficient), and applications perform
better on the network.
Application acceleration commonly
employs a variety of functions, including the following, which are designed to
improve performance in some way:
The following sections describe each function.
Object Caching
Accelerators that provide
application acceleration may provide object caching to help minimize bandwidth
consumption, reduce the impact of latency, and improve user performance. The
intent of an object cache is to allow an accelerator (also known as a cache) to retain a local copy
of objects that have been requested using a specific application or protocol
when it is safe to do so. Should the object be requested again, and the object
is verified to be identical to the copy on the server (also called an origin server), it can be
safely served to the requesting user assuming the application requirements for
freshness and state have been met. These application requirements may include
that the user has successfully authenticated to the server (or domain, for
instance), the user is authorized to use the object (permissions are configured
to allow the user to open the object), and the appropriate state is applied to
the object on the origin server (for instance, a file lock for files being
accessed on a mapped network drive).
Object caching is
similar to data suppression in that the redundant transfer of an object is
mitigated. The key difference between the two, however, is that object caching
removes the need for the object to be transferred across the network in any form
(redundancy eliminated or otherwise), whereas data suppression minimizes the
bandwidth consumption when the object is transferred across the network.
When coupled together, the two provide
a high-performance solution for objects that are both read and written. For
instance, in an environment where a large object that is cached has been
accessed, the performance when opening the object is accelerated significantly.
Should the object be changed and written back to the server, the data
suppression capabilities will be employed to provide high levels of compression
for the transfer of the object back to the origin server.
Caching can also be coupled with content
delivery networking capabilities (also known as prepositioning), which allows an
accelerator's cache to be proactively populated with objects that the user may
need to access. This is particularly helpful for environments with large object
requirements, such as software distribution, patch management, CAD/CAM, medical
imaging, and engineering, as it helps to improve performance for the first user
access of that object. Prepositioning and content delivery networking are
discussed at length in Chapter 4, "Content Delivery Networks."
Accelerators that provide application
acceleration and caching also provide an additional benefit: offloading the
origin server from having to manage user requests and transmission of
information. By allowing the accelerators to become object-aware and respond to
object data requests when safe to do so, a smaller quantity of requests must
traverse the WAN through the core accelerator and to the origin server. This
means that the origin server sees fewer requests, providing higher levels of
scalability in existing server and application infrastructure (see Figure 3-19).
Additionally, accelerators that provide object caching for
application protocols in addition to data suppression internally isolate object
data from compression history. Architecturally, this allows the capacity of the
two storage repositories to be managed separately rather than together. This
provides significant value in that large objects, such as service pack files,
hotfixes, CAD/CAM objects, medical images, and more, can be prepositioned to the
edge of the network. With an isolated storage repository for these objects, any
network traffic burst that causes the compression history to be overwhelmed with
new data will not have an impact on the objects that are cached in the object
cache. In this way, when a user begins a large backup of his home movie and
picture archive across the WAN to a data center NAS device, the service pack
files that were prepositioned to the accelerator will remain uncompromised in
the object cache for future use.
Read-Ahead
While object caching provides
improved performance for objects that are accessed multiple times, many objects
are accessed only one time or are accessed in such a way that prohibits caching.
Accelerators commonly implement application-specific read-ahead algorithms as a
complement to caching to improve performance in scenarios where caching is not
possible or otherwise cannot be employed due to state.
Read-ahead allows the
accelerator to examine application requests and incrementally request additional
segments within the object from the origin server on behalf of the user. This
allows the accelerator to stage data that the user may request in the future.
Data that is "read ahead" by the accelerator may be retained temporarily to be
used should the user actually request that data. Read-ahead can also be used to
more aggressively populate an object cache when an object is accessed for the
first time or if the object in the cache is deemed out of date when compared to
the object on the origin server.
Although read-ahead provides value in
terms of improving user experience, it can create additional workloads on the
origin server if not coupled with an edge-side object cache. In such cases where
accelerators do not provide caching at the edge, every request is treated in the
same way that a cache miss (object not in cache) would be treated in a scenario
where accelerators that do provide an object cache have been deployed. Each
request would be accompanied by a large number of read-ahead requests, thereby
creating incremental workload on the server itself. Accelerators that provide
object caching along with other application acceleration techniques such as
read-ahead provide the best balance between server offload and performance
improvement.
Figure 3-20 shows how read-ahead can be used to prefetch data on behalf
of the user. This figure shows the read-ahead capabilities of an accelerator
when either object caching is not being employed on the accelerator, or, object
caching is being employed but the object is not in the cache (cache miss) or the
object cannot be cached safely.
Write-Behind
Write-behind is a function that an accelerator applies
within an application protocol to locally handle and acknowledge a write request
coming from a user. This helps to mitigate the transmission latency of the data
contained within the write operation, because the accelerator makes the client
believe that the data has been received. When using write-behind, the
accelerator must also ensure that the data is actually written to the origin
server. In many cases, accelerators only implement write-behind optimizations
that are safe and recoverable should network connectivity be lost or connections
be destroyed.
Many application protocols provide a
built-in write-behind mechanism that is granted to a user under certain
circumstances. For instance, with the Common Internet File System (CIFS)
protocol, certain opportunistic locks permit the user to perform write-behind
operations locally. In such a case, the user is able to respond to his own write
requests and flush the data periodically to the server. Many accelerators
leverage protocol mechanisms such as this to ensure a safe implementation of
optimization.
Message Prediction
Prediction is
a function employed by accelerators that allows them to determine how to handle
a specific message that has been received from a user or a server. The
accelerator may handle certain application messages in a static fashion based on
a preconfigured understanding of the order and sequence of operations that are
expected. The accelerator can also handle the messages dynamically, based on
what is learned from interactive user and server exchanges. For instance, static
message prediction allows an accelerator to programmatically issue an additional
set of operations on behalf of the user when a particular user operation is
encountered.
Static message prediction is based
on the protocol handling that is built into the accelerator logic and has little
to do with what the user is actually doing. An example of static message
prediction includes having the accelerator proactively apply object locks
against an object of a specific type when the first object lock in a sequence is
seen.
When working with dynamic message
prediction, the accelerator maintains a history of previous user operations and
calculates probability of what messages the user may submit next. The result of
this probability assessment is that the accelerator issues a set of operations
on behalf of the user based on previously seen behavior rather than on
programmatic understanding. In either case, message prediction allows the
accelerator to issue requests on behalf of the user in an attempt to mitigate
the transmission latency of the predicted messages when the user actually
initiates such a request.
Wide Area File Services
Many accelerators that provide
caching as a component of application acceleration can also be used to implement
a form of disconnected mode operation for certain application protocols.
Disconnected mode allows the accelerator to act on behalf of the server that the
user is attempting to access during periods of time when the network between the
user and the server is severed and the resource is otherwise not accessible. For
file server shares, this is commonly referred to as Wide Area File Services
(WAFS).
Although WAFS generally
refers to the application acceleration components that help improve performance
over the WAN for interactive file server access, WAFS also refers to the ability
to provide file services to the enterprise edge, even when the WAN is down. In
disconnected mode of operation, the accelerator acts as a full proxy for the
origin server and provides some level of access to the cached objects based on
previously seen access control entries or statically defined security
parameters.
For some accelerator solutions, WAFS
refers to the broader set of services that needs to be provided to the
enterprise edge beyond file services, including print server capabilities,
authentication and login, and other infrastructure services. For the purposes of
this book, WAFS is considered part of the larger set of application acceleration
capabilities that is provided by accelerators.