Queuing and Scheduling
Once traffic
has been classified (identified) and pre-queuing operators have been applied
(policing, dropping), the router then queues traffic for service onto the
next-hop link. Queuing is defined as the way a node temporarily stores data
while waiting for system resources to become available to act upon that data.
The queues within the network device are serviced based on the configuration of
the router and link speed. This layer of the QoS behavioral model ensures that
the router services packets according to the application demand and business
priority. Furthermore, this layer ensures that packets are forwarded in such a
way that network capacity is more efficiently used.
As packets enter a queue, the router
must schedule the service of those packets. Queuing and scheduling go hand in
hand: one function temporarily holds data while waiting for resources (queuing),
and the other determines when the data is to be serviced and how (scheduling).
Other functions are also applied during queuing and scheduling, including
shaping, congestion management, and congestion avoidance.
Not all packets are created equal, and
neither are the applications that cause packets to be exchanged between end
nodes. Some applications are more sensitive to network conditions and QoS
metrics than others. Applications use different means of exchanging data, and
the architecture of process-to-process data exchange may cause applications to
behave differently under different network conditions. For instance, interactive
voice and video traffic, which is transported by an
unreliable transport such as UDP, is considered sensitive to link quality and
changes in network service level. In this way, interactive voice and video
quality can be compromised by not forwarding packets in a timely fashion, or by
allowing the transmission of these flows to be delayed (congestion) due to the
transmission of other flows on the network.
Furthermore, the quality of the data
being received (that is, the sound of the other person's voice on the other end
of the phone call, or the stream of video and audio that is being examined, for
example) might be compromised if too many packets are lost in transit. In many
cases, voice and video encoding can accommodate a reasonable amount of loss
through predictive algorithms; however, these types of data are commonly
classified and marked as requiring high-priority and low-latency handling to
ensure that the sound of the other person's voice or the image being viewed is
not compromised during congestion scenarios.
Other applications that are transactional in
nature commonly use TCP as a transport (for reliability) and might not be as
sensitive to packet loss or jitter but can be impacted by delays in servicing
the packets within the network. This type of data will likely need to be
classified as requiring low-latency handling as well. Less-sensitive
applications that commonly use bulk movement of data over reliable transport
protocols, such as file transfers or e-mail, may need to be handled with lower
priority and serviced in such a way that the performance of transactional
applications and the quality of interactive communications protocols are not
compromised.
The following multiple levels of
queuing are available, each providing a different type of service to packets
that are queued:
The following sections describe each
queuing mechanism as well as traffic shaping.
FIFO Queuing
Most networking devices implement
a basic FIFO queuing mechanism. FIFO queuing places packets from all flows
entering a common interface into a common queue. As packets enter interfaces in
a serial fashion, the first packet received is also the first packet that is
serviced.
Using FIFO queuing provides no
means of QoS, because there is no differentiation of packets or flows. With FIFO
queuing, a large bulk data transfer that is sharing a router interface with
other application flows may quickly compromise the interface's queues, causing
service starvation for other applications on the network. For instance, an
interactive voice call between two users across the network could be impacted if
another user is transferring a large file to a file server through the same router
interface. FIFO generally is implemented only on high-speed interfaces, because
the likelihood of congestion is far less than on a low-speed interface.
Priority Queuing
Priority queuing is a technique
that allows multiple queues to be used and assigns each queue a relative
priority. Priority queuing ensures that traffic entering a high-priority queue
is always serviced before traffic waiting in a lower-priority queue, and
provides the level of service needed to ensure that high-priority traffic (such
as internetwork control and voice conversations) is serviced first.
The drawback with priority queuing
is that higher-priority traffic always takes precedence and is serviced first,
which could lead to starvation, or blocking, of flows waiting in lower-priority
queues should a large number of high-priority conversations exist. This can
create performance challenges for applications that are assigned to
lower-priority queues, because they simply do not receive an adequate level of
service from the network. In some cases, priority queuing is automatically
configured with a bandwidth maximum to ensure that other queues are not
starved.
Figure 3-12 shows an example of priority queuing.
Weighted Fair Queuing
Weighted fair queuing overcomes the issue of
lower-priority queue starvation by allowing each of the queues to be assigned a
weight. This weight identifies the amount of service that the queue can consume.
Configuring a queue to consume only a portion of the available service allows
the scheduler to provide some level of service to packets in lower-weight queues
even if there are still packets waiting in the higher-weight queues.
Figure 3-13 illustrates an example of WFQ. Comparing Figure 3-12 and Figure
3-13 shows how WFQ can overcome the
challenges with priority queuing starvation to ensure some level of fairness
among packets that are queued.
WFQ can be extended using
"classes." By using match criteria such as identified protocol, input interface,
or granular ACLs, traffic can be queued in a weighted fair fashion based on the
class that the packets are matched with.
Bandwidth can be assigned to each of
the classes to ensure that flows consume network capacity only up to the
specified limit. These queues can also be limited in terms of the number of
packets that are held in queue. When a queue defined for a particular class of
traffic becomes full, either
packets at the tail can be dropped from the queue to allow additional data to be
queued (the dropped data may require retransmission by the end node) or weighted
random early detection (WRED) can be used to drop packets based on the
configured drop policy and markings that exist within the ToS byte of the
packet. This allows the router to selectively control how much data is held in
the queue based on the drop policy, in an effort to mitigate queue congestion
and network congestion by throttling the transmission source by means of packet
loss.
Referring to Chapter
2, "Barriers to Application Performance," when packet
loss is detected by a node using a connection-oriented, reliable transport
protocol such as TCP, it triggers a decrease in transmission throughput.
Likewise, with the router selectively dropping packets before and during periods
of congestion, end-node applications using protocols such as TCP as a transport
are proactively throttled to mitigate larger-scale congestion problems. In this
way, the transmission protocol is said to be normalizing around the available
network throughput.
Other forms of queuing exist, and many
network devices allow multiple levels of queuing to be intermixed. This allows
network administrators to provide very configurable service policies for
converged networks that carry voice, video, transaction applications, and other
types of data through a common and shared infrastructure. For instance,
hierarchical queuing employs a combination of queuing architectures
concurrently. An example of hierarchical queuing is where low-latency queuing
mechanisms such as priority queuing are used for latency and loss-sensitive
applications such as voice and video, and class-based WFQ is used for
interactive applications.
With hierarchical queuing, the network
device can use the right form of queuing based on application requirements while
providing the flexibility necessary to ensure adequate levels of service for all
applications. Such a compromise can be met only when using hierarchical queuing
architectures.
Traffic Shaping
Traffic shaping, also known as rate
limiting, is applied as part of the queuing and scheduling subsystem of a
network device. Traffic shaping is used to smooth the output flow of packets
leaving a queue in order to control the amount of bandwidth that is consumed by
a particular flow or class. As described in the previous section, traffic
policing is used to drop incoming packets when a specified rate has been
exceeded. Traffic shaping, however, allows the packets exceeding the specified
rate to be received, assuming the queue has capacity. With traffic shaping, the
servicing of the queue is done at a rate equal to the configured policy or the
next-hop link. In this way, traffic shaping does not drop packets unless queue
capacity is exceeded; rather, it throttles the servicing of the queue according
to policy or network capacity.
Figure 3-14 provides
an example of traffic shaping. Notice that, compared to Figure 3-11, traffic shaping allows
packets exceeding the configured rate to be queued and serviced, whereas
policing simply drops the packets.