CCIE Routing and Switching: April 2010

Thursday, April 8, 2010

MPLS Label Assignment and Distribution

Label Distribution Protocol (LDP) and Tag Distribution Protocol (TDP) exchange labels and store the information in the label information base (LIB).

A label is added to the IP forwarding table (forwarding information base, or FIB) to map an IP prefix to a next-hop label.

A locally generated label is added to the label forwarding information base (LFIB) and mapped to a next-hop label.

An LSP is a sequence of LSRs that forward labeled packets for a particular FEC. Each LSR swaps the top label in a packet traversing the LSP. An LSP is similar to Frame Relay or ATM virtual circuits. In cell-mode MPLS, an LSP is a virtual circuit.

Impacts of IP Aggregation

Aggregation (or summarization) should not be used on ATM LSRs because it breaks LSPs in two, which means that ATM switches would have to perform Layer 3 lookups.

Aggregation should also not be used where an end-to-end LSP is required. Typical examples of networks that require end-to-end LSPs are the following:

A transit BGP autonomous system (AS) where core routers are not running BGP
An MPLS VPN backbone
An MPLS-enabled ATM network
A network that uses MPLS TE

Frame-Mode Loop Detection

The TTL functionality in MPLS is equivalent to that of traditional IP forwarding. Furthermore, when an IP packet is labeled, the TTL value from the IP header is copied into the TTL field in the label. This is called “TTL propagation.”

TTL propagation can be disabled to hide the core routers from the end users. Disabling TTL propagation causes routers to set the value 255 into the TTL field of the label when an IP packet is labeled.

If TTL propagation is disabled, it must be disabled on all routers in an MPLS domain to prevent unexpected behavior.

TTL can be optionally disabled for forwarded traffic only, which allows administrators to use traceroute from routers to troubleshoot problems in the network.

Penultimate Hop Popping

PHP optimizes MPLS performance by reducing the number of table lookups on the egress router.

PHP is not supported on ATM devices because a label is part of the ATM cell payload and cannot be removed by the ATM switching hardware.

Per-Platform Label Allocation

There are two possible approaches for assigning labels to networks:

* Per-platform label allocation: One label is assigned to a destination network and announced to all neighbors. The label must be locally unique and valid on all incoming interfaces. This is the default operation in frame-mode MPLS.

* Per-interface label allocation: Local labels are assigned to IP destination prefixes on a per-interface basis. These labels must be unique on a per-interface basis.

MPLS Convergence

The overall convergence in an MPLS network is not affected by LDP convergence when there is a link failure.

Frame-mode MPLS uses liberal label retention mode, which enables routers to store all received labels, even if they are not being used.

These labels can be used, after the network convergence, to enable immediate establishment of an alternative LSP tunnel.

Cell-Mode Issues

Cell-mode MPLS is significantly different from frame-mode MPLS because of some ATM-specific requirements:

* ATM uses cells and not frames. A single packet may be encapsulated into multiple cells. Cells are a fixed length, which means that normal labels cannot be used because they would increase the size of a cell. The virtual path identifier/virtual channel identifier (VPI/VCI) field in the ATM header is used as the MPLS label. An LSP tunnel is therefore called a virtual circuit in ATM terminology.
* ATM switches and routers usually have a limited number of virtual circuits that they can use. MPLS establishes a full mesh of LSP tunnels (virtual circuits), which can result in an extremely large number of tunnels.

Because ATM switches cannot forward IP packets, labels cannot be asynchronously assigned and distributed.

Instead, the router initiates an ordered sequence of requests on the upstream side of the ATM network.

It is not until the request is answered with the label and assigned to destinations in the IP routing table that the forwarding table is populated.

An ordered sequence of downstream requests is followed by an ordered sequence of upstream replies. This type of operation is called downstream-on-demand allocation of labels.

Two virtual circuits can merge into one. Standard ATM virtual switching hardware does not support this situation, and as a result, segmented packets from the two sources may become interleaved.

There are two possible solutions to this problem:

* Allocate a new downstream label for each request. This solution would result in a greater number of labels.
* Buffer the cells of the second packet until all cells of the first packet are forwarded. This solution results in an increased delay of packets because of buffering.

The major benefit of VC merge is that it minimizes the number of labels (VPI/VCI values) needed in the ATM part of the network.

The major drawbacks to VC merge are as follows:

* Buffering requirements increase on the ATM LSR.
* There is an increase in delay and jitter in the ATM network.
* ATM networks under heavy load become more like frame-based networks.

Loop Detection in Cell-Mode MPLS Networks

Cell-mode MPLS uses the VPI/VCI fields in the ATM header to encode labels. These two fields do not include a TTL field. Therefore, cell-mode MPLS must use other ways of preventing routing loops.

LDP uses a hop-count TLV (type, length, value) attribute to count hops in the ATM part of the MPLS domain.

This hop count can be used to provide correct TTL handling on ATM edge LSRs on behalf of ATM LSRs that cannot process IP packets.

A maximum limit in the number of hops can also be set.

Per-Interface Label Allocation

Cell-mode MPLS defaults to using per-interface label space because ATM switches support per-interface VPI/VCI values to encode labels.

Therefore, if a single router has two parallel links to the same ATM switch, two LDP sessions are established and two separate labels are requested.

Label Distribution Parameters

The two label space options are:

* Per-interface label space, where labels must be unique for a specific input interface
* Per-platform label space, where labels must be unique for the entire platform (router)

The two options for label generation and distribution are as follows:

* Unsolicited downstream distribution of labels is used in frame-mode MPLS, where all routers can asynchronously generate local labels and propagate them to adjacent routers.
* Downstream-on-demand distribution of labels is used in cell-mode MPLS, where ATM LSRs have to request a label for destinations found in the IP routing table.

Another aspect of label distribution focuses on how labels are allocated:

* Frame-mode MPLS uses independent control mode, where all routers can start propagating labels independently of one another.
* Cell-mode MPLS requires LSRs to already have the next-hop label if they are to generate and propagate their own local labels. This option is called ordered control mode.

The last aspect of label distribution looks at labels that are received but not used:

* Frame-mode MPLS may result in multiple labels being received but only one being used. Unused labels are kept, and this mode is usually referred to as liberal label retention mode.
* Cell-mode MPLS keeps only labels that it previously requested. This mode is called conservative label retention mode.

LDP Session Establishment

LDP is a standard protocol used to exchange labels between adjacent routers. TDP) is a Cisco proprietary protocol that has the same functionality as LDP.

LDP periodically sends hello messages. The hello messages use UDP packets with a multicast destination address of 224.0.0.2 (“all routers on a subnet”) and destination port number of 646 (711 for TDP).

If another router is enabled for LDP (or TDP), it will respond by opening a TCP session with the same destination port number (646 or 711).

ATM LSRs establish the IP adjacency across the MPLS control virtual circuit, which by default has a VPI/VCI value of 0/32.

An IP routing protocol and LDP (or TDP) use this control virtual circuit to exchange IP routing information and labels.

Some Cisco devices use the Virtual Switch Interface (VSI) protocol to create entries in the LFIB table (ATM switching matrix of the data plane) based on the information in the LIB table (control plane). This protocol is used to dynamically create virtual circuits for each IP network.

MPLS Concepts

The two major elements of MPLS architecture are the control plane and the data plane.

* The control plane exchanges routing information (with routing protocols such as OSPF) and labels with protocols such as LDP or TDP
* The data plane is the forwarding engine

MPLS labels maintain how to forward information. They function differently depending on whether MPLS is functioning in frame-mode or cell-mode.

* In frame-mode MPLS labels are 32-bit fields inserted between the Layer 2 and Layer 3 headers. These are broken into the following
o 20-bit label
o 3-bit experimental field
o 1 bit bottom-of-stack indicator
o 8-bit TTL field
* In cell-mode the ATM header is the label

A label switch router (LSR) is a device that forwards based on labels.

An edge LSR labels and removes labels from packets.

LSRs that perform cell-mode MPLS are divided into the following categories:

* ATM LSRs if they are ATM switches. All interfaces are enabled for MPLS, and forwarding is done based only on labels.
* ATM edge LSRs if they are routers connected to an MPLS-enabled ATM network.

Forwarding equivalence class (FEC) describes the forwarding characteristic of a packet, such as the destination IP.

MPLS is used for the following applications:

* Unicast IP routing
* Multicast IP routing
* MPLS traffic engineering provides more efficient link use
* Differentiated Quality of Service
* MPLS VPNs - Separate customer routing information across the MPLS backbone
* Any Transport over MPLS - Transport Layer 2 packets over an MPLS backbone

Thursday, April 1, 2010

Optimizing BGP Scalability

This is due not only to the number of parameters and attributes you can modify for the protocol, but also to the sheer size of the routing table(s) you could deal with! This final chapter of BGP focuses on four ways of optimizing BGP operation when dealing with these enormous routing tables:

* Reducing BGP convergence time
* Limiting the number of BGP prefixes from a neighbor
* Using BGP peer groups
* Configuring route dampening

Reducing BGP Convergence Time

The creators of the BGP routing protocol designed it for slow convergence. Although this seems illogical, it becomes clear when you realize the sheer size of a BGP network. If BGP propagated routes quickly, a single, flapping network could cause an instant worldwide routing table recalculation. Considering the number of flapping routes that exist on a daily basis, this would be disastrous.

Using a variety of BGP configuration commands, you are able to lower the convergence time of BGP. If you are dealing with Internet-sized routing tables, Cisco recommends that you do NOT adjust the following timers. However, if you are using BGP to manage an enterprise-sized routing table, modifying the following timers can increase network performance and convergence time.

There are two timers you can adjust to lower the convergence time of BGP: the scanner interval and the hello interval.

The scanner interval is how often the BGP routing process “walks through” the BGP routing table and ensures all routes are still reachable. By default, this occurs once every 60 seconds. By lowering this interval you allow BGP to modify the table more quickly in the event that a next-hop address becomes unreachable. Keep in mind that decreasing this interval does adversely affect your router CPU load. Use the following syntax to modify the scanner interval:

Router(config-router)# bgp scan-time seconds

The hello interval is how often BGP sends a hello message to a neighboring router. By default, BGP sends hello messages every 30 seconds for EBGP neighbors and every 5 seconds for IBGP neighbors. By decreasing this interval, the BGP routing process can detect a disconnected neighbor sooner resulting in faster convergence. Use the following syntax to modify the hello interval:

Router(config-router)# neighbor ip_address advertisement-interval seconds

Limiting the number of BGP prefixes from a neighbor

This feature allows you to limit the number of route advertisements you receive from a particular neighbor. This is necessary to protect yourself from a misconfigured neighbor who could send multiple copies of the Internet routing table to your router. This would quickly result in a memory overflow and potentially cause the router to crash. Use the following syntax to limit the number of prefixes you can receive from a neighbor:

Router(config-router)# neighbor ip_address maximum-prefix number_of_prefixes [threshold] [warning-only] [restart minutes]

Following is a description of the optional arguments for the maximum-prefix syntax:

threshold – This is a number from 0-100 representing a percentage. When a router reaches this percentage of prefixes (in relation to the maximum number of prefixes), it will begin generating warning messages.

warning-only – This causes the BGP router process to ONLY send warning messages when the neighbor exceeds the maximum number of prefixes. The default behavior is to drop the neighbor connection.

restart minutes – This instructs the router to try to re-establish the session after the specified interval in minutes

Using BGP peer groups

BGP peer groups are primarily designed to ease BGP neighbor configuration. However, peer groups also provide a slight performance boost. Peer groups allow you to group common neighbor parameters under a peer group name. This is useful if you have many BGP neighbors with similar parameters. You can then assign all the neighbors to a common peer group rather than assigning all the neighbor parameters individually. The syntax to create a peer group is as follows:

Peer group creation

Router(config-router)# neighbor peer_group_name peer-group

Router(config-router)# neighbor peer_group_name (assign parameters to the peer group such as remote-as, route-map, filter-list, etc…)

Assigning peer groups

Router(config-router)# neighbor ip_address peer_group_name

Configuring Route Dampening

Because the Internet is such a large entity, the probability for routing table changes is extremely high. At any given time of day or night, there are routes being added and removed from the BGP routing table. When a router connected to the Internet is failing, a common symptom is the connection going up and dropping continuously. Administrators commonly refer to this as route flapping. Uninhibited route flapping can cause constant, worldwide BGP routing table changes, thus decreasing Internet performance.

Route dampening is a method that allows a service provider to detect flapping routes and suppress them. This keeps a route that could potentially flap for hours or even days from propagating across the Internet. The architecture of route dampening is fairly easy to understand. When a route flaps (goes down and back up), the service provider assigns that route a penalty. After a route has been assigned too many penalties, the service provider suppresses the route and no longer advertises it for a certain amount of time.

Before you can understand the configuration of route dampening, you must understand the terminology:

Suppress Limit – The penalty limit at which a route is suppressed. Once a route reaches this limit, it is no longer advertised.

Reuse Limit – The point at which the route is re-advertised to the Internet. Once the penalty assigned to a route reaches this amount, the service provider will re-advertise the route. (in addition, service provider erases all penalties assigned to a route once the penalty drops below half of the reuse limit)

Maximum Suppress Limit – The maximum amount of time the service provider will suppress a route.

Now that you understand the foundation terms, here is the syntax to configure route dampening:

Router(config-router)# bgp dampening [half-life reuse suppress max-suppress-time]

half-life – How long before the service provider reduces the penalty of a route by half

reuse – The penalty value at which a route is reused

suppress – The penalty value at which a route is suppressed

max-suppress-time – The maximum amount of time a route can be suppressed

Scaling Service Provider Networks

Here are some guidelines for scaling service provider networks:

* BGP carries customer and provider routes
* IGPs carry only internal routes used to supply routers with an understanding of the next-hop-IP. This may include loopback IPs for IBGP neighborships.
* Do not redistribute BGP into your IGP
* IBGP does not scale well as a full mesh, and create too much update traffic
* Use route-summarization whenever possible

Route Reflectors overcome the full mesh requirement of IBGP neighborship.

Here is how a route reflector will behave.

* When a router receives an update from an external peer, it will propagate that advertisement to all peers (eBGP and iBGP).
* When a router receives an update from a non-client internal peer, if it is a router reflector, it will propagate that advertisement to all clients and eBGP peers.
* When a route reflector receives an update from a client, it will be reflected to all iBGP peers.

Route-reflectors may be single points of failure unless clusters are used. Clusters allow for redundancy without problems such as routing loops.

A hierarchy of route-reflectors may be used to overcome scaling very large autonomous systems.

Confederations allow a large autonomous system to be carved up into smaller AS numbers. To the outside world, the autonomous systems participating in the BGP confederation are seen as a single AS. This can help overcome scalability by reducing peering.

An iBGP full mesh is needed for member-autonomous systems. eBGP neighborships can be used in any manner to provide connectivity between all participating member-ASs.

Important Commands:

bgp cluser-id cluster-id – Configured the route reflector cluster

neighbor ip-address route-reflector-client – Informs a route reflector of its clients

router bgp member-as-number – Configures the member-AS of a router within a confederation

bgp confederation identifier external-as-number – Configures the external AS

bgp confederation peers list-of-intra-confederation-as – Informs an intermember EBGP speaker in a confederation of the other member-autonomous systems participating in the confederation

CCIE Routing and Switching