BFD Basics Explained or: How I Learned the BFD with BFD

Contents

Bidirectional Forwarding Detection is a standards based protocol defined in RFC 5880, whose sole purpose is to detect communication failure between two devices quickly and efficiently. Even though BFD is responsible for detecting the failures, any BFD capable protocols (IS-IS, OSPF, BGP etc.) can use this information and make their decisions independently.

Since BFD is responsible for speedy failure detection, the routing protocols themselves do not have to use aggressive hello/keepalive timers, the use of which can put considerable burden on a device’s CPU.

BFD’s Purpose

BFD’s purpose and function are streamlined which allow it to be lightweight in nature.

Things BFD does:

It performs fast failure detection in a lightweight manner
If possible, it does not interrupt the CPU for detection
It forwards this information to its client protocols for them to react

Things BFD does not do:

Does not discover neighbors
- It depends on the routing protocols for this data
- The routing protocol requesting BFD is responsible for providing this data
Does not work in point-to-multipoint/multipoint-to-multipoint fashion
- This flows from the second sub-bullet above
- Since the routing protocol provides neighbor unicast IP information to BFD, it simply reduces a P2MP/MP2MP topology to a collection of P2P links
- Consequently, BFD packets always have unicast destinations
Does not run unless requested by another protocol
- BFD has no purpose without a requesting protocol

Once a failure is detected, BFD can then inform its clients (usually routing protocols) about this failure allowing them to respond to the event.

BFD Modes

There are two BFD modes described in the Standards Track:

Asynchronous mode
Demand mode (not supported by most major vendors)

Note:- This post will only discuss Asynchronous mode which is by far the most deployed mode.

In addition, either of the above modes can be further optimized by utilizing the Echo function. It is important to note that the Echo function is not a mode but as the RFC describes it, an “adjunct to both modes”. This essentially means that the Echo function can be implemented on top of a base BFD mode but cannot function independently.

Asynchronous Mode

asynchronous
adjective asyn·chro·nous, (ˌ)ā-ˈsiŋ-krə-nəs , -ˈsin-

1 not simultaneous or concurrent in time; not synchronous
2 of, used in, or being digital communication (as between computers) in which there is no timing requirement for transmission and in which the start of each character is individually signaled by the transmitting device

In asynchronous mode, true to the word’s definition, the BFD peers simply send packets to each other at an agreed upon interval (details follow) and if a few packets go missing, the neighbor is declared to be down. The peers make this decision independently. Furthermore, each peer can use different values for the timers associated with BFD peerings.

To understand Asynchronous mode, one must understand what information the peers share with each other and how it is exchanged. The peers exchange information with each other within BFD control packets.

BFD Control Packet

         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |Vers |  Diag   |Sta|P|F|C|A|D|M|  Detect Mult  |    Length     |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                       My Discriminator                        |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                      Your Discriminator                       |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                    Desired Min TX Interval                    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                   Required Min RX Interval                    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |                 Required Min Echo RX Interval                 |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fields of the control packet (important ones, succinctly. Refer to RFC 5880 for further details):

Discriminators – Differentiate between multiple BFD sessions that may exist between the two neighbors
Desired Min TX Interval (msecs) – How fast the sender would like to transmit BFD packets
Required Min RX Interval (msecs) – The minimum interval between received BFD packets that the sender can support (as a receiver, of course)
- A peer sending BFD packets to this device must wait at least this long between transmissions
- This value essentially informs the peer of how many packets the local device can handle every time period
Min Echo RX Interval – Discussed under the Echo function post (second in this series)

The two peers can examine each other’s packets to deduce how frequently they can send BFD packets to their neighbor. Theoretically, each device should send packets as fast as it and it peer’s capability allows.

Although it may not be apparent at first sight, there is a reason for each timer field. This post illustrates this using three case studies in order to look at examples with actual numbers and understand the timer negotiation. (Note:- For those familiar with BFD, the following example does not use the echo mode).

BFD Timers Example 1

Consider two routers – R1 is connected to R2. They are routing peers and negotiating BFD. Currently, the timer values they advertise in their packets are as follows:

Device	Interval	Min_Rx	Multiplier
R1	50 ms	50 ms	3
R2	50 ms	50 ms	3

Since they are using the same timer settings, explaining R1’s timers explains R2’s timers as well. R1 essentially reports that it is:

Capable of sending BFD packets once every 50 ms
Capable of receiving BFD packets once every 50 ms
If a receiver misses 3 BFD packets from R1, the peering should be considered down.

R2 receives this packet and compares the values with the locally configured value for the interval. Remember that BFD is designed to operate as fast as the peering will allow. R2 can send a packet every 50ms and R1 can receive a packet every 50 ms. Thus, it is a simple decision for R2: it can send packets every 50 ms.

Note:- Using the same logic as above for R1, R1 will also send packets every 50 ms.

Finally, if either R1 or R2 does not receive 3 consecutive packets from each other they can independently declare the adjacency down and reflect that in the flags available in the packet (shown above).

BFD Timers Example 2

Moving beyond that simple example, here is an example where the numbers do not match but are symmetric:

Device	Interval	Min_Rx	Multiplier
R1	50 ms	150 ms	3
R2	50 ms	150 ms	3

This time, R2 is still capable of sending a packet every 50 ms but now R1 is unwilling to receive packets that fast. It can only receive a packet every 150 ms. R2 now has no choice but to use the higher value and will transmit a packet only every 150 ms. Notice how BFD once again manages to run as fast the slower peer allows but no slower. R1 performs exactly the same calculation as R2 to arrive at the same result, 150 ms.

BFD Timers Example 3

One final example where the numbers neither match, nor are symmetric:

Device	Interval	Min_Rx	Multiplier
R1	50 ms	200 ms	3
R2	500 ms	150 ms	3

This time the calculation is unique for R1 and R2. R1 compares its preferred interval of 50 ms to R2’s Min_Rx which is 150 ms. The calculation here is actually the same as the last example and for the same reasons – 150 ms.

R2 is more interesting and involved. First R2 performs a calculation comparing its preferred interval of 500 ms to R1’s Min_Rx which is 200 ms. There is no problem for R2 to use its preferred interval, in this case, so it sends packets every 500 ms. The reason this interval is advertised to the peer in the first place, is to allow R1 to perform the same calculation as R2, and arrive at the value R2 will use as the interval – 500 ms. If R1 did not have this data available, it would not be able to know R2’s chosen interval. Once again, the final piece of information here is the multiplier which informs each device of the number of missed packets (consecutive) that would bring the peering down.

The next post in this series will tackle the Echo mode.