In this fifth article on Anycast DNS, we provide some examples of deploying Anycast using Border Gateway Protocol or BGP, the core routing protocol of the Internet. While BGP is mostly used by Internet Service Providers (ISPs), it is also used in some of the larger enterprise environments that must interconnect networks that span geographical and/or administrative regions and boundaries. Since BGP is a very complex routing protocol, we will provide only a basic recipe using Cisco and Quagga host-based routing software. A detailed discussion of the BGP protocol is beyond the scope of this article.

BGP is an Exterior Gateway Protocol (EGP), which means that it exchanges routing information between Autonomous Systems (AS). BGP is quite different from other IGPs, such as RIP and OSPF. BGP uses a different routing algorithm that uses a path vector algorithm, causing it to keep a list of every AS that the path passes through.

Our recipe will demonstrate how to configure Quagga to peer with a Cisco router using BGP. Suppose our Anycast design consists of an Autonomous System 65500 and AS 64555 as shown below. AS 64555 will contain our Anycast DNS servers and we'll establish peering between the two as shown below:

 

anycast-dns-bgp-1

The recipe calls for configuring an Anycast DNS server each with two physical network connections on different subnets or VLANs. Two upstream routers are configured with BGP routing and will peer with our Anycast DNS server. The Anycast DNS servers will be configured with BGP routing protocol for originating our two Anycast VIPs of 192.168.0.1/32 and 192.168.1.1/32. The configuration is shown in the graphic below:

anycast-dns-bgp-2

We could advertise two (2) Anycast VIPs from within the same netblock 192.168.0.0/24, such as 192.168.0.1/32 and 192.168.0.2/32. This would save address space, but we're simply trying to show by example by using VIPs from different netblocks.

Recipe - Multihomed Anycast DNS using BGP

Step 1 - Configure Anycast VIPs on "Server A"

Add two (2) Anycast VIPs to the host's loopback interface as a virtual loopback device or sub-interface. This is performed using the following command:

ifconfig lo:0 192.168.0.1 netmask 255.255.255.255
ifconfig lo:1 192.168.1.1 netmask 255.255.255.255

NOTE: The command above shows the syntax for performing this on Linux. The loopback devices are named slightly different on Sun Solaris. The loopback devices on Solaris are called lo0:0 and lo0:1 respectively.

Step 2 - Configure Zebra (component of Quagga) on "Server A"

The typical location of the zebra configuration file is /etc/quagga/zebra.conf, unless you have built Quagga with non-default file locations. Create the /etc/quagga/zebra.conf file as follows:

!
! Zebra configuration saved from vty
! 2009/06/07 09:49:00
!
hostname server_a
!
password zebra
enable password zebra
!
interface eth0
ip address 10.0.1.10/24
!
interface eth1
ip address 10.0.2.10/24
!
interface lo
!
line vty
!

Once the zebra.conf file is built, start the zebra process and configure it to start automatically at boot time. With zebra running, we can access the running configuration interactively using the vty or vtysh. Please consult the Quagga on-line help for usage at http://www.quagga.net

Step 3 - Configure BGP on "server_a"

In order to configure BGP routing on server_a, we need to configure the server to run the bgpd routing daemon. The Quagga BGP routing daemon is configured through the /etc/quagga/bgpd.conf file as follows:

!
! bgpd configuration saved from vty
!2009/06/13 11:21:42
!
hostname server_a
password zebra
log stdout
!
router bgp 64555
bgp router-id 10.0.3.10
network 192.168.0.1/32
network 192.168.1.1/32
timers bgp 4 16
neighbor 10.0.1.1 remote-as 65500
neighbor 10.0.1.1 next-hop-self
neighbor 10.0.1.1 prefix-list DEFAULT in
neighbor 10.0.1.1 prefix-list ANYCAST out
neighbor 10.0.2.1 remote-as 65500
neighbor 10.0.2.1 next-hop-self
neighbor 10.0.2.1 prefix-list DEFAULT in
neighbor 10.0.2.1 prefix-list ANYCAST out
!
ip prefix-list ANYCAST seq 5 permit 192.168.0.1/32
ip prefix-list ANYCAST seq 10 permit 192.168.1.1/32
ip prefix-list DEFAULT seq 5 permit 0.0.0.0/0
line vty
!

Start the BGPD routing daemon and enable the service to start automatically at boot time. Similar to zebra, the BGP process can be maintained and configured by using the vty or vtysh. The only interfaces in our configuration that are actively participating using BGP are eth0 and eth1. They will "peer" with their respective upstream BGP neighboring router. The eth0 peers with router R1-A, and the eth1 interface will peer with the R1-B router.

In our configuration above, we used some of the more advanced BGP configuration directives. Here is a summary of what some of them do:

  • "timers bgp 4 16" - this command adjusts the network timers for keepalive and holddown timers. On Cisco routers, this defaults to 60 and 180 respectively. This means that a keepalive is sent every 4 seconds, and the router should wait 16 seconds for keepalive messages before it declares the peer dead.
  • "neighbor 10.0.1.1 next-hop-self" - This configures "peering" by forcing routing updates to this upstream neighbor
  • "neighbor 10.0.1.1 prefix-list DEFAULT in" - this allows the ip prefix-list called "DEFAULT" to propogate the default route to this device.
  • "neighbor 10.0.1.1 prefix-list ANYCAST out" - this enables our outbound ANYCAST prefix-list to be advertised to our upstream peer

Step 4 - Configure "Server A" upstream router R1-A and R1-B with BGP

The following Cisco configuration were applied to the upstream router R1-A:

interface FastEthernet0/0
description link to BGP AS 65500
ip address 192.168.2.31 255.255.255.0
!
interface FastEthernet0/1
description link to BGP AS 64555
ip address 10.0.1.1 255.255.255.0
!
router bgp 65500
bgp log-neighbor-changes
network 10.0.1.0 mask 255.255.255.0
network 192.168.2.0
network 0.0.0.0
timers bgp 4 16
neighbor 10.0.1.10 remote-as 64555
neighbor 10.0.1.10 next-hop-self
maximum-paths 4

Perform a similar configuration to router R1-B:

interface FastEthernet0/0
description link to BGP AS 65500
ip address 192.168.2.32 255.255.255.0
!
interface FastEthernet0/1
description link to BGP AS 64555
ip address 10.0.2.1 255.255.255.0
!
router bgp 65500
bgp log-neighbor-changes
network 10.0.2.0 mask 255.255.255.0
network 192.168.2.0
network 0.0.0.0
timers bgp 4 16
neighbor 10.0.2.10 remote-as 64555
neighbor 10.0.2.10 next-hop-self
maximum-paths 4

At this point, BGP routing should be operational, and our Anycast VIPs should be advertised.

Step 5 - Create Failover Mechanism

In the event that our DNS server process on "Server A" or "Server B" fails, it is desirable to remove the Anycast VIPs from the global routing table. To do that, we must stop the routes from being advertised at their point of origination. A small script can be used to accomplish this by performing cursory checks on the health of the DNS server, and its ability to respond to queries. A simple script is used to detect issues with DNS. The script will issue queries and as soon as they fail, it will simply shutdown our routing daemon(s) or remove the routes from being advertised. The following is an example of what a script might look like:

#!/bin/bash
 
DNSUP=`/usr/sbin/dig @192.168.0.1 localhost. A +short`
if [ "$DNSUP" != "127.0.0.1" ];
then
echo "Stopping Anycast...."
    /etc/init.d/bgpd stop
    /etc/init.d/zebra stop
    /etc/init.d/named stop
else 
    echo "Everything's good... Do nothing..."
fi

The script should be scheduled in cron or at to minimize downtime and provide quick failover.

Step 6 - Repeate Steps 1-5 for all other Anycast Servers that are part of this Anycast Group.

Key BGP Troubleshooting Commands

BGP is a complex routing protocol to deploy and maintain, especially in larger enterprise network environments. A great amount of planning time is needed to achieve an efficient routing architecture that provides high availability and fast convergence. As you work with BGP, you will need to rely on a bevy of tools for troubleshooting and validating your BGP routed network. Here are some Cisco IOS commands used in configuring and/or troubleshooting BGP:

show ip bgp summary - shows BGP neighbors in summary mode

R1-A# show ip bgp summary
BGP router identifier 192.168.2.31, local AS number 65500
BGP table version is 1, main routing table version 1
6 network entries using 582 bytes of memory
6 path entries using 216 bytes of memory
2 BGP path attribute entries using 120 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 942 total bytes of memory
BGP activity 6/0 prefixes, 6/0 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.1.10       4 64555       4       3        0    0    0 00:00:02        2
10.0.2.10       4 64555       3       3        0    0    0 00:00:00        0

The output shown above by the show ip bgp summary displays a lot of useful information, including the local router identifier for router R1-A as 192.168.2.31, the local AS of 65500, and the BGP table version of 1. (An increasing version number indicates a network change is occurring; if no changes occur, this number remains the same.) It also shows six network paths on R1-A, using 582 bytes of memory. Memory is important in BGP because in a large network, such as the Internet, memory can be a limiting factor. As more BGP entries populate the IP routing table, more memory is required. The above output displays two configured remote peers: both are EBGP (because the AS is 64555 and are different the same as the local AS).

show ip bgp - displays the BGP topology table

R1-A# show ip bgp
BGP table version is 7, local router ID is 192.168.2.31
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 0.0.0.0          192.168.2.1              0         32768 i
*> 10.0.1.0/24      0.0.0.0                  0         32768 i
*> 10.0.2.0/24      0.0.0.0                  0         32768 i
*> 192.168.0.1/32   10.0.2.10                0             0 64555 i
*                   10.0.1.10                0             0 64555 i
*> 192.168.1.1/32   10.0.2.10                0             0 64555 i
*                   10.0.1.10                0             0 64555 i
*> 192.168.2.0      0.0.0.0                  0         32768 i

The BGP table version is displayed as 7 and the local router ID is 192.168.2.31. The various networks are listed along with the next hop address, metric (MED), local preference (Locpref), weight, and the path. The i on the left side (part of the status codes) indicates an internal BGP route and the i on the right side of our example indicates the origin. (i is for IGP, part of the origin codes.)

show ip bgp neighbors - displays BGP neighbors in detail

R1-A# show ip bgp neighbors
BGP neighbor is 10.0.1.10,  remote AS 64555, external link
  BGP version 4, remote router ID 10.0.1.10
  BGP state = Established, up for 00:05:07
  Last read 00:00:02, hold time is 16, keepalive interval is 4 seconds
  Configured hold time is 16, keepalive interval is 4 seconds
  Neighbor capabilities:
    Route refresh: advertised and received(old & new)
    Address family IPv4 Unicast: advertised and received
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent       Rcvd
    Opens:                  1          1
    Notifications:          0          0
    Updates:                2          1
    Keepalives:            79         63
    Route Refresh:          0          0
    Total:                 82         65
  Default minimum time between advertisement runs is 30 seconds

 For address family: IPv4 Unicast
  BGP table version 7, neighbor version 7
  Index 3, Offset 0, Mask 0x8
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               6          2 (Consumes 72 bytes)
    Prefixes Total:                 6          2
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          0
    Used as multipath:            n/a          2

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Total:                                0          0
  Number of NLRIs in the update sent: max 4, min 0

  Connections established 1; dropped 0
  Last reset never
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Local host: 10.0.1.1, Local port: 179
Foreign host: 10.0.1.10, Foreign port: 48101

Enqueued packets for retransmit: 0, input: 0  mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x5E1F8):
Timer          Starts    Wakeups            Next
Retrans            84          0             0x0
TimeWait            0          0             0x0
AckHold            67         64             0x0
SendWnd             0          0             0x0
KeepAlive           0          0             0x0
GiveUp              0          0             0x0
PmtuAger            0          0             0x0
DeadWait            0          0             0x0

iss:  915421219  snduna:  915422937  sndnxt:  915422937     sndwnd:   5840
irs: 4113695520  rcvnxt: 4113696868  rcvwnd:      15037  delrcvwnd:   1347

SRTT: 300 ms, RTTO: 303 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 300 ms, ACK hold: 200 ms
Flags: passive open, nagle, gen tcbs

Datagrams (max data segment is 1460 bytes):
Rcvd: 152 (out of order: 0), with data: 67, total data bytes: 1347
Sent: 148 (retransmit: 0, fastretransmit: 0), with data: 83, total data bytes: 1717

BGP neighbor is 10.0.2.10,  remote AS 64555, external link
  BGP version 4, remote router ID 10.0.1.10
  BGP state = Established, up for 00:05:19
  Last read 00:00:04, hold time is 16, keepalive interval is 4 seconds
  Configured hold time is 16, keepalive interval is 4 seconds
  Neighbor capabilities:
    Route refresh: advertised and received(old & new)
    Address family IPv4 Unicast: advertised and received
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent       Rcvd
    Opens:                  1          1
    Notifications:          0          0
    Updates:                1          1
    Keepalives:            82         65
    Route Refresh:          0          0
    Total:                 84         67
  Default minimum time between advertisement runs is 30 seconds

 For address family: IPv4 Unicast
  BGP table version 7, neighbor version 7
  Index 4, Offset 0, Mask 0x10
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               4          2 (Consumes 72 bytes)
    Prefixes Total:                 4          2
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          2
    Used as multipath:            n/a          2

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Bestpath from this peer:              2        n/a
    Total:                                2          0
  Number of NLRIs in the update sent: max 4, min 0

  Connections established 1; dropped 0
  Last reset never
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Local host: 10.0.2.1, Local port: 179
Foreign host: 10.0.2.10, Foreign port: 39231

Enqueued packets for retransmit: 0, input: 0  mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x60E88):
Timer          Starts    Wakeups            Next
Retrans            88          0             0x0
TimeWait            0          0             0x0
AckHold            69         51             0x0
SendWnd             0          0             0x0
KeepAlive           0          0             0x0
GiveUp              0          0             0x0
PmtuAger            0          0             0x0
DeadWait            0          0             0x0

iss: 2991828195  snduna: 2991829917  sndnxt: 299



1829917 sndwnd: 5840 irs: 4144867550 rcvnxt: 4144868936 rcvwnd: 14999 delrcvwnd: 1385 SRTT: 300 ms, RTTO: 303 ms, RTV: 3 ms, KRTT: 0 ms minRTT: 0 ms, maxRTT: 300 ms, ACK hold: 200 ms Flags: passive open, nagle, gen tcbs Datagrams (max data segment is 1460 bytes): Rcvd: 157 (out of order: 0), with data: 69, total data bytes: 1385 Sent: 139 (retransmit: 0, fastretransmit: 0), with data: 87, total data bytes: 1721

Our output above shows the BGP neighbors in greater detail.

This concludes our high-level recipe on using BGP to configure Anycast DNS services. It also marks the final article in the Anycast DNS Recipe Series.

Comments   

0 #14 Matt 2016-01-15 08:23
How would this work with 1 public /24 for global anycast?
Quote
0 #13 Patrick Piper 2015-08-25 17:34
Quoting DRH:
Quick (hopefully) and naive question - where do prefix-lists like DEFAULT, ANY and ANYCAST come from? Are they automagically set up by the software? Are there any other such special lists that exist? I've seen them used, but not defined...

(BTW, looks like there's a small typo in bgpd.conf above. Shouldn't that be "hostname server_a"?)

Thanks... no, those are ACL names that one must come up with. You supply the prefix ACL name(s). Sorry... Thanks for pointing out the type-o. I think i got it. Cheers
Quote
0 #12 DRH 2015-07-17 15:20
Quick (hopefully) and naive question - where do prefix-lists like DEFAULT, ANY and ANYCAST come from? Are they automagically set up by the software? Are there any other such special lists that exist? I've seen them used, but not defined...

(BTW, looks like there's a small typo in bgpd.conf above. Shouldn't that be "hostname server_a"?)
Quote
0 #11 Patrick 2014-02-06 17:01
In general nice article, however..
In your first diagram you show 2 clouds of AS64555 connecting back to your main network - AS65500, but further in the article you are in fact "deploying" single instance of AS64555 that is dual-homed to AS65500. To this point it all makes sense, but now when you consider "deploying" second instance of AS64555 that connects back to AS65500 somewhere else you are likely to hit a problem, or in fact intended behaviour of the eBGP setup you used - BGP path to only one instance of AS64555 will be marked as "best" and used within's whole AS65500's iBGP cloud.
When I first looked at DNS anycast for my network I have lab'ed it above way and faced this problem. Haven't found solution other then redesigning the setup in one of the two ways:
-make DNS anycast BGP part of iBGP
-use IGP (OSPF/ISIS) for connecting servers and then redistribute to BGP
I was wondering if I'm missing some nice and simple solution?
Rgds
Sergiusz
Quote
0 #10 Patrick Piper 2012-07-24 18:22
Karsen,

First, thanks for such a great question. I must say when i set out to write this series of articles, i really never gave it much thought. I remain convinced by the use of loopbacks for anycast. F-ROOT was done that way, all the reference materials i've ever used, do it that way as well. The IPv6 Anycast RFC even explicitly mandates the use of Loopback interfaces.

The major DNS/DHCP/IPAM product vendors do it this way.

I don't see any PROs in using an alias to a physical interface only CONs as opposed to using Loopback interfaces.

- issues with ARP when you have multiple systems on the same VLAN advertising the same VIP (cannot do this as easily on physical interface w/o disabling ARP)
- administrativel y it's more complex and has more risk
- cannot segregate the traffic from the physical interface where as loopback offers complete isolation
- situations where routes cannot be withdrawn b/c routing process detects that the interface is still up and running.
Quote
0 #9 Karsen 2012-07-24 18:02
Pros and cons. Well, lets conclude that the loopback solution is kinda less fallible. I shall bookmark your site!

Thank you for your time.

P.S. And hey, please have someone fix that damn name field... :)
Quote
0 #8 Patrick Piper 2012-07-24 09:56
Quoting Karsen:
I won't ifdown the physical interface but the
physical interface alias. In other words, I
will remove the second subnet of the physical
interface, then the dynamic routing protocol
will stop the announcements to the router,
skipping to the next closest anycast domain.

The point is that I can have multiple subnets
to one physical interface and I can just remove
the given subnet. No need to bring down the
whole interface.

({R1}1.1.1.1) -BGP- ({server}eth0-1.1.1.2|second subnet-2.2.2.0/24)

When I remove the second subnet on the server,
the BGP will stop the announcements because
it won't know where to route the packets to.

Got it?

I still maintain it's easier, safer, and more flexible to use loopback interfaces. Suppose you wish to have three physical Anycast DNS servers on the same VLAN segment and be able to advertise the same VIP across all three systems. You would not want these sub-interfaces to perform ARP requests on the network - Duplicate IP Addresses would exist if you used sub-interfaces of a physical. By using loopback interfaces you isolate broadcasts. Yes, you can squelch ARP, but with Loopbacks you don't need to.

I do understand what you are suggesting re: BGP to remove the route. So, the scenario i painted has more to do with other dynamic routing situations. There are known side affects of routes persisting when using physical interface aliases.
Quote
0 #7 Karsen 2012-07-24 09:32
I won't ifdown the physical interface but the
physical interface alias. In other words, I
will remove the second subnet of the physical
interface, then the dynamic routing protocol
will stop the announcements to the router,
skipping to the next closest anycast domain.

The point is that I can have multiple subnets
to one physical interface and I can just remove
the given subnet. No need to bring down the
whole interface.

({R1}1.1.1.1) -BGP- ({server}eth0-1 .1.1.2|second subnet-2.2.2.0/24)

When I remove the second subnet on the server,
the BGP will stop the announcements because
it won't know where to route the packets to.

Got it?
Quote
0 #6 Patrick Piper 2012-07-24 09:05
Quoting Karsen:
Patrick,

Thank you for your prompt response, I'm impressed!

Correct. But the point is about the subnet or the ip
address you wan't to connect to, no the interface
itself, right? It sounds to me more like precaution
measure. So, lets see the following brief example
for clarification:

client - {router[2.2.2.2/24 via .1.2/32]} - [eth0-1.1.1.2/32|eth0:0-2.2.2.2/24]server

Remove the route to 2.2.2.0/24 you won't be able to reach
the services listening on 2.2.2.2. Isn't that the same
effect?


When you are using a dynamic routing protocol to remove that route, you won't be able to effectively remove that route as long as the interface is still up and connected. It is much simpler to perform ifdown on a loopback alias than downing a physical interface. So, in your scenario, you would require at least two (2) physical connections to the switch - one that MUST always stay up and another that you could turn down administrativel y.

Switch ports do cost some money.

But that's the real reason folks use loopback alias - it's easy to down when there's a DNS service failure and you want route withdrawal.
Quote
0 #5 Karsen 2012-07-24 08:12
Patrick,

Thank you for your prompt response, I'm impressed!

Correct. But the point is about the subnet or the ip
address you wan't to connect to, no the interface
itself, right? It sounds to me more like precaution
measure. So, lets see the following brief example
for clarification:

client - {router[2.2.2.2 /24 via .1.2/32]} - [eth0-1.1.1.2/32|eth0:0-2.2.2.2/24]server

Remove the route to 2.2.2.0/24 you won't be able to reach
the services listening on 2.2.2.2. Isn't that the same
effect?
Quote

Add comment


Security code
Refresh