Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the limit-login-attempts-reloaded domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /mnt/plesk_vhosts/vhosts/andonguyen.com/httpdocs/wp-includes/functions.php on line 6114
Active-Active DCs Networking Problem – An Do Nguyen's Blog
Active-Active DCs Networking Problem

Active-Active DCs Networking Problem

 

Problem

One interesting problem I was asked to solve from a networking aspect was:

How to build an Active-Active DCs (Data Centers)?

How do we ensure the ingress and egress traffic flows use the most efficient path especially around host mobility?

Background

Why would an organisation want Active-Active DCs?

From a business point of view, we want to run everything hot and maximise our return on investment. Why should we have resources idling by for an off-chance that our primary DC has a catastrophic failure?

From a technical point of view, we want to give our developers the flexibility to host their apps anywhere and not hinder them due to network restrictions. We also want to be able to dynamically migrate our services to provide better end-user experience.

Solution

Stage 1

So let’s imagine a simple DC topology and cut it down to the bare minimal.

In this scenario, there are two DCs and a HQ (Headquarters) connected together. There is a PC (Personal Computer) in the HQ that needs to talk to two VMs (Virtual Machines) in Sydney DC.

There is routing between all 3 GWs (Gateways) via OSPF. The PC’s subnet is advertised by Brisbane HQ GW and the VMs’ subnet is advertised by Sydney DC GW. Finally, a VLAN provides a bridge domain for Sydney DC GW, VM1 and VM2 via the Sydney DC Switch. Under steady state, everyone is reachable.

Stage 2

The company decides it wants to move VM2 to Melbourne DC but keep the IP address the same.

How can we achieve communications between PC and both VMs now?

I feel there are 3 pieces of this puzzle that needs to be solved.

  • Stretching the bridge domain
  • Egress routing/traffic flow
  • Ingress routing/traffic flow

1. Stretch The Bridge Domain

To enable the migration without changing IP address, we need to stretch the VLAN to Melbourne DC. This can be solved a number of ways via a DCI (Data Center Interconnect). The chosen solution will depend on complexity, features, cost and personnel skill sets.

Some of the popular options are:

  • Dark fibre
  • L2VPN service (ATOM or VPLS)
  • Overlay technology (EVPN VXLAN, OTV via Cisco or Logical Switch via NSX)

2. Egress Routing/Traffic Flow

At this point, there is communication but traffic to VM2 from PC is tromboning via Sydney DC. The egress traffic from VM2 can be influenced to exit out of Melbourne DC, via multiple ways.

Some are the popular options are:

  • Filtered FHRP
  • Anycast FHRP
  • Local egress via ESG on NSX

3. Ingress Routing/Traffic Flow

This is the difficult aspect that took me a while to solve. How do we influence HQ to send the traffic to the closer GW? As the same subnet will be advertised by both GW.

At this point, I was considering manipulating the routing table. There are three methods to manipulate a routing table:

  • Manipulate via Administration Distance

Inject a more believable route via a different routing protocol, such as EIGRP or static route. However this will just move the subnet to Melbourne DC and introduce the complexity of another routing protocol.  It will also require some manual process so it won’t scale.

  • Manipulate the cost in OSPF

Change the cost of the OSPF links. This will just move the subnet to Melbourne DC and requires some manual process, so it won’t scale

  • Inject a more specific route

Inject a /32 static route on Melbourne DC GW and leave the original subnet on Sydney DC. This is the most plausible however it also has a scaling issues without the assistance of an orchestrator.

Well, how about we use a new protocol?

LISP is great solution to address host mobility. Instead of a host just having an IP addresses they also have the concept of a RLOC (Routing Locator). This helps identify the gateway that the host is attached to. Using this new name address space and a mapping concept (similar to DNS). Gateways are able to query where host are and build direct communication to each other.

Conclusion:

From a networking point of view, it is possible to provide Active-Active DCs. Using a mixture of LISP, VXLAN and Anycast FHRP, we are able to provide the most efficient connectivity between the PC and VMs.

Leave a Reply

Your email address will not be published. Required fields are marked *