TheSerialClouder: AWS Gateway Load Balancer (GWLB)

The goal of this post is to test another AWS Cloud Network product called GWLB (Gateway Load Balancer).

AWS GWLB will be tested in a pure AWS scenario but we will see at the end of this post some explanations on the integration of AWS GWLB with Aviatrix Firenet.

NOTE: later in the summer, a specific post on Aviatrix Firenet with AWS GWLB integration will be released. You then will be able to see the magnitude of AWS GWLB & Aviatrix Firenet combined.

WHAT is AWS GWLB?

AWS GWLB enables you to deploy, scale & manage virtual appliances (FW, IDS & IPS & DPI (Deep Packet Inspection) systems. It combines a transparent network GW & distributes traffic while scaling your virtual appliances with the demand.

AWS GWLB operates @3rd-4th layer of the OSI model. It listens for all IP packets across all ports & forwards traffic to the target group specified in the listener rule. It maintains stickiness of flows to a specific target appliance using 5-tuple (for TCP / UDP flows) or 3-tuple (for non-TCP / UDP flows)). AWS GWLB & its registered virtual appliance instances exchange application traffic using GENEVE protocol on port UDP/6081.

NOTE: GENEVE encapsulation is a network encapsulation protocol created by IETF in order to unify the efforts made by other initiatives like VXLAN & NVGRE.

AWS GWLB High Level

AWS GWLB acts as a L3 GW (no packet rewrite) but as well as a L4 Load Balancer (elasticity of the appliances, stickiness of flows in both directions).

It also provides Health Checks (if 1 of the appliances goes down, AWS GWLB starts rerouting the flows by encapsulating original traffic in a layer3 header called GENEVE).

AWS GWLB Components & Behaviour

Components

GWLB is a managed service for the customers to deploy & manage a fleet of horizontally scalable inline network virtual appliances in a transparent manner. This is the control plane where configurations & rules reside.

GWLBE is the data plane component of GWLB, pushing the packets & filtering the traffic. GWLBE is similar to AWS Private Link allowing you to place your service across accounts & VPC's without losing centralized control & administration.

GWLB has 2 sides. The side connecting GWLBE is called frontend. The side connecting appliances is called backend.

In the backend, GWLB acts as a Load Balancer for routing traffic flows through target appliances. GWLB ensures stickiness of the flows in both directions to target appliances & also reroutes flows if 1 appliance becomes unhealthy.

Packets sent from source to destination do not contain GWLB IP as destination IP@ but will be routed to GWLB due to Route Tables configuration. To achieve this transparent forwarding behaviour, GWLB encapsulates the original packet using Geneve encapsulation. Appliances also need to decapsulate Geneve TLV's (Type-Length-Value) pairs to process the original packet.

Appliance selection by GWLB for a specific flow

If TCP/UDP flow, GWLB selects a healthy appliance from a target group based on 5-tuple hash (Source/Destination IP, Protocol, Source/Destination Port)

If non TCP/UDP flow, GWLB uses 3-tuple (Source/Destination IP, Protocol)

GWLB & Health Checks

TCP, HTTP or HTTPS

Packet walk

Step1: GWLBE receives packets from source & sends packets to GWLB using AWS Private Link

Step2: GWLB uses 3/5-tuple to send to an appliance. GWLB then encapsulates original packet in Geneve header (outer IPv4 header is with Src IP = GWLB IP & Dst IP = Appliance IP) & adds Geneve TLV's (vpce-id, attach-id & flow cookie).

Step3: GWLB forwards encapsulated to an appliance.

Step4: The appliance:

- Decapsulates the packet

- Reads TLV's & saves flow cookie

- Drop or Allow packet (depending on Appliances rules)

- Encapsulates packet back & attaches same TLV's

Step5: Appliance sends packet back to GWLB

Step6: GWLB removes Geneve encapsulation, validates 3/5-tuple & flow cookie & forwards packet to GWLBE.

Step7: GWLBE sends back packet to source using AWS Private Link.

GWLB architecture patterns

Please see this link for more details.

Single VPC with North / South connectivity: represents a VPC requiring a FW or inline function to be placed between resources inside the VPC & outside the VPC (like Internet). This will be the scenario tested later as the focus is more testing the GWLB itself rather than complex scenarios.

Many VPC's with centralized North / South connectivity across AWS TGW

Many VPC's with centralized East / West connectivity across AWS TGW

NOTE: a single VPC requiring East / West inspection between subnets is not supported.

NOTE: VPC Ingress is also supported (see link above).

Best practices for deploying AWS GWLB

Tune TCP-keepalive or timeout values to support long-lived TCP flows

GWLB has a fixed idle timeout of 350 seconds for TCP flows & 120 seconds for non TCP flows. Once the idle timeout is reached for a flow, it is removed from GWLB connection state table. Then, the subsequent packets for that flow are treated as a new flow & may be sent to another healthy instance. This could result in the flow timing out on the client side.

It is recommend to configure TCP keepalive of client / server / FW to less than 120sec/350sec.

Enable 'Appliance mode' on AWS TGW to maintain flow symmetry for E/W inter-VPC traffic inspection

Thus ensures symmetry for forward / reverse flow the same appliance.

Understand when to use Cross-Zone Load Balancing

When enabled, GWLB distributes traffic across all registered & healthy targets regardless of which AZ's these targets are in.

Choose 1-arm or 2-arm deployment for egress traffic inspection

1-arm mode is the most common deployment method & eliminates dependency on appliance supporting NAT. It also increases performance by offloading NAT.

In 2-arm mode fashion, 1 ENI is in private subnet & the other in public subnet. This requires support from the vendor.

AWS GWLB vs NLB vs ALB

The table below is a very high level of comparison between the 3 types of Elastic Load Balancers that AWS provides & what are the specific use cases.

Single Spoke VPC with N/S connectivity (to Internet) configuration

Architecture build

1. Create Security VPC & its 2 associated subnets (in 2 different AZs)

2. Create Appliances in Security (1 per AZ). Special care about Security Group as follows: Health Check (here, HTTP) & Geneve (UDP/6081) must be allowed.

3. Creation of the Spoke VPC, its 4 associated subnets in 2 different AZs (2 for EC2 in each AZ & 2 for GWLBE in each AZ)

4. Creation of 5 RTs for Spoke VPC:

- Ingress & do 'Edge Association' with IGW

- 2 Public GWLBE RT (1 per GWLBE) & associate the corresponding subnet

- 2 Private RTs (1 per Spoke EC2) & associate the corresponding subnet

5. Create GWLB (EC2 - Load Balancer - GWLB)

6. Basic Network Configuration (mapping with the 2 AZs in Security VPC)

7. Create Target Group (Step1 / Security VPC; Health Check in HTTP)

8. Create Target Group (Step2 / Register Targets (ie the 2 appliances created in the Security VPC)

9. Associate Target Group to GWLB

10. Configure the Security appliances to allow GENEVE encapsulation, listen to HTTP (for health checks) & pass through the traffic without inspection as per link (good luck & Thanks Mihai :-) )

NOTE: we have to do the steps 10.1 & 10.2 as our EC2 is not a 3rd party Security appliance & does not talk GENEVE, nor respond to HTTP/80 or inspect traffic.

10.1 Downloading & installing necessary tools & programs

10.2 Geneve encapsulation, Traffic pass-through & Health Checks

Check the Target appliances answer the Health Checks

11. Create Endpoint Service (no acceptance required for the sake of this test)

12. Deploy Endpoints in Spoke VPC (search Service Name as specified in the below screenshot): 1 GWLBE per AZ

13. Update the Spoke VPC Route Tables (as below)

13.1 Ingress RT to return traffic to Spoke EC2 via GWLBE

13.2 Each Private RT to send traffic to Internet via GWLBE

13.3 Each PublicGWLBE RT to send traffic to Internet via IGW

14. Enable Cross Zone Load Balancing (Edit Attributes of GWLB)

Testing

1. Testing in initial mode (Spoke EC2 in AZ1 will talk with 1 of the 2 Appliances (here the one in AZ1)

TCPDUMP on Appliance AZ1

2. Testing in failure mode & validate 'cross zone load balancing' - Stop Appliance in AZ1

TCPDUMP on Appliance AZ2 (validating it takes over the traffic & validating then 'cross zone load balancing')

AVIATRIX & AWS GWLB

Definition

Aviatrix Firenet allows you to seamlessly deploy, scale, service chain & operate NG Firewalls & Security appliances (can be Check Point, F5, Fortinet, Palo Alto, etc..). Aviatrix Firenet connects virtual appliances to Cloud Transit Network. Traffic is routed through these security appliances for inspection to enforce corporate & regulatory security policies.

Aviatrix Firenet leverages AWS GWLB & GWLBE (endpoints) to scale & manage these appliances that support GENEVE encapsulation.

Advantages

GWLB handles symmetric hashing, connection draining & failover

High performance connection to virtual appliances

No need for SNAT

Allows customers to remove / add appliances for scaling or in response to health checks without impacting existing sessions

Aviatrix Controller automates attachment of AWS GWLB / GWLBE & all connected appliances

Automation of the propagation of routes to apps. (ie updates of VPC RTs to direct traffic through GWLB & on the appropriate connected FW appliance)

NOTE: if you can't wait for my post about Aviatrix implementation with AWS GWLB & Aviatrix Firenet, please check out this link.

TheSerialClouder

Wednesday, July 6, 2022

AWS Gateway Load Balancer (GWLB)