Un brin de réseau — L2 sur EVPN/MPLS

L2 sur EVPN/MPLS Lab

Nous testons ici l'implémentation de services L2 sur EVPN/MPLS. Il ne s'agit pas d'une étude conceptuelle poussée mais d'un simple lab. Avec des confs et des captures à l'appui, nous établirons quand même le lien avec certains passages pertinents des RFC.

Plan de l'article

La maquette
Configuration du backbone
Configuration d'un VPWS
Configuration d'un VPLS
- VLAN-Based
- VLAN-Aware Bundle
Configuration d'un VPLS E-Tree
- Avec jeu de RT
- Sans jeu de RT (Leaf-Indication)
- Ingress vs egress filtering

La maquette

Il s'agit d'un réseau opérateur miniature (un backbone IP/MPLS en l'occurrence) :

evpn-mpls-arch — La maquette utilisée sur GNS3 (avec un `RR` afin d'éviter le maillage complet iBGP entre les PE).

Nous allons configurer :

Un VPWS VLAN-Based entre CE1 et CE2 dans le VLAN 10
Un VPLS VLAN-Based entre CE1, CE2 et CE3 dans le VLAN 20
Un VPLS VLAN-Aware Bundle entre CE1, CE2 et CE3 dans les VLAN 20 et 30
Une topologie E-Tree sur les deux VPLS ci-dessus

Dans la terminologie du MEF, le VPWS s'appelle E-Line et le VPLS s'appelle E-LAN ou E-Tree selon la topologie. Je vous renvoie à l'article sur Carrier Ethernet car, en effet, l'EVPN se base beaucoup sur la terminologie du MEF pour la spécification des services.

L'image Arista est disponible sur leur page, moyennant un compte gratuit à l'heure où j'écris. De même pour l'image MikroTik et sans compte.

Configuration du backbone

Ci-dessous les configurations IP, OSPF, MPLS (LDP) et MP-BGP.


    # Arista vEOS 4.32.2F

    !
    hostname PE1
    !
    interface Ethernet1
       no switchport
       ip address 10.0.0.0/31
       ip ospf network point-to-point
       ip ospf area 0.0.0.0
    !
    interface Loopback0
       ip address 1.1.1.1/32
       ip ospf area 0.0.0.0
    !
    ip routing
    !
    mpls ip
    !
    mpls ldp
       router-id interface Loopback0
       no shutdown
    !
    router ospf 1
       router-id 1.1.1.1
    !
    router bgp 100
       router-id 1.1.1.1
       neighbor 9.9.9.9 remote-as 100
       neighbor 9.9.9.9 update-source Loopback0
       neighbor 9.9.9.9 send-community extended
       !
       address-family evpn
          neighbor 9.9.9.9 activate
          neighbor 9.9.9.9 encapsulation mpls next-hop-self source-interface Loopback0
       !
       address-family ipv4
          no neighbor 9.9.9.9 activate
    !


    # Arista vEOS 4.32.2F

    !
    hostname PE2
    !
    interface Ethernet1
       no switchport
       ip address 10.0.0.2/31
       ip ospf network point-to-point
       ip ospf area 0.0.0.0
    !
    interface Loopback0
       ip address 2.2.2.2/32
       ip ospf area 0.0.0.0
    !
    ip routing
    !
    mpls ip
    !
    mpls ldp
       router-id interface Loopback0
       no shutdown
    !
    router ospf 1
       router-id 2.2.2.2
    !
    router bgp 100
       router-id 2.2.2.2
       neighbor 9.9.9.9 remote-as 100
       neighbor 9.9.9.9 update-source Loopback0
       neighbor 9.9.9.9 send-community extended
       !
       address-family evpn
          neighbor 9.9.9.9 activate
          neighbor 9.9.9.9 encapsulation mpls next-hop-self source-interface Loopback0
       !
       address-family ipv4
          no neighbor 9.9.9.9 activate
    !


    # Arista vEOS 4.32.2F

    !
    hostname PE3
    !
    interface Ethernet1
       no switchport
       ip address 10.0.0.4/31
       ip ospf network point-to-point
       ip ospf area 0.0.0.0
    !
    interface Loopback0
       ip address 3.3.3.3/32
       ip ospf area 0.0.0.0
    !
    ip routing
    !
    mpls ip
    !
    mpls ldp
       router-id interface Loopback0
       no shutdown
    !
    router ospf 1
       router-id 3.3.3.3
    !
    router bgp 100
       router-id 3.3.3.3
       neighbor 9.9.9.9 remote-as 100
       neighbor 9.9.9.9 update-source Loopback0
       neighbor 9.9.9.9 send-community extended
       !
       address-family evpn
          neighbor 9.9.9.9 activate
          neighbor 9.9.9.9 encapsulation mpls next-hop-self source-interface Loopback0
       !
       address-family ipv4
          no neighbor 9.9.9.9 activate
    !


    # Arista vEOS 4.32.2F

    !
    hostname RR
    !
    interface Ethernet1
       no switchport
       ip address 10.0.0.6/31
       ip ospf network point-to-point
       ip ospf area 0.0.0.0
    !
    interface Loopback0
       ip address 9.9.9.9/32
       ip ospf area 0.0.0.0
    !
    ip routing
    !
    mpls ip
    !
    mpls ldp
       router-id interface Loopback0
       no shutdown
    !
    router ospf 1
       router-id 9.9.9.9
    !
    router bgp 100
       router-id 9.9.9.9
       neighbor 1.1.1.1 peer group RR-CLIENTS
       neighbor 2.2.2.2 peer group RR-CLIENTS
       neighbor 3.3.3.3 peer group RR-CLIENTS
       !
       neighbor RR-CLIENTS peer group
       neighbor RR-CLIENTS remote-as 100
       neighbor RR-CLIENTS update-source Loopback0
       neighbor RR-CLIENTS route-reflector-client
       neighbor RR-CLIENTS send-community extended
       !
       address-family evpn
          neighbor 1.1.1.1 activate
          neighbor 2.2.2.2 activate
          neighbor 3.3.3.3 activate
          !
          neighbor RR-CLIENTS encapsulation mpls next-hop-self source-interface Loopback0
       !
       address-family ipv4
          no neighbor RR-CLIENTS activate
    !


    # Cisco IOSv 15.9(3)M3

    !
    hostname P
    !
    interface Loopback0
     ip address 8.8.8.8 255.255.255.255
     ip ospf 1 area 0
    !
    mpls label protocol ldp
    mpls ldp router-id Loopback0
    !
    router ospf 1
     router-id 8.8.8.8
    !
    interface GigabitEthernet0/0
     ip address 10.0.0.1 255.255.255.254
     ip ospf network point-to-point
     ip ospf 1 area 0
     mpls ip
    !
    interface GigabitEthernet0/1
     ip address 10.0.0.3 255.255.255.254
     ip ospf network point-to-point
     ip ospf 1 area 0
     mpls ip
    !
    interface GigabitEthernet0/2
     ip address 10.0.0.5 255.255.255.254
     ip ospf network point-to-point
     ip ospf 1 area 0
     mpls ip
    !
    interface GigabitEthernet0/3
     ip address 10.0.0.7 255.255.255.254
     ip ospf network point-to-point
     ip ospf 1 area 0
     mpls ip
    !

Par MP-BGP, nous parlons plus précisément de BGP avec la capacité multiprotocol. Pour citer la RFC 7432 :


    In order for two BGP speakers to exchange labeled EVPN NLRI, they
    must use BGP Capabilities Advertisements to ensure that they both are
    capable of properly processing such NLRI.  This is done as specified
    in [RFC4760], by using capability code 1 (multiprotocol BGP) with an
    AFI of 25 (L2VPN) and a SAFI of 70 (EVPN).

Soit en capture :

cap-evpn-mpls-bgp-open — Le message BGP `OPEN` envoyé de `PE1` au `RR`.
Noter le couple AFI/SAFI valant `25/70` pour l'EVPN.

Le paquet IP de la capture ci-dessus—et tout message BGP des captures de la suite de l'article—aurait dû être transporté sur MPLS, puisque le backbone est configuré IP/MPLS. Cela provient soit d'un bug sur la version utilisée, soit d'un comportement spécifique à Arista. En l'occurrence, cela n'empêche pas le bon fonctionnement des services EVPN, comme on le verra.

Configuration d'un VPWS

Nous configurons ci-dessous un VPWS VLAN-Based entre PE1 et PE2 dans le VLAN 10 :


    # Arista vEOS 4.32.2F

    !
    interface Ethernet12
       no switchport
    !
    interface Ethernet12.10
       encapsulation vlan
          client dot1q 10 network client
    !
    patch panel
       patch VPWS-1
          connector 1 interface Ethernet12.10
          connector 2 pseudowire bgp vpws EVI-1 pseudowire PW-1
    !
    router bgp 100
       vpws EVI-1
          rd 1.1.1.1:1
          route-target import export evpn 100:1
          mpls control-word
          !
          pseudowire PW-1
             evpn vpws id local 1001 remote 1002
    !


    # Arista vEOS 4.32.2F

    !
    interface Ethernet12
       no switchport
    !
    interface Ethernet12.10
       encapsulation vlan
          client dot1q 10 network client
    !
    patch panel
       patch VPWS-1
          connector 1 interface Ethernet12.10
          connector 2 pseudowire bgp vpws EVI-1 pseudowire PW-1
    !
    router bgp 100
       vpws EVI-1
          rd 2.2.2.2:1
          route-target import export evpn 100:1
          mpls control-word
          !
          pseudowire PW-1
             evpn vpws id local 1002 remote 1001
    !


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE1
    /interface vlan
    add interface=ether1 name=vlan10 vlan-id=10
    /ip address
    add address=192.168.10.1/24 interface=vlan10


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE2
    /interface vlan
    add interface=ether1 name=vlan10 vlan-id=10
    /ip address
    add address=192.168.10.2/24 interface=vlan10


    # Arista vEOS 4.32.2F

    PE1#sho bgp evpn detail
    BGP routing table information for VRF default
    Router identifier 1.1.1.1, local AS number 100
    BGP routing table entry for auto-discovery 1001 0000:0000:0000:0000:0000, Route Distinguisher: 1.1.1.1:1
     Paths: 1 available
      Local
        - from - (0.0.0.0)
          Origin IGP, metric -, localpref -, weight 0, tag 0, valid, local, best
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 143788
    BGP routing table entry for auto-discovery 1002 0000:0000:0000:0000:0000, Route Distinguisher: 2.2.2.2:1
     Paths: 1 available
      Local
        2.2.2.2 from 9.9.9.9 (9.9.9.9)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Originator: 2.2.2.2, Cluster list: 9.9.9.9
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 256893


    # Arista vEOS 4.32.2F

    PE2#sho bgp evpn detail
    BGP routing table information for VRF default
    Router identifier 2.2.2.2, local AS number 100
    BGP routing table entry for auto-discovery 1001 0000:0000:0000:0000:0000, Route Distinguisher: 1.1.1.1:1
     Paths: 1 available
      Local
        1.1.1.1 from 9.9.9.9 (9.9.9.9)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Originator: 1.1.1.1, Cluster list: 9.9.9.9
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 143788
    BGP routing table entry for auto-discovery 1002 0000:0000:0000:0000:0000, Route Distinguisher: 2.2.2.2:1
     Paths: 1 available
      Local
        - from - (0.0.0.0)
          Origin IGP, metric -, localpref -, weight 0, tag 0, valid, local, best
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 256893


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for auto-discovery 1001 0000:0000:0000:0000:0000, Route Distinguisher: 1.1.1.1:1
     Paths: 1 available
      Local (Received from a RR-client)
        1.1.1.1 from 1.1.1.1 (1.1.1.1)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 143788
    BGP routing table entry for auto-discovery 1002 0000:0000:0000:0000:0000, Route Distinguisher: 2.2.2.2:1
     Paths: 1 available
      Local (Received from a RR-client)
        2.2.2.2 from 2.2.2.2 (2.2.2.2)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:1 TunnelEncap:tunnelTypeMpls L2 Attributes: control word
          MPLS label: 256893

Capture du control plane avec une route EVPN de type Ethernet Auto-Discovery :

cap-evpn-mpls-vpws-control-plane — `PE1` annonce une route Ethernet AD au `RR`.
On retrouve RD, RT, ID de VPWS et label alloué.

Capture du data plane avec un ping de CE1 à CE2 :

cap-evpn-mpls-vpws-data-plane — `CE2` répond au ping de `CE1`.
On retrouve la stack de labels, le `46` dédié au LSP et le `143788` au VPWS.

Détails et concepts

Des items de conf bien connus des L3VPN et VPLS ayant pour control plane BGP apparaissent dans les configurations : RD et RT. Pour citer la RFC 7432 :


    The policy attributes of EVPN are very similar to those of IP-VPN.
    An EVPN instance requires a Route Distinguisher (RD) that is unique
    per MAC-VRF and one or more globally unique Route Targets (RTs).

D'autres items apparaissent sur les PE : des ID de VPWS qualifiés de local et remote. Ils peuvent avoir des valeurs différentes—comme c'est le cas ici—ou, contrairement à ce que disent beaucoup d'articles, la même valeur, comme le précise la RFC 8214 :


    It should be noted that the same VPWS service instance
    identifier may be configured on both PEs.

C'est d'ailleurs probablement plus simple pour l'opérateur, qui doit alors juste maintenir comme inventaire un entier positif à incrémenter par VPWS. Noter que Cisco propose ainsi un raccourci pour configurer la même valeur d'ID :


    cisco-xr(config-l2vpn-xc-p2p)# neighbor evpn evi 1 ?
      service  Specify service ID (used as local and remote ac-id)
      target   Specify remote attachment circuit identifier

À vrai dire, l'ID a une portée locale à l'EVI. Autrement dit, un même ID peut s'utiliser dans plusieurs EVI. Juniper a rédigé un billet à ce sujet : « Due to each EVI having a unique route distinguisher and one or more route targets, duplicate vpws-service-ids will not affect the EVPN-VPWS service ».

Dans tous les cas, pour une même instance de VPWS, les valeurs doivent correspondre sur les deux PE qui y participent :


                          […]  For both EPL and EVPL services using a
    given VPWS service instance, the pair of PEs instantiating that VPWS
    service instance will each advertise a per-EVI Ethernet A-D route
    with its VPWS service instance identifier and will each be configured
    with the other PE's VPWS service instance identifier. When each PE
    has received the other PE's per-EVI Ethernet A-D route, the VPWS
    service instance is instantiated.

Mais à quoi servent ces ID de VPWS ? Les RT assurent déjà le rôle de connecteurs entre PE. Je soupçonne que cela permet d'avoir plusieurs VPWS dans la même EVI—je n'en vois pas d'application cependant (à ce jour). Ce snippet de conf Arista illustre bien ce « multiplexage » de VPWS à l'EVI :


    !
    router bgp 1
       vpws evi-1
          rd 10.2.2.2:2
          route-target import export evpn 0.0.0.0:1
          mpls control-word
          !
          pseudowire pw1
             evpn vpws id local 2001 remote 1001
          !
          pseudowire pw2
             evpn vpws id local 2002 remote 1002
    !

Configuration d'un VPLS

VLAN-Based

Nous configurons ci-dessous un service VLAN-Based :


    6.1.  VLAN-Based Service Interface

       With this service interface, an EVPN instance consists of only a
       single broadcast domain (e.g., a single VLAN).  Therefore, there is a
       one-to-one mapping between a VID on this interface and a MAC-VRF.
       Since a MAC-VRF corresponds to a single VLAN, it consists of a single
       bridge table corresponding to that VLAN.

L'EVI configurée entre PE1, PE2 et PE3 consiste en un unique VLAN, le 20 en l'occurrence.


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 1.1.1.1:2
          route-target both 100:2
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 2.2.2.2:2
          route-target both 100:2
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 3.3.3.3:2
          route-target both 100:2
          redistribute learned
    !


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE1
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.1/24 interface=vlan20


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE2
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.2/24 interface=vlan20


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE3
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.3/24 interface=vlan20

Le VLAN 20 est ici créé globalement : aucun autre client (opérateur inclus) ne peut donc réutiliser ce VLAN sur ce PE. Sur un réel PE de prod, on ne configurerait pas le VPLS ainsi mais plutôt via une subif, ce que la version vEOS d'Arista ne semble pas supporter—mais cela ne gène en rien pour un test.

Avant même que les CE émettent du trafic, chaque PE envoie au RR une route EVPN de type IMET (Inclusive Multicast Ethernet Tag) :

cap-evpn-mpls-vpls1-imet — `PE2` annonce une route IMET au `RR`.
On voit le label que les autres PE apposeront pour envoyer du trafic BUM à `PE2`.

Le message contient notamment le label MPLS, 423686 ici, que les autres PE doivent apposer pour envoyer du trafic BUM (Broadcast, Unknown Unicast, Multicast) à PE2—de l'ARP par exemple. En effet, on retient souvent de l'EVPN qu'il échange avec BGP les adresses MAC apprises, mais il faut également transporter ce trafic BUM, transport rendu possible par la notion de P-Tunnel (Provider-Tunnel) véhiculé par un attribut de la route IMET :


    In order to identify the P-tunnel used for sending broadcast, unknown
    unicast, or multicast traffic, the Inclusive Multicast Ethernet Tag
    route MUST carry a Provider Multicast Service Interface (PMSI) Tunnel
    attribute as specified in [RFC6514].

La RFC évoque par ailleurs plusieurs techniques pour l'envoi du trafic BUM :


    The PEs in a particular EVPN instance may use ingress replication,
    P2MP LSPs, or MP2MP LSPs to send unknown unicast, broadcast, or
    multicast traffic to other PEs.

Les PE appliquent ici par défaut l'ingress replication, qui consiste à envoyer une copie de la trame BUM vers chaque PE de l'EVI—à l'instar des VPLS classiques. Regardons les routes IMET que le RR a reçues de chaque PE :


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 1.1.1.1:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for imet 1.1.1.1, Route Distinguisher: 1.1.1.1:2
     Paths: 1 available
      Local (Received from a RR-client)
        1.1.1.1 from 1.1.1.1 (1.1.1.1)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534827
          PMSI Tunnel: Ingress Replication, MPLS Label: 8557232, Leaf Information Required: false, Tunnel ID: 1.1.1.1


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 2.2.2.2:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for imet 2.2.2.2, Route Distinguisher: 2.2.2.2:2
     Paths: 1 available
      Local (Received from a RR-client)
        2.2.2.2 from 2.2.2.2 (2.2.2.2)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 423686
          PMSI Tunnel: Ingress Replication, MPLS Label: 6778976, Leaf Information Required: false, Tunnel ID: 2.2.2.2


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 3.3.3.3:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for imet 3.3.3.3, Route Distinguisher: 3.3.3.3:2
     Paths: 1 available
      Local (Received from a RR-client)
        3.3.3.3 from 3.3.3.3 (3.3.3.3)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659891
          PMSI Tunnel: Ingress Replication, MPLS Label: 10558256, Leaf Information Required: false, Tunnel ID: 3.3.3.3

Par conséquent, si PE1 reçoit une trame BUM de CE1, il répliquera la trame vers PE2 et PE3 en apposant respectivement les labels 423686 et 659891 :

cap-evpn-mpls-vpls1-bum — `PE1` replique la trame BUM reçue de `CE1` (requête ARP) vers chaque PE participant à l'EVI en apposant le label respectif.

Nous voyons ensuite la réponse de CE2 dans le label 534835 car entre temps, PE1, ayant appris l'adresse MAC de CE1, l'a annoncée au RR avec une route EVPN de type MAC/IP :

cap-evpn-mpls-vpls1-mac — `PE1` annonce l'adresse MAC de `CE1` avec le label alloué via une route MAC/IP.

Une telle route peut véhiculer l'adresse IP associée à la MAC—ce qui n'est pas le cas ici, n'ayant pas réussi à le tester—afin de diminuer le trafic ARP/ND :


    The IP Address field in the MAC/IP Advertisement route may optionally
    carry one of the IP addresses associated with the MAC address.  This
    provides an option that can be used to minimize the flooding of ARP
    or Neighbor Discovery (ND) messages over the MPLS network and to
    remote CEs.

Les PE assurent alors le rôle de proxy ARP/ND, puisqu'ils maintiennent un cache ARP/ND et répondent à la place des CE.

La réduction du trafic ARP/ND, voire sa suppression, constitue un point clé de l'EVPN. DE-CIX a rédigé un retour d'expérience sur son passage à l'EVPN (Peering LAN 2.0 Introduction of EVPN at DE-CIX) et a participé activement à la RFC 9161, qui détaille les aspects du proxy ARP/ND sur un PE.

Remarquer l'Ethernet Tag ID positionné à 0, conformément au §6.1 sur le service VLAN-Based :


                         […]  In such scenarios, the Ethernet frames
    transported over an MPLS/IP network SHOULD remain tagged with the
    originating VID, and a VID translation MUST be supported in the data
    path and MUST be performed on the disposition PE.  The Ethernet Tag
    ID in all EVPN routes MUST be set to 0.

De plus, l'extrait nous dit que dans ce type de service, le VLAN client peut ou non être conservé lors du transport de la trame. Dans la capture ci-dessous—on le voyait aussi dans celle de l'ARP—qui correspond à un ping de CE1 vers CE2, le tag n'est pas présent (probable choix d'implémentation d'Arista) :

cap-evpn-mpls-vpls1-ping — Ping de `CE1` vers `CE2` (noter que le VLAN client `20` n'est pas conservé).
`PE1` appose le label alloué par `PE2` pour l'adresse MAC de `CE2`, ici `423687`.

Si l'on regarde les adresses MAC que le RR a reçues de chaque PE :


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 1.1.1.1:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for mac-ip 0c83.7cdc.0001, Route Distinguisher: 1.1.1.1:2
     Paths: 1 available
      Local (Received from a RR-client)
        1.1.1.1 from 1.1.1.1 (1.1.1.1)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534835 ESI: 0000:0000:0000:0000:0000
    BGP routing table entry for imet 1.1.1.1, Route Distinguisher: 1.1.1.1:2
     Paths: 1 available
      Local (Received from a RR-client)
        1.1.1.1 from 1.1.1.1 (1.1.1.1)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534827
          PMSI Tunnel: Ingress Replication, MPLS Label: 8557232, Leaf Information Required: false, Tunnel ID: 1.1.1.1


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 2.2.2.2:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for mac-ip 0cb5.466e.0001, Route Distinguisher: 2.2.2.2:2
     Paths: 1 available
      Local (Received from a RR-client)
        2.2.2.2 from 2.2.2.2 (2.2.2.2)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 423687 ESI: 0000:0000:0000:0000:0000
    BGP routing table entry for imet 2.2.2.2, Route Distinguisher: 2.2.2.2:2
     Paths: 1 available
      Local (Received from a RR-client)
        2.2.2.2 from 2.2.2.2 (2.2.2.2)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 423686
          PMSI Tunnel: Ingress Replication, MPLS Label: 6778976, Leaf Information Required: false, Tunnel ID: 2.2.2.2


    # Arista vEOS 4.32.2F

    RR#sho bgp evpn rd 3.3.3.3:2 detail
    BGP routing table information for VRF default
    Router identifier 9.9.9.9, local AS number 100
    BGP routing table entry for mac-ip 0cfe.6f39.0000, Route Distinguisher: 3.3.3.3:2
     Paths: 1 available
      Local (Received from a RR-client)
        3.3.3.3 from 3.3.3.3 (3.3.3.3)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659821 ESI: 0000:0000:0000:0000:0000
    BGP routing table entry for imet 3.3.3.3, Route Distinguisher: 3.3.3.3:2
     Paths: 1 available
      Local (Received from a RR-client)
        3.3.3.3 from 3.3.3.3 (3.3.3.3)
          Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659891
          PMSI Tunnel: Ingress Replication, MPLS Label: 10558256, Leaf Information Required: false, Tunnel ID: 3.3.3.3

Enfin, remarquons que, comme dans les VPN IP sur BGP/MPLS, plusieurs modes d'allocation des labels existent :


    A PE may advertise the same single EVPN label for all MAC addresses
    in a given MAC-VRF.  This label assignment is referred to as a per
    MAC-VRF label assignment.  Alternatively, a PE may advertise a unique
    EVPN label per <MAC-VRF, Ethernet tag> combination.  This label
    assignment is referred to as a per <MAC-VRF, Ethernet tag> label
    assignment.  As a third option, a PE may advertise a unique EVPN
    label per <ESI, Ethernet tag> combination.  This label assignment is
    referred to as a per <ESI, Ethernet tag> label assignment.  As a
    fourth option, a PE may advertise a unique EVPN label per MAC
    address.  This label assignment is referred to as a per MAC label
    assignment.  All of these label assignment methods have their
    trade-offs.  The choice of a particular label assignment methodology
    is purely local to the PE that originates the route.

Dans ce lab, les PE implémentent par défaut la première option, à savoir un label MPLS pour toute l'EVI—la section suivante le mettant en évidence.

VLAN-Aware Bundle

Nous configurons ci-dessous un service VLAN-Aware Bundle :


    6.3.  VLAN-Aware Bundle Service Interface

       With this service interface, an EVPN instance consists of multiple
       broadcast domains (e.g., multiple VLANs) with each VLAN having its
       own bridge table -- i.e., multiple bridge tables (one per VLAN) are
       maintained by a single MAC-VRF corresponding to the EVPN instance.

Dans la même EVI impliquant ici PE1, PE2 et PE3, on ajoute le VLAN 30 en plus du 20, chaque VLAN ayant sa propre bridge table. Cela contraste avec le service VLAN Bundle où il y a plusieurs VLAN mais dans une même bridge table—ce qui impose une unicité des adresses MAC tout VLAN confondu.


    # Arista vEOS 4.32.2F

    !
    vlan 20,30
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 1.1.1.1:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20,30
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 2.2.2.2:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20,30
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 3.3.3.3:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE1
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.1/24 interface=vlan20

    # Workaround: use a bridge interface to have a vlan30 interface with a different MAC address

    /interface vlan
    add interface=ether2 name=vlan30 vlan-id=30
    /interface bridge
    add admin-mac=0C:83:7C:DC:00:99 auto-mac=no name=br30
    /interface bridge port
    add bridge=br30 interface=vlan30
    /ip address
    add address=192.168.30.1/24 interface=br30


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE2
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.2/24 interface=vlan20

    # Workaround: use a bridge interface to have a vlan30 interface with a different MAC address

    /interface vlan
    add interface=ether2 name=vlan30 vlan-id=30
    /interface bridge
    add admin-mac=0C:B5:46:6E:00:99 auto-mac=no name=br30
    /interface bridge port
    add bridge=br30 interface=vlan30
    /ip address
    add address=192.168.30.2/24 interface=br30


    # MikroTik CHR 7.15.3

    /system identity
    set name=CE3
    /interface vlan
    add interface=ether2 name=vlan20 vlan-id=20
    /ip address
    add address=192.168.20.3/24 interface=vlan20

    # Workaround: use a bridge interface to have a vlan30 interface with a different MAC address

    /interface vlan
    add interface=ether2 name=vlan30 vlan-id=30
    /interface bridge
    add admin-mac=0C:FE:6F:39:00:99 auto-mac=no name=br30
    /interface bridge port
    add bridge=br30 interface=vlan30
    /ip address
    add address=192.168.30.3/24 interface=br30

Comme précédemment, avant même que les CE émettent du trafic, chaque PE envoie au RR une route EVPN de type IMET—mais une par VLAN cette fois :

cap-evpn-mpls-vpls2-imet20 — `PE2` annonce une route IMET au `RR` pour le VLAN `20`.
On voit le label que les autres PE apposeront pour envoyer du trafic BUM à `PE2` dans ce VLAN.

cap-evpn-mpls-vpls2-imet30 — `PE2` annonce une route IMET au `RR` pour le VLAN `30`.
On voit le label que les autres PE apposeront pour envoyer du trafic BUM à `PE2` dans ce VLAN.

Nous constatons que le même label MPLS, 423686 ici, a été alloué par PE2 pour le trafic BUM dans les VLAN 20 et 30, comportement conforme à la RFC :


    Broadcast, unknown unicast, or multicast (BUM) traffic is sent only
    to the CEs in a given broadcast domain; however, the broadcast
    domains within an EVI either MAY each have their own P-Tunnel or MAY
    share P-Tunnels -- e.g., all of the broadcast domains in an EVI MAY
    share a single P-Tunnel.

Le même P-Tunnel peut donc être partagé par plusieurs VLAN, ce que la capture ci-dessous d'un ping de CE1 vers CE3 dans chaque VLAN montre bien :

cap-evpn-mpls-vpls2-bum — `PE1` replique la trame BUM reçue de `CE1` vers chaque PE participant à l'EVI en apposant le label respectif.
Le même label P-Tunnel est utilisé pour les deux VLAN (par exemple `423686` pour la réplication vers `PE2`).

Nous voyons ensuite la réponse de PE3 dans le label 534835 car entre temps, PE1, ayant appris l'adresse MAC de CE1, l'a annoncée au RR avec une route EVPN de type MAC/IP :

cap-evpn-mpls-vpls2-mac20 — `PE1` annonce au `RR` l'adresse MAC de `CE1` dans le VLAN `20` avec le label alloué via une route MAC/IP.

cap-evpn-mpls-vpls2-mac30 — `PE1` annonce au `RR` l'adresse MAC de `CE1` dans le VLAN `30` avec le label alloué via une route MAC/IP.

Nous constatons, de nouveau, que le même label MPLS, 534835 ici, a été alloué par PE1 pour deux adresses MAC différentes (et de deux VLAN différents). C'est car le PE Arista applique par défaut le per MAC-VRF label assignment. Ces informations se retrouvent évidemment sur PE3 :


    # Arista vEOS 4.32.2F

    PE3#sho bgp evpn rd 1.1.1.1:2 detail

    # Même label alloué par PE1 pour les routes MAC/IP (tout VLAN confondu, tout MAC confondue)

    BGP routing table entry for mac-ip 20 0c83.7cdc.0001, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534835 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for mac-ip 30 0c83.7cdc.0099, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534835 ESI: 0000:0000:0000:0000:0000

    # Même label alloué par PE1 pour les routes IMET (tout VLAN confondu)

    BGP routing table entry for imet 20 1.1.1.1, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534827

    BGP routing table entry for imet 30 1.1.1.1, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 534827

Remarquer l'Ethernet Tag ID positionné, cette fois, à chaque VLAN respectif (20 et 30), conformément au §6.3 sur le service VLAN-Aware Bundle :


    In the case where a single VLAN is represented by a single VID and
    thus no VID translation is required, an MPLS-encapsulated packet MUST
    carry that VID.  The Ethernet Tag ID in all EVPN routes MUST be set
    to that VID.

De plus, l'extrait nous dit que dans ce type de service, le tag VLAN du client doit être conservé lors du transport de la trame—on le voyait bien dans la capture précédente montrant l'ARP.

Configuration d'un VPLS E-Tree

Les VPLS configurés précédemment suivaient la topologie full-mesh : chaque CE pouvait communiquer avec tout autre CE. Les spécifications de l'EVPN, recensées dans la RFC 7209, mentionnent également la topologie hub-and-spoke aussi appelée E-Tree :


    One example of this is E-Tree topology, where one or more sites in
    the VPN are roots and the others are leaves.  The roots are allowed
    to send traffic to other roots and to leaves, while leaves can
    communicate only with the roots.  The solution MUST provide the
    ability to support E-Tree topology.

La topologie E-Tree s'applique bien, par exemple, dans un contexte de réseau d'accès Ethernet où le BNG (root) peut communiquer avec les CPE (leaves), les CPE avec le BNG, mais pas les CPE entre eux—comme détaillé dans cette section de l'article sur l'IPv6 appliqué au BNG.

Peu de document en parle mais un VPLS classique peut tout à fait être E-Tree ! Je le démontre dans l'article VPLS LDP vs BGP. D'ailleurs, la première solution que l'on va voir est implémentée de la même manière. L'EVPN permet une granularité plus fine illustrée dans la deuxième solution.

La RFC 8317 dédiée à la topologie E-Tree dans l'EVPN identifie trois scénarios de différentes granularités—nous illustrons dans la suite les deux premiers :


    3.  E-Tree Scenarios  . . . . . . . . . . . . . . . . . . . . . .   6
      3.1.  Scenario 1: Leaf or Root Site(s) per PE . . . . . . . . .   6
      3.2.  Scenario 2: Leaf or Root Site(s) per AC . . . . . . . . .   7
      3.3.  Scenario 3: Leaf or Root Site(s) per MAC Address  . . . .   8

Un AC (Attachment Circuit) correspond à une terminaison dans l'EVI : un port physique, un ou plusieurs VLAN (comme dans les sections précédentes).

Avec jeu de RT

Dans le premier scénario, l'EVI sur un PE se veut exclusivement root ou leaf. La topologie E-Tree s'implémente avec un simple jeu d'import-export de RT :


    In this scenario, tailored BGP Route Target (RT) import/export
    policies among the PEs belonging to the same EVI can be used to
    prevent communication among Leaf PEs.  To prevent communication among
    Leaf ACs connected to the same PE and belonging to the same EVI,
    split-horizon filtering is used to block traffic from one Leaf AC to
    another Leaf AC on a MAC-VRF for a given E-Tree EVI.

De plus, l'extrait explique que pour isoler les AC de rôle leaf sur un même PE, ce dernier peut employer un filtrage appelé split-horizon. Huawei, par exemple, le propose avec la commande isolate spoken activable à l'EVI. Cisco l'active automatiquement :


    Split-horizon group between the ACs (leaf) on same EVI is enabled automatically.

Configuration du jeu de RT :


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 1.1.1.1:2
          route-target import 100:2
          route-target export 100:3
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 2.2.2.2:2
          route-target import 100:2
          route-target export 100:3
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 3.3.3.3:2
          route-target import 100:3
          route-target export 100:2
          redistribute learned
    !

Soit en image :

evpn-mpls-rt — Le RT root `100:2` est importé par les leaves.
Le RT leaf `100:3` est importé par le(s) root(s).

Le control plane (messages BGP UPDATE) ne véhicule que le RT exporté. Le RT importé reste local au PE, qui s'en sert pour charger ou non les routes reçues.

Le RT 100:3 n'étant pas importé par PE1 et PE2, les adresses MAC portant ce RT ne se retrouvent pas dans leur bridge table (elles se retrouvent quand même dans leur database BGP) :


    # Arista vEOS 4.32.2F

    # L'adresse MAC de CE2 (portant le RT 100:3) n'a pas été importée dans la bridge table
    #   seule celle de CE3 (portant le RT 100:2) a été importée

    PE1#show mac address-table vlan 20

    Vlan    Mac Address       Type        Ports      Moves   Last Move
    ----    -----------       ----        -----      -----   ---------
      20    0c83.7cdc.0001    DYNAMIC     Et11       1       0:12:06 ago
      20    0cfe.6f39.0000    DYNAMIC     Mt1        1       0:08:00 ago

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE2 portent le RT 100:3, qui n'est pas importé par PE1

    PE1#sho bgp evpn rd 2.2.2.2:2 detail

    BGP routing table entry for mac-ip 0cb5.466e.0001, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:3 TunnelEncap:tunnelTypeMpls
          MPLS label: 423720 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 2.2.2.2, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:3 TunnelEncap:tunnelTypeMpls
          MPLS label: 423739

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE3 portent le RT 100:2, qui est importé par PE1

    PE1#sho bgp evpn rd 3.3.3.3:2 detail

    BGP routing table entry for mac-ip 0cfe.6f39.0000, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659821 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 3.3.3.3, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659891


    # MikroTik CHR 7.15.3

    [admin@CE1] > ip/arp/print
    Flags: D - DYNAMIC; C - COMPLETE
    Columns: ADDRESS, MAC-ADDRESS, INTERFACE, STATUS
    #    ADDRESS       MAC-ADDRESS        INTERFACE  STATUS
    0 D  192.168.20.2                     vlan20     failed
    1 DC 192.168.20.3  0C:FE:6F:39:00:00  vlan20     reachable

    # CE1 (leaf) ne peut pas ping CE2 (leaf)

    [admin@CE1] > ping 192.168.20.2 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.2                                                 timeout
        1 192.168.20.2                                                 timeout
        2 192.168.20.2                                                 timeout
        sent=3 received=0 packet-loss=100%

    # CE1 (leaf) peut ping CE3 (root)

    [admin@CE1] > ping 192.168.20.3 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.3                               56  64 3ms908us
        1 192.168.20.3                               56  64 13ms311us
        2 192.168.20.3                               56  64 3ms901us
        sent=3 received=3 packet-loss=0% min-rtt=3ms901us avg-rtt=7ms40us max-rtt=13ms311us

Mais certains opérateurs préfèrent n'avoir qu'un seul RT par EVI, pour simplifier la configuration. Si le besoin de granularité plus fine motive la solution de la section suivante, elle peut également s'appliquer dans ce cas :


    For this scenario, if it is desired to use only a single RT per EVI
    (just like E-LAN services in [RFC7432]), then approach B in Scenario
    2 (described below) needs to be used.

Sans jeu de RT (Leaf-Indication)

Arista implémente ici une solution différente de la RFC 8317 et décrite dans le draft-bamberger-bess-imet-filter-evpn-etree-vxlan auquel Google a aussi contribué.

La solution du draft se veut plus logique et optimale à mon sens, car elle filtre le trafic BUM sur le PE d'entrée et non de sortie, contrairement à la solution de la RFC. L'aspect sous-optimal de cette dernière provient sans doute de sa volonté à couvrir tous les scénarios E-Tree identifiés. Or, le dernier scénario—granularité à l'adresse MAC—n'est à ma connaissance pas si répandu : dommage donc de généraliser une approche sous-optimale qui, de plus, ne fonctionne pas sur le VXLAN (motivation principale du draft). Plus d'infos en fin d'article

La granularité précédente ne s'avère parfois pas suffisante, notamment quand une même EVI sur un même PE contient à la fois des AC de rôle root et leaf. La RFC évoque alors une solution qui consiste à « colorer » les annonces BGP d'une Leaf-Indication via une extended community :


    In order to recognize the association of a destination MAC address to
    a Leaf or Root AC and, thus, support ingress filtering on the ingress
    PE with both Leaf and Root ACs, MAC addresses need to be colored with
    a Root or Leaf-Indication before advertising to other PEs.

Soit en image :

evpn-mpls-color — Les routes IMET et MAC/IP annoncées par `PE1` et `PE2` portent la Leaf-Indication.
Celles annoncées par `PE3` ne la portent pas (rôle de root par défaut).

Nous allons illustrer cette Leaf-Indication sur deux cas :

Le VPLS VLAN-Based précédent mais avec un seul RT—pour illustrer qu'on peut ainsi se passer du jeu d'import-export de RT
Le VPLS VLAN-Aware Bundle précédent mais en ajoutant des contraintes—pour illustrer des AC root et leaf dans une même EVI d'un même PE

D'après la RFC, le jeu de RT détaillé précédemment pourrait en réalité suffire (sans Leaf-Indication) mais avec une implémentation plus lourde. Elle ne recommande donc pas la solution—voir le §3.2 et l'appendix A. Il peut également se combiner à la Leaf-Indication, mais cela ajoute de la complexité.

Cas 1

Un seul RT mais en indiquant le rôle sur les VLAN (les AC) :


    # Arista vEOS 4.32.2F

    !
    vlan 20
       e-tree role leaf
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 1.1.1.1:2
          route-target both 100:2
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
       e-tree role leaf
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 2.2.2.2:2
          route-target both 100:2
          redistribute learned
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
       # rôle de root par défaut
    !
    interface Ethernet11
       switchport trunk allowed vlan 20
       switchport mode trunk
    !
    router bgp 100
       vlan 20
          rd 3.3.3.3:2
          route-target both 100:2
          redistribute learned
    !

Les routes EVPN de type IMET et MAC/IP de PE1 et PE2 portent alors une Leaf-Indication :

cap-evpn-mpls-etree2-imet — `PE2` annonce une route IMET au `RR`—avec la Leaf-Indication.
On voit le label que les autres PE (de rôle root) apposeront pour envoyer du trafic BUM à `PE2`.

cap-evpn-mpls-etree2-mac — `PE2` annonce au `RR` l'adresse MAC de `CE2` via une route MAC/IP—avec la Leaf-Indication.

Avec cette indication, PE1, de rôle leaf dans cette EVI, sait qu'il ne faut pas envoyer le trafic BUM de CE1 vers PE2. De même, l'adresse MAC de CE2 étant marquée leaf, PE1 sait qu'il ne faut pas envoyer le trafic known unicast à destination de cette adresse MAC.


    # Arista vEOS 4.32.2F

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE2 portent la Leaf-Indication

    PE1#sho bgp evpn rd 2.2.2.2:2 detail

    BGP routing table entry for mac-ip 0cb5.466e.0001, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423737 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 2.2.2.2, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423729

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE3 ne portent pas la Leaf-Indication (⇒ root par défaut)

    PE1#sho bgp evpn rd 3.3.3.3:2 detail

    BGP routing table entry for mac-ip 0cfe.6f39.0000, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659854 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 3.3.3.3, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659873


    # Arista vEOS 4.32.2F

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE1 portent la Leaf-Indication

    PE2#sho bgp evpn rd 1.1.1.1:2 detail

    BGP routing table entry for mac-ip 0c83.7cdc.0001, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 534765 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 1.1.1.1, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 534784

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE3 ne portent pas la Leaf-Indication (⇒ root par défaut)

    PE2#sho bgp evpn rd 3.3.3.3:2 detail

    BGP routing table entry for mac-ip 0cfe.6f39.0000, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659854 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 3.3.3.3, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659873


    # Arista vEOS 4.32.2F

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE1 portent la Leaf-Indication

    PE3#sho bgp evpn rd 1.1.1.1:2 detail

    BGP routing table entry for mac-ip 0c83.7cdc.0001, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 534765 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 1.1.1.1, Route Distinguisher: 1.1.1.1:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 534784

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE2 portent la Leaf-Indication

    PE3#sho bgp evpn rd 2.2.2.2:2 detail

    BGP routing table entry for mac-ip 0cb5.466e.0001, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423737 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 2.2.2.2, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423729


    # MikroTik CHR 7.15.3

    [admin@CE1] > ip/arp/print
    Flags: D - DYNAMIC; C - COMPLETE
    Columns: ADDRESS, MAC-ADDRESS, INTERFACE, STATUS
    #    ADDRESS       MAC-ADDRESS        INTERFACE  STATUS
    0 D  192.168.20.2                     vlan20     failed
    1 DC 192.168.20.3  0C:FE:6F:39:00:00  vlan20     reachable

    # CE1 (leaf) ne peut pas ping CE2 (leaf)

    [admin@CE1] > ping 192.168.20.2 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.2                                                 timeout
        1 192.168.20.2                                                 timeout
        2 192.168.20.2                                                 timeout
        sent=3 received=0 packet-loss=100%

    # CE1 (leaf) peut ping CE3 (root)

    [admin@CE1] > ping 192.168.20.3 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.3                               56  64 30ms984us
        1 192.168.20.3                               56  64 3ms627us
        2 192.168.20.3                               56  64 3ms645us
        sent=3 received=3 packet-loss=0% min-rtt=3ms627us avg-rtt=12ms752us max-rtt=30ms984us


    # MikroTik CHR 7.15.3

    [admin@CE2] > ip/arp/print
    Flags: D - DYNAMIC; C - COMPLETE
    Columns: ADDRESS, MAC-ADDRESS, INTERFACE, STATUS
    #    ADDRESS       MAC-ADDRESS        INTERFACE  STATUS
    0 D  192.168.20.1                     vlan20     failed
    1 DC 192.168.20.3  0C:FE:6F:39:00:00  vlan20     reachable

    # CE2 (leaf) ne peut pas ping CE1 (leaf)

    [admin@CE2] > ping 192.168.20.1 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.1                                                 timeout
        1 192.168.20.1                                                 timeout
        2 192.168.20.1                                                 timeout
        sent=3 received=0 packet-loss=100%

    # CE2 (leaf) peut ping CE3 (root)

    [admin@CE2] > ping 192.168.20.3 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.3                               56  64 5ms941us
        1 192.168.20.3                               56  64 3ms903us
        2 192.168.20.3                               56  64 3ms412us
        sent=3 received=3 packet-loss=0% min-rtt=3ms412us avg-rtt=4ms418us max-rtt=5ms941us


    # MikroTik CHR 7.15.3

    [admin@CE3] > ip/arp/print
    Flags: D - DYNAMIC; C - COMPLETE
    Columns: ADDRESS, MAC-ADDRESS, INTERFACE, STATUS
    #    ADDRESS       MAC-ADDRESS        INTERFACE  STATUS
    0 DC 192.168.20.1  0C:83:7C:DC:00:01  vlan20     reachable
    1 DC 192.168.20.2  0C:B5:46:6E:00:01  vlan20     reachable

Dans la capture ci-dessous, PE1 (leaf) réplique le trafic BUM de CE1 vers PE3 (root) uniquement, et non PE2 (leaf) :

cap-evpn-mpls-etree2-bum-pe1 — `PE1` ne réplique le trafic BUM de `CE1` (trois tentatives d'ARP pour résoudre, en vain, l'adresse IP de `CE2`) que vers `PE3` (label `659873`).

Dans la capture ci-dessous, PE3 (root) réplique le trafic BUM de CE3 vers PE1 et PE2 (tous deux leaf) :

cap-evpn-mpls-etree2-bum-pe3 — `PE3` réplique le trafic BUM de `CE3` (de l'ARP pour résoudre l'adresse IP de `CE1`) vers `PE1` (label `534784`) et `PE2` (label `423729`).

Cas 2

Nous reprenons ici le service VLAN-Aware Bundle précédent mais en ajoutant les contraintes suivantes :

VLAN 20—tous les CE peuvent communiquer entre eux (topologie full-mesh donc)
VLAN 30—application d'une topologie E-Tree : CE1 et CE2 sont leaves et CE3 est root

Ainsi, nous obtenons des AC de différents rôles dans une même EVI d'un même PE.


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    vlan 30
       e-tree role leaf
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 1.1.1.1:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    vlan 30
       e-tree role leaf
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 2.2.2.2:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !


    # Arista vEOS 4.32.2F

    !
    vlan 20
    !
    vlan 30
       # rôle de root par défaut
    !
    interface Ethernet11
       switchport trunk allowed vlan 20,30
       switchport mode trunk
    !
    router bgp 100
      vlan-aware-bundle MY-BUNDLE
         rd 3.3.3.3:2
         route-target both 100:2
         redistribute learned
         vlan 20,30
    !

Comme précédemment, avant même que les CE émettent du trafic, chaque PE envoie au RR une route EVPN de type IMET—mais une par VLAN cette fois et avec la Leaf-Indication pour le VLAN 30 :

cap-evpn-mpls-etree3-imet20 — `PE2` annonce une route IMET au `RR`—sans la Leaf-Indication.
On voit le label que les autres PE apposeront pour envoyer du trafic BUM à `PE2` dans ce VLAN.

cap-evpn-mpls-etree3-imet30 — `PE2` annonce une route IMET au `RR`—avec la Leaf-Indication.
On voit le label que les autres PE (de rôle root) apposeront pour envoyer du trafic BUM à `PE2` dans ce VLAN.

Nous constatons que le même label MPLS, 423670 ici, a été alloué par PE2 pour le trafic BUM sur les VLAN 20 et 30, même si leur rôle diffère (root pour le premier, leaf pour le deuxième). Exemple en capture :

cap-evpn-mpls-etree3-bum-pe1 — Le trafic BUM de `CE1` dans le VLAN `20` est répliqué par `PE1` vers `PE2` et `PE3` (car topologie full-mesh).
Le trafic BUM de `CE1` dans le VLAN `30` est répliqué par `PE1` vers `PE3` seulement (car topologie E-Tree).

Les routes EVPN de type MAC/IP annoncées par PE1 au RR (celle dans le VLAN 20 porte la Leaf-Indication, celle dans le VLAN 30 ne la porte pas) :

cap-evpn-mpls-etree3-mac20 — `PE1` annonce au `RR` l'adresse MAC de `CE1` dans le VLAN `20` via une route MAC/IP—sans la Leaf-Indication.

cap-evpn-mpls-etree3-mac30 — `PE1` annonce au `RR` l'adresse MAC de `CE1` dans le VLAN `30` via une route MAC/IP—avec la Leaf-Indication.

Les outputs correspondant :


    # Arista vEOS 4.32.2F

    # L'adresse MAC de CE2 dans le VLAN 20 (root) a été importée dans la bridge table
    #    avec celle de CE3 dans le VLAN 20 (root également)

    PE1#show mac address-table vlan 20

    Vlan    Mac Address       Type        Ports      Moves   Last Move
    ----    -----------       ----        -----      -----   ---------
      20    0c83.7cdc.0001    DYNAMIC     Et11       1       0:06:05 ago
      20    0cb5.466e.0001    DYNAMIC     Mt1        1       0:05:41 ago
      20    0cfe.6f39.0000    DYNAMIC     Mt1        1       0:05:39 ago

    # L'adresse MAC de CE2 dans le VLAN 30 (leaf) n'a pas été importée dans la bridge table
    #   seule celle de CE3 dans le VLAN 30 (root) a été importée

    PE1#show mac address-table vlan 30

    Vlan    Mac Address       Type        Ports      Moves   Last Move
    ----    -----------       ----        -----      -----   ---------
      30    0c83.7cdc.0099    DYNAMIC     Et11       1       0:04:25 ago
      30    0cfe.6f39.0099    DYNAMIC     Mt1        1       0:04:25 ago

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE2 (VLAN 30 tagué leaf)

    PE1#sho bgp evpn rd 2.2.2.2:2 detail

    BGP routing table entry for mac-ip 20 0cb5.466e.0001, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 423716 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for mac-ip 30 0cb5.466e.0099, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423716 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 20 2.2.2.2, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 423670

    BGP routing table entry for imet 30 2.2.2.2, Route Distinguisher: 2.2.2.2:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls EvpnEtree:L:0
          MPLS label: 423670

    # Les routes EVPN (IMET et MAC/IP) annoncées par PE3

    PE1#sho bgp evpn rd 3.3.3.3:2 detail

    BGP routing table entry for mac-ip 20 0cfe.6f39.0000, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659821 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for mac-ip 30 0cfe.6f39.0099, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659821 ESI: 0000:0000:0000:0000:0000

    BGP routing table entry for imet 20 3.3.3.3, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659820

    BGP routing table entry for imet 30 3.3.3.3, Route Distinguisher: 3.3.3.3:2
          Extended Community: Route-Target-AS:100:2 TunnelEncap:tunnelTypeMpls
          MPLS label: 659820


    # MikroTik CHR 7.15.3

    [admin@CE1] > ip/arp/print
    Flags: D - DYNAMIC; C - COMPLETE
    Columns: ADDRESS, MAC-ADDRESS, INTERFACE, STATUS
    #    ADDRESS       MAC-ADDRESS        INTERFACE  STATUS
    0 DC 192.168.20.2  0C:B5:46:6E:00:01  vlan20     reachable
    2 DC 192.168.20.3  0C:FE:6F:39:00:00  vlan20     reachable
    1 D  192.168.30.2                     br30       failed
    3 DC 192.168.30.3  0C:FE:6F:39:00:99  br30       reachable

    # CE1 peut ping CE2 et CE3 dans le VLAN 20

    [admin@CE1] > ping 192.168.20.2 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.2                               56  64 4ms202us
        1 192.168.20.2                               56  64 3ms775us
        2 192.168.20.2                               56  64 3ms746us
        sent=3 received=3 packet-loss=0% min-rtt=3ms746us avg-rtt=3ms907us max-rtt=4ms202us

    [admin@CE1] > ping 192.168.20.3 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.20.3                               56  64 3ms288us
        1 192.168.20.3                               56  64 3ms563us
        2 192.168.20.3                               56  64 3ms845us
        sent=3 received=3 packet-loss=0% min-rtt=3ms288us avg-rtt=3ms565us max-rtt=3ms845us

    # CE1 (leaf) ne peut ping que CE3 (root) dans le VLAN30

    [admin@CE1] > ping 192.168.30.2 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.30.2                                                 timeout
        1 192.168.30.2                                                 timeout
        2 192.168.30.2                                                 timeout
        sent=3 received=0 packet-loss=100%

    [admin@CE1] > ping 192.168.30.3 count=3
      SEQ HOST                                     SIZE TTL TIME       STATUS
        0 192.168.30.3                               56  64 11ms328us
        1 192.168.30.3                               56  64 3ms920us
        2 192.168.30.3                               56  64 3ms323us
        sent=3 received=3 packet-loss=0% min-rtt=3ms323us avg-rtt=6ms190us max-rtt=11ms328us

Ingress vs egress filtering

Dans la solution de Leaf-Indication, la RFC 8317 distingue deux cas à traiter : celui du trafic known unicast (où de l'ingress filtering s'applique) et celui du trafic BUM (où de l'egress filtering s'applique). Le draft-bamberger-bess-imet-filter-evpn-etree-vxlan, publié en mars 2023, le résume bien :


    [RFC8317] defines two different filtering
    methods to achieve the required segmentation:

    1.  Ingress filtering, applicable to known unicast traffic
    2.  Egress filtering, applicable to broadcast, unknown
        unicast, and multicast (BUM) traffic

Arista implémente pourtant une solution différente, décrite dans ce draft. En effet, comme vu dans les captures, les routes IMET portent la Leaf-Indication, ce qui permet aux PE de filtrer en entrée le trafic BUM. Voyons pourquoi Arista (et Google, co-auteur du draft) ont fait ce choix.

Solution proposée par la RFC 8317

Principe du filtrage sur le PE d'entrée. Les adresses MAC de rôle leaf dans l'EVI d'un PE sont annoncées en BGP par ce dernier avec des routes MAC/IP qui comportent la Leaf-Indication via une extended community :


         […]  For known unicast traffic, additional extensions to
    [RFC7432] are needed (i.e., a new BGP extended community for Leaf-
    Indication described in Section 6.1) in order to enable ingress
    filtering as described in detail in the following sections.

Ainsi, à la réception de ces annonces BGP, les autres PE sont au fait du rôle leaf des MAC annoncées et peuvent filtrer en entrée : « je reçois une trame à destination d'une MAC annoncée leaf, je suis moi-même leaf ⇒ filtrage ».

Principe du filtrage sur le PE de sortie. Mais quid du trafic BUM ? Comment un PE peut-il informer que les futures adresses MAC apprises dans une EVI sont leaves ? La RFC nous dit qu'il ne peut pas :


    This specification does not provide support for filtering Broadcast,
    Unknown Unicast, and Multicast (BUM) traffic on the ingress PE; due
    to the multidestination nature of BUM traffic, it is not possible to
    perform filtering of the same on the ingress PE.  As such, the
    solution relies on egress filtering.  In order to apply the proper
    egress filtering, which varies based on whether a packet is sent from
    a Leaf AC or a Root AC, the MPLS-encapsulated frames MUST be tagged
    with an indication of when they originated from a Leaf AC (i.e., to
    be tagged with a Leaf label as specified in Section 6.1).

La solution proposée est que le PE leaf appose un label MPLS particulier sur les trames BUM qu'il origine—sur le data plane donc. Les autres PE étant au fait de la signification du label (voir le §4.2), ils peuvent filtrer en sortie : « la trame BUM reçue porte un Leaf-Label, je suis moi-même leaf ⇒ filtrage ».

Mais…

Ce fonctionnement de filtrage sur le PE de sortie, en plus de s'avérer sous-optimal (réplication des trames pour finalement les filtrer sur le PE de sortie), n'est surtout pas compatible avec le VXLAN qui ne comporte pas de label MPLS dans son data plane :


               […]  However, egress filtering for BUM traffic relies on
    specific features of MPLS encapsulation, specifically, the ability to
    attach multiple labels to each data packet.  Therefore, egress
    filtering of BUM traffic as defined by [RFC8317] doesn't work
    unmodified for networks using VXLAN encapsulation.

Aussi, le draft cité propose le filtrage en entrée pour tout type de trafic (known unicast et BUM), et ce, en taguant leaf les routes IMET—comme vu dans les captures précédentes. La solution n'est cependant compatible qu'avec les deux premiers scénarios de l'E-Tree :


    [RFC8317] defines 3 primary methods for classifying hosts as roots
    and leafs:

    1.  Each PE site (VTEP) contains only root or only leaf hosts
    2.  Each attachment circuit (VLAN) contains only root or only leaf hosts
    3.  Each host (MAC) can be individually classified as a root or a leaf

    This document will define an approach for performing ingress
    filtering for BUM traffic, in addition to known unicast traffic, for
    networks using VXLAN encapsulation and E-Tree role classification
    using either method 1 or method 2 defined above.  E-Tree filtering of
    BUM traffic for VXLAN networks using method 3 for E-Tree role
    classification (which is also not covered in [RFC8317]) is outside
    the scope of this document.

Autrement dit, la solution ne couvre pas le dernier scénario (granularité à l'adresse MAC). Les deux premiers (granularité à l'EVI et à l'AC) étant les plus répandus à ma connaissance, et la solution du draft me semblant plus naturelle et optimale, elle se défend selon moi. À voir si elle devient un standard !