0%

Openstack网络——虚拟机通信实验1

Openstack中的虚拟机流量通常被分为东西向和南北向。而测试环境中可能会遇到各种流量问题,搞清楚每种流量的通路有助于出现问题后的快速定位。

计划将分多篇博文,掰开来详细分析openstack中的各种流量路径。

本文以最简单的单节点为例,介绍在同子网下,虚拟机之间相互访问的流量路径

环境说明

  • Openstack: stein(all in one)
  • Host: Ubuntu 18.04
  • Network driver: openvswitch

必要准备

  1. 外部网络:MyEx
  2. 镜像:cirros
  3. flavor: small

实验

同子网虚拟机

创建网络

1
2
$ openstack network create net1
$ openstack subnet create --network net1 --subnet-range 200.0.0.0/24 sub1

创建虚拟机

1
2
3
4
5
6
7
8
$ openstack server create --flavor small --image cirros --network net1 --min 2 --max 2 vm
$ openstack server list
+--------------------------------------+------+--------+------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+------------------+--------+--------+
| ba3d4162-6a4d-45dc-9b41-0aa2e2ae0c88 | vm-1 | ACTIVE | net1=200.0.0.16 | cirros | small |
| d349a3a1-fdae-473c-83be-ac02ab5997f3 | vm-2 | ACTIVE | net1=200.0.0.224 | cirros | small |
+--------------------------------------+------+--------+------------------+--------+--------+

分别创建了:

vm-1: 200.0.0.16

vm-2: 200.0.0.224

先确认两个虚机的连通性

发生了什么?

linux bridge

1
2
3
4
5
6
7
$ brctl show
bridge name bridge id STP enabled interfaces
brq459c374c-d3 8000.da9618bd3c8d no vxlan-6
qbra95f52ba-a2 8000.c26ea2427d47 no qvba95f52ba-a2
tapa95f52ba-a2
qbrebeae637-5c 8000.e2ef56b21ffb no qvbebeae637-5c
tapebeae637-5c

先不关注brq459c374c-d3的网桥

Openstack

先来记录一下openstack上的port信息

1
2
3
4
5
6
7
8
9
$ openstack port list
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
| a333499b-e5ea-4658-9ec4-cc9b1942d660 | | fa:16:3e:c7:8c:10 | ip_address='192.168.0.2', subnet_id='a5b25645-c5df-486b-9a49-9412eebc2e59' | ACTIVE |
| a95f52ba-a255-46a7-b3ca-2c870de30e2d | | fa:16:3e:53:d5:50 | ip_address='200.0.0.224', subnet_id='690d8c09-5f55-4b0c-b673-aff967fb0765' | ACTIVE |
| e013286e-b707-4992-b9e2-4c1f77d465b1 | | fa:16:3e:45:00:dc | ip_address='200.0.0.2', subnet_id='690d8c09-5f55-4b0c-b673-aff967fb0765' | ACTIVE |
| ebeae637-5c92-4a5d-943c-27a7c3abfe1f | | fa:16:3e:17:60:c2 | ip_address='200.0.0.16', subnet_id='690d8c09-5f55-4b0c-b673-aff967fb0765' | ACTIVE |
+--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+
id 用途 ip
a333499b-e5 外部网络MyEx的DHCP port 192.168.0.2
a95f52ba-a2 vm-2 200.0.0.224
e013286e-b7 租户网络net1的DHCP port 200.0.0.2
ebeae637-5c vm-1 200.0.0.16

网络接口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ ip link show
...
45: qbra95f52ba-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether c2:6e:a2:42:7d:47 brd ff:ff:ff:ff:ff:ff
# qbra95f52ba-a2: linux bridge,vm-2相关
46: qvoa95f52ba-a2@qvba95f52ba-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
link/ether 72:63:10:8c:b6:b4 brd ff:ff:ff:ff:ff:ff
47: qvba95f52ba-a2@qvoa95f52ba-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master qbra95f52ba-a2 state UP mode DEFAULT group default qlen 1000
link/ether c2:6e:a2:42:7d:47 brd ff:ff:ff:ff:ff:ff
# qvoa95f52ba-a2和qvba95f52ba-a2:veth pair
48: qbrebeae637-5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether e2:ef:56:b2:1f:fb brd ff:ff:ff:ff:ff:ff
# qbrebeae637-5c: linux bridge,vm-1相关
49: qvoebeae637-5c@qvbebeae637-5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
link/ether 92:19:d7:5f:b7:0c brd ff:ff:ff:ff:ff:ff
50: qvbebeae637-5c@qvoebeae637-5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master qbrebeae637-5c state UP mode DEFAULT group default qlen 1000
link/ether e2:ef:56:b2:1f:fb brd ff:ff:ff:ff:ff:ff
# qvoebeae637-5c和qvbebeae637-5c:veth pair
51: tapa95f52ba-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel master qbra95f52ba-a2 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether fe:16:3e:53:d5:50 brd ff:ff:ff:ff:ff:ff
# tapa95f52ba-a2:tap接口,通过vnet方式与虚拟机vm-2相连
52: tapebeae637-5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel master qbrebeae637-5c state UNKNOWN mode DEFAULT group default qlen 1000
link/ether fe:16:3e:17:60:c2 brd ff:ff:ff:ff:ff:ff
# tapebeae637-5c:tap接口,通过vnet方式与虚拟机vm-1相连
53: vxlan-6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq459c374c-d3 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether da:96:18:bd:3c:8d brd ff:ff:ff:ff:ff:ff
# overlay接口
54: brq459c374c-d3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether da:96:18:bd:3c:8d brd ff:ff:ff:ff:ff:ff
# overlay网桥

上面出现了两个ida95f52ba-a2ebeae637-5c,分别对应了openstack的两个port的id

而对于同一个id,有4个前缀:qbr,qvb,qvo,tap:

前缀 说明
qbr Linux网桥
qvb 与qvo互为veth pair,qvb置于qbr网桥中
qvo 与qvo互为veth pair,qvo置于br-int的ovs网桥中
tap tap接口,与虚拟机中的网卡组成veth pair

namespace

1
2
3
$ ip netns list
qdhcp-459c374c-d347-4c3d-8dca-7dbb6b403f4f (id: 0)
qdhcp-04d375a6-7600-4a62-87ff-9d66a41d15b1 (id: 1)

2个namespace的id分别对应了openstack上的两个network的id

1
2
3
4
5
6
7
$ openstack network list
+--------------------------------------+------+--------------------------------------+
| ID | Name | Subnets |
+--------------------------------------+------+--------------------------------------+
| 04d375a6-7600-4a62-87ff-9d66a41d15b1 | MyEx | a5b25645-c5df-486b-9a49-9412eebc2e59 |
| 459c374c-d347-4c3d-8dca-7dbb6b403f4f | net1 | 690d8c09-5f55-4b0c-b673-aff967fb0765 |
+--------------------------------------+------+--------------------------------------+

namespace中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ ip netns exec qdhcp-459c374c-d347-4c3d-8dca-7dbb6b403f4f ifconfig
...
tape013286e-b7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 169.254.169.254 netmask 255.255.0.0 broadcast 169.254.255.255
inet6 fe80::f816:3eff:fe45:dc prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:45:00:dc txqueuelen 1000 (Ethernet)
RX packets 254 bytes 18301 (18.3 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 161 bytes 17139 (17.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ ip netns exec qdhcp-04d375a6-7600-4a62-87ff-9d66a41d15b1 ifconfig
...
tapa333499b-e5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.169.254 netmask 255.255.0.0 broadcast 169.254.255.255
inet6 fe80::f816:3eff:fec7:8c10 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:c7:8c:10 txqueuelen 1000 (Ethernet)
RX packets 153628 bytes 12984859 (12.9 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 151174 bytes 14352784 (14.3 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
  • 每个namespace各有一个接口:tape013286e-b7tapa333499b-e5

  • 这两个接口因为在namespace中,在ip link show的时候看不到。

  • id也分别对应了openstack中的port的id,后面会看到这两个接口出现在ovs的br-int

openvswitch

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ ovs-vsctl show
479f3788-7afb-48a4-accd-eb173f318715
...
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port int-br-ext
Interface int-br-ext
type: patch
options: {peer=phy-br-ext}
Port "tapa333499b-e5"
tag: 2
Interface "tapa333499b-e5"
type: internal
Port "tape013286e-b7"
tag: 3
Interface "tape013286e-b7"
type: internal
Port "qvoebeae637-5c"
tag: 3
Interface "qvoebeae637-5c"
Port "qvoa95f52ba-a2"
tag: 3
Interface "qvoa95f52ba-a2"
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port int-br-provider
Interface int-br-provider
type: patch
options: {peer=phy-br-provider}
Port br-int
Interface br-int
type: internal
ovs_version: "2.11.0"

br-int下有4个port:tapa333499b-e5,tape013286e-b7,qvoebeae637-5c,qvoa95f52ba-a2,这4个接口在前面都有提及

port 说明
tapa333499b-e5 MyEx的dhcp相关,位于qdhcp-04d375a6-76...的namespace中
tape013286e-b7 net1的dhcp相关,位于qdhcp-459c374c-d3...的namespace中
qvoebeae637-5c 通过veth peer接口qvbebeae637-5c与vm-1连接
qvoa95f52ba-a2 通过veth peer接口qvoa95f52ba-a2与vm-2连接

ovs流表

为了方便观察,手动删除了cookieduration字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ ovs-ofctl dump-flows br-int
table=0, n_packets=0, n_bytes=0, priority=65535,vlan_tci=0x0fff/0x1fff actions=drop
table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoebeae637-5c",icmp_type=136 actions=resubmit(,24)
table=0, n_packets=0, n_bytes=0, priority=10,icmp6,in_port="qvoa95f52ba-a2",icmp_type=136 actions=resubmit(,24)
table=0, n_packets=1073, n_bytes=45066, priority=10,arp,in_port="qvoebeae637-5c" actions=resubmit(,24)
table=0, n_packets=1072, n_bytes=45024, priority=10,arp,in_port="qvoa95f52ba-a2" actions=resubmit(,24)
table=0, n_packets=3103, n_bytes=926990, priority=2,in_port="int-br-provider" actions=drop
table=0, n_packets=21, n_bytes=5096, priority=2,in_port="int-br-ext" actions=drop
table=0, n_packets=18494, n_bytes=1811307, priority=9,in_port="qvoebeae637-5c" actions=resubmit(,25)
table=0, n_packets=18499, n_bytes=1811728, priority=9,in_port="qvoa95f52ba-a2" actions=resubmit(,25)
table=0, n_packets=2654, n_bytes=849079, priority=3,in_port="int-br-ext",vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,resubmit(,60)
table=0, n_packets=763992, n_bytes=72539656, priority=0 actions=resubmit(,60)
table=23, n_packets=0, n_bytes=0, priority=0 actions=drop
table=24, n_packets=0, n_bytes=0, priority=2,icmp6,in_port="qvoebeae637-5c",icmp_type=136,nd_target=fe80::f816:3eff:fe17:60c2 actions=resubmit(,60)
table=24, n_packets=0, n_bytes=0, priority=2,icmp6,in_port="qvoa95f52ba-a2",icmp_type=136,nd_target=fe80::f816:3eff:fe53:d550 actions=resubmit(,60)
table=24, n_packets=1073, n_bytes=45066, priority=2,arp,in_port="qvoebeae637-5c",arp_spa=200.0.0.16 actions=resubmit(,25)
table=24, n_packets=1072, n_bytes=45024, priority=2,arp,in_port="qvoa95f52ba-a2",arp_spa=200.0.0.224 actions=resubmit(,25)
table=24, n_packets=0, n_bytes=0, priority=0 actions=drop
table=25, n_packets=19555, n_bytes=1855533, priority=2,in_port="qvoebeae637-5c",dl_src=fa:16:3e:17:60:c2 actions=resubmit(,60)
table=25, n_packets=19560, n_bytes=1855982, priority=2,in_port="qvoa95f52ba-a2",dl_src=fa:16:3e:53:d5:50 actions=resubmit(,60)
table=60, n_packets=963419, n_bytes=92070282, priority=3 actions=NORMAL

先忽略icmp6的流表

table=0

  • in_port为qvoebeae637-5cqvoa95f52ba-a2的arp报文送往table=24
  • in_port为qvoebeae637-5cqvoa95f52ba-a2的其他报文送往table=25
  • 其他报文送往table=60

table=24

  • in_port为qvoebeae637-5cqvoa95f52ba-a2的arp报文,arp_spa分别是200.0.0.224200.0.0.16的报文,送往table=25

table=25

  • in_port为qvoebeae637-5cqvoa95f52ba-a2的报文,目的mac分别为fa:16:3e:17:60:c2fa:16:3e:53:d5:50的报文,送往table=60

table=60

  • 正常转发

通过上面一系列的流表,in_port为qvoebeae637-5cqvoa95f52ba-a2的报文基本上都会在br-int上正常转发

梳理一下

  1. vm通过vnet与一个tap接口相连
  2. tap接口与qvbxxx接口置于一个linux网桥中
  3. qvbxxx的veth peer置于br-int的ovs网桥中
  4. dhcp服务位于linux namespace中,使用了一个tap接口,而此tap接口同时位于br-int的ovs网桥中

报文跟踪

vm之间互通

分析

根据上面的图我们可以看到,vm-1访问vm-2的流量路径为:

  1. tapebeae637-5c
  2. qbrebeae637-5c
  3. qvbebeae637-5c
  4. qvoebeae637-5c
  5. br-int
  6. qvoa95f52ba-a2
  7. qvba95f52ba-a2
  8. qbra95f52ba-a2
  9. tapa95f52ba-a2

抓包

实际上这个抓包很无聊,因为每个上面看到的报文都是一样的,这里只列举其中一个的结果

1
2
3
4
5
6
7
$ tcpdump -i qvoebeae637-5c
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qvoebeae637-5c, link-type EN10MB (Ethernet), capture size 262144 bytes
15:42:19.320342 IP 200.0.0.224 > 200.0.0.16: ICMP echo request, id 48897, seq 20449, length 64
15:42:19.321219 IP 200.0.0.16 > 200.0.0.224: ICMP echo reply, id 48897, seq 20449, length 64
15:42:20.321525 IP 200.0.0.224 > 200.0.0.16: ICMP echo request, id 48897, seq 20450, length 64
15:42:20.322566 IP 200.0.0.16 > 200.0.0.224: ICMP echo reply, id 48897, seq 20450, length 64

vm-1访问dhcp的port

分析

而访问dhcp的通路,前面到达br-int的报文都一样,到了br-int后,会到达对应的namespace中

1. 在vm-1上ping 200.0.0.2

2. 在ovs中抓包

1
2
3
4
5
6
$ ovs-tcpdump -i tape013286e-b7
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ovsmi644603, link-type EN10MB (Ethernet), capture size 262144 bytes
16:00:35.469101 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
16:00:35.552469 IP 200.0.0.16 > 200.0.0.2: ICMP echo request, id 48129, seq 480, length 64
16:00:35.552507 IP 200.0.0.2 > 200.0.0.16: ICMP echo reply, id 48129, seq 480, length 64

3. 在namespace中抓包

注意,在namespace中抓包,报文信息可能不会及时打印在屏幕上

1
2
3
4
5
$ ip netns exec qdhcp-459c374c-d347-4c3d-8dca-7dbb6b403f4f tcpdump -i tape013286e-b7
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tape013286e-b7, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:00.643387 IP 200.0.0.16 > 200.0.0.2: ICMP echo request, id 48129, seq 565, length 64
16:02:00.643461 IP 200.0.0.2 > 200.0.0.16: ICMP echo reply, id 48129, seq 565, length 64

More

你或许已经发现了,在namespace中的接口的IP为169.254.169.254,这个地址是干嘛的?那么200.0.0.2到底在哪儿呢?

169.254.169.254

在openstack中,你会经常看到这个IP地址,它是metadata service的IP

大多数cloud os实例启动时,都会向这个IP地址发起请求,获取一些信息,如以下实例启动日志中,获取public-keysuser-data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Starting network...
udhcpc (v1.23.2) started
Sending discover...
Sending select for 200.0.0.224...
Lease of 200.0.0.224 obtained, lease time 86400
route: SIOCADDRT: File exists
WARN: failed: route add -net "0.0.0.0/0" gw "200.0.0.1"
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 24.66. request failed
successful after 2/20 tries: up 37.29. iid=i-00000021
failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys
warning: no ec2 metadata for public-keys
failed to get http://169.254.169.254/2009-04-04/user-data
warning: no ec2 metadata for user-data

200.0.0.2在哪儿?

1
2
3
4
$ ps -aux | grep dnsmasq
...
nobody 3449 0.0 0.0 53332 2588 ? S 09:49 0:00 dnsmasq --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/host --addn-hosts=/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/opts --dhcp-leasefile=/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-interfaces --dhcp-range=set:tag0,200.0.0.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1450 --dhcp-lease-max=256 --conf-file= --domain=openstacklocal
...

--dhcp-leasefile指向了/var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/leases,这个文件记录了dhcp分配的IP地址:

1
2
3
4
$ cat /var/lib/neutron/dhcp/459c374c-d347-4c3d-8dca-7dbb6b403f4f/leases
1567498384 fa:16:3e:17:60:c2 200.0.0.16 host-200-0-0-16 01:fa:16:3e:17:60:c2
1567475691 fa:16:3e:53:d5:50 200.0.0.224 host-200-0-0-224 01:fa:16:3e:53:d5:50
1567475394 fa:16:3e:45:00:dc 200.0.0.2 host-200-0-0-2 *

200.0.0.2的mac地址是fa:16:3e:45:00:dc,而这个mac正是tape013286e-b7的物理地址

1
2
3
4
5
6
7
8
9
10
$ ip netns exec qdhcp-459c374c-d347-4c3d-8dca-7dbb6b403f4f ifconfig
...
tape013286e-b7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 169.254.169.254 netmask 255.255.0.0 broadcast 169.254.255.255
inet6 fe80::f816:3eff:fe45:dc prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:45:00:dc txqueuelen 1000 (Ethernet)
RX packets 1639 bytes 130817 (130.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1546 bytes 149118 (149.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

小结

在单节点状况下,同子网下的虚拟机相互访问,是一个非常简单的路径。

通过tap接口、veth pair、linux网桥、ovs网桥以及ovs流表来实现了流量通路。