Support multiple APN's per GTP endpoint and as an additional benefit support multiple GTP sockets per GTP entity.
Use case multiple APN's: ------------------------
In 3GPP a APN is control path construct. When mappend into the data path, it mean that UE IP's can be source from independended IP networks with overlaping IP ranges.
3GPP, TS 29.061 version 13.6.0 Release 13, Section 11.3 describes this as:
- each private network manages its own addressing. In general this will result in different private networks having overlapping address ranges. A logically separate connection (e.g. an IP in IP tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW and each private network. In this case the IP address alone is not necessarily unique. The pair of values, Access Point Name (APN) and IPv4 address and/or IPv6 prefixes, is unique.
To support such a setup, each APN is mapped to a Linux network device. VRF-Lite, network namespaces or other mechanismns can the be used to realize the full separation of the per APN IP networks.
Use case multiple GTP sockets per GTP entity: ---------------------------------------------
A GTP entity like a PGW can use multiple GTP sockets for:
* separate IPv4 and IPv6 transport endpoints * support multiple reference points in separated IP networks, e.g. have Gn/Gp/S5/S8 in a GRX attaches network and S2a/S2b in another private network
Especially the S2a/S2b separation is an important scenario. The networks use for roaming and non roaming attachment (Gn/Gp/S5/S8 reference points) are usually different from the connection for trusted and untrusted WiFi access (S2a/S2b). Will the GTP transport networks are separated, it is still desirable to terminated the tunnels in the same GTP entity to ensure uninterrupted IP connectivity during 3G/LTE to/from WiFi handover.
Implementation: ---------------
APN's are a control path construct, the identification of the associated network device need therefore to be bound to be tunnel endpoint identifier.
This series moves the hash for the incoming tunnel endpoint identifiers into the socket to support multiple network devices per GTP socket. It the adds a method of enabling the GTP encapsulation on a socket without having to bound the socket to a network device and finally allows to specify a GTP socket per PDP context.
API impact: -----------
This is probably the most problematic part of this series...
The removeal of the TEID form the netdevice also means that the gtp genl API for retriving tunnel information and removing tunnels needs to be adjusted.
Before this change it was possible to change a GTP tunnel using the gtp netdevice id and the teid. The teid is no longer unique per gtp netdevice. After this change it has to be either the netdevice and MS IP or the GTP socket and teid.
Fortunatly, libgtpnl has always populated the Link Id, TEID, GSN Peer IP and MS IP. The library interface has ensured that all information that is mandatory after this change is guaranteed to be present.
The only project that doesn't use libgtpnl (OpenAir-CN) is also populating all of those values.
The API change will therefore not break any existing userspace applications.
-- Andreas Schultz (4): gtp: move TEID hash to per socket structure gtp: add genl cmd to enable GTP encapsulation on UDP socket gtp: add support to select a GTP socket during PDP context creation Extend Kernel GTP-U tunneling documentation
Documentation/networking/gtp.txt | 103 ++++++++++++++- drivers/net/gtp.c | 263 ++++++++++++++++++++++++++++----------- include/uapi/linux/gtp.h | 4 + 3 files changed, 292 insertions(+), 78 deletions(-)
Untangele the TEID information from the network device and move it into a per socket structure.
The removeal of the TEID form the netdevice also means that the gtp genl API for retriving tunnel information and removing tunnels needs to be adjusted. Before this change it was possible to change a GTP tunnel using the gtp netdevice id and the teid. The teid is no longer unique per gtp netdevice. So after this change it has to be either the netdevice and MS IP or the GTP socket and teid.
Fortunatly, libgtpnl has always populated the Link Id, TEID, GSN Peer IP and MS IP. So, the library interface has ensured that all information that is mandatory after this change is guranteed to be present. The only project that doesn't use libgtpnl (OpenAir-CN) is also populating all of those values.
Signed-off-by: Andreas Schultz aschultz@tpip.net --- drivers/net/gtp.c | 145 +++++++++++++++++++++++++++++------------------------- 1 file changed, 78 insertions(+), 67 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 3e1854f..66616f7 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -75,10 +75,15 @@ struct gtp_dev { struct net_device *dev;
unsigned int hash_size; - struct hlist_head *tid_hash; struct hlist_head *addr_hash; };
+/* One instance of the GTP socket. */ +struct gtp_sock { + unsigned int hash_size; + struct hlist_head tid_hash[]; +}; + static unsigned int gtp_net_id __read_mostly;
struct gtp_net { @@ -106,12 +111,12 @@ static inline u32 ipv4_hashfn(__be32 ip) }
/* Resolve a PDP context structure based on the 64bit TID. */ -static struct pdp_ctx *gtp0_pdp_find(struct gtp_dev *gtp, u64 tid) +static struct pdp_ctx *gtp0_pdp_find(struct gtp_sock *gsk, u64 tid) { struct hlist_head *head; struct pdp_ctx *pdp;
- head = >p->tid_hash[gtp0_hashfn(tid) % gtp->hash_size]; + head = &gsk->tid_hash[gtp0_hashfn(tid) % gsk->hash_size];
hlist_for_each_entry_rcu(pdp, head, hlist_tid) { if (pdp->gtp_version == GTP_V0 && @@ -122,12 +127,12 @@ static struct pdp_ctx *gtp0_pdp_find(struct gtp_dev *gtp, u64 tid) }
/* Resolve a PDP context structure based on the 32bit TEI. */ -static struct pdp_ctx *gtp1_pdp_find(struct gtp_dev *gtp, u32 tid) +static struct pdp_ctx *gtp1_pdp_find(struct gtp_sock *gsk, u32 tid) { struct hlist_head *head; struct pdp_ctx *pdp;
- head = >p->tid_hash[gtp1u_hashfn(tid) % gtp->hash_size]; + head = &gsk->tid_hash[gtp1u_hashfn(tid) % gsk->hash_size];
hlist_for_each_entry_rcu(pdp, head, hlist_tid) { if (pdp->gtp_version == GTP_V1 && @@ -215,7 +220,7 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, unsigned int hdrlen }
/* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */ -static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) +static int gtp0_udp_encap_recv(struct gtp_sock *gsk, struct sk_buff *skb) { unsigned int hdrlen = sizeof(struct udphdr) + sizeof(struct gtp0_header); @@ -233,16 +238,16 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) if (gtp0->type != GTP_TPDU) return 1;
- pctx = gtp0_pdp_find(gtp, be64_to_cpu(gtp0->tid)); + pctx = gtp0_pdp_find(gsk, be64_to_cpu(gtp0->tid)); if (!pctx) { - netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb); + pr_debug("No PDP ctx to decap skb=%p\n", skb); return 1; }
return gtp_rx(pctx, skb, hdrlen); }
-static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) +static int gtp1u_udp_encap_recv(struct gtp_sock *gsk, struct sk_buff *skb) { unsigned int hdrlen = sizeof(struct udphdr) + sizeof(struct gtp1_header); @@ -275,9 +280,9 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
- pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid)); + pctx = gtp1_pdp_find(gsk, ntohl(gtp1->tid)); if (!pctx) { - netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb); + pr_debug("No PDP ctx to decap skb=%p\n", skb); return 1; }
@@ -286,13 +291,21 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
static void gtp_encap_destroy(struct sock *sk) { - struct gtp_dev *gtp; + struct gtp_sock *gsk; + struct pdp_ctx *pctx; + int i;
- gtp = rcu_dereference_sk_user_data(sk); - if (gtp) { + gsk = rcu_dereference_sk_user_data(sk); + if (gsk) { udp_sk(sk)->encap_type = 0; rcu_assign_sk_user_data(sk, NULL); - sock_put(sk); + + for (i = 0; i < gsk->hash_size; i++) + hlist_for_each_entry_rcu(pctx, &gsk->tid_hash[i], hlist_tid) + pdp_context_delete(pctx); + + synchronize_rcu(); + kfree(gsk); } }
@@ -315,23 +328,23 @@ static void gtp_encap_disable(struct gtp_dev *gtp) */ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb) { - struct gtp_dev *gtp; + struct gtp_sock *gsk; int ret = 0;
- gtp = rcu_dereference_sk_user_data(sk); - if (!gtp) + gsk = rcu_dereference_sk_user_data(sk); + if (!gsk) return 1;
- netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk); + pr_debug("encap_recv sk=%p\n", sk);
switch (udp_sk(sk)->encap_type) { case UDP_ENCAP_GTP0: - netdev_dbg(gtp->dev, "received GTP0 packet\n"); - ret = gtp0_udp_encap_recv(gtp, skb); + pr_debug("received GTP0 packet\n"); + ret = gtp0_udp_encap_recv(gsk, skb); break; case UDP_ENCAP_GTP1U: - netdev_dbg(gtp->dev, "received GTP1U packet\n"); - ret = gtp1u_udp_encap_recv(gtp, skb); + pr_debug("received GTP1U packet\n"); + ret = gtp1u_udp_encap_recv(gsk, skb); break; default: ret = -1; /* Shouldn't happen. */ @@ -339,12 +352,12 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
switch (ret) { case 1: - netdev_dbg(gtp->dev, "pass up to the process\n"); + pr_debug("pass up to the process\n"); break; case 0: break; case -1: - netdev_dbg(gtp->dev, "GTP packet has been dropped\n"); + pr_debug("GTP packet has been dropped\n"); kfree_skb(skb); ret = 0; break; @@ -624,7 +637,8 @@ static void gtp_link_setup(struct net_device *dev)
static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize); static void gtp_hashtable_free(struct gtp_dev *gtp); -static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]); +static int gtp_encap_enable(struct gtp_dev *gtp, int hsize, + struct nlattr *data[]);
static int gtp_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) @@ -638,15 +652,15 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev,
gtp = netdev_priv(dev);
- err = gtp_encap_enable(gtp, data); - if (err < 0) - return err; - if (!data[IFLA_GTP_PDP_HASHSIZE]) hashsize = 1024; else hashsize = nla_get_u32(data[IFLA_GTP_PDP_HASHSIZE]);
+ err = gtp_encap_enable(gtp, hashsize, data); + if (err < 0) + return err; + err = gtp_hashtable_new(gtp, hashsize); if (err < 0) goto out_encap; @@ -734,20 +748,12 @@ static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize) if (gtp->addr_hash == NULL) return -ENOMEM;
- gtp->tid_hash = kmalloc(sizeof(struct hlist_head) * hsize, GFP_KERNEL); - if (gtp->tid_hash == NULL) - goto err1; - gtp->hash_size = hsize;
- for (i = 0; i < hsize; i++) { + for (i = 0; i < hsize; i++) INIT_HLIST_HEAD(>p->addr_hash[i]); - INIT_HLIST_HEAD(>p->tid_hash[i]); - } + return 0; -err1: - kfree(gtp->addr_hash); - return -ENOMEM; }
static void gtp_hashtable_free(struct gtp_dev *gtp) @@ -756,21 +762,20 @@ static void gtp_hashtable_free(struct gtp_dev *gtp) int i;
for (i = 0; i < gtp->hash_size; i++) - hlist_for_each_entry_rcu(pctx, >p->tid_hash[i], hlist_tid) + hlist_for_each_entry_rcu(pctx, >p->addr_hash[i], hlist_addr) pdp_context_delete(pctx);
synchronize_rcu(); kfree(gtp->addr_hash); - kfree(gtp->tid_hash); }
-static struct sock *gtp_encap_enable_socket(int fd, int type, - struct gtp_dev *gtp) +static struct sock *gtp_encap_enable_socket(int fd, int type, int hsize) { struct udp_tunnel_sock_cfg tuncfg = {NULL}; + struct gtp_sock *gsk; struct socket *sock; struct sock *sk; - int err; + int err, i;
pr_debug("enable gtp on %d, %d\n", fd, type);
@@ -791,10 +796,20 @@ static struct sock *gtp_encap_enable_socket(int fd, int type, goto out_sock; }
+ gsk = kzalloc(sizeof(*gsk) + sizeof(struct hlist_head) * hsize, GFP_KERNEL); + if (!gsk) { + sk = ERR_PTR(-ENOMEM); + goto out_sock; + } + + gsk->hash_size = hsize; + for (i = 0; i < hsize; i++) + INIT_HLIST_HEAD(&gsk->tid_hash[i]); + sk = sock->sk; sock_hold(sk);
- tuncfg.sk_user_data = gtp; + tuncfg.sk_user_data = gsk; tuncfg.encap_type = type; tuncfg.encap_rcv = gtp_encap_recv; tuncfg.encap_destroy = gtp_encap_destroy; @@ -806,7 +821,8 @@ static struct sock *gtp_encap_enable_socket(int fd, int type, return sk; }
-static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]) +static int gtp_encap_enable(struct gtp_dev *gtp, int hsize, + struct nlattr *data[]) { struct sock *sk1u = NULL; struct sock *sk0 = NULL; @@ -814,7 +830,7 @@ static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]) if (data[IFLA_GTP_FD0]) { u32 fd0 = nla_get_u32(data[IFLA_GTP_FD0]);
- sk0 = gtp_encap_enable_socket(fd0, UDP_ENCAP_GTP0, gtp); + sk0 = gtp_encap_enable_socket(fd0, UDP_ENCAP_GTP0, hsize); if (IS_ERR(sk0)) return PTR_ERR(sk0); } @@ -822,7 +838,7 @@ static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]) if (data[IFLA_GTP_FD1]) { u32 fd1 = nla_get_u32(data[IFLA_GTP_FD1]);
- sk1u = gtp_encap_enable_socket(fd1, UDP_ENCAP_GTP1U, gtp); + sk1u = gtp_encap_enable_socket(fd1, UDP_ENCAP_GTP1U, hsize); if (IS_ERR(sk1u)) { if (sk0) gtp_encap_disable_sock(sk0); @@ -895,9 +911,14 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock *sk, struct net_device *dev = gtp->dev; u32 hash_ms, hash_tid = 0; struct pdp_ctx *pctx; + struct gtp_sock *gsk; bool found = false; __be32 ms_addr;
+ gsk = rcu_dereference_sk_user_data(sk); + if (!gsk) + return -ENODEV; + ms_addr = nla_get_be32(info->attrs[GTPA_MS_ADDRESS]); hash_ms = ipv4_hashfn(ms_addr) % gtp->hash_size;
@@ -944,15 +965,15 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock *sk, * situation in which this doesn't unambiguosly identify the * PDP context. */ - hash_tid = gtp0_hashfn(pctx->u.v0.tid) % gtp->hash_size; + hash_tid = gtp0_hashfn(pctx->u.v0.tid) % gsk->hash_size; break; case GTP_V1: - hash_tid = gtp1u_hashfn(pctx->u.v1.i_tei) % gtp->hash_size; + hash_tid = gtp1u_hashfn(pctx->u.v1.i_tei) % gsk->hash_size; break; }
hlist_add_head_rcu(&pctx->hlist_addr, >p->addr_hash[hash_ms]); - hlist_add_head_rcu(&pctx->hlist_tid, >p->tid_hash[hash_tid]); + hlist_add_head_rcu(&pctx->hlist_tid, &gsk->tid_hash[hash_tid]);
switch (pctx->gtp_version) { case GTP_V0: @@ -1048,24 +1069,14 @@ static struct pdp_ctx *gtp_find_pdp_by_link(struct net *net, { struct gtp_dev *gtp;
+ if (!nla[GTPA_MS_ADDRESS]) + return ERR_PTR(-EINVAL); + gtp = gtp_find_dev(net, nla); if (!gtp) return ERR_PTR(-ENODEV);
- if (nla[GTPA_MS_ADDRESS]) { - __be32 ip = nla_get_be32(nla[GTPA_MS_ADDRESS]); - - return ipv4_pdp_find(gtp, ip); - } else if (nla[GTPA_VERSION]) { - u32 gtp_version = nla_get_u32(nla[GTPA_VERSION]); - - if (gtp_version == GTP_V0 && nla[GTPA_TID]) - return gtp0_pdp_find(gtp, nla_get_u64(nla[GTPA_TID])); - else if (gtp_version == GTP_V1 && nla[GTPA_I_TEI]) - return gtp1_pdp_find(gtp, nla_get_u32(nla[GTPA_I_TEI])); - } - - return ERR_PTR(-EINVAL); + return ipv4_pdp_find(gtp, nla_get_be32(nla[GTPA_MS_ADDRESS])); }
static struct pdp_ctx *gtp_find_pdp(struct net *net, struct nlattr *nla[]) @@ -1209,7 +1220,7 @@ static int gtp_genl_dump_pdp(struct sk_buff *skb, last_gtp = NULL;
for (i = k; i < gtp->hash_size; i++) { - hlist_for_each_entry_rcu(pctx, >p->tid_hash[i], hlist_tid) { + hlist_for_each_entry_rcu(pctx, >p->addr_hash[i], hlist_addr) { if (tid && tid != pctx->u.tid) continue; else
On Tue, Mar 14, 2017 at 12:25:45PM +0100, Andreas Schultz wrote:
@@ -275,9 +280,9 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
- pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
- pctx = gtp1_pdp_find(gsk, ntohl(gtp1->tid)); if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
return 1;pr_debug("No PDP ctx to decap skb=%p\n", skb);
Again the pr_debug() change has resurrected.
I already told you: If we are going to have more than one gtp device, then this doesn't make sense. I have to repeat things over and over again, just because you don't want to rebase your patchset for some reason. I don't find any other explaination for this.
So please remove this debugging rather than rendering this completely useful.
Moreover this change has nothing to this patch, so this doesn't break the one logical change per patch.
----- On Mar 14, 2017, at 12:33 PM, pablo pablo@netfilter.org wrote:
On Tue, Mar 14, 2017 at 12:25:45PM +0100, Andreas Schultz wrote:
@@ -275,9 +280,9 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
- pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
- pctx = gtp1_pdp_find(gsk, ntohl(gtp1->tid)); if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
return 1;pr_debug("No PDP ctx to decap skb=%p\n", skb);Again the pr_debug() change has resurrected.
Yes, at that point in the code, there is now ways to resolve the network device. Therefore the netdev_dbg has to go.
I already told you: If we are going to have more than one gtp device, then this doesn't make sense. I have to repeat things over and over again, just because you don't want to rebase your patchset for some reason. I don't find any other explaination for this.
Without a PDP context, there is no network device, so netdev_dbg.
So please remove this debugging rather than rendering this completely useful.
ACK
Moreover this change has nothing to this patch, so this doesn't break the one logical change per patch.
This patch moves the incoming teid has from the network device to the socket. This means that gtp1_pdp_find needs to change. So this related. For the debug change, see above why it's related.
Andreas
Signed-off-by: Andreas Schultz aschultz@tpip.net --- drivers/net/gtp.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/gtp.h | 4 ++++ 2 files changed, 51 insertions(+)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 66616f7..c4cf1b9 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1244,6 +1244,45 @@ static int gtp_genl_dump_pdp(struct sk_buff *skb, return skb->len; }
+static int gtp_genl_enable_socket(struct sk_buff *skb, struct genl_info *info) +{ + u32 version, fd, hashsize; + struct sock *sk; + + if (!info->attrs[GTPA_VERSION] || + !info->attrs[GTPA_FD]) + return -EINVAL; + + if (!info->attrs[GTPA_PDP_HASHSIZE]) + hashsize = 1024; + else + hashsize = nla_get_u32(info->attrs[IFLA_GTP_PDP_HASHSIZE]); + + version = nla_get_u32(info->attrs[GTPA_VERSION]); + fd = nla_get_u32(info->attrs[GTPA_FD]); + + switch (version) { + case GTP_V0: + sk = gtp_encap_enable_socket(fd, UDP_ENCAP_GTP0, hashsize); + break; + + case GTP_V1: + sk = gtp_encap_enable_socket(fd, UDP_ENCAP_GTP1U, hashsize); + break; + + default: + return -EINVAL; + } + + if (!sk) + return -EINVAL; + + if (IS_ERR(sk)) + return PTR_ERR(sk); + + return 0; +} + static struct nla_policy gtp_genl_policy[GTPA_MAX + 1] = { [GTPA_LINK] = { .type = NLA_U32, }, [GTPA_VERSION] = { .type = NLA_U32, }, @@ -1254,6 +1293,8 @@ static struct nla_policy gtp_genl_policy[GTPA_MAX + 1] = { [GTPA_NET_NS_FD] = { .type = NLA_U32, }, [GTPA_I_TEI] = { .type = NLA_U32, }, [GTPA_O_TEI] = { .type = NLA_U32, }, + [GTPA_PDP_HASHSIZE] = { .type = NLA_U32, }, + [GTPA_FD] = { .type = NLA_U32, }, };
static const struct genl_ops gtp_genl_ops[] = { @@ -1276,6 +1317,12 @@ static const struct genl_ops gtp_genl_ops[] = { .policy = gtp_genl_policy, .flags = GENL_ADMIN_PERM, }, + { + .cmd = GTP_CMD_ENABLE_SOCKET, + .doit = gtp_genl_enable_socket, + .policy = gtp_genl_policy, + .flags = GENL_ADMIN_PERM, + }, };
static struct genl_family gtp_genl_family __ro_after_init = { diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h index 72a04a0..a9e9fe0 100644 --- a/include/uapi/linux/gtp.h +++ b/include/uapi/linux/gtp.h @@ -6,6 +6,8 @@ enum gtp_genl_cmds { GTP_CMD_DELPDP, GTP_CMD_GETPDP,
+ GTP_CMD_ENABLE_SOCKET, + GTP_CMD_MAX, };
@@ -26,6 +28,8 @@ enum gtp_attrs { GTPA_I_TEI, /* for GTPv1 only */ GTPA_O_TEI, /* for GTPv1 only */ GTPA_PAD, + GTPA_PDP_HASHSIZE, + GTPA_FD, __GTPA_MAX, }; #define GTPA_MAX (__GTPA_MAX + 1)
On Tue, Mar 14, 2017 at 12:25:46PM +0100, Andreas Schultz wrote:
@@ -1254,6 +1293,8 @@ static struct nla_policy gtp_genl_policy[GTPA_MAX + 1] = { [GTPA_NET_NS_FD] = { .type = NLA_U32, }, [GTPA_I_TEI] = { .type = NLA_U32, }, [GTPA_O_TEI] = { .type = NLA_U32, },
- [GTPA_PDP_HASHSIZE] = { .type = NLA_U32, },
This per PDP hashsize attribute clearly doesn't belong here.
Moreover, we now have a rhashtable implementation, so we hopefully we can get rid of this. It should be very easy to convert this to use rhashtable, and it is very much desiderable.
- [GTPA_FD] = { .type = NLA_U32, },
This new atttribute has nothing to do with the PDP context. And enum gtp_attrs *only* describe a PDP context. Adding more attributes there to mix semantics is not the way to go.
You likely have to inaugurate a new enum. This gtp_attrs enum only related to the PDP description.
Why not add some interface to attach more sockets to the gtp device globally? So still the gtp device is the top-level structure. Then add a netlink attribute to specify to what VRF this tunnel belongs to, instead of implicitly using the socket to achieve this.
Another possibility is to explicitly have an interface to add new/delete VRFs, attach sockets to them.
In general, I'm still not convinced this is the right design for this.
----- On Mar 14, 2017, at 12:43 PM, pablo pablo@netfilter.org wrote:
On Tue, Mar 14, 2017 at 12:25:46PM +0100, Andreas Schultz wrote:
@@ -1254,6 +1293,8 @@ static struct nla_policy gtp_genl_policy[GTPA_MAX + 1] = { [GTPA_NET_NS_FD] = { .type = NLA_U32, }, [GTPA_I_TEI] = { .type = NLA_U32, }, [GTPA_O_TEI] = { .type = NLA_U32, },
- [GTPA_PDP_HASHSIZE] = { .type = NLA_U32, },
This per PDP hashsize attribute clearly doesn't belong here.
Moreover, we now have a rhashtable implementation, so we hopefully we can get rid of this. It should be very easy to convert this to use rhashtable, and it is very much desiderable.
This would mean I have to mix the unrelated rhashtable change with moving the hash into the socket. This certainly is not desirable either.
So, I'm going to have a look at the rhashtable thing and send a patch first to convert the hashes to it.
- [GTPA_FD] = { .type = NLA_U32, },
This new atttribute has nothing to do with the PDP context. And enum gtp_attrs *only* describe a PDP context. Adding more attributes there to mix semantics is not the way to go.
You seem to assume that the network device or the APN/VRF is the root entity for the GTP tunnels. That is IMHO wrong. The 3GPP specification clearly defines a GTP entity that is completely Independent from an APN or the local IP endpoint.
A GTP entity serves multiple local IP endpoints, It manages outgoing tunnels by the local APN/VRF, source IP, destination IP and remote tunnel id, incoming tunnels are managed by the local destination IP, local tunnel id and VRF/APN.
Therefor a PDP context needs the following attributes:
* local source/destination IP (and port - but that's for different series) * remote destination IP * local and remote TEID * VRF/APN
The local source and destination IP is implicitly contained in the socket, therefor the socket needs to part of the context. The VRF/APN is contained in the network device reference. So this also needs to part of the PDP context.
Having either the socket or the network device as the sole root container for a GTP entity is wrong since the PDP context always refer both.
You likely have to inaugurate a new enum. This gtp_attrs enum only related to the PDP description.
Why not add some interface to attach more sockets to the gtp device globally? So still the gtp device is the top-level structure.
That is IMHO the wrong model. In a real live setup it likely to have a few GTP sockets and possibly hundreds if not thousands of network device attached to them (we already had the discussion why this kind of sharing makes sense). So from a resource perspective alone, having the network device as root makes no sense.
Then add a netlink attribute to specify to what VRF this tunnel belongs to, instead of implicitly using the socket to achieve this.
You got that the wrong way arround. The VRF is already in the PDP context through the network device reference. The socket is added to the PDP context to select the outgoing source IP of the GTP tunnel in order to support multiple GTP source IP's per GTP entity.
Another possibility is to explicitly have an interface to add new/delete VRFs, attach sockets to them.
We already have that interface. It's the create a GTP network interface genl API. I explained a few lines above why I think that adding sockets to GTP network devices is wrong.
In general, I'm still not convinced this is the right design for this.
Following your "add VRF" idea, I would end up with a pseudo network device that represents a GTP entity. This would be the root instance for all the VRF's and GTP sockets. Although being a network device, it would not behave in any way like a network device, it would not handle traffic or have IP(v6) addresses attached to it. I would then further have GTP VRF network devices. Those would be "real" network device that handle traffic and have IP addresses/route attached to them.
I'm not sure if this pseudo GTP entity root device fits well with other networking concepts. And more over, I can't really see the need for such an construct.
This need for an in-kernel root entity seem to come the concept that the kernel *owns* the tunnels and that tunnel a static and independent from the user space control instance.
I think that the user space control instance should own the tunnels and only use the kernel facility to manage them. When the user space instance goes away, so should the tunnels.
From that perspective, I want to keep the kernel facilities to the absolute
needed minimum.
Regards Andreas
On Tue, Mar 14, 2017 at 01:28:25PM +0100, Andreas Schultz wrote:
----- On Mar 14, 2017, at 12:43 PM, pablo pablo@netfilter.org wrote:
[...]
A GTP entity serves multiple local IP endpoints, It manages outgoing tunnels by the local APN/VRF, source IP, destination IP and remote tunnel id, incoming tunnels are managed by the local destination IP, local tunnel id and VRF/APN.
Therefor a PDP context needs the following attributes:
- local source/destination IP (and port - but that's for different series)
- remote destination IP
- local and remote TEID
- VRF/APN
[...]
I'm not sure if this pseudo GTP entity root device fits well with other networking concepts. And more over, I can't really see the need for such an construct.
Some sort of top-level structure that wraps all these objects is needed, and that can be a new VRF object itself.
You can add a netlink interface to add/dump/delete VRFs, this VRF database would be *global* to the GSN. At VRF creation, you attach the socket and the GTP device. You can share sockets between VRFs. PDP context objects would be added to the corresponding VRF *not to the socket*, but actually this will result in inserting this PDP context into the socket hashtable and the GTP device hashtable.
We need to introduce a default VRF that is assumed to always exist, and that userspace cannot remove, so things don't break backward. If no VRF is specified, then we attach things to this default VRF. Actually, look at this from a different angle: the existing driver is just supporting *one single VRF* at this moment so we just have to represent this explicitly. Breaking existing API is a no-go.
This explicit VRF concept would also allow us to dump PDP contexts that belong to a given VRF, by simply indicating the VRF unique id. Jamal already requested that we extend iproute2 to have command to inspect the gtp driver we cannot escape this, we should allow standalone tools to inspect the gtp datapath as we do with other existing tunnel drivers, no matter what daemon in userspace implements the control plane.
[...]
I think that the user space control instance should own the tunnels and only use the kernel facility to manage them. When the user space instance goes away, so should the tunnels.
This doesn't allow daemon hot restart for whatever administrative reason without affecting existing traffic. The kernel owns the datapath indeed, that include tunnels.
From that perspective, I want to keep the kernel facilities to the absolute needed minimum.
If some simple abstraction that we can insert makes this whole thing more maintainable, then it makes sense to consider it.
This is all about not exposing the internal layout of the representation you use for a very specific reason: The more you expose internal details to userspace, the more problems we'll have to extend things later on in the future. And don't try to be smart and say: "Hey, I already know every usecase we will have in the future" because that is not true.
For each new PDP a separate socket can be selected. The per netdevice default sockets are no longer mandatory.
This means also that multiple gtp netdevices can share the same default socket and that therefore the destruction of a gtp netdevice can no longer automatically disable the gtp encapsulation on it's sockets.
Signed-off-by: Andreas Schultz aschultz@tpip.net --- drivers/net/gtp.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 68 insertions(+), 11 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index c4cf1b9..afa043d 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -381,9 +381,6 @@ static int gtp_dev_init(struct net_device *dev)
static void gtp_dev_uninit(struct net_device *dev) { - struct gtp_dev *gtp = netdev_priv(dev); - - gtp_encap_disable(gtp); free_percpu(dev->tstats); }
@@ -647,9 +644,6 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev, struct gtp_net *gn; int hashsize, err;
- if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1]) - return -EINVAL; - gtp = netdev_priv(dev);
if (!data[IFLA_GTP_PDP_HASHSIZE]) @@ -689,8 +683,11 @@ static void gtp_dellink(struct net_device *dev, struct list_head *head) { struct gtp_dev *gtp = netdev_priv(dev);
- gtp_encap_disable(gtp); gtp_hashtable_free(gtp); + if (gtp->sk0) + sock_put(gtp->sk0); + if (gtp->sk1u) + sock_put(gtp->sk1u); list_del_rcu(>p->list); unregister_netdevice_queue(dev, head); } @@ -1008,9 +1005,10 @@ static void pdp_context_delete(struct pdp_ctx *pctx)
static int gtp_genl_new_pdp(struct sk_buff *skb, struct genl_info *info) { + struct socket *sock = NULL; + struct sock *sk = NULL; unsigned int version; struct gtp_dev *gtp; - struct sock *sk; int err;
if (!info->attrs[GTPA_VERSION] || @@ -1045,12 +1043,14 @@ static int gtp_genl_new_pdp(struct sk_buff *skb, struct genl_info *info) goto out_unlock; }
- if (version == GTP_V0) + if (info->attrs[GTPA_FD]) { + sock = sockfd_lookup(nla_get_u32(info->attrs[GTPA_FD]), &err); + if (sock) + sk = sock->sk; + } else if (version == GTP_V0) sk = gtp->sk0; else if (version == GTP_V1) sk = gtp->sk1u; - else - sk = NULL;
if (!sk) { err = -ENODEV; @@ -1059,6 +1059,9 @@ static int gtp_genl_new_pdp(struct sk_buff *skb, struct genl_info *info)
err = ipv4_pdp_add(gtp, sk, info);
+ if (sock) + sockfd_put(sock); + out_unlock: rcu_read_unlock(); return err; @@ -1079,12 +1082,66 @@ static struct pdp_ctx *gtp_find_pdp_by_link(struct net *net, return ipv4_pdp_find(gtp, nla_get_be32(nla[GTPA_MS_ADDRESS])); }
+static struct pdp_ctx *gtp_genl_find_pdp_by_socket(struct net *net, + struct nlattr *nla[]) +{ + struct socket *sock; + struct gtp_sock *gsk; + struct pdp_ctx *pctx; + int fd, err = 0; + + if (!nla[GTPA_FD]) + return ERR_PTR(-EINVAL); + + fd = nla_get_u32(nla[GTPA_FD]); + sock = sockfd_lookup(fd, &err); + if (!sock) { + pr_debug("gtp socket fd=%d not found\n", fd); + return ERR_PTR(-EBADF); + } + + gsk = rcu_dereference_sk_user_data(sock->sk); + if (!gsk) { + pctx = ERR_PTR(-EINVAL); + goto out_sock; + } + + switch (nla_get_u32(nla[GTPA_VERSION])) { + case GTP_V0: + if (!nla[GTPA_TID]) { + pctx = ERR_PTR(-EINVAL); + break; + } + pctx = gtp0_pdp_find(gsk, nla_get_u64(nla[GTPA_TID])); + break; + + case GTP_V1: + if (!nla[GTPA_I_TEI]) { + pctx = ERR_PTR(-EINVAL); + break; + } + pctx = gtp1_pdp_find(gsk, nla_get_u64(nla[GTPA_I_TEI])); + break; + + default: + pctx = ERR_PTR(-EINVAL); + break; + } + +out_sock: + sockfd_put(sock); + return pctx; +} + static struct pdp_ctx *gtp_find_pdp(struct net *net, struct nlattr *nla[]) { struct pdp_ctx *pctx;
if (nla[GTPA_LINK]) pctx = gtp_find_pdp_by_link(net, nla); + else if (nla[GTPA_FD]) + pctx = gtp_genl_find_pdp_by_socket(net, nla); + else pctx = ERR_PTR(-EINVAL);
* clarify specification references for v0/v1 * add section "APN vs. Network device" * add section "Local GTP-U entity and tunnel identification"
Signed-off-by: Andreas Schultz aschultz@tpip.net Signed-off-by: Harald Welte laforge@gnumonks.org --- Documentation/networking/gtp.txt | 103 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 99 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/gtp.txt b/Documentation/networking/gtp.txt index 93e9675..0d9c18f 100644 --- a/Documentation/networking/gtp.txt +++ b/Documentation/networking/gtp.txt @@ -1,6 +1,7 @@ The Linux kernel GTP tunneling module ====================================================================== -Documentation by Harald Welte laforge@gnumonks.org +Documentation by Harald Welte laforge@gnumonks.org and + Andreas Schultz aschultz@tpip.net
In 'drivers/net/gtp.c' you are finding a kernel-level implementation of a GTP tunnel endpoint. @@ -91,9 +92,13 @@ http://git.osmocom.org/libgtpnl/
== Protocol Versions ==
-There are two different versions of GTP-U: v0 and v1. Both are -implemented in the Kernel GTP module. Version 0 is a legacy version, -and deprecated from recent 3GPP specifications. +There are two different versions of GTP-U: v0 [GSM TS 09.60] and v1 +[3GPP TS 29.281]. Both are implemented in the Kernel GTP module. +Version 0 is a legacy version, and deprecated from recent 3GPP +specifications. + +GTP-U uses UDP for transporting PDUs. The receiving UDP port is 2151 +for GTPv1-U and 3386 for GTPv0-U.
There are three versions of GTP-C: v0, v1, and v2. As the kernel doesn't implement GTP-C, we don't have to worry about this. It's the @@ -133,3 +138,93 @@ doe to a lack of user interest, it never got merged. In 2015, Andreas Schultz came to the rescue and fixed lots more bugs, extended it with new features and finally pushed all of us to get it mainline, where it was merged in 4.7.0. + +== Architectural Details == + +=== Local GTP-U entity and tunnel identification === + +GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152 +for GTPv1-U and 3386 for GTPv0-U. + +There is only one GTP-U entity (and therefor SGSN/GGSN/S-GW/PDN-GW +instance) per IP address. Tunnel Endpoint Identifier (TEID) are unique +per GTP-U entity. + +A specific tunnel is only defined by the destination entity. Since the +destination port is constant, only the destination IP and TEID define +a tunnel. The source IP and Port have no meaning for the tunnel. + +Therefore: + + * when sending, the remote entity is defined by the remote IP and + the tunnel endpoint id. The source IP and port have no meaning and + can be changed at any time. + + * when receiving the local entity is defined by the local + destination IP and the tunnel endpoint id. The source IP and port + have no meaning and can change at any time. + +[3GPP TS 29.281] Section 4.3.0 defines this so: + +> The TEID in the GTP-U header is used to de-multiplex traffic +> incoming from remote tunnel endpoints so that it is delivered to the +> User plane entities in a way that allows multiplexing of different +> users, different packet protocols and different QoS levels. +> Therefore no two remote GTP-U endpoints shall send traffic to a +> GTP-U protocol entity using the same TEID value except +> for data forwarding as part of mobility procedures. + +The definition above only defines that two remote GTP-U endpoints +*should not* send to the same TEID, it *does not* forbid or exclude +such a scenario. In fact, the mentioned mobility procedures make it +necessary that the GTP-U entity accepts traffic for TEIDs from +multiple or unknown peers. + +Therefore, the receiving side identifies tunnels exclusively based on +TEIDs, not based on the source IP! + +== APN vs. Network Device == + +The GTP-U driver creates a Linux network device for each Gi/SGi +interface. + +[3GPP TS 29.281] calls the Gi/SGi reference point an interface. This +may lead to the impression that the GGSN/P-GW can have only one such +interface. + +Correct is that the Gi/SGi reference point defines the interworking +between +the 3GPP packet domain (PDN) based on GTP-U tunnel and IP +based networks. + +There is no provision in any of the 3GPP documents that limits the +number of Gi/SGi interfaces implemented by a GGSN/P-GW. + +[3GPP TS 29.061] Section 11.3 makes it clear that the selection of a +specific Gi/SGi interfaces is made through the Access Point Name +(APN): + +> 2. each private network manages its own addressing. In general this +> will result in different private networks having overlapping +> address ranges. A logically separate connection (e.g. an IP in IP +> tunnel or layer 2 virtual circuit) is used between the GGSN/P-GW +> and each private network. +> +> In this case the IP address alone is not necessarily unique. The +> pair of values, Access Point Name (APN) and IPv4 address and/or +> IPv6 prefixes, is unique. + +In order to support the overlapping address range use case, each APN +is mapped to a separate Gi/SGi interface (network device). + +NOTE: The Access Point Name is purely a control plane (GTP-C) concept. +At the GTP-U level, only Tunnel Endpoint Identifiers are present in +GTP-U packets and network devices are known + +Therefore for a given UE the mapping in IP to PDN network is: + * network device + MS IP -> Peer IP + Peer TEID, + +and from PDN to IP network: + * local GTP-U IP + TEID -> network device + +Furthermore, before a received T-PDU is injected into the network +device the MS IP is checked against the IP recorded in PDP context.
On Tue, Mar 14, 2017 at 12:25:44PM +0100, Andreas Schultz wrote: [...]
API impact:
This is probably the most problematic part of this series...
The removeal of the TEID form the netdevice also means that the gtp genl API for retriving tunnel information and removing tunnels needs to be adjusted.
Before this change it was possible to change a GTP tunnel using the gtp netdevice id and the teid. The teid is no longer unique per gtp netdevice. After this change it has to be either the netdevice and MS IP or the GTP socket and teid.
Then we have to introduce some explicit VRF concept or such to sort out this.
It is definitely not acceptable to break the existing API.
----- On Mar 14, 2017, at 12:45 PM, pablo pablo@netfilter.org wrote:
On Tue, Mar 14, 2017 at 12:25:44PM +0100, Andreas Schultz wrote: [...]
API impact:
This is probably the most problematic part of this series...
The removeal of the TEID form the netdevice also means that the gtp genl API for retriving tunnel information and removing tunnels needs to be adjusted.
Before this change it was possible to change a GTP tunnel using the gtp netdevice id and the teid. The teid is no longer unique per gtp netdevice. After this change it has to be either the netdevice and MS IP or the GTP socket and teid.
Then we have to introduce some explicit VRF concept or such to sort out this.
It is definitely not acceptable to break the existing API.
The specific use case of the API that is no longer supported was never used by anyone. The only supported and documented API for the GTP module is libgtpnl. libgtpnl has always required the now mandatory fields. Therefor the externally supported API does not change.
Regards Andreas
On Tue, Mar 14, 2017 at 01:42:44PM +0100, Andreas Schultz wrote:
It is definitely not acceptable to break the existing API.
The specific use case of the API that is no longer supported was never used by anyone. [...]
Yes, this was used openggsn and I tested this with a full blown FOSS setup. Yes, a toy thing compared to the proprietary equipment you deal with, but we always started with things like this.
The only supported and documented API for the GTP module is libgtpnl.
No, the netlink interface itself if the API.
Stopping trying to find a reason to break API, that is a no-go.
Hi Andreas,
On Tue, Mar 14, 2017 at 02:42:16PM +0100, Pablo Neira Ayuso wrote:
On Tue, Mar 14, 2017 at 01:42:44PM +0100, Andreas Schultz wrote:
The only supported and documented API for the GTP module is libgtpnl.
No, the netlink interface itself if the API.
Stopping trying to find a reason to break API, that is a no-go.
As much as one might dislike it as a developer in this particular case, the Linux kernel has the very well communicated rule: All userspace visible interfaces must not change in an incompatible way. This includes of course all the syscalls, the ioctl() parameters but also the netlink interfaces of the networking stack.
The statement "nobody ever used it" is a statement you can never make in FOSS software, as you don't know of 99.9999999% of all the users of your software. The fact that none of the FOSS projects that any of us was involved in may not have used a certain feature doesn't mean nobody else has been using it privately, quietly. Keep in mind that several Linux distributions have already been shipping the gtp module as part of their stable releases meanwhile.
Also, no matter what Pablo or I may think about, there are general rules about how Linux kernel development is done (from coding style to merge windows, and also userspace compatibility), and we all have to obey them. There's little point in discussing about them, we all just have to live with them.
Regards, Harald
From: Andreas Schultz aschultz@tpip.net Date: Tue, 14 Mar 2017 13:42:44 +0100 (CET)
The specific use case of the API that is no longer supported was never used by anyone. The only supported and documented API for the GTP module is libgtpnl. libgtpnl has always required the now mandatory fields. Therefor the externally supported API does not change.
That's not how kernel development works, sorry.
Any user visible interface the kernel exports is not to be broken, even if you think you control the one library that makes use of it.
This is especially important for netlink because netlink is more like a networking protocol, that arbitrary programs can listen to for events.
osmocom-net-gprs@lists.osmocom.org