Hi all,
During the Chaos Communication Camp 2019 (an international hacker camp with about 5500 participants) last week, there is a tradition to operate Osmocom based 2G and more recently also 3G networks.
This time I operated a nextepc based 4G/LTE network next to the camp 2G/3G networks. In order to share one subscriber database, I have implemented osmo_dia2gsup, which can translate the S6a/S6d diameter into Osmocom GSUP protocol, so nextepc can be used without nextepc-hssd but with osmo-hlr instead.
The network was operating six Ericsson RBS6402 in Band 7 (2600 MHz).
Some more details can be found at
Regarding the nextepc side:
* 2439 uniqua IMSIs were seen ** 147 unique IMSIs of CCC SIM cards (26242) ** 2292 non-CCC IMSIs ** 75 unique MCC-MNC tuples ** 34 unique MCCs ** The usual suspects (Europe + North America), but also... *** Malaysia, Indonesia, Australia, New Zealand, South Africa * 560 Attach accept (CCC SIM cards) * 46590 Attach reject (commercial operator SIM cards) * 629 PDN context (APN) activations * 235 handovers between cells (X2) * 64 crashes + restarts of nextepc-mme * 9 crashes + restarts of nextepc-pgw * 0 crashes + restarts of nextepc-sgw * 10 crashes + restarts of nextepc-pcrf
In general, it worked quite nicely, and I have to congratulate Sukchan on his work at nextepc.
I investigated some of the crashes, reported them to the issue tracker and attempted to fix some of them on-site. The actual codebase that was running can be found at https://github.com/laf0rge/nextepc/commits/laforge/cccamp19
From my experience with operating such a "large" nextepc network for the first
time, I have the following overall feedback, which basically boils down to three major areas:
== the use of assert() ==
ASSERT should never be triggered by anything that is received from another network entity. So if a eNB sends an unknown S1AP-ID, or if a SGW sends an unknown TEID, or if the NAS MAC validation fails, or a EMM message cannot be decoded - all of those must be handled gracefully without terminating the program. This 'fail fast' way of programming can be done when writing code in C++ (exceptions that are caught) or in erlang (one process per message, crashing that one doesn't bring the entire MME down).
I've tried my best to review all ogs_assert() in the MME and came up with the following patch: https://github.com/laf0rge/nextepc/commit/3b528af8fd51c85769123338eb57a4635c... which requires https://github.com/laf0rge/ogslib/commit/dc36ccbb080038306666931bdc97f6204fd... which introduces ogs_expect() and ogs_expect_or_return() macros that can be used in many places instead of ogs_assert().
It would also be possible to use this kind of 'fail fast' approach in C programs, but then one would have to use longjmp() from the 'assert', and you would have to use some kind of hierarchical memory allocator so that in the 'exception handler' you can release any dynamic allocations that were made before.
== the lack of introspection ==
When you operate a network, it is vital to have some visibility. For the MME you want to inspect how many subscribers are currently attached, where they are attached (TAC), whether they currently have an UE Context (and at which eNB), which TMSI/GUTI was allocated, etc.
Likewise, for both SGW and PGW you want to see which PDN contexts exist, from which peer IP adresses, which APN was used, what IP addresses have been allocated, etc.
In the Osmocom world, we implement this introspection in two ways: * by means of the VTY interface (for the human user) * by means of the CTRL interface (for other programs)
If I hadn't been busy with debugging various other issues, I would have actually attempted to add a basic VTY interface to nextepc-mmed.
For sure there may be better ways to expose this state (ideally with the same piece of code providing access to both human users as well as external programs), but I'm not aware of any nice C language implementation in FOSS that one could use right away.
== logging without context ==
When looking at log file output, it is very important that this log file output always carry sufficient context. IF there are many subscribers acting in parallel, you need to know which subscriber / pdn context / ... a given log message relates to, otherwise the log message is rather useless.
For example, if you get [mme] DEBUG: [MME] Authentication-Information-Answer (mme-fd-path.c:211) then even at DEBUG level you have no indication what so ever for which particular subscriber this AIA was received. I would normally expect that the UE is resolved from the DIAMETER session-id, and then the UEs identity (IMSI) can be printed.
I also find it suboptimal that log lines often span multiple lines, which means you cannot simply 'grep' for something, as you always need to check some lines before and/or after. But I guess conrary to the lack of context, this is a matter of teste and one can have different opinions about it.
I'll try to contribute as much as I can regarding bug fixes and enhancements. Thanks again for all the great work so far!
Regards, Harald
Hi all,
On Tue, Aug 27, 2019 at 12:59:52PM +0200, Harald Welte wrote:
Some more details can be found at
I forgot to insert the link here: http://people.osmocom.org/laforge/slides/cccamp2019-how_the_camp_lte_works.p...
Regards, Harald
Hi Harald,
I'm really surprising that nextepc can support such a large people. I'd like to share my though about three good point.
1. ogs_expect() I think that ogs_expect() is really good starting point to remove unnecessary ogs_assert(). So, I added my version in ogslib() repository as below.
#define ogs_expect(expr, fallback) \ do { \ if (ogs_likely(expr)) ; \ else { \ ogs_error("%s: Expectation `%s' failed.", OGS_FUNC, #expr); \ fallback; \ } \ } while (0)
Let me know if the above interface has a problem.
2. VTY/CTRL I'm happy if nextepc can have such a good VTY/CTL. Indeed, this is actually needed if someone would like to test nextepc on a large scale.
3. Logging I agree that IMSI, APN context is needed when logging. So, I think mme_log()/sgw_log()/... needs to be introduced. And also, I saw a fix of the freeDiameter logging. How about merging it to master branch. Please github pull request if you agree.
Much appreciated about your super work!
Best Regards, Sukchan
On Tue, Aug 27, 2019 at 8:00 PM Harald Welte laforge@gnumonks.org wrote:
Hi all,
During the Chaos Communication Camp 2019 (an international hacker camp with about 5500 participants) last week, there is a tradition to operate Osmocom based 2G and more recently also 3G networks.
This time I operated a nextepc based 4G/LTE network next to the camp 2G/3G networks. In order to share one subscriber database, I have implemented osmo_dia2gsup, which can translate the S6a/S6d diameter into Osmocom GSUP protocol, so nextepc can be used without nextepc-hssd but with osmo-hlr instead.
The network was operating six Ericsson RBS6402 in Band 7 (2600 MHz).
Some more details can be found at
Regarding the nextepc side:
- 2439 uniqua IMSIs were seen
** 147 unique IMSIs of CCC SIM cards (26242) ** 2292 non-CCC IMSIs ** 75 unique MCC-MNC tuples ** 34 unique MCCs ** The usual suspects (Europe + North America), but also... *** Malaysia, Indonesia, Australia, New Zealand, South Africa
- 560 Attach accept (CCC SIM cards)
- 46590 Attach reject (commercial operator SIM cards)
- 629 PDN context (APN) activations
- 235 handovers between cells (X2)
- 64 crashes + restarts of nextepc-mme
- 9 crashes + restarts of nextepc-pgw
- 0 crashes + restarts of nextepc-sgw
- 10 crashes + restarts of nextepc-pcrf
In general, it worked quite nicely, and I have to congratulate Sukchan on his work at nextepc.
I investigated some of the crashes, reported them to the issue tracker and attempted to fix some of them on-site. The actual codebase that was running can be found at https://github.com/laf0rge/nextepc/commits/laforge/cccamp19
From my experience with operating such a "large" nextepc network for the first time, I have the following overall feedback, which basically boils down to three major areas:
== the use of assert() ==
ASSERT should never be triggered by anything that is received from another network entity. So if a eNB sends an unknown S1AP-ID, or if a SGW sends an unknown TEID, or if the NAS MAC validation fails, or a EMM message cannot be decoded - all of those must be handled gracefully without terminating the program. This 'fail fast' way of programming can be done when writing code in C++ (exceptions that are caught) or in erlang (one process per message, crashing that one doesn't bring the entire MME down).
I've tried my best to review all ogs_assert() in the MME and came up with the following patch:
https://github.com/laf0rge/nextepc/commit/3b528af8fd51c85769123338eb57a4635c... which requires
https://github.com/laf0rge/ogslib/commit/dc36ccbb080038306666931bdc97f6204fd... which introduces ogs_expect() and ogs_expect_or_return() macros that can be used in many places instead of ogs_assert().
It would also be possible to use this kind of 'fail fast' approach in C programs, but then one would have to use longjmp() from the 'assert', and you would have to use some kind of hierarchical memory allocator so that in the 'exception handler' you can release any dynamic allocations that were made before.
== the lack of introspection ==
When you operate a network, it is vital to have some visibility. For the MME you want to inspect how many subscribers are currently attached, where they are attached (TAC), whether they currently have an UE Context (and at which eNB), which TMSI/GUTI was allocated, etc.
Likewise, for both SGW and PGW you want to see which PDN contexts exist, from which peer IP adresses, which APN was used, what IP addresses have been allocated, etc.
In the Osmocom world, we implement this introspection in two ways:
- by means of the VTY interface (for the human user)
- by means of the CTRL interface (for other programs)
If I hadn't been busy with debugging various other issues, I would have actually attempted to add a basic VTY interface to nextepc-mmed.
For sure there may be better ways to expose this state (ideally with the same piece of code providing access to both human users as well as external programs), but I'm not aware of any nice C language implementation in FOSS that one could use right away.
== logging without context ==
When looking at log file output, it is very important that this log file output always carry sufficient context. IF there are many subscribers acting in parallel, you need to know which subscriber / pdn context / ... a given log message relates to, otherwise the log message is rather useless.
For example, if you get [mme] DEBUG: [MME] Authentication-Information-Answer (mme-fd-path.c:211) then even at DEBUG level you have no indication what so ever for which particular subscriber this AIA was received. I would normally expect that the UE is resolved from the DIAMETER session-id, and then the UEs identity (IMSI) can be printed.
I also find it suboptimal that log lines often span multiple lines, which means you cannot simply 'grep' for something, as you always need to check some lines before and/or after. But I guess conrary to the lack of context, this is a matter of teste and one can have different opinions about it.
I'll try to contribute as much as I can regarding bug fixes and enhancements. Thanks again for all the great work so far!
Regards, Harald --
- Harald Welte laforge@gnumonks.org
============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Hi Sukchan,
On Tue, Aug 27, 2019 at 11:20:37PM +0900, Sukchan Lee wrote:
I'm really surprising that nextepc can support such a large people.
What I'm personally quite happy is that I couldn't see any clear memory leaks. Knowing from experience, this is one of the hardest tasks in development of complex C-language software.
- ogs_expect()
I think that ogs_expect() is really good starting point to remove unnecessary ogs_assert(). So, I added my version in ogslib() repository as below.
#define ogs_expect(expr, fallback) \ do { \ if (ogs_likely(expr)) ; \ else { \ ogs_error("%s: Expectation `%s' failed.", OGS_FUNC, #expr); \ fallback; \ } \ } while (0)
Let me know if the above interface has a problem.
I don't see a problem, other than the code looking - from my point of view "ugly". having return statements inside a macro argument is highly unusual, AFAICT.
Also, in your version, how would a "print a message but do nothing else" look like? I think it would be "ogs_expect(ret == OGS_OK, );" where the empty second argument looks completely unlike C code at all...
I understand your motivation and I was also thinking about something like this originally, but I decided against it. At least at the moment most of your related functions return 'void' anyway, so my ogs_expect_or_return() worked fine.
I think as soon as you want to do something else but either nothing or return, then you need to introduce a proper if-clause with error handling. That's why I introduced only those two versions and not a flexible variant.
- VTY/CTRL
I'm happy if nextepc can have such a good VTY/CTL. Indeed, this is actually needed if someone would like to test nextepc on a large scale.
As indicated, the question is whether I should simply start adding Osmocom VTY using libosmocore, or if we should spend some more time looking for alternatives or even designing + writing some new, better system and then introduce this to nextepc. I'm a bit undecided here at this point. Take the quick route or the long path...
- Logging
I agree that IMSI, APN context is needed when logging. So, I think mme_log()/sgw_log()/... needs to be introduced. And also, I saw a fix of the freeDiameter logging. How about merging it to master branch. Please github pull request if you agree.
It's probably even more like a mme_ue_log() or something like that, because you may have different objects/structs which provide loggign context.
In Osmocom we have introduced log macros like LOG_MSC_A(), LOG_MNCC_CALL(), lOG_MSUB() where the first argument is typically the 'object' providing context. Those macros then expand to a call to the generic logging macro/function.
See inline!
On Tue, Aug 27, 2019 at 11:40 PM Harald Welte laforge@gnumonks.org wrote:
Hi Sukchan,
On Tue, Aug 27, 2019 at 11:20:37PM +0900, Sukchan Lee wrote:
I'm really surprising that nextepc can support such a large people.
What I'm personally quite happy is that I couldn't see any clear memory leaks. Knowing from experience, this is one of the hardest tasks in development of complex C-language software.
- ogs_expect()
I think that ogs_expect() is really good starting point to remove unnecessary ogs_assert(). So, I added my version in ogslib() repository as below.
#define ogs_expect(expr, fallback) \ do { \ if (ogs_likely(expr)) ; \ else { \ ogs_error("%s: Expectation `%s' failed.", OGS_FUNC, #expr); \ fallback; \ } \ } while (0)
Let me know if the above interface has a problem.
I don't see a problem, other than the code looking - from my point of view "ugly". having return statements inside a macro argument is highly unusual, AFAICT.
Also, in your version, how would a "print a message but do nothing else" look like? I think it would be "ogs_expect(ret == OGS_OK, );" where the empty second argument looks completely unlike C code at all...
I understand your motivation and I was also thinking about something like this originally, but I decided against it. At least at the moment most of your related functions return 'void' anyway, so my ogs_expect_or_return() worked fine.
I think as soon as you want to do something else but either nothing or return, then you need to introduce a proper if-clause with error handling. That's why I introduced only those two versions and not a flexible variant.
Ah! Now, I understand your intention. So I did rollback ogs_expect_or_return(). Thank you for introducing a good interface.
- VTY/CTRL
I'm happy if nextepc can have such a good VTY/CTL. Indeed, this is
actually
needed if someone would like to test nextepc on a large scale.
As indicated, the question is whether I should simply start adding Osmocom VTY using libosmocore, or if we should spend some more time looking for alternatives or even designing + writing some new, better system and then introduce this to nextepc. I'm a bit undecided here at this point. Take the quick route or the long path...
Basically, I planned to implement HTTP/JSON for this part(e.g. GET /ue, POST /logging). And also, I'd like to monitor the status in the WebUI. However, to do so, we need to create one process to handle it. IMHO, it will take a long to implement it. I also don't know which one is better for us.
- Logging
I agree that IMSI, APN context is needed when logging. So, I think mme_log()/sgw_log()/... needs to be introduced. And also, I saw a fix of the freeDiameter logging. How about merging it to master branch. Please github pull request if you agree.
It's probably even more like a mme_ue_log() or something like that, because you may have different objects/structs which provide loggign context.
In Osmocom we have introduced log macros like LOG_MSC_A(), LOG_MNCC_CALL(), lOG_MSUB() where the first argument is typically the 'object' providing context. Those macros then expand to a call to the generic logging macro/function.
I saw your work! LOG macro with context adds the prefix logging. I will refer your work when I improve a logging.
Thanks Harald for the detailed writeup and the work to get a large NextEPC network up and running.
Regarding point 2, I think it might be good to consider starting with only a programmatic interface (CTRL or some kind of RPC framework possibly?) integrated directly into the EPC code, and then providing the end UI to users as a client of the programmatic interface. This would allow separating out the client UI code from the core EPC components more easily, and make it more scalable to add additional interfaces as needed (I.E. a web gui and a telnet cli existing in parallel, with access to the same underlying functionality).
-Matt J.
On 8/27/19 10:14 AM, Sukchan Lee wrote:
See inline!
On Tue, Aug 27, 2019 at 11:40 PM Harald Welte <laforge@gnumonks.org mailto:laforge@gnumonks.org> wrote:
Hi Sukchan, On Tue, Aug 27, 2019 at 11:20:37PM +0900, Sukchan Lee wrote: > I'm really surprising that nextepc can support such a large people. What I'm personally quite happy is that I couldn't see any clear memory leaks. Knowing from experience, this is one of the hardest tasks in development of complex C-language software. > 1. ogs_expect() > I think that ogs_expect() is really good starting point to remove > unnecessary ogs_assert(). > So, I added my version in ogslib() repository as below. > > #define ogs_expect(expr, fallback) \ > do { \ > if (ogs_likely(expr)) ; \ > else { \ > ogs_error("%s: Expectation `%s' failed.", OGS_FUNC, #expr); \ > fallback; \ > } \ > } while (0) > > Let me know if the above interface has a problem. I don't see a problem, other than the code looking - from my point of view "ugly". having return statements inside a macro argument is highly unusual, AFAICT. Also, in your version, how would a "print a message but do nothing else" look like? I think it would be "ogs_expect(ret == OGS_OK, );" where the empty second argument looks completely unlike C code at all... I understand your motivation and I was also thinking about something like this originally, but I decided against it. At least at the moment most of your related functions return 'void' anyway, so my ogs_expect_or_return() worked fine. I think as soon as you want to do something else but either nothing or return, then you need to introduce a proper if-clause with error handling. That's why I introduced only those two versions and not a flexible variant.
Ah! Now, I understand your intention. So I did rollback ogs_expect_or_return(). Thank you for introducing a good interface.
> 2. VTY/CTRL > I'm happy if nextepc can have such a good VTY/CTL. Indeed, this is actually > needed if someone would like to test nextepc on a large scale. As indicated, the question is whether I should simply start adding Osmocom VTY using libosmocore, or if we should spend some more time looking for alternatives or even designing + writing some new, better system and then introduce this to nextepc. I'm a bit undecided here at this point. Take the quick route or the long path...
Basically, I planned to implement HTTP/JSON for this part(e.g. GET /ue, POST /logging). And also, I'd like to monitor the status in the WebUI. However, to do so, we need to create one process to handle it. IMHO, it will take a long to implement it. I also don't know which one is better for us.
> 3. Logging > I agree that IMSI, APN context is needed when logging. So, I think > mme_log()/sgw_log()/... needs to be introduced. And also, I saw a fix of > the freeDiameter logging. How about merging it to master branch. Please > github pull request if you agree. It's probably even more like a mme_ue_log() or something like that, because you may have different objects/structs which provide loggign context. In Osmocom we have introduced log macros like LOG_MSC_A(), LOG_MNCC_CALL(), lOG_MSUB() where the first argument is typically the 'object' providing context. Those macros then expand to a call to the generic logging macro/function.
I saw your work! LOG macro with context adds the prefix logging. I will refer your work when I improve a logging.
-- - Harald Welte <laforge@gnumonks.org <mailto:laforge@gnumonks.org>> http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)