So, we've been discussing IRL about the osmo-hnbgw implementation of interacting with nftables from a separate thread.
The idea now is that a separate thread gets the results from nftables, caches them and enqueues them, so that the main thread can update the hnbgw counters in-sync. A second separate thread does nft add/del rule maintenance.
There is another complication, and I want to make sure we agree on the solution: Problem: to delete a rule, I need to retrieve its 'handle' first. With the amount of threading we want, this becomes a bit complex.
So. We 'add rule' and 'delete rule', as hNodeB attach and detach. But, to be able to 'delete rule', we need to retrieve an id that nftables assigned to this rule when it was added, the 'handle':
add rule inet osmo-hnbgw gtpu-ul ip saddr 1.2.3.4 counter comment "ul:1-2-3-4";
...
list table inet osmo-hnbgw --> table inet osmo-hnbgw { chain gtpu-ul { ip saddr 1.2.3.4 counter packets 5 bytes 100 comment "ul:1-2-3-4" # handle 23; } ^^^^^^^^^ }
...
delete rule inet osmo-hnbgw gtpu-ul handle 23; ^^^^^^^^^
AFAICT I cannot set a handle right from the start. I can also not delete a rule by just stating it again as it is, nor by using the 'comment' as a handle. nft wants a 'handle', period. Does anyone know better?
Every time we read counters, osmo-hnbgw also retrieves the handles for each rule, and correlates via the 'comment'. Similarly to counters, we can only get all handles at once, via a complete table dump (possibly inefficient).
It works out fine in most practical cases. But the complex corner case is when an hNodeB detaches quickly, before the counters had a chance to run and update the hNodeB's state, including the handle.
In the current agreement on our implementation with two threads for nft, with queues back to the main thread, it would go something like this, to avoid races:
main() nft_maintenance_thread() nft_read_counters_thread() | | | |---add hNodeB-------->| | |<--ok-----------------| | X | | |---del hNodeB-------->| | | [rule handle unknown!] | ..|<--EAGAIN-------------| | . | | | . |---do counters--------------------------------->| . | | [work] . | | [work] . | | [work] . |<--update-counters-and-handles------------------| . | | | ..|---del hNodeB-------->| | | [delete rule] | |<--done---------------| |
Simplifying a bit...
main() nft_maintenance_thread() nft_read_counters_thread() | | | |---add hNodeB-------->| | |<--ok-----------------| | X | | [want: del hNodeB] | | [rule handle unknown!] | | [set some "del" flag] | | . | | | . |---do counters--------------------------------->| . | | [work] . | | [work] . | | [work] . |<--update-counters-and-handles------------------| . | | | [del flag!] | | |---del hNodeB-------->| | | [delete rule] | |<--done---------------| |
I'm not liking very much the amount of complexity spawning here, but this last approach is ok I guess, do you agree?
i.e. when the handle is not set yet, set a flag on the hNodeB, and trigger the 'delete' the next time the counter thread reports back a handle for the hNodeB.
~N
more hnbgw counter implementation details:
I just realized that there is a race condition when we have two separate threads for adding/removing nft rules and for retrieving counters:
When adding a new rule, the counters will read back starting from zero. The rate_ctrs in hnbgw should persistently grow across reconnects, so we need to cache the last seen counter value and accumulate the difference.
- This cache needs to be reset to zero at the time a new rule is set up. (hnbgw cache needs to tightly correllate to the nftables state.)
- The counters being retrieved in a separate thread are then ambiguous: they could reflect either a count of the new rule, or of a previously existing rule.
Example of rapid disconnect and re-attach, which would add 12345 ghost counts that need to be avoided:
main() nft-maintenance-thread |* | |--add-hnb----->| |<-done---------| [prev_ctr=0] | | ...lots of traffic... | [prev_ctr=12345] | counters-thread | | |-----get-counters-------------------------->| | [work] | | |X [reading: ctr=12345] |--del-hnb----->| | |<-done---------| | correct: | | ctr - prev_ctr = 0 |* | (no additional counts) |--add-hnb----->| | |<-done---------| | [prev_ctr=0] | | | |<----ctr=12345------------------------------| [update counter state] [add ctr - prev_ctr = 12345] incorrect! [prev_ctr=12345] | | ... | |-----get-counters-------------------------->| |<----ctr=0----------------------------------| [update counter state] [add ctr - prev_ctr = -12345] rate_ctr must not go backwards!
The thread reading the counters has no way of telling which reference state nft is using for its report. AFAICT this can only be fixed by doing hnb rule maintenance strictly sequentially to retrieving counters.
So I am going to reduce the implementation to a single nft thread that does both hnb maintenance as well as counter retrieval.
Agree?
~N
IMHO yet another reason to give named counters a try, as we discussed. They might have other problems,but if we don't try, we never know.
On Mon, May 13, 2024 at 09:05:26AM +0200, Harald Welte wrote:
IMHO yet another reason to give named counters a try, as we discussed. They might have other problems,but if we don't try, we never know.
Is your idea to use a named counter that remains persistent even if the hNodeB disconnects, in order to avoid the race?
On Tue, May 14, 2024 at 01:20:14AM +0200, Neels Hofmeyr wrote:
On Mon, May 13, 2024 at 09:05:26AM +0200, Harald Welte wrote:
IMHO yet another reason to give named counters a try, as we discussed. They might have other problems,but if we don't try, we never know.
Is your idea to use a named counter that remains persistent even if the hNodeB disconnects, in order to avoid the race?
The idea was to have a named counter whose name doesn't get recycled, and thereby there's no chance of the race you described. A persistent one might also be an alternative, but of course only in scenarios where the hNBs are a static set. I believe we have seen cases at other customerswhere the hNB comes back with a different identity every time it connects, so one would be leaking counters if they were persistent, as in reality every reconnect a new one would be allocated and all old ones would be stale.
I believe we have seen cases at other customerswhere the hNB comes back with a different identity every time it connects, so one would be leaking counters if they were persistent, as in reality every reconnect a new one would be allocated and all old ones would be stale.
oh, i didn't expect that. Cleaning up on disconnect seems necessary then.
I did not consider this for the reason that osmo-hnbgw creates struct hnb_persistent for each cell id it sees connecting, and adds rate_ctrs for each. This was implemented only recently, so I adopted that premise.
This means we already have such a "leak" situation, rate_ctrs being added and never removed. I think I'd add a configurable idle timeout for hnb_persistent?
It does make sense to me to let the nft named counters exist for the same time that the struct hnb_persistent's cell id and rate_ctrs exist. So I would delete the named counters upon hnb_persistent_free().
There is a hnb_persistent_free() function, but it is so far only invoked from vty command 'no hnb IDENTITY_INFO' -- I'd add that idle timeout. We so far don't rate_ctr_group_free(ctrs) in hnb_persistent_free() -- I'd add it now.
Do you agree with these ideas?
~N