more hnbgw counter implementation details:
I just realized that there is a race condition when we have two separate threads for adding/removing nft rules and for retrieving counters:
When adding a new rule, the counters will read back starting from zero. The rate_ctrs in hnbgw should persistently grow across reconnects, so we need to cache the last seen counter value and accumulate the difference.
- This cache needs to be reset to zero at the time a new rule is set up. (hnbgw cache needs to tightly correllate to the nftables state.)
- The counters being retrieved in a separate thread are then ambiguous: they could reflect either a count of the new rule, or of a previously existing rule.
Example of rapid disconnect and re-attach, which would add 12345 ghost counts that need to be avoided:
main() nft-maintenance-thread |* | |--add-hnb----->| |<-done---------| [prev_ctr=0] | | ...lots of traffic... | [prev_ctr=12345] | counters-thread | | |-----get-counters-------------------------->| | [work] | | |X [reading: ctr=12345] |--del-hnb----->| | |<-done---------| | correct: | | ctr - prev_ctr = 0 |* | (no additional counts) |--add-hnb----->| | |<-done---------| | [prev_ctr=0] | | | |<----ctr=12345------------------------------| [update counter state] [add ctr - prev_ctr = 12345] incorrect! [prev_ctr=12345] | | ... | |-----get-counters-------------------------->| |<----ctr=0----------------------------------| [update counter state] [add ctr - prev_ctr = -12345] rate_ctr must not go backwards!
The thread reading the counters has no way of telling which reference state nft is using for its report. AFAICT this can only be fixed by doing hnb rule maintenance strictly sequentially to retrieving counters.
So I am going to reduce the implementation to a single nft thread that does both hnb maintenance as well as counter retrieval.
Agree?
~N