more hnbgw counter implementation details:
I just realized that there is a race condition when we have two separate
threads for adding/removing nft rules and for retrieving counters:
When adding a new rule, the counters will read back starting from zero. The
rate_ctrs in hnbgw should persistently grow across reconnects, so we need to
cache the last seen counter value and accumulate the difference.
- This cache needs to be reset to zero at the time a new rule is set up.
(hnbgw cache needs to tightly correllate to the nftables state.)
- The counters being retrieved in a separate thread are then ambiguous: they
could reflect either a count of the new rule, or of a previously existing
rule.
Example of rapid disconnect and re-attach,
which would add 12345 ghost counts that need to be avoided:
main() nft-maintenance-thread
|* |
|--add-hnb----->|
|<-done---------|
[prev_ctr=0]
|
| ...lots of traffic...
|
[prev_ctr=12345]
| counters-thread
| |
|-----get-counters-------------------------->|
| [work]
| |
|X [reading: ctr=12345]
|--del-hnb----->| |
|<-done---------| | correct:
| | ctr - prev_ctr = 0
|* | (no additional counts)
|--add-hnb----->| |
|<-done---------| |
[prev_ctr=0] |
| |
|<----ctr=12345------------------------------|
[update counter state]
[add ctr - prev_ctr = 12345] incorrect!
[prev_ctr=12345]
|
| ...
|
|-----get-counters-------------------------->|
|<----ctr=0----------------------------------|
[update counter state]
[add ctr - prev_ctr = -12345] rate_ctr must not go backwards!
The thread reading the counters has no way of telling which reference state nft
is using for its report. AFAICT this can only be fixed by doing hnb rule
maintenance strictly sequentially to retrieving counters.
So I am going to reduce the implementation to a single nft thread that does
both hnb maintenance as well as counter retrieval.
Agree?
~N