Hello,
On 2017-10-06 16:57, Max wrote:
On 06.10.2017 17:43, ringsignature(a)riseup.net wrote:
Is there any reason that it isn't just called
a single time on system
startup? It should never again block after that point in time.
Small clarification (more like not to myself reading this later on):
that's the case
for default urandom source (used in current patch), not the
GRND_RANDOM (discussed
elsewhere in the ML thread).
A measurement that might be worth considering is
if it blocks, ever, in
practice?
We'll sort of make that in the end: even in GRND_NONBLOCK mode the
errno is set for
the cases where it would have been blocked otherwise. It's propagated
to application
by osmo_get_rand_id() and the application will log it.
OK. That seems reasonable as a way of measuring if this is practically a
problem on any production systems.
If such a log message occurs, what would it mean to an end user? One
might read that as those random bytes are less than ideal but beyond
that, I'm not sure what it would mean in an actionable sense. Does it
mean one shouldn't use an IMSI ki generated when that log message is
emitted?
Might it also
be possible to call the
wrapper for getrandom() at library init time as well as later when the
random bytes are needed?
Sure, but it seems more complex.
If there is a library setup, a possible, but not certain, blocking call
seems reasonable for the later promise of never blocking. It could even
call the same function with a single argument change? I agree that it is
slightly more complex, though I think the trade off might be worth
considering. Though it sounds like you have considered it and your
conclusions are all very reasonable.
What do we gain by using it that way? The underlying
getrandom() call
might still
fail so we have to handle/propagate errors anyway.
The documentation explicitly claims to never fail for 1 to 256 byte
requests:
If the urandom source has been initialized, reads of up to
256 bytes will always return as many bytes as
requested and will not be interrupted by signals. No such
guarantees apply for larger buffer sizes. For
example, if the call is interrupted by a signal handler, it may
return a partially filled buffer, or fail with
the error EINTR.
What do we loose by using it that way? Seems like
nothing - see my
note about error
propagation above.
I think the failure on an embedded system is that the output is
predictable. That's the Mining Your Ps and Qs issue all over again with
GSM and related protocols. I'm curious about analysis of GSM and related
protocols, specifically how they fail when there isn't sufficient
entropy. Are there any useful survey papers on the topic that anyone
could recommend?
Isn't it more error prone to handle errors
and unfilled buffers than to
block a single time?
We don't have to deal with unfilled buffers in application code: either call to
osmo_get_rand_id() succeeds or not.
Ah - of course, in your patch for values of 1 to 256 bytes, I agree. The
suggestion of using GRND_RANDOM does produce an occasional outcome where
the buffer is less than the requested size, which does need to be
handled. I think that for simplicity, it seems wise to avoid using
GRND_RANDOM, regardless of the discussion around GRND_NONBLOCK.
Also it depends on what kind of error handling we
choose. To me single
"if (rc < 0) {
log("doh!"); exit(1); }" looks less error-prone than potentially blocking
code.
In the case of GRND_NONBLOCK, I generally agree.
Seems tricky,
though I agree that consistent
behavior might be worth the trade off. If GRND_NONBLOCK ensures that no
buffer is ever underfilled, that might be the middle ground that makes
the most sense.
GRND_NONBLOCK doesn't guarantee that, using OSMO_MAX_RAND_ID_LEN = 16
and default urandom source does.
Understood, I agree that with GRND_NONBLOCK alone, getrandom() should
both never block and never under fill a buffer of 256 bytes or less.
I think there might be some confusion so let me stress
the following
point: GRND_*
flags are independent. There are 2 flags currently available. So we have 4 flag
combinations.
GRND_RANDOM let us select randomness source.
GRND_NONBLOCK let us control blocking behavior.
So in the absence of explicit guarantees (provided by _NONBLOCK) we
can implicitly
control blocking (to block at most once) by selecting random source
and buffer size.
I prefer explicit control unless there are strong arguments against it.
I agree with the strategy of explicit control. Where we slightly diverge
is that I would simply want to ensure that the RNG was initialized. My
intuition is that on most modern systems, it will always be the case but
for some embedded systems, it seems to be tempting fate. In those
perhaps rare but perhaps serious cases, I think an initalizer function
blocking once might save a lot of cryptographic headaches later.
That is my
understanding as well. The key difference is that you were
spot on about the pool being drained very seriously - to the point of
underflowing, thus potentially encountering serious error states. Those
error states might be a 512 byte buffer with only one random byte, for
example.
There're 2 possible causes for underfilled buffer after getrandom()
according to man:
1) insufficient entropy
2) signal interrupt
Both are impossible in the current implementation under review in
https://gerrit.osmocom.org/#/c/1526/
1) Impossible because we use default urandom source
2) Impossible because we always request less than 256 bytes
Yes, I agree.
I still check for it out of paranoia.
I'm not sure if this is portable to FreeBSD with the same guarantees, so
checking seems prudent.
However, from application PoV it should not matter
anyway: if call to
some function
might fail than we should handle it.
Agreed.
There are basically 2 things we
can do after
logging the error:
- terminate the application
- fallback to insecure random numbers
So far we used the latter. If understood the summary of ongoing
discussion right,
than we should opt for former.
It is certainly better to terminate than to use insecure random numbers
in the security sensitive context, especially if any of those bytes are
used for anything that is long lived.
The only other option is the one we've discussed and perhaps isn't worth
while: initialize once in a blocking manner such that the above
termination state should never happen in theory. Of course, you'll catch
it when it happens in practice when it is caused by for some previously
unconsidered factor. It might make sense to simply state that the OS
handles this case but [0] suggests that such an approach fails badly.
It's probably worse than the papers in public as those are simply the
systems that attract open research and attention.
Essentially, I think your conclusions are clear, and reasonable. I agree
with you and I still worry that I don't have a way to measure the
(probably obscure) failure case. I think this is actually a failure of
the /dev/urandom interface - is any option unsafe? If so, why does it
exist after so many years of failures from unsafe options?
Shall I make it configurable via application
vty/config (OsmoBSC/OsmoMSC/OsmoSGSN)?
If the default is reasonably secure, it doesn't seem necessary to add a
foot cannon.
Thank you for all your work on this patch set and for discussing it in
incredible detail.
Happy Hacking,
RS
[0]
https://factorable.net/weakkeys12.extended.pdf