prng change feedback

historical

Hello,

On 2017-10-06 16:57, Max wrote:
> On 06.10.2017 17:43, ringsignature at riseup.net wrote:
>> Is there any reason that it isn't just called a single time on system
>> startup? It should never again block after that point in time.
> 
> Small clarification (more like not to myself reading this later on):
> that's the case
> for default urandom source (used in current patch), not the
> GRND_RANDOM (discussed
> elsewhere in the ML thread).
> 
>
>>  A measurement that might be worth considering is if it blocks, ever, in
>> practice?
> 
> We'll sort of make that in the end: even in GRND_NONBLOCK mode the
> errno is set for
> the cases where it would have been blocked otherwise. It's propagated
> to application
> by osmo_get_rand_id() and the application will log it.

OK. That seems reasonable as a way of measuring if this is practically a
problem on any production systems.

If such a log message occurs, what would it mean to an end user? One
might read that as those random bytes are less than ideal but beyond
that, I'm not sure what it would mean in an actionable sense. Does it
mean one shouldn't use an IMSI ki generated when that log message is
emitted?

>> Might it also be possible to call the
>> wrapper for getrandom() at library init time as well as later when the
>> random bytes are needed?
> 
> Sure, but it seems more complex.
> 

If there is a library setup, a possible, but not certain, blocking call
seems reasonable for the later promise of never blocking. It could even
call the same function with a single argument change? I agree that it is
slightly more complex, though I think the trade off might be worth
considering. Though it sounds like you have considered it and your
conclusions are all very reasonable. 

> What do we gain by using it that way? The underlying getrandom() call
> might still
> fail so we have to handle/propagate errors anyway.

The documentation explicitly claims to never fail for 1 to 256 byte
requests:

       If  the  urandom  source  has  been  initialized, reads of up to
256 bytes will always return as many bytes as
       requested and will not be interrupted by signals.  No such
guarantees apply  for  larger  buffer  sizes.   For
       example, if the call is interrupted by a signal handler, it may
return a partially filled buffer, or fail with
       the error EINTR.

> What do we loose by using it that way? Seems like nothing - see my
> note about error
> propagation above.

I think the failure on an embedded system is that the output is
predictable. That's the Mining Your Ps and Qs issue all over again with
GSM and related protocols. I'm curious about analysis of GSM and related
protocols, specifically how they fail when there isn't sufficient
entropy. Are there any useful survey papers on the topic that anyone
could recommend?

> 
>> Isn't it more error prone to handle errors and unfilled buffers than to
>> block a single time?
> 
> We don't have to deal with unfilled buffers in application code: either call to
> osmo_get_rand_id() succeeds or not.

Ah - of course, in your patch for values of 1 to 256 bytes, I agree. The
suggestion of using GRND_RANDOM does produce an occasional outcome where
the buffer is less than the requested size, which does need to be
handled. I think that for simplicity, it seems wise to avoid using
GRND_RANDOM, regardless of the discussion around GRND_NONBLOCK.

> Also it depends on what kind of error handling we choose. To me single
> "if (rc < 0) {
> log("doh!"); exit(1); }" looks less error-prone than potentially blocking code.

In the case of GRND_NONBLOCK, I generally agree.

>>  Seems tricky, though I agree that consistent
>> behavior might be worth the trade off. If GRND_NONBLOCK ensures that no
>> buffer is ever underfilled, that might be the middle ground that makes
>> the most sense.
> 
> GRND_NONBLOCK doesn't guarantee that, using OSMO_MAX_RAND_ID_LEN = 16
> and default urandom source does.

Understood, I agree that with GRND_NONBLOCK alone, getrandom() should
both never block and never under fill a buffer of 256 bytes or less.

> I think there might be some confusion so let me stress the following
> point: GRND_*
> flags are independent. There are 2 flags currently available. So we have 4 flag
> combinations.
> 
> GRND_RANDOM let us select randomness source.
> GRND_NONBLOCK let us control blocking behavior.
> 
> So in the absence of explicit guarantees (provided by _NONBLOCK) we
> can implicitly
> control blocking (to block at most once) by selecting random source
> and buffer size.
> 
> I prefer explicit control unless there are strong arguments against it.

I agree with the strategy of explicit control. Where we slightly diverge
is that I would simply want to ensure that the RNG was initialized. My
intuition is that on most modern systems, it will always be the case but
for some embedded systems, it seems to be tempting fate. In those
perhaps rare but perhaps serious cases, I think an initalizer function
blocking once might save a lot of cryptographic headaches later.

>> That is my understanding as well. The key difference is that you were
>> spot on about the pool being drained very seriously - to the point of
>> underflowing, thus potentially encountering serious error states. Those
>> error states might be a 512 byte buffer with only one random byte, for
>> example.
> 
> There're 2 possible causes for underfilled buffer after getrandom()
> according to man:
> 1) insufficient entropy
> 
> 2) signal interrupt
> 
> Both are impossible in the current implementation under review in
> https://gerrit.osmocom.org/#/c/1526/
> 
> 1) Impossible because we use default urandom source
> 
> 2) Impossible because we always request less than 256 bytes
> 

Yes, I agree.   

> I still check for it out of paranoia.

I'm not sure if this is portable to FreeBSD with the same guarantees, so
checking seems prudent.

> However, from application PoV it should not matter anyway: if call to
> some function
> might fail than we should handle it.

Agreed.

> There are basically 2 things we
> can do after
> logging the error:
> 
> - terminate the application
> 
> - fallback to insecure random numbers
> 
> So far we used the latter. If understood the summary of ongoing
> discussion right,
> than we should opt for former. 

It is certainly better to terminate than to use insecure random numbers
in the security sensitive context, especially if any of those bytes are
used for anything that is long lived.

The only other option is the one we've discussed and perhaps isn't worth
while: initialize once in a blocking manner such that the above
termination state should never happen in theory. Of course, you'll catch
it when it happens in practice when it is caused by for some previously
unconsidered factor. It might make sense to simply state that the OS
handles this case but [0] suggests that such an approach fails badly.
It's probably worse than the papers in public as those are simply the
systems that attract open research and attention.

Essentially, I think your conclusions are clear, and reasonable. I agree
with you and I still worry that I don't have a way to measure the
(probably obscure) failure case. I think this is actually a failure of
the /dev/urandom interface - is any option unsafe? If so, why does it
exist after so many years of failures from unsafe options?

> Shall I make it configurable via application
> vty/config (OsmoBSC/OsmoMSC/OsmoSGSN)?

If the default is reasonably secure, it doesn't seem necessary to add a
foot cannon.

Thank you for all your work on this patch set and for discussing it in
incredible detail.

Happy Hacking,
RS

[0] https://factorable.net/weakkeys12.extended.pdf