Hello,
Apologies in advance for not responding to the relevant thread/message IDs. First time poster, long time reader.
Reviewing the relevant email thread about "randomness of identifiers" and the design in the proposed changes [0] by Max, it seems that the patch is reasonable with one or two exceptions.
Using OpenSSL is neither necessary, nor sufficient. My understanding is that it brings other restrictions on the use of the RNG, the state of the process, as well as a history of serious security concerns. There are a number of OpenSSL forks such as LibreSSL that seem to be addressing memory safety which might be a better choice.
Many user space PRNGs now use getrandom() internally with added complexity in user space which does not seem strictly necessary. If the system RNG is properly seeded at boot, getrandom() should not block at any later point in the system's lifetime. After Nadia Heninger's work[1] attacking the Linux PRNG, the kernel team greatly improved the state of the kernel's RNG/RNG interfaces. If the system RNG is not properly seeded, it is game over. Systems running OpenBSC need to have cryptographically secure source of random numbers. If a user space cryptographic library really is required, it would probably be better to use NaCL [2] which is a well designed library for cryptography. It considers many cryptographic concerns such as various side channel attacks and properly handles each concern in a systematic manner. OpenSSL and its family do not seem to have been so carefully designed.
There was recently an interesting design posted [3] by djb about the design of "Fast-key-erasure random-number generators" which sounds exactly like what is desired for this use case. It should be extremely easy to implement. Still, the use of getrandom() without re-seeding is probably necessary and should also be sufficient even for non GNU/Linux systems.
Returning to Max's patch:
[4] - A lack of a good random number generator should be a hard build/runtime failure. [5] - GNU_TLS is suggested and on the mailing list, OpenSSL, neither seems to be needed.
Using glibc interface is less complicated than a full user space RNG. Has anyone reviewed it? Can it fail? If so, how does it fail? It is simple enough that intuitively it should be safe. Using the system call directly should ensure the guarantees of the API are met without any issues. The issue of re-seeding [6] is also one worth re-considering.
One lesson learned from various problems with TLS is worth considering here as well: it should be secure to expose random outputs from the (P)RNG and still, it is better to _never_ expose random outputs directly to the network or to an attacker. A simple design for that is simply to hash or encrypt random bytes as djb suggests in his [3] design. Raw PRNG bytes may allow an attacker to break your PRNG as was the case with ExtendedRandom and DualEC [7]. The Debian OpenSSL case [8] is also informative, though of a completely different class of worst case failures, one hopes.
Regarding the conclusion of the reviewer: merging Max's code while removing the rand() failure mode would be a net improvement for OpenBSC. The build should fail without getrandom(). Is there any system where this is an expected failure mode? Anything more complex probably deserves a design discussion with a well defined threat model for a given system.
Happy Hacking, RS
As an aside: If for some reason there is no cryptographically secure hardware RNG on the OpenBSC system, one wonders if it might be of interest to use the available RF interfaces as part of a design for such an RNG. There would be concerns about adversarial control of inputs, of course. It is also reasonable to try to have multiple sources of entropy, especially if the only hardware RNG device is one that isn't verifiable by end users such as RDRAND [9].
[0] https://gerrit.osmocom.org/#/c/1526 [1] https://crypto.stanford.edu/RealWorldCrypto/slides/nadia.pdf [2] https://nacl.cr.yp.to/features.html [3] https://blog.cr.yp.to/20170723-random.html [4] https://gerrit.osmocom.org/#/c/1526/8/utils/osmo-auc-gen.c [5] https://gerrit.osmocom.org/#/c/1526/8//COMMIT_MSG [6] https://blog.cr.yp.to/20140205-entropy.html [7] https://en.wikipedia.org/wiki/Bullrun_%28decryption_program%29 [8] https://freedom-to-tinker.com/2013/09/20/software-transparency-debian-openss... [9] https://en.wikipedia.org/wiki/RdRand#Reception
Hey RS,
thanks for your excellent input on P/RNG!
There's a lot in it, let me first off sprinkle a few remarks...
On Thu, Oct 05, 2017 at 09:56:51AM +0000, ringsignature@riseup.net wrote:
system RNG is properly seeded at boot, getrandom() should not block at any later point in the system's lifetime.
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool? AFAIK that's a real possibility?
As an aside: If for some reason there is no cryptographically secure hardware RNG on the OpenBSC system, one wonders if it might be of interest to use the available RF interfaces as part of a design for such an RNG. There would be concerns about adversarial control of inputs, of course.
It might well be possible to use RF noise as entropy, but the HLR that is generating the RAND bytes is layers away from the BTS and its antenna. If all run on the same small integrated box, somehow getting RF noise from the DSP and feeding entropy to the system which in turn gets used on the other end by the HLR is a possibility, but not really when the HLR is, as by the usual GSM design, in a central location serving numerous cells.
Would be interesting to run a few tests on how quickly we can end up in entropy exhaustion. Using getrandom() always would be the easiest and safest. Saying that we prefer to be quick rather than secure sounds wrong indeed. But if we practically block because of too little entropy, then we annoy lab / testing setups that don't care about security, and we provide a DoS attack vector.
~N
I feel an urge to re-iterate few points - see below. N. B: I'm talking about concrete code available in gerrit.
On 05.10.2017 13:09, Neels Hofmeyr wrote:
On Thu, Oct 05, 2017 at 09:56:51AM +0000, ringsignature@riseup.net wrote:
system RNG is properly seeded at boot, getrandom() should not block at any later point in the system's lifetime.
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool?
It still won't block. See https://gerrit.osmocom.org/#/c/1526/
AFAIK that's a real possibility?
Not sure either way: the attacker should deplete entropy faster than kernel gathers it.
Also, it's irrelevant for RAND_bytes() vs getrandom() discussion: both use the same entropy pool, both will generate "not good enough" random bytes if out of entropy.
So the context for this part of the discussion is in: https://gerrit.osmocom.org/#/c/3819/ https://gerrit.osmocom.org/#/c/3820/ https://gerrit.osmocom.org/#/c/3821/ Meaning: what do we do if we don't get "random enough" data?
So far we've just logged it and carried on with insecure rand(). Could/should we do better? Should we fail? Should we let user decide via config option?
Would be interesting to run a few tests on how quickly we can end up in entropy exhaustion.
I think it's more of an academic exercise at this point: even if our tests would show that we can't deplete it, it doesn't mean that attacker couldn't come up with better ways. So we should decide what should be done when this happens. So far the decision was to "log and forget". Should we change it?
But if we practically block because of too little entropy
No, we don't. See https://gerrit.osmocom.org/#/c/1526/
The patch was initiated by the need to fix licensing issue, so it preserves all the properties of original code: - it does not block - it uses insecure random
Since we're touching this part of the code anyway, we might also change the way we treat entropy depletion.
But, it's somewhat orthogonal to the use of getrandom(): we can change the way we deal with RAND_bytes() failure in another unrelated patch series.
Using getrandom() is not introducing any problems with the random data which are not already there. It also does not fix any of them. It's good that discussion around it attracted our attention to those problems but I think we should keep in mind that those are related but different issues.
On 2017-10-05 11:37, Max wrote:
I feel an urge to re-iterate few points - see below. N. B: I'm talking about concrete code available in gerrit.
On 05.10.2017 13:09, Neels Hofmeyr wrote:
On Thu, Oct 05, 2017 at 09:56:51AM +0000, ringsignature@riseup.net wrote:
system RNG is properly seeded at boot, getrandom() should not block at any later point in the system's lifetime.
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool?
It still won't block. See https://gerrit.osmocom.org/#/c/1526/
As I understand the getrandom() interface in modern Linux systems - it is documented to block until it is initialized and then never block again - is that an incorrect understanding of that interface?
AFAIK that's a real possibility?
Not sure either way: the attacker should deplete entropy faster than kernel gathers it.
Gathering entropy is orthogonal and also important. The core issue as I understand it is *expanding* available entropy with a PRNG construction from an original seed at boot time. Is an attacker really able to deplete the PRNG's theoretical output limit? If we want to be extremely conservative, lets say that is 2^64 outputs is the limit, would the system really not receive another 128 bits of entropy in some manner before that output limit is reached? Just on network card interrupts and timers alone of fetching the TIMSI, I would expect the /dev/random interface to be reseeding the internal prng pool. An attacker could easily gather 128bits of TIMSI data but that expenditure does not directly correspond to the number of input bits at seed time. That's one nice property of the PRNG over a direct RNG interface.
Also, it's irrelevant for RAND_bytes() vs getrandom() discussion: both use the same entropy pool, both will generate "not good enough" random bytes if out of entropy.
Agreed. The main reason to use getrandom() is that it is simpler and ultimately what most projects need to use unless they directly read from a device such as /dev/random, /dev/urandom, or another hardware device of some kind.
So the context for this part of the discussion is in: https://gerrit.osmocom.org/#/c/3819/ https://gerrit.osmocom.org/#/c/3820/ https://gerrit.osmocom.org/#/c/3821/ Meaning: what do we do if we don't get "random enough" data?
So far we've just logged it and carried on with insecure rand(). Could/should we do better? Should we fail? Should we let user decide via config option?
Yes, I think getrandom() is a better default and in fact, the only safe interface. I suggest failing the build absent a getrandom() system call/glibc interface. Additionally, it would be good to ensure that any system running OpenBSC has some source of entropy beyond interrupts and timing - is that already the case?
Would be interesting to run a few tests on how quickly we can end up in entropy exhaustion.
I think it's more of an academic exercise at this point: even if our tests would show that we can't deplete it, it doesn't mean that attacker couldn't come up with better ways. So we should decide what should be done when this happens. So far the decision was to "log and forget". Should we change it?
It would be good to know the theoretical limits of /dev/urandom from a given random seed absent any other influences. I was not able to find a clear explanation and used a simple rule of thumb to come up with 2^64 (n bits / 2) outputs. It seems reasonable to log a low entropy situation - but what exactly are the conditions for that situation?
But if we practically block because of too little entropy
No, we don't. See https://gerrit.osmocom.org/#/c/1526/
The patch was initiated by the need to fix licensing issue, so it preserves all the properties of original code:
- it does not block
 - it uses insecure random
 
Understood.
Since we're touching this part of the code anyway, we might also change the way we treat entropy depletion.
Is there a system wide entropy depletion monitor in place?
But, it's somewhat orthogonal to the use of getrandom(): we can change the way we deal with RAND_bytes() failure in another unrelated patch series.
Using getrandom() is not introducing any problems with the random data which are not already there. It also does not fix any of them. It's good that discussion around it attracted our attention to those problems but I think we should keep in mind that those are related but different issues.
Using getrandom() is strictly better based on my understanding of the interface.
Happy Hacking, RS
On 05.10.2017 14:40, ringsignature@riseup.net wrote:
As I understand the getrandom() interface in modern Linux systems - it is documented to block until it is initialized and then never block again - is that an incorrect understanding of that interface?
More like incomplete. We use it with *GRND_NONBLOCK parameter to make sure it never blocks.*
Additionally, it would be good to ensure that any system running OpenBSC has some source of entropy beyond interrupts and timing - is that already the case?
Out of curiosity - is there a way to check for this programmatically?
Is there a system wide entropy depletion monitor in place?
Not that I know of. Is it some sort of a program or some kernel sysctl knob?
On 2017-10-05 13:17, Max wrote:
On 05.10.2017 14:40, ringsignature@riseup.net wrote:
As I understand the getrandom() interface in modern Linux systems - it is documented to block until it is initialized and then never block again - is that an incorrect understanding of that interface?
More like incomplete. We use it with *GRND_NONBLOCK parameter to make sure it never blocks.*
OK. If it is called at startup a single time without that parameter, all subsequent calls should be completely non-blocking and the interface promises to provide CSPRNG quality random bits. This is strictly an improvement over rand() and when getrandom() isn't available on a system, extra work will be required. Are there any such systems?
Additionally, it would be good to ensure that any system running OpenBSC has some source of entropy beyond interrupts and timing - is that already the case?
Out of curiosity - is there a way to check for this programmatically?
It is possible to check for the presence of some devices by checking CPU flags (RDRAND, etc).
In theory, a call to getrandom() with no flags set should also tell you if the system itself thinks if it has entropy. If the call doesn't block, there is something generating data that the system considers to be entropy.
Is there a system wide entropy depletion monitor in place?
Not that I know of. Is it some sort of a program or some kernel sysctl knob?
There is an entry in /proc which provides an estimate: /proc/sys/kernel/random/entropy_avail
I wrote a small benchmark in C to use getrandom() to fetch random bytes using the SYS_getrandom system call. The entropy_avail value fluctuated between 3820 and 3830 consistently. It doesn't seem that fetching many hundreds of megabytes of random data impacted the system's PRNG entropy pool estimation. This seems to be what is expected: the original entropy is expanded by the PRNG and reading the output of the PRNG does not exhaust the pool with a 1:1 ratio.
Happy Hacking, RS
Hi RS,
On Thu, Oct 05, 2017 at 12:40:11PM +0000, ringsignature@riseup.net wrote:
Yes, I think getrandom() is a better default and in fact, the only safe interface. I suggest failing the build absent a getrandom() system call/glibc interface. Additionally, it would be good to ensure that any system running OpenBSC has some source of entropy beyond interrupts and timing - is that already the case?
We of course have no idea on what systems people are using the related osmocom components on (such as OsmoNITB, OsmoMSC, OsmoSGSN). For some of the smaller / deeper embedded devices (like e.g. the sysmoBTS 1002) for sure there is no hardware random number generator and interrupts are the only source of randomness.
However, in most realistic scenarios you would have more than one BTS and run the NITB/MSC/SGSN on some kind of (embedded?) x86 or ARM board, and most systems have had hardware random number generators for quite a long time. Yes, the question is whether you trust those, but that's completely off-topic here in this thread.
Regards, Harald
Hello,
On 2017-10-06 01:03, Harald Welte wrote:
Hi RS,
On Thu, Oct 05, 2017 at 12:40:11PM +0000, ringsignature@riseup.net wrote:
Yes, I think getrandom() is a better default and in fact, the only safe interface. I suggest failing the build absent a getrandom() system call/glibc interface. Additionally, it would be good to ensure that any system running OpenBSC has some source of entropy beyond interrupts and timing - is that already the case?
We of course have no idea on what systems people are using the related osmocom components on (such as OsmoNITB, OsmoMSC, OsmoSGSN). For some of the smaller / deeper embedded devices (like e.g. the sysmoBTS 1002) for sure there is no hardware random number generator and interrupts are the only source of randomness.
Might those devices be interesting as a research target for generating entropy from a radio interface? The specification suggests that the device does indeed have a radio interface. If so, perhaps it would be a useful experiment for someone to attempt to create an OsmoEntropy subproject? I would be interested in undertaking such a project, if it would be useful and especially if it would be used.
However, in most realistic scenarios you would have more than one BTS and run the NITB/MSC/SGSN on some kind of (embedded?) x86 or ARM board, and most systems have had hardware random number generators for quite a long time. Yes, the question is whether you trust those, but that's completely off-topic here in this thread.
Understood.
Happy Hacking, RS
Hi RS,
On Thu, Oct 05, 2017 at 11:16:45PM -0700, ringsignature@riseup.net wrote:
Might those devices be interesting as a research target for generating entropy from a radio interface? The specification suggests that the device does indeed have a radio interface. If so, perhaps it would be a useful experiment for someone to attempt to create an OsmoEntropy subproject?
Technically, it would of course make sense, and conceptually it's a great idea. However, specifically on the sysmoBTS (as some other devices we support), the proprietary PHY (and hence any part that directly obtains baseband samples) runs on a separate DSP core.
On osmo-bts using that PHY we only have access to figures like BER, RSSI, clock drift, burst arrival timing, ... It might be possible to use some of those for generating entropy, too - if proper care is taken to avoid situations where all of those parameters are controlled by an attacker, of course.
For devices using osmo-trx (the SDR based implementation of a GSM PHY), the situation is different: OsmoTRX receives the baseband samples and is performing the radiomodem function on it. osmo-bts-trx then performs the bust demodulation/decoding. One could hence possibly add some module to either of the two (probably osmo-trx).
However, the much higher CPU requirements of a osmo-trx + osmo-bts-trx setup require a larger/higher-end system (like embedded PC) to run the related code, and hence the probability of having a hardware randomness source is much higher than on the deeply embedded sysmoBTS or osmo-bts-octphy / osmo-bts-litecell15 devices which all run a proprietary PHY in a DSP.
I would be interested in undertaking such a project, if it would be useful and especially if it would be used.
I think if it existed for osmo-bts-sysmo, we'd for sure use it. I'm still not sure if it's really worth the effort, given that most non-trivial setups typically have an external computer as BSC/NITB anyway, as stated below:
However, in most realistic scenarios you would have more than one BTS and run the NITB/MSC/SGSN on some kind of (embedded?) x86 or ARM board, and most systems have had hardware random number generators for quite a long time.
Regards, Harald
On 2017-10-05 11:09, Neels Hofmeyr wrote:
Hey RS,
thanks for your excellent input on P/RNG!
I'm glad it may be useful.
There's a lot in it, let me first off sprinkle a few remarks...
On Thu, Oct 05, 2017 at 09:56:51AM +0000, ringsignature@riseup.net wrote:
system RNG is properly seeded at boot, getrandom() should not block at any later point in the system's lifetime.
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool? AFAIK that's a real possibility?
Could you quantify that? What is the process by which an attacker would be able to cause new TMSI generation? Does it require interactivity from them or can they simply flood OpenBSC with a packet that triggers a the creation of a TMSI? If such a flood is possible, that suggests a security problem at the least and a possible entropy problem.
On the topic of exhausting the entropy pool: as long as a modern GNU/Linux system is properly seeded at boot, /dev/urandom is guaranteed to not block and to continue to emit cryptographically secure random numbers. The entropy post by djb discusses the trade offs of seed-once versus seed-many-times.
I think it is reasonable to seed-once at system boot and to simply always use getrandom() with the assumption that you cannot exhaust the entropy pool. The Linux kernel version 3.16 appears to be when the kernel greatly improved with regard to the random interface. I think on many modern GNU/Linux systems (2017-03-13 RANDOM(4) says so), getrandom() will block until it is initialized:
The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in all use cases, with the exception of applications which require randomness during early boot time; for these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.
In fact, internally in the kernel, it was once and should still be the case that the entropy pool is being constantly updated with timer and interrupt information, in addition to other sources. In a sense, there is no seed-once option - only a seed once with csprng source, then seed many times with other sources including possible csprng sources. Some of those inputs may be controlled or influenced by an attacker but such an attack would be against the entire /dev/random interface. That is definitely a good research project and I'm sure Professor Heninger is already working on it. With regard to the entire PRNG being broken, this is also why I suggested to hash or transform the output bytes, just in case, something is broken as a defense in depth strategy. If an attacker can break the PRNG and also predict the preimage of H(RandomBytes), then they probably don't need the RandomBytes in the first place and the RNG is almost certainly beyond salvation. This would have stopped the *practical* exploitation of DualEC in TLS but it almost certainly wouldn't stop another Debian OpenSSL fiasco.
If one is not convinced by the notion that one 128bit or 256bit seed at boot is good for say, 2^64 outputs even with these internal reseeding operations, and you expect to use that many outputs, reseeding could be done with a userspace rng daemon to hardware bridge like rng-tools.
There are a few free and open hardware designs such as the Gnuk (in the NeuG configuration) that can be re-purposed as a hardware RNG. I recall that the device specifically uses a trick with an ADC that can fail very badly. It seems to me that a properly designed RNG that is cheap to build and easy to buy would be very nice as a project too.
Now down to the brass tacks:
How much random data does OpenBSC currently use in a given period of time, say one day? Is it really larger than the theoretical limits of the outputs for /dev/urandom? What are the system requirements for OpenBSC with regard to a hardware rng or manual seeding? Does OpenBSC only run on Linux or BSD systems with the getrandom() interface? Does OpenBSC store random seed data between boots? Could someone share the value of /proc/sys/kernel/random/entropy_avail from a typical OpenBSC machine?
As an aside: If for some reason there is no cryptographically secure hardware RNG on the OpenBSC system, one wonders if it might be of interest to use the available RF interfaces as part of a design for such an RNG. There would be concerns about adversarial control of inputs, of course.
It might well be possible to use RF noise as entropy, but the HLR that is generating the RAND bytes is layers away from the BTS and its antenna. If all run on the same small integrated box, somehow getting RF noise from the DSP and feeding entropy to the system which in turn gets used on the other end by the HLR is a possibility, but not really when the HLR is, as by the usual GSM design, in a central location serving numerous cells.
Understood.
As a slight digression: There was once some kind of Ubuntu cloud project [1] called Pollinate to solve this for virtual machines and other systems with no direct hardware entropy source. While that seems excessive for this situation, it's a reasonably good overview of how another project shares entropy over the network. Over HTTP there are raw bytes on the network, while over HTTPS it has the problem of needing entropy to start a secure TLS session. One option might be to ensure that the devices are seeded at install time, another might be to extend the already existing protocols to share entropy data from a system which has some, and another might just be to raise the default system requirements to include one or two RNG devices. There are many strategies that seem promising and all of them require around 128 or 256 bits of entropy at boot time, every boot.
Would be interesting to run a few tests on how quickly we can end up in entropy exhaustion. Using getrandom() always would be the easiest and safest. Saying that we prefer to be quick rather than secure sounds wrong indeed. But if we practically block because of too little entropy, then we annoy lab / testing setups that don't care about security, and we provide a DoS attack vector.
It would be surprising if one was able to exhaust it. I would expect an RF denial of service for local clients before I would expect the Linux /dev/urandom PRNG to output predictable sequences based on asking for random TIMSI data.
Happy Hacking, RS
[0] https://www.gniibe.org/memo/development/gnuk/rng/neug.html [1] http://people.canonical.com/~kirkland/Random%20Seeds%20in%20Ubuntu%2014.04%2... [2] https://launchpad.net/pollen
Hi RS,
On Thu, Oct 05, 2017 at 05:20:21AM -0700, ringsignature@riseup.net wrote:
On 2017-10-05 11:09, Neels Hofmeyr wrote:
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool? AFAIK that's a real possibility?
Could you quantify that? What is the process by which an attacker would be able to cause new TMSI generation? Does it require interactivity from them or can they simply flood OpenBSC with a packet that triggers a the creation of a TMSI? If such a flood is possible, that suggests a security problem at the least and a possible entropy problem.
Well, it probably depends on the interface the attacker has access to (Radio/Backhaul/...) and hence the threat model.
A TMSI is allocated at many possible transactions, but all of them [should, in a sane implementation!] require authentication first, so it requires interactivity and is not just a simple flood.
How much random data does OpenBSC currently use in a given period of time, say one day?
I think for all practical installations outside of an attack scenario that we know of, it's very little.
What are the system requirements for OpenBSC with regard to a hardware rng or manual seeding?
We don't publish hardware requirements.
Does OpenBSC only run on Linux or BSD systems with the getrandom() interface?
We don't officially support or build test on anything but Ubuntu >= 16.04, Debian >= 8 and FreeBSD >= 10.3. We of course don't know what people decide to run the code on, in the end.
Does OpenBSC store random seed data between boots?
Not our code. That's a task of the OS, no?
Regards, Harald
Hello,
On 2017-10-06 01:11, Harald Welte wrote:
Hi RS,
On Thu, Oct 05, 2017 at 05:20:21AM -0700, ringsignature@riseup.net wrote:
On 2017-10-05 11:09, Neels Hofmeyr wrote:
What about an attacker firing endless events that cause us to generate new TMSIs and ends up exhausting the entropy pool? AFAIK that's a real possibility?
Could you quantify that? What is the process by which an attacker would be able to cause new TMSI generation? Does it require interactivity from them or can they simply flood OpenBSC with a packet that triggers a the creation of a TMSI? If such a flood is possible, that suggests a security problem at the least and a possible entropy problem.
Well, it probably depends on the interface the attacker has access to (Radio/Backhaul/...) and hence the threat model.
A TMSI is allocated at many possible transactions, but all of them [should, in a sane implementation!] require authentication first, so it requires interactivity and is not just a simple flood.
That is reasonable. I wonder how much entropy that entire process start to finish will use in any given authentication process? That's I suppose, a GSM (and other related protocols) research question rather than an OpenBSC specific question.
How much random data does OpenBSC currently use in a given period of time, say one day?
I think for all practical installations outside of an attack scenario that we know of, it's very little.
That is what I'd expect - I'd also expect in the cases where it is used, it is probably a contributory process with inputs from all communicating peers.
What are the system requirements for OpenBSC with regard to a hardware rng or manual seeding?
We don't publish hardware requirements.
If I may humbly suggest, it may be worth suggesting that users ensure that it is their responsibility to have a good RNG source for the Linux kernel. There are probably a few situations where entropy failure is catastrophic. The generation of K_i seems to be an obvious failure, though there are probably other interesting failures where a badly behaving handset might be able to force the the network side to compute over a value that is unsafe in some manner. Again, some research which is out of scope but hopefully helpful to someone reading along...
Does OpenBSC only run on Linux or BSD systems with the getrandom() interface?
We don't officially support or build test on anything but Ubuntu >= 16.04, Debian >= 8 and FreeBSD >= 10.3. We of course don't know what people decide to run the code on, in the end.
There is a python issue [0] which explains in detail many choices that the python project has made and why with regard to supporting os.random. There is also an excellent "Entropy-supplying system calls" overview page on Wikipedia [1] that is worth reviewing.
On my Debian stable machine, my glibc does not appear to have getrandom() while my kernel does have the syscall for getrandom(). On FreeBSD to test if the machine is properly seeded one may check the kern.random.sys.seeded sysctl and then subsequent calls to [2] read_random() or read_random_uio() [3] may be functionally similar. FreeBSD appears to define those functions such that if there isn't a DEV_RANDOM, the functions simply return 0. It appears OpenBSD supplies a getentropy() function on their system.
getentropy() is documented on Debian as a wrapper around getrandom():
DESCRIPTION The getentropy() function writes length bytes of high-quality random data to the buffer starting at the loca‐ tion pointed to by buffer. The maximum permitted value for the length argument is 256.
A successful call to getentropy() always returns the requested number of bytes of entropy.
It may make sense as a matter of portability to first look for the syscall for getrandom(), getrandom(), getentropy(), and finally a compile failure, in that order? I'm not sure, though whatever is selected should be logged.
Does OpenBSC store random seed data between boots?
Not our code. That's a task of the OS, no?
I know that modern systems are supposed to handle this functionality. It should be a task taken care of by the OS, of course. I believe both officially supported platforms do handle this exact issue which is reasonable.
Happy Hacking, RS
[0] https://bugs.python.org/issue27266 [1] https://en.wikipedia.org/wiki/Entropy-supplying_system_calls [2] https://svnweb.freebsd.org/base?view=revision&revision=286839 [3] https://svnweb.freebsd.org/base/head/sys/sys/random.h?view=markup&pathre...
Hello,
Would be interesting to run a few tests on how quickly we can end up in entropy exhaustion. Using getrandom() always would be the easiest and safest. Saying that we prefer to be quick rather than secure sounds wrong indeed. But if we practically block because of too little entropy, then we annoy lab / testing setups that don't care about security, and we provide a DoS attack vector.
I conducted some simple experiments with small C program running on a quad core Intel(R) Core(TM) i5-2520M CPU. The program is in the body of this email. It utilized only a single core at full capacity. The test runs syscall(SYS_getrandom, &buf, bufsize, flags) in a while(1) loop. The bufsize is 512 (bytes) for each call to SYS_getrandom. The program measures the system entropy as SYS_getrandom is repeatedly called. As the program uses syscall() without #defines for getrandom(), I #defined the flags manually. The name GRND_BLOCKONCE seemed appropriate though non-standard it meaningfully describes the defined behavior. The two flags I tested were:
#define GRND_BLOCKONCE 0 #define GRND_RANDOM 0x0002
The first method of calling ensured that the syscall may block until the system entropy pool is initialized. It did not block on my machine as the entropy pool was initialized around two weeks ago at the last system boot. During a ~fifteen minute run with 412100000 iterations fetching 512 bytes per iteration, the entropy pool increased from ~3700 to ~3867. In another run over ~eleven minutes with 277200000 iterations, I found that the pool went from ~3920 to ~3727. During the second run, I suspect other software on my machine used the /dev/random device directly. In both cases, the output matches my expectation of a very large ratio for the estimated entropy bits to PRNG output bits. The core of the iteration was effectively this single line:
actual_bytes_fetched = syscall(SYS_getrandom, &buf, bufsize, GRND_BLOCKONCE)
The value of actual_bytes_fetched was always equal to the size of bufsize. It never underflowed. With GRND_BLOCKONCE, any concerns about intermittent blocking or a denial of service vectors through getrandom() should be be mitigated. GRND_BLOCKONCE seems ideal for TMSI generation even if an adversary were requesting (tens of) thousands of TMSIs a second.
The second method of calling getrandom() was configured to use the actual bits of the random device pool as clarified in the documentation for getrandom():
GRND_RANDOM If this bit is set, then random bytes are drawn from the random source (i.e., the same source as the /dev/random device) instead of the urandom source. The random source is limited based on the entropy that can be obtained from environmental noise. If the number of available bytes in the random source is less than requested in buflen, the call returns just the available random bytes. If no random bytes are available, the behavior depends on the presence of GRND_NONBLOCK in the flags argument.
The core of the iteration was the same with only a different flag set:
actual_bytes_fetched = syscall(SYS_getrandom, &buf, bufsize, GRND_RANDOM)
The value of actual_bytes_fetched varied at every iteration and did often underflow. The ratio of entropy pool bits to output bits appears to be closer to one to one. It does not seem ideal to use GRND_RANDOM for TMSI generation, especially if a possible adversary is the requesting party - they would seem to be able to drain the device entropy pool very quickly. If this mode was used and it is configured to be non-blocking, it seems that it could fail very badly.
After running the above tests and reading the related documentation, my conclusion is that it would be reasonable to use syscall(SYS_getrandom, &buf, bufsize, 0) as a suitable replacement for rand() in all cases without any concrete security or performance concerns. The overhead is also significantly less than importing or otherwise depending on OpenSSL, GnuTLS, NaCL, or probably even glibc. It may make sense to use the platform's libc interface if it is available. It may also be worthwhile to try to ensure that buffer is indeed changed. The small program below could also easily be modified to test that the buffer is indeed completely filled with some new data and to additionally hash the buffer before use in any cryptographic application.
Happy Hacking, RS
/* * * This program is Free Software under the GPLv3. * Copyleft ringsignature 2017. * * Compile with: * gcc -Wall -std=c11 -o getrandom-exhaust getrandom-exhaust.c * * */
#define _GNU_SOURCE 1 #include <sys/types.h> #include <sys/syscall.h> #include <unistd.h> #include <stdio.h> #include <string.h>
#define GRND_BLOCKONCE 0x0000 #define GRND_NONBLOCK 0x0001 #define GRND_RANDOM 0x0002
void pp(unsigned char *buf, uint s){ uint z = 0; for (z=0; z<s; z++){ printf("%x", buf[z]); } printf("\n"); }
void ms(unsigned char *buf, uint s, unsigned char f){ memset(buf, f, s); }
uint ea(void){ FILE *f; uint i = 0; int val = 0; f = fopen("/proc/sys/kernel/random/entropy_avail", "r"); if (f == NULL) { return -1; } i = fscanf(f, "%u", &val); if (i == EOF) { return -1; } fclose(f); return val; }
void ppea(void) { uint e = -1; e = ea(); if (e != -1) { printf("Current entropy available: %u\n", e); } else { fprintf(stderr, "Error fetching currently available entropy!\n"); } }
void pps(uint b, uint c) { printf("\e[1;1H\e[2J"); printf("getrandom status - byte size requested: %i iteration: %i\n", b, c); ppea(); }
int main(void) { uint bufsize = 512; unsigned char buf[bufsize]; int actual_bytes = 0; int count = 0; int es = ea(); ms(buf, bufsize, 0x0); pps(actual_bytes, count); while (count != -1) { ms(buf, bufsize, 0x42); actual_bytes = syscall(SYS_getrandom, &buf, bufsize, GRND_BLOCKONCE); if (actual_bytes != bufsize) { pps(actual_bytes, count); printf("getrandom underflow!\n"); pp(buf, bufsize); } count++; if ( (count % 100000) == 0) { pps(actual_bytes, count); ms(buf, bufsize, 0x47); } } pps(actual_bytes, count); printf("Original entropy estimate: %i\n", es); return 0; }
Hi RS,
After running the above tests and reading the related documentation, my conclusion is that it would be reasonable to use syscall(SYS_getrandom, &buf, bufsize, 0) as a suitable replacement for rand() in all cases without any concrete security or performance concerns. The overhead is also significantly less than importing or otherwise depending on OpenSSL, GnuTLS, NaCL, or probably even glibc.
Thanks again for your investigation.
So the conclusion is probably, for now: * use getrandom() with zero flags for TMSIs and other random identifiers * use getrandom() with zero flags for RAND challenges * use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
It may make sense to use the platform's libc interface if it is available.
I would go for that, and I believe the current patch under discussion is doing exactly that: use getrandom() if available, and fall back to the raw syscall, if not.
Further falling back on rand() is probably a bad idea, we could simply make this a fatal runtime error ending the program and see if anyone ever reports that error to the mailing list. If at all, the fall-back should not happen automatically, but should be explicitly requested at compile-time ("--enable-unsafe-rand"), or depend on an environment variable, or the like. Basically to make sure the default configuration will be safe, and any unsafe operation requires explicit user intervention.
It may also be worthwhile to try to ensure that buffer is indeed changed.
good point, I like that.
The small program below could also easily be modified to test that the buffer is indeed completely filled with some new data and to additionally hash the buffer before use in any cryptographic application.
Yes, we could improve on that by using some hashing function on the result, rather than using the (cs)prng output directly. But let's keep that as a possible future improvement for now. Out of curiosity: What would you recommend?
Regards, Harald
On 2017-10-06 01:23, Harald Welte wrote:
Hi RS,
After running the above tests and reading the related documentation, my conclusion is that it would be reasonable to use syscall(SYS_getrandom, &buf, bufsize, 0) as a suitable replacement for rand() in all cases without any concrete security or performance concerns. The overhead is also significantly less than importing or otherwise depending on OpenSSL, GnuTLS, NaCL, or probably even glibc.
Thanks again for your investigation.
You're welcome. Thanks for writing Free Software! I really appreciate that your project is open to feedback from complete strangers!
My program is not optimized in any sense and I re-ran my program for 0 to -1 iterations with the following outcome:
time ./getrandom-exhaust getrandom status - byte size requested: 512 iteration: -1 Current entropy available: 2575 Original entropy estimate: 3338
real 170m44.814s user 3m50.836s sys 166m52.948s
I think it would be interesting to run this on any supported system and see if the output is similar. All of the conclusions that I've reached are based on the notion that any decreases to the entropy pool are negligible. Furthermore that the entropy pool might increase during the attempted exhaustion process which is probably an unrelated systems side effect. In either case, it will never underflow and it will always return the requested number of bytes.
So the conclusion is probably, for now:
- use getrandom() with zero flags for TMSIs and other random identifiers
 - use getrandom() with zero flags for RAND challenges
 - use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
 
I don't entirely disagree, however the third option creates a more complex case that I'm not sure is worth the trouble. I would very strongly resist the urge to believe that the raw device bits are _better_ than the output of the PRNG. The GRND_RANDOM flag will result in underflows of the entire systems entropy pool, which especially on an embedded system is a halt-and-catch-fire situation. The complexity around that failure mode might cause failures that are otherwise avoided entirely.
It may make sense to use the platform's libc interface if it is available.
I would go for that, and I believe the current patch under discussion is doing exactly that: use getrandom() if available, and fall back to the raw syscall, if not.
I'd suggest someone examine the failure modes for glibc before making this decision. I expect it to be a completely sane interface which fails nicely - though, if it doesn't, we need to consider how it fails. Apparently it was a "long road" [0] [1] to supporting getrandom() in glibc. The 2.25 appears to be the first release to support getrandom(). Probably the most interesting thing of note for that version is "* The getentropy and getrandom functions, and the <sys/random.h> header file have been added." This almost suggests to me that getentropy() and a maximum buffer of 256 bytes is a reasonable interface to settle on.
As further reading - the people implementing Rust's random [3] have the following view of the matter:
Unix-like systems (Linux, Android, Mac OSX): read directly from /dev/urandom, or from getrandom(2) system call if available. OpenBSD: calls getentropy(2) FreeBSD: uses the kern.arandom sysctl(2) mib Windows: calls RtlGenRandom, exported from advapi32.dll as SystemFunction036. iOS: calls SecRandomCopyBytes as /dev/(u)random is sandboxed. PNaCl: calls into the nacl-irt-random-0.1 IRT interface.
Further falling back on rand() is probably a bad idea, we could simply make this a fatal runtime error ending the program and see if anyone ever reports that error to the mailing list. If at all, the fall-back should not happen automatically, but should be explicitly requested at compile-time ("--enable-unsafe-rand"), or depend on an environment variable, or the like. Basically to make sure the default configuration will be safe, and any unsafe operation requires explicit user intervention.
It may also be worthwhile to try to ensure that buffer is indeed changed.
good point, I like that.
I'm not entirely sure of the best method for checking but I would expect something like this to at least fail safely in the event of a kernel failure:
- allocate a fixed size buffer of 256 bytes which seems to be the most portable size among all the OS provided interfaces - memset the buffer with a known value such as #define THEMEASUREOFABYTE 0x47 - memcmp each byte of the buffer with THEMEASUREOFABYTE to ensure that each byte compared is the same - request that the OS or glibc or libc fills that same buffer with random bytes for the size of the buffer - ensure that the number of bytes written is equal to the requested size - memcmp each byte of the buffer with THEMEASUREOFABYTE to ensure that each byte compared is *not* the same - if anything has gone wrong, return -1 and 0 (is this correct?) - in one possible future, hash the buffer in place to conceal the raw outputs from the PRNG - return the number of bytes written to the buffer, return the buffer full of random bytes
That may make a better unit test than an actual wrapper function. It seems that for portability, it's worth considering such a wrapper as it doesn't appear that every API has the same failure mode. There is of course the unfortunate side effect of removing all 0x47s from your buffer with such an approach, which leads to a suggestion of generating a single random byte, which well, it's 🐢🐢🐢 all the way down... :-)
Such an implementation should be constant time and take into consideration other things that might leak the random data. I'm not sure if memcmp is ideal for that reason. I'll do some additional research/experimentation and share the results.
The small program below could also easily be modified to test that the buffer is indeed completely filled with some new data and to additionally hash the buffer before use in any cryptographic application.
Yes, we could improve on that by using some hashing function on the result, rather than using the (cs)prng output directly. But let's keep that as a possible future improvement for now. Out of curiosity: What would you recommend?
For a 512 *bit* buffer, I think hashing with SHA512 would be a fine transformation. If the attacker can guess the input for an output of SHA512, there is a serious problem. In which case, the hashing might mitigate practical, casual exploitation of the problem. It might not, of course.
For buffers that are 256 or 512 bytes, I think it could be hashed in two or four chunks but someone smarter than me should think about that strategy before drawing a final conclusion. There are strategies using block ciphers rather than hashing functions that might make sense for larger buffers. Such a block cipher would essentially become a user space RNG of a similar design to the one from djb that I shared in earlier in the thread. Once we're in block cipher territory, we're using additional memory, we increase complexity, we may also have concerns about internal key schedules, side channels, and so on. I suppose this is another open research question for me. I'm sure someone else knows an answer off hand but I'm not confident that I know how to do it correctly.
Happy Hacking, RS
[0] https://lwn.net/Articles/711013/ [1] https://sourceware.org/bugzilla/show_bug.cgi?id=17252 [2] https://doc.rust-lang.org/rand/rand/os/struct.OsRng.html
On 06.10.2017 03:23, Harald Welte wrote:
So the conclusion is probably, for now:
- use getrandom() with zero flags for TMSIs and other random identifiers
 - use getrandom() with zero flags for RAND challenges
 
I don't think it's a good idea. It's fine for exploratory programming while experimenting but in the library which is meant for production use behavior should be as predictable as possible.
Using "zero flags" means that the function might or might not block which pretty-much guarantees headaches later on when troubleshooting the code which will use it.
I think we should always opt for "least surprise" path and use GRND_NONBLOCK (as in current patch). That way we'll never block and let the caller handle errors (if any).
- use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
 
I don't have a strong opinion on this one. For GNU/Linux kernel >= 4.8 both /dev/random and /dev/urandom are going through the same CSPRNG so I'm not sure we gain anything by requiring random instead of urandom.
Also, are we talking about utils/osmo-auc-gen or smth else?
I would go for that, and I believe the current patch under discussion is doing exactly that: use getrandom() if available, and fall back to the raw syscall, if not.
Yepp, that's how it's done.
Further falling back on rand() is probably a bad idea
Removed: it was there only because that's how OpenBSC and other apps have always used it. But if we do not plan to use insecure random anymore than it's not needed.
On 2017-10-06 12:50, Max wrote:
On 06.10.2017 03:23, Harald Welte wrote:
So the conclusion is probably, for now:
- use getrandom() with zero flags for TMSIs and other random identifiers
 - use getrandom() with zero flags for RAND challenges
 I don't think it's a good idea. It's fine for exploratory programming while experimenting but in the library which is meant for production use behavior should be as predictable as possible.
Using "zero flags" means that the function might or might not block which pretty-much guarantees headaches later on when troubleshooting the code which will use it.
Is there any reason that it isn't just called a single time on system startup? It should never again block after that point in time. A measurement that might be worth considering is if it blocks, ever, in practice? At least one of my Debian systems appears to ensure that the RNG is seeded before it has reached the run level where services run. Might that be the case here? Might it also be possible to call the wrapper for getrandom() at library init time as well as later when the random bytes are needed?
I think we should always opt for "least surprise" path and use GRND_NONBLOCK (as in current patch). That way we'll never block and let the caller handle errors (if any).
Isn't it more error prone to handle errors and unfilled buffers than to block a single time? Seems tricky, though I agree that consistent behavior might be worth the trade off. If GRND_NONBLOCK ensures that no buffer is ever underfilled, that might be the middle ground that makes the most sense.
- use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
 I don't have a strong opinion on this one. For GNU/Linux kernel >= 4.8 both /dev/random and /dev/urandom are going through the same CSPRNG so I'm not sure we gain anything by requiring random instead of urandom.
That is my understanding as well. The key difference is that you were spot on about the pool being drained very seriously - to the point of underflowing, thus potentially encountering serious error states. Those error states might be a 512 byte buffer with only one random byte, for example.
Happy Hacking, RS
On 06.10.2017 17:43, ringsignature@riseup.net wrote:
Is there any reason that it isn't just called a single time on system startup? It should never again block after that point in time.
Small clarification (more like not to myself reading this later on): that's the case for default urandom source (used in current patch), not the GRND_RANDOM (discussed elsewhere in the ML thread).
A measurement that might be worth considering is if it blocks, ever, in practice?
We'll sort of make that in the end: even in GRND_NONBLOCK mode the errno is set for the cases where it would have been blocked otherwise. It's propagated to application by osmo_get_rand_id() and the application will log it.
Might it also be possible to call the wrapper for getrandom() at library init time as well as later when the random bytes are needed?
Sure, but it seems more complex.
What do we gain by using it that way? The underlying getrandom() call might still fail so we have to handle/propagate errors anyway. What do we loose by using it that way? Seems like nothing - see my note about error propagation above.
Isn't it more error prone to handle errors and unfilled buffers than to block a single time?
We don't have to deal with unfilled buffers in application code: either call to osmo_get_rand_id() succeeds or not.
Also it depends on what kind of error handling we choose. To me single "if (rc < 0) { log("doh!"); exit(1); }" looks less error-prone than potentially blocking code.
Seems tricky, though I agree that consistent behavior might be worth the trade off. If GRND_NONBLOCK ensures that no buffer is ever underfilled, that might be the middle ground that makes the most sense.
GRND_NONBLOCK doesn't guarantee that, using OSMO_MAX_RAND_ID_LEN = 16 and default urandom source does.
I think there might be some confusion so let me stress the following point: GRND_* flags are independent. There are 2 flags currently available. So we have 4 flag combinations.
GRND_RANDOM let us select randomness source. GRND_NONBLOCK let us control blocking behavior.
So in the absence of explicit guarantees (provided by _NONBLOCK) we can implicitly control blocking (to block at most once) by selecting random source and buffer size.
I prefer explicit control unless there are strong arguments against it.
That is my understanding as well. The key difference is that you were spot on about the pool being drained very seriously - to the point of underflowing, thus potentially encountering serious error states. Those error states might be a 512 byte buffer with only one random byte, for example.
There're 2 possible causes for underfilled buffer after getrandom() according to man: 1) insufficient entropy
2) signal interrupt
Both are impossible in the current implementation under review in https://gerrit.osmocom.org/#/c/1526/
1) Impossible because we use default urandom source
2) Impossible because we always request less than 256 bytes
I still check for it out of paranoia.
However, from application PoV it should not matter anyway: if call to some function might fail than we should handle it. There are basically 2 things we can do after logging the error:
- terminate the application
- fallback to insecure random numbers
So far we used the latter. If understood the summary of ongoing discussion right, than we should opt for former. Shall I make it configurable via application vty/config (OsmoBSC/OsmoMSC/OsmoSGSN)?
Hello,
On 2017-10-06 16:57, Max wrote:
On 06.10.2017 17:43, ringsignature@riseup.net wrote:
Is there any reason that it isn't just called a single time on system startup? It should never again block after that point in time.
Small clarification (more like not to myself reading this later on): that's the case for default urandom source (used in current patch), not the GRND_RANDOM (discussed elsewhere in the ML thread).
A measurement that might be worth considering is if it blocks, ever, in practice?
We'll sort of make that in the end: even in GRND_NONBLOCK mode the errno is set for the cases where it would have been blocked otherwise. It's propagated to application by osmo_get_rand_id() and the application will log it.
OK. That seems reasonable as a way of measuring if this is practically a problem on any production systems.
If such a log message occurs, what would it mean to an end user? One might read that as those random bytes are less than ideal but beyond that, I'm not sure what it would mean in an actionable sense. Does it mean one shouldn't use an IMSI ki generated when that log message is emitted?
Might it also be possible to call the wrapper for getrandom() at library init time as well as later when the random bytes are needed?
Sure, but it seems more complex.
If there is a library setup, a possible, but not certain, blocking call seems reasonable for the later promise of never blocking. It could even call the same function with a single argument change? I agree that it is slightly more complex, though I think the trade off might be worth considering. Though it sounds like you have considered it and your conclusions are all very reasonable.
What do we gain by using it that way? The underlying getrandom() call might still fail so we have to handle/propagate errors anyway.
The documentation explicitly claims to never fail for 1 to 256 byte requests:
If the urandom source has been initialized, reads of up to 256 bytes will always return as many bytes as requested and will not be interrupted by signals. No such guarantees apply for larger buffer sizes. For example, if the call is interrupted by a signal handler, it may return a partially filled buffer, or fail with the error EINTR.
What do we loose by using it that way? Seems like nothing - see my note about error propagation above.
I think the failure on an embedded system is that the output is predictable. That's the Mining Your Ps and Qs issue all over again with GSM and related protocols. I'm curious about analysis of GSM and related protocols, specifically how they fail when there isn't sufficient entropy. Are there any useful survey papers on the topic that anyone could recommend?
Isn't it more error prone to handle errors and unfilled buffers than to block a single time?
We don't have to deal with unfilled buffers in application code: either call to osmo_get_rand_id() succeeds or not.
Ah - of course, in your patch for values of 1 to 256 bytes, I agree. The suggestion of using GRND_RANDOM does produce an occasional outcome where the buffer is less than the requested size, which does need to be handled. I think that for simplicity, it seems wise to avoid using GRND_RANDOM, regardless of the discussion around GRND_NONBLOCK.
Also it depends on what kind of error handling we choose. To me single "if (rc < 0) { log("doh!"); exit(1); }" looks less error-prone than potentially blocking code.
In the case of GRND_NONBLOCK, I generally agree.
Seems tricky, though I agree that consistent behavior might be worth the trade off. If GRND_NONBLOCK ensures that no buffer is ever underfilled, that might be the middle ground that makes the most sense.
GRND_NONBLOCK doesn't guarantee that, using OSMO_MAX_RAND_ID_LEN = 16 and default urandom source does.
Understood, I agree that with GRND_NONBLOCK alone, getrandom() should both never block and never under fill a buffer of 256 bytes or less.
I think there might be some confusion so let me stress the following point: GRND_* flags are independent. There are 2 flags currently available. So we have 4 flag combinations.
GRND_RANDOM let us select randomness source. GRND_NONBLOCK let us control blocking behavior.
So in the absence of explicit guarantees (provided by _NONBLOCK) we can implicitly control blocking (to block at most once) by selecting random source and buffer size.
I prefer explicit control unless there are strong arguments against it.
I agree with the strategy of explicit control. Where we slightly diverge is that I would simply want to ensure that the RNG was initialized. My intuition is that on most modern systems, it will always be the case but for some embedded systems, it seems to be tempting fate. In those perhaps rare but perhaps serious cases, I think an initalizer function blocking once might save a lot of cryptographic headaches later.
That is my understanding as well. The key difference is that you were spot on about the pool being drained very seriously - to the point of underflowing, thus potentially encountering serious error states. Those error states might be a 512 byte buffer with only one random byte, for example.
There're 2 possible causes for underfilled buffer after getrandom() according to man:
insufficient entropy
signal interrupt
Both are impossible in the current implementation under review in https://gerrit.osmocom.org/#/c/1526/
Impossible because we use default urandom source
Impossible because we always request less than 256 bytes
Yes, I agree.
I still check for it out of paranoia.
I'm not sure if this is portable to FreeBSD with the same guarantees, so checking seems prudent.
However, from application PoV it should not matter anyway: if call to some function might fail than we should handle it.
Agreed.
There are basically 2 things we can do after logging the error:
terminate the application
fallback to insecure random numbers
So far we used the latter. If understood the summary of ongoing discussion right, than we should opt for former.
It is certainly better to terminate than to use insecure random numbers in the security sensitive context, especially if any of those bytes are used for anything that is long lived.
The only other option is the one we've discussed and perhaps isn't worth while: initialize once in a blocking manner such that the above termination state should never happen in theory. Of course, you'll catch it when it happens in practice when it is caused by for some previously unconsidered factor. It might make sense to simply state that the OS handles this case but [0] suggests that such an approach fails badly. It's probably worse than the papers in public as those are simply the systems that attract open research and attention.
Essentially, I think your conclusions are clear, and reasonable. I agree with you and I still worry that I don't have a way to measure the (probably obscure) failure case. I think this is actually a failure of the /dev/urandom interface - is any option unsafe? If so, why does it exist after so many years of failures from unsafe options?
Shall I make it configurable via application vty/config (OsmoBSC/OsmoMSC/OsmoSGSN)?
If the default is reasonably secure, it doesn't seem necessary to add a foot cannon.
Thank you for all your work on this patch set and for discussing it in incredible detail.
Happy Hacking, RS
Hi Max,
On Fri, Oct 06, 2017 at 06:57:03PM +0200, Max wrote:
However, from application PoV it should not matter anyway: if call to some function might fail than we should handle it. There are basically 2 things we can do after logging the error:
terminate the application
fallback to insecure random numbers
So far we used the latter. If understood the summary of ongoing discussion right, than we should opt for former. Shall I make it configurable via application vty/config (OsmoBSC/OsmoMSC/OsmoSGSN)?
I think it should be a compile time decision for now, and the default should be "no fallback". So basically the entire fallback code is #ifdef'd out unless somebody builds libosmocore with a possibly dangerous compile option and has a good reason to do so.
If the user does that, there should be a related warning at the end of the ./configure step, and we should also print runtime WARNING level messages once we actually start to fallback to insecure rand().
I'm somewhat confused with the implementation details: placing fallback into library would mean we effectively duplicate the fallback logic: the library might or might not fallback and than the application will have to decide if it's ok with the fallback.
I'd prefer to use secure only random in the library code and make insecure fallback a compile-time option in the application code. That way we can manage it on application or even case-by-case basis later on if we decide to drop it altogether.
Although I might be missing smth, so looking forward for your feedback.
On 07.10.2017 08:34, Harald Welte wrote:
I think it should be a compile time decision for now, and the default should be "no fallback". So basically the entire fallback code is #ifdef'd out unless somebody builds libosmocore with a possibly dangerous compile option and has a good reason to do so.
If the user does that, there should be a related warning at the end of the ./configure step, and we should also print runtime WARNING level messages once we actually start to fallback to insecure rand().
On Mon, Oct 09, 2017 at 12:16:50PM +0200, Max wrote:
I'm somewhat confused with the implementation details: placing fallback into library would mean we effectively duplicate the fallback logic: the library might or might not fallback and than the application will have to decide if it's ok with the fallback.
I'd prefer to use secure only random in the library code and make insecure fallback a compile-time option in the application code. That way we can manage it on application or even case-by-case basis later on if we decide to drop it altogether.
I think we should have the related code only once, and that means it should be in the library. I don't want per-application specific fallback.
In any case, to conserve our limited development resources, let's not have any fallback for the time being and wait if it ever turns out to be an issue for any of our users.
On 2017-10-06 16:57, Max wrote:
GRND_RANDOM let us select randomness source. GRND_NONBLOCK let us control blocking behavior.
After rereading the relevant documentation, I realize that I misunderstood one critical point regarding GRND_NONBLOCK. The critical difference is that GRND_NONBLOCK controls not only blocking behavior, it also ensures that in the case of the system not being properly seeded, it fails closed by immediately returning -1 without outputting any random bytes. I apologize for completely missing this detail.
It is of course the case that you are correct and if blocking is a problem, it is logical to set GRND_NONBLOCK and to handle the error case, especially by terminating. That is indeed a critical error and there should be no output.
Happy Hacking, RS
Hi RS,
On Fri, Oct 06, 2017 at 08:43:22AM -0700, ringsignature@riseup.net wrote:
- use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
 I don't have a strong opinion on this one. For GNU/Linux kernel >= 4.8 both /dev/random and /dev/urandom are going through the same CSPRNG so I'm not sure we gain anything by requiring random instead of urandom.
That is my understanding as well. The key difference is that you were spot on about the pool being drained very seriously - to the point of underflowing, thus potentially encountering serious error states. Those error states might be a 512 byte buffer with only one random byte, for example.
I don't think this is an issue. We can actually do blocking reads for key generation, once we ever do that. The speed of programming SIM cards is probably invariably slower than the amount of randomness we can get. That's a one time operation at the time SIM cards are programmed.
On 2017-10-07 06:30, Harald Welte wrote:
Hi RS,
On Fri, Oct 06, 2017 at 08:43:22AM -0700, ringsignature@riseup.net wrote:
- use getrandom() with GRND_RANDOM flag for K/OP/OPc/Ki generation
 I don't have a strong opinion on this one. For GNU/Linux kernel >= 4.8 both /dev/random and /dev/urandom are going through the same CSPRNG so I'm not sure we gain anything by requiring random instead of urandom.
That is my understanding as well. The key difference is that you were spot on about the pool being drained very seriously - to the point of underflowing, thus potentially encountering serious error states. Those error states might be a 512 byte buffer with only one random byte, for example.
I don't think this is an issue. We can actually do blocking reads for key generation, once we ever do that. The speed of programming SIM cards is probably invariably slower than the amount of randomness we can get. That's a one time operation at the time SIM cards are programmed.
That suggests setting GRND_NONBLOCK for all calls except for generation of SIM card cryptographic keys, I think?
In the latter case, setting a flag of '0' may be appropriate. After reading one of the latest [0] papers on the robustness of /dev/random, I think it would be wise to leave GRND_RANDOM out and only use urandom in a properly seeded state for such keys. That that paper does not inspire confidence in even that choice, sadly I don't see a better choice.
Happy Hacking, RS [0] https://eprint.iacr.org/2013/338.pdf