Hi Thomas,
I've made some progress in making osmo-trx able to power off/on without restart. At this moment it's almost there - the only big remaining issue I see is some bug with UHD device start/stop. It seems we do not re-initialize it properly and assert (num_rd == OUTCHUNK) fails. I hope you could fix that.
I completely re-wrote Thread class to make starting/stopping threads a synchronous reliable operation. I also introduced shutdown() function to FIFOServiceLoopThread, ControlServiceLoopThread and TransmitPriorityQueueServiceLoopThread. This method takes care of un-blocking a thread from a blocking read from a queue or a socket. Few bugs in the other code is fixed as well. Details are below, please review the patches.
Patch set is checked into https://github.com/fairwaves/osmo-trx under achemeris/stable_threads and achemeris/poweroff_hack branches.
commit c5da6607b4d17706380de72e0814e28f03624c51 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 14:57:14 2013 +0400
Transceiver52M: Fix crash in uhd_device destructor due to deleting statically allocated memory.
This is a genuine bug, should be fixed upstream.
commit dbd27a60b6ed99fd6fd2339ffafccc0d759fa2fc Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sat Jul 13 21:26:39 2013 +0400
CommonLibs: Fix compile time warnings.
Fixes annoying warning.
commit 8eaa40dd6c2a783306d809aec98c2937e15068f3 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 16:23:32 2013 +0400
Transceiver52M: Remove unused thread mAlignRadioServiceLoopThread;
This actually leads to crash when Thread class is abstract.
commit b8646946526903d2bbfa657690936b9bcba80ca5 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sat Jul 13 22:29:12 2013 +0400
CommonLibs: DEBUG: Mark interthread classes which are not used in OsmoTRX.
I've added this comments just to simplify debugging. I.e. I don't want to debug interthread primitives which we do not use.
commit 69b6a6dfcd882df180e9d5e9856225f4b1d2eeed Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 01:43:04 2013 +0400
CommonLibs: Allow NULLs to be retrieved from InterthreadQueue. We need a way to stop InterthreadQueue blocking read to be able to shutdown a thread. The easiest way to do that is to push NULL to the queue, but the original implementation will just ignore that and continue blocking. After the change the blocking read() will exit with NULL result which is perfectly fine with us. Ideally we should change all methods of InterthreadQueue to return a status value to indicate normal exits, timeouts, etc. Right now the only way to indicate an error is returning NULL, which could be a valid operation.commit d4a8c1360e5d54188b648766dde199fa37f2ed27 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 02:00:00 2013 +0400
CommonLibs: Add shutdown() method to the DatagramSocket class. This method is useful to abort a blocking operations on a socket during shutdown.commit 302a9198df85e4f771368f24d8445dec7c8b2203 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 12:04:44 2013 +0400
CommonLibs: Signal::wait() should return the value which pthread_cond_timedwait() returns.commit e279124660a2add686e6c9266d18a9bb0cf43258 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 12:38:25 2013 +0400
CommonLibs: Rewrite Threads class to be predictable during start/stop. Current implementation strictly synchronize thread startup and shutdown to make sure a thread is actually started on startThread() exit and that it's stopped on stopThread() exit (or an error is returned). I also changed the Thread class to be used as a parent class for an actual thread. This way we could provide much better logical separation between variables used by different threads. Which will (hopefully) reduce number of issues with inter-thread communications.commit de202e343564913e0591de10850e617d7de55b6b Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 12:42:52 2013 +0400
Transceiver52M: Port all threads to the new Thread interface. Also note that we introduced shutdown() method for Transceiver threads which implement proper shutdown of threads when they are in blocking read state. This involves using shutdown() on sockets and pushing NULL to queues. With this change we should be able to start/stop transceiver channels at arbitrary moments.commit 129ad76b15f1fdf0588d5a24fe2d0f7ef3df274e Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 12:48:37 2013 +0400
Transceiver52M: Delay is no longer needed at the transceiver exit, threads should shut down cleanly.commit 604f65e69f2317dd5dc1aeb8faf3c107cd8a2231 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 13:22:31 2013 +0400
CommonLibs: Port InterthreadTest and SocketsTest to the new Thread API.commit c4038ef54c806c8c00f7fa28bc2b8b200bc3356f Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 13:24:16 2013 +0400
CommonLibs: Adding a new ThreadsTest testsuite. It's very basic at this moment. We should add a stress-test for thread start/stop at least.
This is the meat of the branch. At this stage everything works fine, threads are shutdown properly, but power off command still doesn't power off the transceiver.
commit c0f0da4ec8e0068a562f427a89a899bf190bc74a Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 23:07:31 2013 +0400
Transceiver52M: HACK: Exit transceiver on POWEROFF command. This hack allows OsmoBTS to correctly restart itself with different parameters. We assume that transceiver is run as a service and will be restarted on exit.commit 806a64b1df40871543e9ae34386463d08a564147 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 22:58:55 2013 +0400
Transceiver52M: Start and stop all threads on POWERON/POWEROFF. This change reveals some issue with the UHD start/stop code. Specifically, we get a failed assert in RadioInterface::pullBuffer() (num_rd == OUTCHUNK with num_rd=0) with the POWERON/POWEROFF/POWERON sequence.
These patches implements actual power off command. They are split into a separate branch, because they lead to assert failure I mentioned above. In my setup it's 100% reproducible. I hope you could fix that.
commit 4c39dd4b492b439742911aa70623d41ad634d7f4 Author: Alexander Chemeris Alexander.Chemeris@gmail.com Date: Sun Jul 14 22:45:40 2013 +0400
Transceiver52M: Introduce usage counting for DriveLoop and RadioInterface. The reason is to allow them to be started from any TRX. I.e. they are started when the first TRX starts and stopped when the last TRX stops.
This hack is a workaround for the assert failure above - it shuts down the complete transceiver in a hope that it will be restarted by external means. Meant as to be used in production while we don't have the assert failure fixed.
-- Regards, Alexander Chemeris. CEO, Fairwaves LLC / ООО УмРадио http://fairwaves.ru
On Sun, Jul 14, 2013 at 10:01 PM, Alexander Chemeris alexander.chemeris@gmail.com wrote:
I've made some progress in making osmo-trx able to power off/on without restart. At this moment it's almost there - the only big remaining issue I see is some bug with UHD device start/stop. It seems we do not re-initialize it properly and assert (num_rd == OUTCHUNK) fails. I hope you could fix that.
This is a very positive patch set. As you very well know, the power on/off handling was in a ugly state. One of the remaining issue with UHD start/stop is that the device clock is reset to zero at restart, but not all of the timestamp counters are reset in radioInterface. The radioInterface is not stopped at all until deallocation since 'POWEROFF' is an empty command.
I've only looked at the outer thread interface changes, but the ability to tear down individual threads (instead of uncontrolled stack unwinding) makes the remaining issues much more straightforward. There's a fair amount of old code that was probably never written with restart in mind, but I don't see any issues with cleaning that up.
Thomas
On Mon, Jul 15, 2013 at 2:35 PM, Thomas Tsou ttsou@vt.edu wrote:
On Sun, Jul 14, 2013 at 10:01 PM, Alexander Chemeris alexander.chemeris@gmail.com wrote:
I've made some progress in making osmo-trx able to power off/on without restart. At this moment it's almost there - the only big remaining issue I see is some bug with UHD device start/stop. It seems we do not re-initialize it properly and assert (num_rd == OUTCHUNK) fails. I hope you could fix that.
This is a very positive patch set. As you very well know, the power on/off handling was in a ugly state. One of the remaining issue with UHD start/stop is that the device clock is reset to zero at restart, but not all of the timestamp counters are reset in radioInterface. The radioInterface is not stopped at all until deallocation since 'POWEROFF' is an empty command.
I made an attempt to implement it in the 806a64b1 patch. If you check out this commit, you should get the assert during TURN ON/OFF/ON sequence. To start/stop transceiver I use scripts attached to this e-mail.
I've only looked at the outer thread interface changes, but the ability to tear down individual threads (instead of uncontrolled stack unwinding) makes the remaining issues much more straightforward. There's a fair amount of old code that was probably never written with restart in mind, but I don't see any issues with cleaning that up.
For the old code we could use "compatibility" mode where we don't shutdown or exit on shutdown.
-- Regards, Alexander Chemeris. CEO, Fairwaves LLC / ООО УмРадио http://fairwaves.ru
On Mon, Jul 15, 2013 at 1:19 PM, Alexander Chemeris < alexander.chemeris@gmail.com> wrote:
On Mon, Jul 15, 2013 at 2:35 PM, Thomas Tsou ttsou@vt.edu wrote:
This is a very positive patch set. As you very well know, the power on/off handling was in a ugly state. One of the remaining issue with UHD start/stop is that the device clock is reset to zero at restart, but not all of the timestamp counters are reset in radioInterface. The radioInterface is not stopped at all until deallocation since 'POWEROFF' is an empty command.
I made an attempt to implement it in the 806a64b1 patch. If you check out this commit, you should get the assert during TURN ON/OFF/ON sequence. To start/stop transceiver I use scripts attached to this e-mail.
There does seem to be something strange going on with UHD. After restart and clock reset the device is reporting time correctly, but subsequent packets do not reflect the change. We've had similar issues before due to lingering packets in the socket buffers, but the same fix (flushing buffers after issuing STREAM_MODE_STOP) doesn't fix the issue.
Rx timestamp 10.3679 Rx timestamp 10.3698 ----POWEROFF---- ----POWERON---- UHD device time 0.00440694 Rx timestamp 10.3732
This may be a bug in UHD, or some oddity in the UHD streamer interface. I'm going to try it with an outside test case.
Thomas
On Mon, Jul 15, 2013 at 10:54 PM, Thomas Tsou tom@tsou.cc wrote:
On Mon, Jul 15, 2013 at 1:19 PM, Alexander Chemeris alexander.chemeris@gmail.com wrote:
On Mon, Jul 15, 2013 at 2:35 PM, Thomas Tsou ttsou@vt.edu wrote:
This is a very positive patch set. As you very well know, the power on/off handling was in a ugly state. One of the remaining issue with UHD start/stop is that the device clock is reset to zero at restart, but not all of the timestamp counters are reset in radioInterface. The radioInterface is not stopped at all until deallocation since 'POWEROFF' is an empty command.
I made an attempt to implement it in the 806a64b1 patch. If you check out this commit, you should get the assert during TURN ON/OFF/ON sequence. To start/stop transceiver I use scripts attached to this e-mail.
There does seem to be something strange going on with UHD. After restart and clock reset the device is reporting time correctly, but subsequent packets do not reflect the change. We've had similar issues before due to lingering packets in the socket buffers, but the same fix (flushing buffers after issuing STREAM_MODE_STOP) doesn't fix the issue.
Rx timestamp 10.3679 Rx timestamp 10.3698 ----POWEROFF---- ----POWERON---- UHD device time 0.00440694 Rx timestamp 10.3732
This may be a bug in UHD, or some oddity in the UHD streamer interface. I'm going to try it with an outside test case.
Worst case we should be able to align the transceiver clock with UHD instead of resetting UHD clock.
-- Regards, Alexander Chemeris. CEO, Fairwaves LLC / ООО УмРадио http://fairwaves.ru