Hi Thomas,
I've made some progress in making osmo-trx able to power off/on
without restart. At this moment it's almost there - the only big
remaining issue I see is some bug with UHD device start/stop. It seems
we do not re-initialize it properly and assert (num_rd == OUTCHUNK)
fails. I hope you could fix that.
I completely re-wrote Thread class to make starting/stopping threads a
synchronous reliable operation. I also introduced shutdown() function
to FIFOServiceLoopThread, ControlServiceLoopThread and
TransmitPriorityQueueServiceLoopThread. This method takes care of
un-blocking a thread from a blocking read from a queue or a socket.
Few bugs in the other code is fixed as well. Details are below, please
review the patches.
Patch set is checked into
https://github.com/fairwaves/osmo-trx under
achemeris/stable_threads and achemeris/poweroff_hack branches.
commit c5da6607b4d17706380de72e0814e28f03624c51
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 14:57:14 2013 +0400
Transceiver52M: Fix crash in uhd_device destructor due to deleting statically
allocated memory.
This is a genuine bug, should be fixed upstream.
commit dbd27a60b6ed99fd6fd2339ffafccc0d759fa2fc
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sat Jul 13 21:26:39 2013 +0400
CommonLibs: Fix compile time warnings.
Fixes annoying warning.
commit 8eaa40dd6c2a783306d809aec98c2937e15068f3
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 16:23:32 2013 +0400
Transceiver52M: Remove unused thread mAlignRadioServiceLoopThread;
This actually leads to crash when Thread class is abstract.
commit b8646946526903d2bbfa657690936b9bcba80ca5
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sat Jul 13 22:29:12 2013 +0400
CommonLibs: DEBUG: Mark interthread classes which are not used in OsmoTRX.
I've added this comments just to simplify debugging. I.e. I don't want
to debug interthread primitives which we do not use.
commit 69b6a6dfcd882df180e9d5e9856225f4b1d2eeed
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 01:43:04 2013 +0400
CommonLibs: Allow NULLs to be retrieved from InterthreadQueue.
We need a way to stop InterthreadQueue blocking read to be able to shutdown
a thread. The easiest way to do that is to push NULL to the queue, but the
original implementation will just ignore that and continue blocking. After
the change the blocking read() will exit with NULL result which is perfectly
fine with us.
Ideally we should change all methods of InterthreadQueue to return a status
value to indicate normal exits, timeouts, etc. Right now the only way to
indicate an error is returning NULL, which could be a valid operation.
commit d4a8c1360e5d54188b648766dde199fa37f2ed27
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 02:00:00 2013 +0400
CommonLibs: Add shutdown() method to the DatagramSocket class.
This method is useful to abort a blocking operations on a socket during shutdown.
commit 302a9198df85e4f771368f24d8445dec7c8b2203
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 12:04:44 2013 +0400
CommonLibs: Signal::wait() should return the value which pthread_cond_timedwait()
returns.
commit e279124660a2add686e6c9266d18a9bb0cf43258
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 12:38:25 2013 +0400
CommonLibs: Rewrite Threads class to be predictable during start/stop.
Current implementation strictly synchronize thread startup and shutdown
to make sure a thread is actually started on startThread() exit and
that it's stopped on stopThread() exit (or an error is returned).
I also changed the Thread class to be used as a parent class for
an actual thread. This way we could provide much better logical
separation between variables used by different threads. Which will
(hopefully) reduce number of issues with inter-thread communications.
commit de202e343564913e0591de10850e617d7de55b6b
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 12:42:52 2013 +0400
Transceiver52M: Port all threads to the new Thread interface.
Also note that we introduced shutdown() method for Transceiver threads which
implement proper shutdown of threads when they are in blocking read state.
This involves using shutdown() on sockets and pushing NULL to queues.
With this change we should be able to start/stop transceiver channels at
arbitrary moments.
commit 129ad76b15f1fdf0588d5a24fe2d0f7ef3df274e
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 12:48:37 2013 +0400
Transceiver52M: Delay is no longer needed at the transceiver exit, threads should
shut down cleanly.
commit 604f65e69f2317dd5dc1aeb8faf3c107cd8a2231
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 13:22:31 2013 +0400
CommonLibs: Port InterthreadTest and SocketsTest to the new Thread API.
commit c4038ef54c806c8c00f7fa28bc2b8b200bc3356f
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 13:24:16 2013 +0400
CommonLibs: Adding a new ThreadsTest testsuite.
It's very basic at this moment. We should add a stress-test for thread
start/stop at least.
This is the meat of the branch.
At this stage everything works fine, threads are shutdown properly,
but power off command still doesn't power off the transceiver.
commit c0f0da4ec8e0068a562f427a89a899bf190bc74a
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 23:07:31 2013 +0400
Transceiver52M: HACK: Exit transceiver on POWEROFF command.
This hack allows OsmoBTS to correctly restart itself with different parameters.
We assume that transceiver is run as a service and will be restarted on exit.
commit 806a64b1df40871543e9ae34386463d08a564147
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 22:58:55 2013 +0400
Transceiver52M: Start and stop all threads on POWERON/POWEROFF.
This change reveals some issue with the UHD start/stop code. Specifically,
we get a failed assert in RadioInterface::pullBuffer() (num_rd == OUTCHUNK
with num_rd=0) with the POWERON/POWEROFF/POWERON sequence.
These patches implements actual power off command. They are split into
a separate branch, because they lead to assert failure I mentioned
above. In my setup it's 100% reproducible. I hope you could fix that.
commit 4c39dd4b492b439742911aa70623d41ad634d7f4
Author: Alexander Chemeris <Alexander.Chemeris(a)gmail.com>
Date: Sun Jul 14 22:45:40 2013 +0400
Transceiver52M: Introduce usage counting for DriveLoop and RadioInterface.
The reason is to allow them to be started from any TRX. I.e. they are started
when the first TRX starts and stopped when the last TRX stops.
This hack is a workaround for the assert failure above - it shuts down
the complete transceiver in a hope that it will be restarted by
external means. Meant as to be used in production while we don't have
the assert failure fixed.
--
Regards,
Alexander Chemeris.
CEO, Fairwaves LLC / ООО УмРадио
http://fairwaves.ru