So, does the dsp_api.ndb->a_du_0 member hold the audio data to be send to the other peer, or that's something else? How can I capture the audio data (after codec), make a slight modification to that data, and send it to the other mobile?
Mmm, I'm not sure you can do that. You can 'peek' at the audio, but by the time you see the data there it may have been processed into TCH frames already.
When in 'PLAY_MODE', it will take data to send from a_du_1 (in TCH/F). So it's possible that the data from the microphone would still be in a_du_0 and so you could put it in play mode, do your modification by copying data from a_du_0 to a_du_1 ...
As I said : No documentation so it's all been done by error / trial, you'd have to try :)
I can see that AUDIO_TX_TRAFFIC_REQ/AUDIO_RX_TRAFFIC_IND audio mode were created for that purpose, but I'm not sure how to use that. Do I need to make the modifications I need on gsm_recv_voice and gsm_send_voice (voice.c)?
Those are used to RX and TX frames from the host (rather than the phone directly). The only way to use them "as is" is to interface osmocom-bb with LCR (not sure if there is a howto ...). But as you saw in voice.c you could modify the code yourself to feed data if need be ...
Cheers,
Sylvain