As Dan nicely puts it they are using the available bandwidth in a G711 audio stream and embedding the G729 stream in it. Looks good for security purposes. But this is what I percieve would be the problem in real case scenarios. There are valid applications that need these details. For example Legal Intercept, Loggers etc.
The current IP Loggers ( voice recorders used to record RTP packets) would need to have knowledge about this and for this to happen the underlying signalling protocol has to have knowledge about this ( maybe through SDP). If that information is passed, your security issues starts all over again.
My verdict is that it is a very smart research paper, but to actually implement it would throw the existing systems into a spin.