In this blog article we continue to analyze RTP and RTCP and we will see why Jitter Buffer is important and how it affects call quality.
As we saw in the previous article — SDP is not able to transfer media–this task is delegated to protocols such as RTP or RTSP.
RTCP (or Real Time Control Protocol) provides different levels of feedback about the ongoing RTP Stream.
The goal of RTCP is to provide information to the remote endpoint about the quality of service of the ongoing communication.
This is done by providing regular statistics about the amount of packets received, jitter, and packets lost (either via network or discarded by the jitter buffer).
Jitter Buffer and Call Quality
This information can be used by the application to provide call quality info to users.
Even if RTP packets are generated by devices at regular intervals (typical frame rates being: 20 ms – 30 ms – 40 ms – 60 ms), the transport over the Internet can introduce both packet loss and differences in the interval between following packets (on the receiving side). An RTP packet can be even received later than subsequent RTP packets in the stream.
If the RTP packets are received and handled without any buffer (for example, immediately playing back the audio), the percentage of lost packets will increase, resulting in many more audio / video artifacts. For this reason, a buffer is necessary.
A jitter buffer works by first reconstructing the original ordering of packets on the receiving side, and then generating an even audio / video stream.
The problem introduced by jitter buffer is a small delay in the playback of incoming media (typically between 100 and 500 ms). This delay adds itself to network latency, making conversations (audio or video) less immediate.
For example, in the presence of high latency (400-500 ms) and a big jitter buffer (400 ms), the two participants in a conversation can start talking at the same time but realize it only after 900 ms. At that point, they will both stop talking and wait for the other to speak next and so on.
A static jitter buffer waits a predefined amount of time before considering a packet lost. The main disadvantage of a static jitter buffer is that the latency added by the jitter buffer is constant. If jitter decreases, the delay in playback remains constant. If jitter increases over the jitter buffer size, packets will be discarded.
A dynamic jitter buffer is a great improvement over the static mode we just analyzed. Dynamic jitter buffers use statistics of received packets (rate – interval) to predict how long it should wait for packets before considering them lost. It can adapt quickly to changing conditions, assuring the best possible audio quality by minimizing packet loss and latency.
RTCP with Feedback
In order to further improve the quality of communications, new codecs are constantly being developed.
For example, video codecs (such as h.264 / vp8 / vp9) and audio codecs (such as opus) can dynamically increase and decrease the bitrate used by increasing / reducing the bitrate (frame rate / resolution / audio quality) of the streams. This behavior is needed to allow users to communicate when network conditions are variable and not optimal.
To determine the stream properties, it is necessary for the endpoint to have regular, up-to-date information about packet loss and jitter from the other endpoint in the communication.
The IETF created rfc5760 “RTP Control Protocol (RTCP) Extensions for Single-Source Multicast Sessions with Unicast Feedback” to provide specifications for this RTCP mode with improved and real-time feedback.
Streams supporting this feature are marked as AVPF (SAVPF for the encrypted version) in SDP.
Once such feedback is put in place, the endpoint can take real-time decisions such as resending video frames which got lost during transport, or, if the number of lost frames is too high, resending a full I-frame (a complete picture–I-frame versus updates over the previous video frames–P-frame). When packet loss on the receiving side is high, the codec can decide to reduce the bitrate.
When using audio, retransmitting lost frames is not useful, so the codec will usually just decrease / increase the bitrate and quality depending on network conditions.
Symmetric RTP / RTCP
In simple point-to-point RTP communication, it is possible that only one of the endpoints can receive packets (for example, in the case of a communication between server and client, VoIP Hardware or Software SIP Phone, which is located behind a Router – Firewall implementing NAT / PAT. In this scenario, the phone will be able to reach the server, but the server cannot use the address and RTP ports (received in the SDP message from the phone) to reply, since they will not be valid.
The easiest way to handle such a scenario is to implement Symmetric RTP. The server in this scenario (Callee IP B in the image below) will reply to the source address and port of the incoming RTP stream.
Courtesy of www.ietf.org
As we can see, SDP itself is not able to actually transfer media, but together with SIP it can be used to create media sessions.
Media is exchanged by means of RTP (Real Time Protocol) packets and in its turn, the goal of RTCP is to provide information about the quality of service of the ongoing communication.