Which Microsoft codec should I be using?
The Skype for Business client has three codecs that is leverages which are the following:
SILK – Used in peer to peer media transmission
RTA (Real Time Audio) – Used with legacy clients for peer to peer and Skype clients when making calls to PSTN
G.711 – Skype (Lync 2013) client leverage when making calls to PSTN
Session Description Protocol
Before we get into the codecs that are used, let’s take a look at the Session Description Protocol (SDP) as seen from a snooper trace in figure 1 below. The SDP capture shows what codecs are available for the client will be used for the communication. The SDP displays all the codecs that capable of being used in the communication from the client and the order in which they will be leveraged.
Note: The term codec is a combination of the words “coding” and “decoding” used to convert an analog voice signal to a digital version of the voice signal.
Figure 1: Session Description Protocol
One of the first things that comes to mind if you have not heard of this codec before is, “When did we start talking about clothing material?” Yes they are both spelled the same, but they mean two different things; Introduced in the November 2015 Lync 2013 Cumulative Update release, Microsoft S4BS leverages SILK for peer-to-peer (Wideband) conversations. The plan is to have SILK eventually replace older Microsoft audio codecs used in the Lync 2013 and S4BS platform.
Below in figure 2 is a SDP capture of an egress Skype client call out to the PSTN. The essence of the picture is not to display the actual call but rather how to identify the SILK codec in the log trace.
Figure 2: SILK Codec seen in a snooper trace
The SILK codec which will eventually replace the Real Time Audio (RTA) codec comes in two distinct patterns wideband and narrowband.
a=fmtp:103 useinbandfec=1; usedtx=0
a=fmtp:104 useinbandfec=1; usedtx=0
Both pair of codecs narrowband and wideband can be used for Lync 2013 and Skype audio calling, of course depending on the type of call rather it’s ingress or egress. SILK supports in-band Forward Error Correction (FEC), denoted by the ‘useinbandfec=1’ parameter. Whenever a call is made peer-to- peer Skype will try to leverage the wideband codec and when the call is placed to the PSTN the narrowband codec will try to be leveraged.
Real Time Audio (RTA)
Microsoft created their own proprietary audio codec back in the days of Office Communicator Server (OCS 2007) and it too comes in both narrowband (8 kHz) and wideband (16 kHz). Since Office Communicator Server 2007, RTA was the primary codec for peer-to-peer calls until November 2015 Cumulative Update release for Lync 2013 when Silk became the default codec for peer-to-peer conversation with the Skype client. RTA is now the fallback for peer-to-peer and calls to PSTN for Lync and Skype client calls. Like the SILK codec, RTA calls that are peer-to-peer will leverage the wideband codec and in calls that go to the PSTN the narrowband will be used.
Narrowband a=rtpmap:115 x-msrta/8000
Wideband a=rtpmap:114 x-msrta/16000
Figure 3: RTA Codec seen in a snooper trace
The industry standard G.711 audio codec is used throughout the PSTN world. When you make a cell Phone call to another cell phone, we are using G.711 for the media. Whenever we make calls to the PSTN world from our Lync \ Skype client this is where G.711 comes in. Now you say what happened to RTA and Silk? Those codecs don’t relay to the PSTN world; this is where the Mediation server comes into the picture; for the mediation server is responsible for doing the media transcoding codec from RTA or SILK to G.711 when we are making an outbound call to the PSTN.
Figure 6: G711 Codec seen in a snooper trace
These two codecs represent different calling signals; PCMU represent G.711 µ-Law used exclusively in North America and PCMA for G.711 A-Law used throughout the rest of the world. When the Lync \ Skype client decides to send calls to the Mediation server in G.711 format. The Mediation server does not have to do any media transcoding due the media format for the intended party is already in the format that it accepts. When media is sent to the Mediation server from the Lync \ Skype client in the format of RTA or SILK, media transcoding takes place which ends up placing additional CPU cycles on the Mediation server for that particular call.
Codec Inner workings with the Skype client
We have established RTA is Microsoft’s proprietary audio codec and has both wide-band (16000Hz) and narrow-band (8000Hz). If there is lots of bandwidth available, then the Skype for Business 2015 (Lync 2013) client would typically send G.711 directly to the Mediation Server so that this did not have to perform any audio transcoding.
In the event that available network bandwidth is limited then instead of sending G.711 directly to a Mediation server for outbound media sessions destined for the (PSTN) the Lync client can utilize RTA instead. Although this will provide better quality at a lower bit rate over a poor network it will require that the Mediation Server perform decoding and re-encoding tasks on the media session into G.711 for the PSTN side. In scenarios with plenty of local bandwidth the Lync client will typically send G.711 to the Mediation Server (freeing the Mediation server from transcoding).
These codecs can be used in numerous calling scenarios but are most commonly seen in calls with PSTN callers. The most common scenario is when placing a Skype audio call to the PSTN where there is plenty of available bandwidth on the network between the client and the mediation server. The goal here is for the client to simply encode the audio in G.711 so that the Mediation Server is not taxed with having to perform any transcoding; it will simply send the media on to its next hop. In the event that local bandwidth is limited and the Skype client is aware of this it may instead opt to encode the audio in Real-Time Audio (RTA) so that the transmission over the network is more efficient and then the Mediation Server will need to decode the RTA session and re-encode it into G.711 for delivery on to the PSTN.
The below chart depicts the needed BW in Kbps for RTA and G.711.
|Audio codec||Scenarios||Maximum bandwidth (Kbps) w/o FEC||Maximum bandwidth (Kbps) w/o FEC||Typical bandwidth (Kbps)|
|RTAudio Wideband||Peer-to-peer, default codec||62||91||39.8|
|RTAudio Narrowband||Peer-to-peer, PSTN||44.8||56.6||30.9|
The selection of the Skype client to leverage RTA is only trying to best serve the client media experience in the call. RTA is not a bad thing based on the conditions but more optical for the end user based on the current conditions of the call. Leveraging RTA instead of G.711 for a call from the client to the Mediation server is only saving BW for the transmissions and as a result using less BW trying to make the best call experience for the user.
The Codec knows what the Codec wants
Somethings are not adjustable and the Skype client is one of those things. The goal of the Microsoft codec is to be adaptable to our environments and vary its use based on our current network conditions at that time. Fore surely we can “right size” our environment; but at the end of the day the codec is dynamic in nature in the sense that it’s going to pick what it should use at that exact moment based on the conditions it sees around it. So the codec knows what the codec wants.