Live Smooth Streaming – Design Thoughts

Friday, March 27, 2009

Last week we released the first Beta of IIS Live Smooth Streaming. If you had a chance to watch Scott Guthrie’s demo in the MIX keynote, or our program manager John Bocharov’s MIX session, you have probably seen the slick player UI and gotten a feel for how the technology works in general. I hope you like the experience and the new functionalities. Today I’d like to dig a little bit deeper and talk about some interesting design thoughts we put in when developing this product.

Before we get started, I’d like to call out the crucial distinction between “traditional streaming” –using stateful protocols such as RTSP, MMS, or Adobe’s RTMP – and “IIS Live Smooth Streaming” – which uses HTTP, a stateless protocol. Let’s take a look at the differences between these approaches in detail…

1. Push vs. Pull

First let’s take a look at how traditional live streaming works. A client first establishes a control channel to the server to do some initial setup. Then the server uses the data channel (which could be the same channel as the control channel for protocols such as HTTP) to deliver a long running live stream to the client. While streaming, the client can use the control channel to send control messages to the server to re-set-up streams, pause, seek (if supported), stop or shutdown. Although the initial request and the commands are always coming from the client, if we just focus on the data delivery part, it is actually a push model. The server keeps pushing the latest data packets to the client and the client just passively receives everything.

On the other hand, IIS Live Smooth Streaming is actually a pull model. The client initiates a manifest request to set up the streaming. Then the client issues fragment requests, one for each fragment, to build up its sample buffer for streaming. In fact, all these requests look just like normal Web requests. The server needs to either reply with data immediately or fail the request. The fragment responses can be cached by any web cache/proxy and later be returned directly from the cache server to other clients. This pull model is how standard Web requests work, and that is why IIS Live Smooth Streaming is able to take advantage of the massive existing Web infrastructure to scale it out for you.

So other than being more web-friendly, is there anything else that has changed as the result of converting from push to pull? Read on if you’re interested.

2. State Control

In traditional live streaming, the client state is managed both by the client and the server. The server keeps a record of each client for things such as playback state, streaming position, selected bit rate (if multiple bit rates are supported), etc. While this gives the streaming server more control, it also adds overhead to the server. What is more important is that each client has to maintain the server affinity throughout the streaming session, limiting scalability and creating a single point of failure. If somehow a client request is rerouted by a load balancer to another server in the middle of a streaming session, there is a good chance that the request will fail. This limitation creates big challenges in server scalability and management for CDNs and server farms.

With the pull model in Live Smooth Streaming, the client is solely responsible for maintaining its own state. In turn, the server is now stateless. Any client request (fragment or manifest) can be satisfied by any server that is configured for the same live event. The network topology can freely reroute the client requests to any server that is best for the client. From the server’s perspective, all client requests are equal. It doesn’t matter whether they are from the same client or multiple clients, whether they are in live mode or DVR mode, which bit rate they’re trying to play, whether they’re trying to do bit rate switching, etc. They’re all just fragment requests to the server, and the server's job is to manage and deliver the fragments in the most efficient way. Unlike some other implementations, the Live Smooth Streaming server’s job is once again to keep all the content readily available to empower the client’s decisions, and to make sure it presents the client with a semantically consistent picture. This has two benefits: (1) the feedback loop is much smaller as the client makes all the decisions, resulting in a much faster response (e.g. bit rate switching), and (2) it makes the server very lean and fast. The IIS Live Smooth Streaming beta server module is a mere 230KB DLL binary (32bit version)!

You can see that the division of the responsibilities between the server and the client has changed in the pull model. The server is focusing on delivering and managing fragments with the best possible performance and scalability, while the client is all about ensuring the smooth streaming/playback experience, which, in my view, is a much better solution for large-scale online video.

3. Latencies

We usually talk about two kinds of latencies in live streaming. One is startup latency, which is how fast a client can start playback after an initial open or seek. The other is end-to-end latency, which is the time delay between a real world live event and its appearance in video playback on the client.

For startup latency, the basic approach is still the same, in that client needs to fill up its buffer as fast as possible. What’s different with IIS Live Smooth Streaming is that the client can issue multiple requests simultaneously for the initial data chunks. Those requests could be served by nearby HTTP cache nodes instead of streaming servers which could potentially be much further away. Similarly, in a cache-miss scenario, multiple streaming servers could be used for more throughput than a single server can provide. The IIS Live Smooth Streaming client also tries to start with fragments from a lower bit rate to further expedite the startup process.

End-to-end latency is essentially the sum of all the delays that media samples will encounter when they travel through the entire media processing delivery pipeline. Other than encoder delay and network delay, the biggest factors affecting end-to-end delay are the local buffers on both the server and the client. Media data has to go through these FIFO buffers before it can be decoded and rendered. Smaller end-to-end latency is always good, especially for time-critical events, but it also comes at a cost. In order to lower end-to-end latency, the server and client buffers need to be tuned down to smaller sizes, which can result in a longer startup latency (there is less data in the server FIFO buffer to blast down to fill the client buffer) and more vulnerability to network jitter.

With IIS Live Smooth Streaming, the server automatically archives all the fragments that it has received since the beginning of the event. When the client first connects, the server publishes all the information about the archived fragments in the stream manifest. Based on the stream manifest, the client logic can determine where to start the playback, striking the best balance between end-to-end latency and other considerations. The startup position is no longer mandated by the server. A position closer to the “live edge” means smaller end-to-end delay with the corresponding tradeoffs, and vice versa.

4. Timing Control

Timing control in traditional live streaming is a given. The client passively accepts what the server pushes out, so it always knows how the live stream is progressing. Now with the pull model, it has become more interesting. The client is the one who initiates all the requests and it needs to know the right timing information in order to do the right scheduling. Given that the server is stateless in this pull model and the client could talk to any server for the same streaming session, it has become more challenging. The solution is to always rely on the encoder’s clock for computing timing information and design a timing protocol that’s stateless and cacheable.

5. Clock Drift

One common issue with live streaming, especially in 24x7 scenarios, is clock drift. What happens is that the encoder, server, and client are all running their own clocks for scheduling and in most cases these clocks are not synchronized. Even slight discrepancies between these clocks can result in a non-trivial clock drift over a period of time, causing buffer overflow or underflow. Let’s take a look at them in more detail below:

5.1 Buffer Overflow

This can happen if the client’s clock is running slower than the encoder’s clock, meaning that client is not consuming samples as fast as the encoder is producing them. As samples get pushed to the client, more and more get buffered, and the buffer size keeps growing. Over time this can cause the client machine to slow down and eventually run out of memory. However, with the pull model we have for IIS Live Smooth Streaming, this problem is automatically solved. The client is driving all the requests and it will only request the chunks that it needs and can handle. In other words, the client’s buffer is always synchronized to the client’s clock and never gets out of control. The only side effect of this type of clock drift would be that the client could slowly fall behind, transitioning from a “live” client to a DVR client (playing something in the past).

5.2 Buffer Underflow

Buffer underflow can be the result of a client’s clock running faster than the encoder’s clock. The reason is obvious: the client is consuming samples a little bit too fast. In this case, the client has to either keep re-buffering or tune down its renderer clock. To detect this case, the client needs to distinguish this condition from others that could also cause buffer underflow, e.g. network congestion. This determination is often difficult to implement in a valid and authoritative manner. The client would need to run statistics over an extended period of time to detect a pattern that’s most likely caused by clock drift rather than something else. Even with that, false positives can still happen.

In IIS Live Smooth Streaming’s pull model, the server has more opportunities to communicate with the client about the timing status. In the case that the client has a faster clock, it might end up requesting a chunk that’s not available on the server yet. When the server sees this request, it knows that it’s for a future chunk that will be available shortly, and can therefore return a special error code or status to indicate a “temporarily” not-found condition. This is different than requesting a chunk that’s missing on the server which is a “permanent error”. When the client gets this “temporary error” message, it knows immediately that it is running too far ahead. The server can also directly tell the client how much it was ahead to further help the client tune its clock. This seems to be an easier and more reliable way to detect and correct the clock drift problem.

To make the story complete, I need to point out that there is a more advanced mechanism in traditional streaming to help solve the clock drift problem. That is done by having the client constantly report back its buffer status via a separate feedback control channel using protocols like RTCP. Even with that, the feedback loop is quite lengthy, and the client is still at the mercy of the server to throttle the send rate up or down accordingly. Also, it usually involves UDP communication rather than TCP, which could be an issue in some server and client environments.

6. Summary

In summary, the pull model we designed for IIS Live Smooth Streaming is very different than how the traditional live streaming push model works. With this model, the server becomes a very lean stateless server with better performance, scalability, and manageability. The client has more control over different aspects of streaming/playback and is better equipped to handle all the challenges to offer a great user experience. We’d like to hear your feedback on which things you like and which things that you think we can do better.

Thank you.

Windows Media Services HTTP live streaming is also a push model. Most of the existing live streaming protocols are push models from server to client. Just to be clear, the push/pull here is only for the media data transfer part. It's not the same thing as who initiates the first request in which case any unicast protocol is a pull model because client initiates the first request.
The typical buffer size varies depending on the server type and configuration. In general, if you don't care much about end-to-end latency, you can have a bigger server side buffer for fast start. Small server side buffer would mean shorter end-to-end latency but longer startup latency. It's a trade off.

samzhang - Thursday, July 16, 2009 6:59:54 AM

Good overview, thanks.

abdutr - Wednesday, June 9, 2010 10:34:10 PM