Encoding Big Buck Bunny for SmoothHD.com

Monday, February 23, 2009

So far, Alex Zambelli has had most of the SmoothHD.com fun while I was off on some other projects, but this time he was off on some other projects and I got to do an encode all myself.

I figured this was as good a time as go through a hands-on project for Smooth Streaming.

The Source

We’ve had a few years of good fun with the Blender Foundation’s Elephant’s Dream clip (as I used here, here, and here). They have a new Creative Commons licensed title up called Big Buck Bunny, which is a lot brighter, and with a lot of nice chewy grass for high frequency detail. Elephant’s Dream is a great test clip for handling gradients and dark detail (It’s all about differential quantization), but Big Buck Bunny is arguably a better test for general compression quality, and also has some nice sky gradients as well.

I downloaded the stereo audio as a FLAC file, and converted that to a 44.1 KHz 16-bit WAV file with Audacity. I then downloaded source as a series of PNG files, and then used After Effects to combine the PNG sequence and the WAV that into a 1920x1080 Lagarith RGB AVI file. Since PNG and Lagarith are both lossless, and I didn’t do any scaling or color space correction, I didn’t do any preprocessing, just assembling the content into a file Expression Encoder 2 can consume. I’m a big fan of the quality of EEv2’s Super Sampling scaling mode, so prefer to use that instead of other tools where I have less control over scaling quality.

Setting up the Encode

Since we had other encodes up at SmoothHD.com, I wanted to follow the same data rate/frames size/GOP length parameters we’d already used. For the other encodes in the series. Specifically, the below, which give us seven bands from 2500 Kbps to 364 Kbps:

Tweaking

Smooth Streaming as a feature came in very hot for EEv2 SP1, and there wasn’t a lot of time to tune the presets.

There are a couple of safe tweaks I always recommend as previously mentioned:

Adaptive Dead Zone=Conservative
Overlap=On
Scene Change Detection=On
Search Range=Adaptive

Those help Smooth Streaming (and web-rate VC-1 in general) quality with minimal impact on encoding time.

For quality-over-speed projects, I also like to use

Complexity=4
Chroma Search=Full True Chroma
Match Method=Adaptive

All together, that’ll increase encode time about 2-3x for a nice but not overwhelming quality difference.

And because this is animation content, I’ll also use

B-Frame Number=2

In general, the less noise in the content and the closer the matches between frames, the more likely 2 B-Frames will be more efficient than 1. I’ve gone up to 4 B-frames for Camtasia screen recordings.

Charts!

One cool thing about Smooth Streaming encoding is that the independent chunks make analysis and graphing of the content a lot easier. We can analyze each chunk as an independent unit, and compare quality between bitrates and versions quite directly. Do note that there is some allowed variability in chunk size; we target for a 5 second buffer, so that any three chunks in a row should average around the same amount, but if a particular chunk is harder to encode than its neighbors, it can have a somewhat higher bitrate.

Here’s the .XLSX if you want to check out the source data:

BBB_encodemodes.xlsx

I’ve encoded four versions of the content

Default: the stock EEv2 SP1 Adaptive Streaming settings
Fast: Adding the “safe tweaks” above
HQ: Adding the quality-over-speed settings
Insane: Taking HQ and adding "Insane” modes that add a lot more in encoding time than quality
- Single-thread encoding
- Complexity 5
- Hadamard Motion Match

Insane is there to show what happens when you burn as much CPU as possible.

To show the impact of these different encoding settings on the video, I used an in-house command-line tool to dump a log file of the size, type, etcetera of each frame, and then did made the Excel spreadsheet linked above to see what interesting data I could find.

First off, let’s take a look at just a bunch of frames. Specifically, the frames of Chunk 214, corresponding to the scene where the flying squirrel is flying over the forest floor right before the spikes pop up.

I’ve used contextual shading here to show the quantization parameter for each frame. In VC-1, a 1.0 is the least compressed and 31 is the most compressed. In general, QP is a decent proxy for the quality of a given frame, with values below 8 generally being pretty good and values above 8 getting increasingly blocky.

Since this is a 2-second chunk at 24 frames a second, we have exactly 48 frames. And since it’s a Closed GOP, it starts with an I-frame (keyframe).

The Insane, HQ, and Fast settings offer pretty similar results, due to using the 2 B-Frame pattern and Lookahead to detect scene and other changes. However, QP generally goes down as encode time goes up. The Default encode starts out with much higher QP since the previous scene had used more, and so needed to catch up a bit. By the end of the GOP, its QPs are going down while the “2 B” QPs are going up for a similar reason.

Interesting in itself but it’s definitely a trees-over-forest view.

For 95% of the 10+ minute clip, all the frames look fine; as typical with a CBR encode. It’s in the hard parts where the differences show up.

For the below, I’m going to focus on chunks 209-221 (6:56 to 7:22), which includes the very challenging section where the squirrel is flying over all the spikes jumping up.

In the below, we’re looking at a few metrics here:

Mean QP per chunk: The average QP of all frames in the chunk, as a proxy over overall quality.
Max QP per chunk: The highest QP in the chunk. Since it’s the ugliest parts of the video that stand out, that’s also an important flag for quality issues.
Chunk size: How many bytes in the chunk. Since Smooth Streaming is all about delivering video chunks, chunk size is the primary measure of data rate.

Here’s the data:

You can see there’s quite a bit of variability in chunk size; almost 3:1. That’s fine; we’d expect to see that in content that alternates between easy and hard shots. The swing between Mean and Max QP is worst for the Default settings; Lookhead is handy to help the codec know what’s coming down the pike. For Chunk 217, we see Default has a lower Mean QP but a higher Max QP compared to the others. In general, the Mean and Max QP is highest in the Default, as we can see in the below graphs:

Overall, we see that Default is the weakest, and Insane the strongest. But the gap from Default to Fast is the biggest change; in most chunks, Insane and HQ offer the same Max QP, and HQ has a better mean QP in a few cases. However, there are some chunks where Default does better than any of the other modes; since it’s a 1-pass CBR, one mode may have used more of its buffer by a given point than another, leaving fewer bits for the next part.

For reference, here’s that QP 31 frame in our four versions (click on the thumbnails to open at full resolution).

Default:

Fast:

HQ:

Insane:

So, what went wrong with that particular frame? It was a rate control failure of some sort. We can see from the below that the surrounding frames got more bits and lower QP in Default; it would have been better with a more consistent compression level. It’s also interesting to note the big improvement from Fast to HQ for these sequence of frames; it’s when the content is the most challenging that the slower, higher quality encoding modes make the most significant difference.

When you watch the video at full speed, the QP 31 frame isn’t really that noticeable as it’s right at a scene change.

Given the structure of Smooth Streaming the optimum way to allocate bits is quite a bit different than “classic” web encoding with the VC-1 Encoder SDK used in Expression Encoder was tuned for. Any compression engineer reading this has probably already thought of a dozen ways to improve encoders for chunked video. More about that later.

Lastly, here’s how long each version took to encode.

Default: 59 minutes
Fast: 59 minutes (1.0x Default)
HQ: 158 minutes (2.68x Default)
Insane: 268 minutes (5.4x Default)

As you can see, you rapidly get to the point of diminishing returns with slower encoding modes; the biggest improvement was from Default to Fast, which had no impact on encoding time at all.