Expression Encoder 2 Service Pack 1 – Intro and Multibitrate Encoding

Saturday, January 17, 2009

Oh, I’m a bad, bad blogger. The Expression Encoder Service Pack 1 has been out for MONTHS, and and it’s cool enough that I haven’t had the time to do it full justice. Oh well, better short than nothing; I’ll just start off with the biggest new feature: multibitrate encoding, including support for Smooth Streaming.

First, install it already! It can be downloaded here, and probably most of you picked it up via Microsoft Update by now. James Clarke had a great overview blog post about it which I recommend.

This first post I’m going to focus on Smooth Streaming and multibitrate streaming in general.

“Adaptive Streaming” and Smooth Streaming

Here’s the big one: Expression Encoder can now do multibitrate encoding. Alex Zambelli has a good history of the technology.

When doing multibitrate encoding, EE does multiple simultaneous encodes from the same source file. While the big driver for that feature is Smooth Streaming (another overdue blog topic), we also support encoding a single multiplexed “Intelligent Steaming” WMV file or individual WMV files for each bitrate.

SmoothHD.com

And If you haven’t seen it yet, head on over to the SmoothHD.com demo site we do with Akamai’s and check out the technology in practice. And yes, all those files were encoded with SP1.

SmoothHD.com in action. Those bars in the lower right show which data rate band you’re getting; mouse over for more details. Click on it to bring up a diagnostic menu and play around with scaling. And boy, that 20 Mbps DSL upgrade I’ve got on order can’t come soon enough!

Previewing Smooth Streaming with Expression Encoder

So, you want to play around with Smooth Streaming, but don’t want to have to upload your files to Akamai every time you want to do a quality check? Fair enough, but we don’t have media players that support the Fragmented MPEG-4 format used in .ismv files yet. Fortunately, EEv2 SP1 also includes a little localhost Smooth Streaming web server built into it. If you encode to a Silverlight 2 template and make sure that “Preview in Browser” is checked, you’ll get Smooth Streaming working, heuristics and all! It’s only on the local machine, and only when EE is running, but it’s a start.

To preview the video quality of Smooth Streaming, you can also make individual WMV files and watch them as normal. The output bitstreams are identical no matter what the file is wrapped in.

WMSnoop remains a great (and free) way to check out the GOP structure of WMV files. Below (click to zoom) you can see how Smooth Streaming produces a rock-solid cadence of a keyframe (in yellow) every 60 frames, while still being able to insert extra keyframes as needed for high quality scene changes.

Encoding for Multibitrate

Unlike the old FSDK-based MBR encoder, each stream is encoded in its own thread, which means a whole lot of cores can be saturated; my 8-core typically is running at nearly 100%, and if HD is involved, a 16-core would as well. The limiting factor can wind up being the speed of the source decoder past a certain point. Here’s a pretty typical load I’ll see when running from a Lagarith source file.

Multibitrate Video Options

Most of the parameters must be identical per stream; only frame size and bitrate can vary each. Other parameters like frame rate, audio, and codec settings are fixed. This is to facilitate Smooth Streaming by making sure that I-frames appear at the same point in each output stream. For technical reasons with the current VC-1 SDK, this also means only 1-pass CBR is supported for multibitrate encoding.

Multibitrate Audio Options

Only one audio option can be selected when doing Adaptive Streaming encoding. This isn’t the limitation it would have been in the WMA days, as WMA 10 Pro’s low bitrate audio quality is so good. We find 64 Kbps provides great quality for most content, and fits into most web delivery bitrates.

Multibitrate Container Options

There are three container choices

ASF Single File. This gives you one WMV file for each video data rate, each including its own copy of the audio. This is an easy way to make multiple versions of the same source file.
ASF Multiple File. This gives a single WMV file containing all copies of the video and one copy of the audio, ala Intelligent Streaming. This provides bitrate switching when used on Windows Media Services.
IIS Smooth Streaming. This is for use with the forthcoming IIS Media Pack version that enables Smooth Streaming. To use this today, you’ll need to be part of Akamai’s beta program. That gives one output file per stream. There’s no way at the moment to play these files easily on the desktop, as they use the Fragmented MPEG-4 file format, which isn’t broadly supported in media player software; it’s mainly used inside of set top boxes like a TiVo.

Multibitrate Advanced Codec Settings

There’s a few required settings to get optimal Smooth Streaming compatible content in EEv2SP1. Smooth Streaming requires that all bitrates start have Closed GOPs (Group of Pictures – a keyframe and all the frames that reference it) starting on the same frame every few (typically 2) seconds. This means that the decoder gets a continuous sequence of frames to decode without any overlaps or missing frames, and without having to run two simultaneous decoders. These are compatible with (but not needed for) ASF, and using them gives the option to remux to Smooth Streaming at a future date.

Maximum QP=31. This is a new feature in SP1, which specifies the maximum allowed Quantization Parameter that can be used. This is similar to the old WME “Smoothness” control which determines the tradeoff between preserving image quality versus frame rate. We’re trying to avoid a frame getting dropped in one data rate but not another, which could throw off GOP alignment. QP 31 is the most compressed a frame can be in VC-1, thus minimizing the chance of a skipped frame.
Adaptive GOP=Off. Adaptive GOP tells the codec that it can “reset” the Key Frame Interval, making it into really “keyframe at least every.” That improves efficiency, but makes it possible that the different output bitrates might wind up starting new GOPs at different times, without any mechanism to resync them. When Adaptive GOP=Off, it’ll still insert an I-frame where appropriate to optimize quality, mainly when there’s a video cut in the middle of a GOP.
Closed GOP=On. In a Closed GOP, the first frame of the GOP is always an I-frame (keyframe), and no frame in the GOP references any other GOP. That makes each GOP fully self-contained, as required for Smooth Streaming. If Closed GOP were off (Open GOP), B-frames can reference frames in other GOPs. This helps efficiency a little bit, but means you always need to have decoded most of the previous GOP to play the first frame of the current one. Even if Closed GOP=On, any additional I-frames inserted due to scene changes may be an Open GOP for improved efficiency.
Output Mode: Elementary Stream Sequence Header. With an Elementary Stream Sequence Header, the VC-1 bitstream includes all the data the decoder needs to play back the chunk of video in that chunk. It’s required (and it’s a tiny bit of data; there’s no downside to using it).
Insert Skipped Frames=On. With Skipped Frames, a frame doesn’t get encoded due to exceeding the Max QP or because it’s just a static frame without motion, consist of just a flag indicating “I’m the same as the frame before.” If Skipped Frames=Off, the frame before the dropped frame has its duration extended to cover the missing frame(s). The actual video playback is identical in either case; but this ensures dropped frames don’t cause a GOP misalignment. There’s no real downside to using this for all files, Smooth Streaming or not.

Tips for multibitrate encoding

General Codec tweaks

These are the changes I made from the Adaptive Streaming default, implementing the typical “make it look good” settings I recommend for EE in general also apply to multibitrate compression. Just don’t mess with the settings up above!

Always On recommendations

These settings won’t have much impact on encode time, but can help quality significantly, particularly with

Adaptive Dead Zone=Conservative . This tells the codec to bias toward softness over blockiness as a frame gets more compressed.
Overlap=On. The Overlap filter softens the edges of blocks, reducing blocking and artifacts at higher compression rates. Web delivery is almost always at rates where the Overlap filter helps quality.
Scene Change Detection=On. Scene Change Detection tells the codec to buffer ahead 16 frames to look for fades, flashes, and scene changes, and dynamically change the frame type based on that. This particularly helps with fades to/from black, and with flash frames (think raves and gunfights). While it can insert an I-frame, it won’t disrupt the pattern locked in by having Adaptive GOP=Off. Scene Change Detection never hurts quality, and often helps. Scene Change Detection maps to the Lookahead parameter used in the Format SDK.
Search Range=Adaptive. Lets the codec dynamically adjust the search range based on the amount of motion in the content. Helps quality a lot when there’s more than 64 pixels horizontally or 32 pixels vertically of motion between two P-frames, without spending CPU cycles on frames with more sedate motion.

Quality over Speed recommendations

These are settings I use when I’m more worried about optimum quality than the speed of encode.

Complexity=4. Complexity controls how hard the codec works on each frame, particularly the precision and thoroughness of looking at motion from frame to frame. Complexity 3 is a great default, since it provides most of the value of 4 and 5 at a higher speed; quality drops of quickly as you go to 2 on down. But 4 is a little better yet, particularly with complex motion like particle effects. Complexity 5 is overkill most of the time, but I wind up using it sometimes when I’m feeling more fussy than rushed, “just in case.”
Chroma Search=Full True Chroma. Chroma Search looks for where the chroma (color) of the image changes differently that its luma (brightness).. This is particularly helpful with motion graphics and animation. Chroma is only a third of the data in the encode, so a full precision chroma search doesn’t add all that much to the encoding time, and with the right content can really help squeeze things down.
Match Method: Adaptive. This switches between using Sum of Absolute Differences (SAD) and Hadamard motion matching on a per-macroblock basis. Don’t sweat what that means too much; just know it’s slower and a little faster than the default SAD-only. Full Hadamard is slower yet and sometimes lower quality than Adaptive.

Stretch, not Letterbox

Expression Encoder defaults to letterboxing to be used when the output frame size doesn’t match the source aspect ratio. However, even a slight variation in output aspect ratio from the source’s (typically to get the encoded height and width divisible by 16, the most efficient option) can result in a thin black line of letterboxing (top and bottom) or pillarboxing (left and right). If Resize Mode=Stretch and Mode=Profile Adaptive, Profile Adaptive will get the output frame size in the right ballpark while Stretch adjusts the video slightly to avoid those darn black lines. To make sure the output aspect ratio really is perfect, you can set the Video Aspect to that of the source, and on playback it’ll stretch the image pack to the exact original frame size and shape. Leaving it at the default of Square Pixel generally looks fine as well.

Note that you may need to reapply the Profile Adaptive setting if you apply a job preset to multiple files at once. That can reset to Mode to Custom.

Integer Framerates

By default, Adaptive Streaming profiles gives a keyframe every 2 seconds. However, that’s a precise 2.0000 seconds. So, if your source is a typical NTSC frame rate like 29.97 or 23.976, you’re 0.1% short of 30 and 24 frames a second, and so wind up getting a keyframe every 47 or 59 frames, not the 48 or 60 you might expect. If you set the frame rate to the whole number equivalent, the encode will actually maintain the source frame rate correctly (the value serves as an upper bound, but EE won’t insert duplicate frames when set higher than the source). But it will round up the keyframe cadence to the whole number, so you’ll get the keyframe every 48 or 60 frames like you’d expect.

Not really a big deal in practice I suppose, but I’ve long since internalized the multiples of 48 and 60, which makes it easier to me to figure out where my keyframes are.

Determining bitrates

Picking the optimum single frame size and data rate combination for a particular piece of content can already be a challenge. Having to determine optimal sizes for a bunch of different data rates. Fortunately, Alex Zambelli is as always ready with a handy utility for us – the Smooth Streaming Multi-Bitrate Calculator!

All you need to know is

What bitrate works for the highest frame size for your content
What minimum bitrate you want to go to
How many steps to take

There’s actually some pretty deep math in there that knows how VC-1 bit-per-pixel requirements change as frame size changes, etcetera.