Divergent Media: Video Compression 06/13/12
Every month, a daily progression of fundamentals on a topic.
We’ll be tweeting links to these tips out daily, so make sure to follow us if you don’t already.
TABLE OF CONTENTS:
WEEK 1: Image / Audio Basics
WEEK 2: Compression Speak
WEEK 3: Initial Questions
WEEK 4: Typical Workflows
WEEK 1: Image / Audio Basics
Monday – 06/4/12
The world of video can make even a seemingly simple concept complex, with branch upon branch of “if this then that” caveats. So let’s start with the basics.
What is video? If we were talking about film, we could answer that question pretty easily – it’s a series of still pictures displayed sequentially, like a flipbook. Display those pictures fast enough and the brain perceives them as continuous. Each of those individual pictures are what we call a “frame,” and displaying entire frames sequentially is what we call “progressive.”
In an ideal world, video would be just an electronic version of film – a series of pictures, shown one after another, fast enough to seem fluid. However, video (television) dates back to a time when folks in lab coats were forging equipment out of iron, coal and smallpox. Various limitations imposed by the technology available at the time (transmitters, timing circuits, etc.) meant that engineers were forced to look for ways to be more efficient. Their solution was interlacing.
If a progressive signal is a series of complete frames, an interlaced signal is a series of half frames. But – and here’s the clever bit – it’s not just the top half of the video, followed by the bottom half of the video (that’s another topic).
A frame of video is made up of pixels arranged in rows and columns. An interlaced signal deals with two sets of rows – the odd rows (1, 3, 5, etc.) and the even rows. Each set is called a field.
So, interlaced video is two different fields, displayed sequentially. First we show one group of rows, then the other. Now, here’s the mind bending bit: these two fields represent two different moments in time. Again, this goes back to some decisions made in the time of prohibition and awesome mustaches, before we had digital framestore and accumulators. If you’ve ever seen two fields of video combined together and displayed at once, you’ve seen the “comb” effect which results. Any object in motion ends up broken apart in alternating rows, and it can look pretty unpleasant.
So, if interlacing is essentially a “hack” to get around old technical limitations, why are we still dealing with it today? It’s due to a mix of need for backward compatibility and the fact that, even with modern technology, interlacing can provide benefits in terms of efficiency.
As we dive deeper, we’ll get into the specifics of dealing with interlaced and progressive signals. But for now, the important thing to remember is that a frame is a whole picture – one moment in time, captured just like a photograph. A field is half a picture, again representing a specific moment in time. Both are perfectly acceptable ways of storing and transmitting video, provided that they’re dealt with appropriately.
Tuesday – 06/5/12
THE DIFFERENCE BETWEEN ANALOG AND DIGITAL
Analog and digital are concepts that can be as simple or complex as you’d like to make them. Let’s just start with some basics.
We want to draw a curve on a piece of paper. No problem; put the pencil on the paper, move the pencil a bit, lift the pencil up – you’ve made an curve. An analog curve.
Now, do the same thing, but don’t move the pencil while it’s on the paper. You can only make little dots. You can still draw that curve, you just need to make lots of little dots, moving a tiny bit in between each dot. If you get your nose right up against the paper, you might be able to tell that the curve is actually lots of dots, but once you stand back a bit, it just looks like a continuous curve. That’s digital.
Another way to think of digital is to think about the word “digits” – numeric data. Computers like digital. Ones and zeros. Individual, discrete values. Analog is much more akin to the organic world that humans live in.
Just a few short years ago, most things in the video world were analog – VHS tapes, RCA cables, etc. Today, we almost never encounter analog. We capture digitally, we edit digitally, and we move our signals around digitally.
So, what does a digital video signal look like? Let’s talk about a hypothetical, simplified signal. We’ll assume this signal is progressive (rather than interlaced) and it’s going to send colors as RGB.
In order to construct this digital signal, we need to send a stream of pictures, just like a series of film frames. Each picture is made up of lots of individual pixels (picture elements), which are just like our dots on the paper. Each pixel needs to have a number representing how bright red, green and blue should be. So, an all-black pixel might be “0,0,0” and an all-white pixel might be “255, 255, 255.” As long as both ends know that “255” represents “full brightness,” we’ll get a white pixel.
All we need to do is send a long stream of numbers, and, assuming both ends have agreed on what those numbers mean, our picture will be reconstructed with mathematical perfection at the receiving end.
In its most elemental form, that’s how digital video works. A stream of numbers encoding information about the individual pixels within a frame of video.
Wednesday – 06/06/12
RESOLUTIONS AND ASPECT RATIOS
We’ve already discussed the concept of a digital signal as a set of discrete elements – pixels – which make up a frame of video. Now we need to talk about how those pixels actually go into a frame, and how we arrive at those values.
When we talk about resolution in the digital space, we’re talking about how many pixels are packed into a given frame of video. When we say “low resolution video” we mean that each frame is made up of a low number of pixels. This means there’s less detail, and when displayed on a large monitor, the video will be blocky. Think of an old postage-stamp-sized video on a CDROM.
What does this actually mean in terms of numbers? Well, a typical “low res” video on the web might have frames that are made up of approximately 76,000 pixels each. A high definition video, on the other hand, will have frames made up of more than 2 million pixels each.
Now, just having a bucket full of 2 million pixels doesn’t do us much good. We need to arrange them into our frames. While we could make a video that was a single row, two million pixels long, it wouldn’t be very much fun to watch. That’s where aspect ratio comes into play.
The aspect ratio simply defines the ratio of the height versus the width of a video. A completely square video frame would have an aspect ratio of 1:1. That would mean that if it was 100 pixels high, it would also be 100 pixels wide.
Traditional, standard definition video was always 4:3. So if a video was 400 pixels wide, it would be 300 pixels high. In the modern, high-definition world, we rarely deal with 4:3. Instead, most video is 16:9 or widescreen. The 16:9 aspect ratio more closely mirrors the way we experience the world around us – because of the layout of our eyes in our head, our field of view is wider horizontally than it is vertically.
So, back to our bucket of 2 million pixels. If we say we want a 16:9 aspect ratio, what we end up with is an image that has 1920 pixels horizontally and 1080 pixels vertically. If you’ve dealt with HD video before, those numbers should look pretty familiar.
There are a variety of other aspect ratios used in motion picture production – 21:9, etc. If you know some basic algebra and geometry, you can use aspect ratios to calculate lots of useful information about screen sizes, etc.
Thursday – 06/07/12
In a perfect world, without resource or budget constraints, a digital video signal would be exactly like what we’ve already described. A series of numbers representing the red (R), green (G), and blue (B) values for individual pixels, arranged into a defined frame. We’d send the first picture, then the second, and so on.
Of course, in a perfect world, clients would never ask for last minute re-shoots and planes would never fly overhead right in the middle of a dialogue take. In our real, slightly messier world, we’re constantly constrained by the limits of resources – computing power, storage space, internet bandwidth, etc. And these limitations mean we can’t just send our hypothetical stream of numbers. We need to be a bit more efficient. We need compression.
Let’s look at some simple ways to make our hypothetical video stream more efficient. We’ve already said that a white pixel would be represented as three values for red, green and blue: 255, 255, 255. What if both ends (the transmitter and the receiver) agree that white can be represented instead as one value – 256? When the receiver sees a 256, it knows to draw that pixel as white, and we’ve saved ourselves some bits. This is called dictionary-based compression.
Similarly, what if our video has big chunks of black – long streams of all zeros. Rather than sending [0,0,0] [0,0,0] [0,0,0] [0,0,0] etc., the transmitter might say [0,0,0]*4. The receiver knows what that means and draws it appropriately. This is what’s called run-length encoding.
In both of these cases, the final, reconstructed signal is numerically identical to the original signal. We sent fewer numbers from the receiver to the transmitter, but both ends ended up with the same values. This is what we call lossless compression. No data was thrown away, it was just moved around in a more efficient manner.
We encounter lossless compression in a variety of ways each day. ZIP files are a common type of lossless compression; when you zip a file and then unzip it, your file is unchanged.
Lossless compression is most effective for dealing with signals that have lots of repeating patterns or other common elements. Unfortunately, video often consists of nearly random data – grass blowing in the wind, rippling water, etc. We need another approach – one that can deliver consistent space savings, regardless of the visual makeup of the images.
Think about a video of rippling water. Each frame of that video is only on the screen for a fraction of a second. We don’t have time to notice each and every tiny detail of those ripples. What if we could remove the parts of the video that viewers were unlikely to notice? That’s essentially what lossy compression is. The goal is to remove data in such a way that the viewer won’t notice it’s gone. There’s all kinds of complicated math behind the scenes here, but, done well, lossy compression can be nearly unnoticeable while providing huge benefits.
Compression sometimes gets a bad rap – we’ve all watched web videos that were chunky, blurry blocks of indistinguishable colors. The reality is that almost all video we see is compressed with lossy compression – BluRay disks, broadcast and cable TV, even digital, projected movies in theatres. It’s all about finding the balance between what we throw out and what we preserve.
Friday – 06/8/12
INTERFRAME VERSUS INTRAFRAME
So far, we’ve talked about video as a discrete series of pictures. Basically, a flipbook in digital form. When we talked about compression, we talked about reducing detail within an image, or reducing repetition within an image. This basic type of compression is called intraframe compression. The “intra” part means that all of the compression is happening within each individual frame. If you’re familiar with compressed JPEG images, you can think of intraframe compression as akin to viewing a series of JPEGs, one after another.
In the real world, most video signals don’t change dramatically from frame to frame. If you’ve got a talking head in front of a black backdrop, there may only be a little bit of movement of a person’s mouth and eyes, with almost everything else unchanged. Compression technology has evolved to take advantage of this fact, with something called interframe compression.
Interframe compression is based on the idea of transmitting a mix of complete images, partial images, and information about what has changed from image to image. This technology has gotten incredibly sophisticated; interframe compression technology can transmit information about movement, scale changes, color changes and a variety of other things.
Interframe compression was originally used primarily as a delivery technology. Almost all video shown on the web, on DVDs, on BluRays and digital broadcast video use interframe compression. That’s because it’s incredibly efficient and well-suited to linear playback.
In the last few years, interframe has become a realistic technology for acquisition and editing as well. HDV and AVCHD cameras, along with most DSLRs, capture video with interframe compression. Editing software has dramatically improved to support these formats in a way that hides all of the behind-the-scenes complexity from the user.
So which is better? Intraframe is still considered the gold standard for compressed acquisition and editing. Think of formats like ProRes, and DNxHD. As frame sizes and processing power increase, interframe compression will likely eat away further and further at the dominance of intraframe on the high end.
WEEK 2: Compression Speak
Monday – 06/11/12
“How do you want the video compressed?” “QuickTime is fine.” “Right, but what format?” “QuickTime.” “…”
Ever had that conversation? If so, you’ve entered the wild and confusing world of file wrappers and video codecs, which creates mass confusion and slows workflows every day.
If you asked someone to name the video formats they’re familiar with, it’s very likely that they’d list QuickTime (.mov), AVI, or perhaps MKV. All of these are actually what are called wrappers. They’re containers that hold video, audio, and perhaps images, animations, text or just about anything else.
Perhaps an analogy would help. If you were handed a book, regardless of the language the book was written in, you could probably pick out the table of contents at the front and the index at the back. They generally fall in the same places, and have similar structures.
File wrappers are very similar. They define where in a file video content is stored, where audio content is stored, whether the time is measured in frames or milliseconds, etc. But like our book in an unknown language, wrappers don’t define the “language” or compression scheme of the actual content. A QuickTime file might contain video compressed in one of hundreds of different formats. And, confusingly, just because a hardware device or application says it can play “QuickTime” files, it may or may not be able to play a given file depending on if it speaks that specific video language or not.
So why do we have wrappers? Why not just have .h264 files and .prores files and a variety of others? In reality, having standardized wrappers saves a huge amount of time. When creating a new format, developers don’t have to reinvent the wheel and come up with a new way of describing how large the file is, or how many audio tracks it has. Instead, they can focus on the part they care about – the compression scheme.
Standardized wrappers also allow us to have a variety of tools for manipulating these files without having to know about the contents. For example, there are command line tools which can extract video tracks from QuickTime files, inject new tracks, or add caption data, all without having to know anything about the actual video and audio data within the files. It also means applications can leverage external plugins to decode compressed data, reducing duplicated effort.
The key takeaway for video professionals is to understand that just asking for a QuickTime file (or an AVI or similar) rarely conveys enough information about what you actually need. And just because you have a QuickTime file – and a QuickTime player – doesn’t guarantee that that file will play. The wrapper is a start, but it’s what’s inside the package that matters.
Tuesday – 06/12/12
When covering file wrappers, there were allusions to the fact that the actual video and audio content inside those files can be in a variety of different formats. So, what are these formats?
By the time you’ve dipped a toe in the video world, you’ve likely come across the term “codec” – variously said to refer to “coder/decoder” or “compressor/decompressor.” While used in a variety of ways in different fields, when talking about media, we generally use codec to refer to a means for storing video or audio data in a space-efficient manner.
As we mentioned when discussing interframe versus intraframe compression, there are different methods for storing media more efficiently. Different situations call for different solutions, and thus we have a variety of codecs. We’ll briefly summarize the common categories of codecs here, and then go further in depth in future articles.
Traditionally, codecs used during the acquisition process – whether inside a camera or in a dedicated VTR (video tape recorder) – were optimized for performance and consistency, rather than maximal space savings. The DV format, for example, takes standard definition video, which runs natively at approximately 25 megabytes per second, and shrinks it to a fixed 3.3 megabytes per second. DV, which is an intraframe codec, uses relatively simple compression technology. This allowed it to be implemented on low-cost hardware. Because it needed to be recorded to a physical tape, moving at a fixed speed, having a fixed bitrate was important. These are the types of design constraints that influenced the development of this codec.
As processing power has increased, and capture has moved to solid state or hard disk mechanisms, acquisition codecs have had fewer constraints, and thus have increased in sophistication. The AVCHD format (which uses the H.264 codec) is relatively compute-intensive during capture and, due to its interframe nature, has a bitrate which varies per frame. This more sophisticated codec means that HD video can fit in the same 3.3 megabytes per second rate of DV (or sometimes even less).
Another set of codecs has evolved to address the needs of post-production. In post-production, there are fewer constraints on data rates and computing power, but quality preservation is extremely important. Formats like ProRes and DNxHD are optimized for these constraints. They generally have high demands for encoding, lower demands for decoding, and aim to be visually lossless.
A note here about the concept of a “visually lossless” codec. Earlier, when discussing lossless versus lossy compression, we were talking about mathematically lossless compression. That meant that the exact same bits you sent into the compression routine emerged on the other end after being decompressed. A visually lossless codec aims to remove data that the human visual system is unlikely or unable to perceive. The video should look identical to the human eye. However, data has been completely removed – it is not reconstructed during the decompression phase. For this reason, some very high-end productions still prefer to work with completely uncompressed video, rather than risk trusting even a very good codec.
The last set of codecs that we deal with on a routine basis are delivery codecs. These are the codecs used to move media to the actual viewer, whether that’s over the airwaves, over the internet, or on a physical disc.
These codecs are generally designed to maximize space savings, even at the expense of some quality. Moving bits across the internet is comparatively expensive. If you’re YouTube, delivering content to your viewers for free (or near-free), a 10% bit savings at the expense of a 5% quality drop is a tradeoff that’s happily accepted. Similarly, fitting a long movie on one DVD instead of two, even if it means reducing the quality a bit, is generally considered well worth it.
Almost all modern delivery is via a codec we’ve already encountered – H.264. It’s currently the most advanced collection of compression and decompression techniques in widespread use, and scales from cell phones all the way to theatrical playback. Other codecs you may encounter on the delivery side include MPEG-2, WebM and Flash (VP6).
There are literally hundreds, or perhaps thousands, of other codecs in the world. Some are purely relics at this point – there’s no good reason to use Cinepack or Intel Indeo in 2012. Some, like MotionJPEG, make sense for special use cases, but not in everyday workflows. And, without naming names, some never made sense at all!
Wednesday – 06/13/12
We’ve briefly touched on the concept of an acquisition codec. Now, let’s dig in a bit deeper.
Traditionally, shooters had limited control over the acquisition codec they were using. Codecs were generally segmented by price (and sometimes brand), so your budget generally dictated the format you were using. As we’ll see later, that’s not necessarily the case anymore.
We’ll begin by breaking down some of the common acquisition codecs in broad use. In the consumer, prosumer, and parts of the professional space, the AVCHD codec is the current “800-pound gorilla” – with very few exceptions, cameras in the sub $5,000 range use AVCHD.
AVCHD is an umbrella term that covers audio and video encoding, as well as file structure on disk. AVCHD cameras always shoot to files, whether stored on SD cards, internal harddisks or optical media. The actual media elements of a shot are stored within an MTS file – a very simple type of wrapper. Inside the MTS files are H.264 video, as well as (generally) AC-3 audio. The H.264 video is encoded with interframe compression at a variety of bitrates.
AVCHD is incredibly efficient – 1920×1080 video can be stored at a low bitrate (24 megabits per second) with surprisingly good quality. The downside is that the media is very CPU intensive, so slower computers have trouble playing AVCHD content. Additionally, the complexity of the H.264 codec used means that editing the footage often bogs down a computer. For that reason, many editing applications convert AVCHD content into another (less intensive) codec during the import process.
Shooters generally have limited control over the encoding process used by an AVCHD codec. At best, cameras may provide the choice of a target bitrate. Given the low cost of storage media, it’s recommended that you always select the highest bitrate available.
There are a handful of acquisition codecs that trace their roots to AVCHD. One format, AVC-Intra, uses higher bitrates and intraframe-only compression to capture higher quality footage which is easier to edit. AVC-Lite is a lower-bitrate, lower-resolution version of AVCHD used by many pocket “still” cameras with video features.
Sony has a variety of acquisition formats in use within their XDCam family. These include XDCamHD, XDCamEX and XDCamHD422. These are all based around the older MPEG-2 video codec, generally at bitrates of 35 megabits per second or higher. While MPEG-2 is less sophisticated than H.264, these cameras generally possess very highly tuned compression engines, which give surprisingly good quality. The MPEG-2 codec is also much easier to edit directly. Again, as a shooter, it’s generally advisable to simply select the highest bitrate available.
At the higher end of the market, there are two primary acquisition choices. Some high-end cameras are shooting directly to formats traditionally considered “editing” formats (Apple ProRes, Avid DNxHD). These cameras provide direct-to-edit capabilities, meaning you’re ensured that your footage will be immediately editable in your NLE of choice.
Direct-to-edit cameras generally operate at high bitrates, which means that the cost of the recording media can become significant. For example, ProResHQ will generally consume more than a gigabyte each minute. ProRes and DNxHD both provide a variety of recording profiles at significantly different bitrates – ProRes LT, for example, consumes as little as half the storage of normal ProRes, while retaining almost the same quality. If you’re shooting primarily for web delivery – and have a constrained budget for acquisition media – using one of these lower bitrate formats is often a good tradeoff.
Because most cameras – even in the prosumer space – now offer pristine live digital outputs via HDMI or HD-SDI, there are also a variety of external recording options which can provide direct-to-edit capture capabilities. There are a variety of recording devices which will capture ProRes or DNxHD. Additionally, with a Thunderbolt or PCI-e capture interface, any Mac can be turned into a powerful DTE (direct-to-edit) capture station using an application like (shameless plug) ScopeBox.
The other option at the very high end of the acquisition market is raw capture. These formats, like Redcode Raw and ARRIRAW, capture the data being generated directly from the camera’s sensor. No processing is done in the camera to adjust white balance, or to map the video into a given color space.
Raw acquisition provides for a huge amount of leeway in post-production – exposure can be adjusted after the fact, color mapping adjusted, etc. The downside is that these formats consume a lot of disk space, and require a lot of post-processing. Much like developing film in the old days, raw formats must be “developed” in post-production. Because of the expertise required, using one of these formats requires a lot of planning – you should have a clear sense of your entire production, post-production, and delivery pipeline before beginning a shoot with a raw format.
Thursday – 06/14/12
Until relatively recently, your choice of editing software severely limited your options for post-production format. Each NLE supported a restricted set of options, requiring lengthy renders of any content in another format.
While the current generation of editing applications has largely lifted these restrictions, or at least made them less onerous, there are still some choices to be made.
First, the basics. In the Apple ecosystem, ProRes is the format of choice – whether you’re working Final Cut Pro 7, Final Cut Pro X, Motion, or any of the old “Studio” applications, they’re designed to work best with ProRes. ProRes is also broadly supported across a range of other hardware and software, and Apple has become more liberal about licensing it to third parties.
Just like with ProRes acquisition, when editing ProRes you’ll have a variety of choices for quality and bitrate. If you’re converting footage from a highly compressed format like AVCHD or XDCam, ProRes422 (rather than ProRes422 HQ) is generally good enough. In many of these cases, ProRes LT is fine as well – it’s worth doing a little testing with your footage. ProRes Proxy should be used only for editing in situations where space is extremely limited – for example, if you want to do a rough cut using only the internal storage on your Macbook Air. Note: if you edit your movie on a tiny 11” laptop while crossing an ocean in an airplane, you’re well within your rights to say things like, “Woah, the future is awesome.”
The equivalent to ProRes in the Avid ecosystem is DNxHD. It uses very similar technology to accomplish very similar goals, just with a different logo. Again, all things being equal, if you’re cutting in Avid, use DNxHD.
Adobe Premiere, an increasingly popular editing application, doesn’t have its own editing codec. While it has historically been associated with the Cineform codec, Premiere fully supports footage in both the ProRes and DNxHD formats without a substantial performance penalty.
Some editors choose to skip this whole issue and edit uncompressed, raw video data. Traditionally, this has imposed onerous requirements in terms of storage performance. But fast drives, Thunderbolt connectivity and the passage of time mean that even a relatively small and affordable RAID array is now capable of working with uncompressed footage. For shorter projects, and particularly effects-heavy projects, uncompressed is a realistic option.
Friday – 06/15/12
Okay, maybe we need to dig in a bit deeper. But in 2012 almost all delivery, whether on disc, via the web, or streaming is via the H.264 codec. As we’ve seen, however, H.264 has a rich set of options and capabilities, some of which make sense for delivery and some of which don’t.
Delivery using a codec like H.264 is always via interframe compression. Because delivery codecs are generally watched linearly (from beginning to end) this interframe compression can be relatively aggressive. As we’ve discussed, interframe compression involves storing not just complete images, but also partial images and data about what changes from frame to frame. The group of related frames is called a GOP or Group Of Pictures. The longer the GOP, the more efficient the compression – but also the more complex the decoding. For acquisition formats like AVCHD, GOPs are generally under 16 frames. For delivery, GOP lengths of up to 150 frames are not unheard of.
Bitrates are often determined by the delivery medium – if you need to stream your content over a 3G cellphone connection, you’ll need to ensure your bitrate is well under the speed of the 3G network. High definition content can look very good at bitrates as low as 3 or 4 megabits per second.
The best recommendation when compressing content for distribution is to use quality compression software. Specialized compression apps, like Telestream’s Episode, Apple’s Compressor, or Adobe’s Media Encoder contain tuned encoders and a variety of presets. Unless you’re completely familiar with the ins and outs of the H.264 format, deviating from these presets will generally result in worse-looking video at higher bitrates.
Finally, keep in mind that some delivery platforms – whether they be TV networks or web applications – have very specialized requirements. Because these requirements may include limitations on signal levels or other elements relating to your actual content, it’s advisable that you clarify workflow questions early in the process.
WEEK 3: Initial Questions
Monday – 06/18/12
SHORT PROGRAM VERSUS LONG PROGRAM
There are all sorts of adages about the importance of pre-production. Good pre-production skills can make up for a multitude of sins later in the production process. And there tend to be a lot of sins.
This week, we’re going to cover some of the basic questions that need to get answered before a frame of video is shot.
We’ll start with what might seem completely obvious.
Is this short form content or long form content?
The answer to this question has ramifications throughout the production, including in the choice of camera and codec.
Again, this might seem obvious. If you’re shooting a 30-second commercial, it’s obviously short form content, right? Except when (halfway through the shoot) the client says, “Oh also, let’s get some extra footage because we’re actually thinking of making a half-hour documentary.”
One of the keys to successful production is common goals and shared vision. Many productions crash and burn because of assumptions about a shared goal, which turned out to not be quite so shared.
So, long form or short form? Why does it matter? As we discussed when covering acquisition codecs, some formats have very different storage profiles. Shooting with a very high bitrate camera makes sense if you’ll only be shooting a few minutes of footage. If you’re shooting a feature-length documentary, which often means shooting hundreds of hours of footage, you’re quickly looking at a “Quick! run to the store and buy all the harddisks they have!” scenario.
Some cameras also require more babysitting than others – a fancy digital cinema camera like the RED makes a lot of sense for a shoot on a sound stage with a dedicated digital image technician (DIT). It makes less sense on a three-week trek through the Amazon with an underpaid PA.
Your camera and shooting ratio also affects how long you’ll spend ingesting video into your edit suite to begin post-production. RAW formats like those used by RED, ARRI (when optionally shooting ARRIRAW), and even H.264 formats like DSLRs and AVCHD cameras can have long post-processing requirements. For many workflows you’ll want or need to convert your footage before you can begin editing, which can significantly extend your post-production timeline.
Different content lengths also allow for different on-set decisions. For example, say you’re doing a shoot in a commercial space, and there’s some signage in the background that you don’t want included. On a 30-second shoot, “fixing it in post” may be the most economical solution – some quick roto work on a clip that’s only a fraction of your overall commercial may actually take less time than negotiating to change the signage and then sending a grip to do the work.
Once you’ve actually made the decision about the length of the content, and you’re sure everyone is on the same page, you can start making choices based on that knowledge. But if you don’t have the conversation during pre-production, you’ll bear the consequences for the rest of the shoot.
Tuesday – 06/19/12
HOW WILL THE VIDEO BE DELIVERED?
Modern content is often viewed on everything from a 3-inch cell phone screen to a 65-inch LCD television. Choices made during the production process directly influence those viewing experiences.
While you rarely have complete control over the environment in which your content is being viewed, you can develop a simple priority list to optimize your shoot. If you’re shooting a webisode which will primarily be delivered via YouTube, you know an awful lot about the viewing environment. You know that the content will be viewed in a relatively small format, with a lot of compression, and likely in a less-than-pristine viewing environment.
How can you use that knowledge? Let’s break it down a bit further.
Size. If the primary viewing experience will be a small video window on a webpage, you shouldn’t compose shots that require viewers to notice minute elements. Don’t expect the viewer to read the handwritten note that character is writing – find a different way to tell that part of the story.
Compression. While web video is a lot better than it was even a few years ago, it’s still more compressed than a typical BluRay or broadcast experience. You can make simple choices throughout the production process to account for that. For example, avoid highly random images – images which look like “noise” or “snow.” That gorgeous shot of rippling water in the sunset is going to become a flickering compression artifact, which doesn’t quite convey the same emotion. Someone viewing your content over a cellphone connection will likely see an even more heavily compressed version – will they be able to understand the core message you’re trying to convey, even if they’re losing lots of fine detail? Remember that you often don’t control the compression process for web video, so you need to make choices based on worst-case scenarios.
Viewing environment. Chances are pretty good that your webisode will be viewed in an office environment. Because the viewer is supposed to be Maximizing Shareholder Value rather than watching your video, they won’t want to turn their speakers up too loud to hear your audio. So minimize the amount of dynamic range – the difference between the loudest elements and the softest elements. By controlling your dynamic range, you ensure that your viewer isn’t constantly turning the volume up to hear a whisper, and then during it down to avoid startling the whole office during an explosion.
Other deliverables have their own requirements throughout the production process. Make sure you’ve got as many details as possible early on, so that you can deliver the best possible experience for your most important viewers.
Wednesday – 06/20/12
HOW MUCH CONTENT ARE YOU DEALING WITH?
In the days of film- or tape-based acquisition, it was standard practice to keep a tape log during the shoot. Start and end times for every take would be recorded, along with notes about the shot, performance, and whether or not it was a “keeper.” File-based acquisition has, for many shooters, made logging a thing of the past. Since you don’t have to scrub back and forth during ingest, marking in- and out-points and waiting for lengthy batch captures, the logging process often ends up taking place inside the NLE, once all the clips from the day have been imported.
This is generally pretty stupid.
While there certainly are “run and gun” sorts of shoots in which logging isn’t an option, this is rarely the case. The multitude of apps for tablets and smartphones to simplify the logging process removes a lot of the excuses. And the reality is that good logs pay dividends for the rest of the shoot.
Using your log, you can make sure that you’re only importing potentially usable shots into your editing application. Why waste the disk space, screen space, and mental bandwidth storing the shot in which the gaffer knocked a light over in the background? Logs also allow you to quickly make changes, even late in the production process – if a shot isn’t quite working, you’ll be able to quickly find similar takes or related setups.
There’s a temptation, throughout the production pipeline, to assume that because storage is so affordable that it can be treated as if it were infinite, so there’s no harm in storing more than you need. While the cost of those bits on disk may be cheap, the cost of the editor’s time sorting through a folder with 15 different Photoshop composites of the title graphic is not. Keep your post-production storage organized, clean, and minimalist.
One caveat though – many editors prefer to have all the footage loaded. What may be a disaster of a take could have that one glance they need for a quick cutaway. Regardless, having clear logs and good organization of the footage is a must. As with everything else, ask your editor how they like things organized early in the planning stages.
Thursday – 06/21/12
HIGH QUALITY, FAST OR CHEAP? PICK TWO. OR MAYBE JUST ONE.
As we’ve discussed, shared expectations, agreed upon during pre-production, are a huge determinant in the overall success of a shoot. If your budget is all the money in the world, you can afford to get a lot of very high quality content produced very quickly. If your budget is very limited, you may still be able to produce high quality content, but it’s going to take a lot more time, hustling and scrounging. Or perhaps a quick turnaround is the most important thing.
Video professionals tend to be snobs. It’s okay, we can admit it. We don’t want to make bad things. But sometimes “perfect” is a lot less important than “now.” A commercial for local television advertising an upcoming 4th of July sale is much more useful if it’s slightly unpolished and ready to go on June 20th, rather than pristine and artful, but not ready until mid-August.
It’s often difficult for us to accept that – and obviously having your name or business attached to a truly awful piece of content can have consequences. By understanding the goals of the production, and setting clear expectations, you’ll avoid unpleasant conflict late in the process.
This fast/cheap/gorgeous trichotomy expresses itself in smaller ways throughout production. Compression software, file distribution, etc. all require that you make choices based on which components are most valuable in light of the goals of the production.
Friday – 06/22/12
HOW KNOWLEDGEABLE IS YOUR CLIENT?
It’s easy for creative professionals to forget that their clients see the work of the creative professional as a means to an end – advertising a business, sharing a story, distributing training materials, etc. Their core knowledge and passions rarely overlap with that of the creative professional. Understanding your client’s level of knowledge about the production process, as well as their interest in being familiar with the process, helps you make better decisions and reduces friction.
Some clients are interested in knowing about the inner workings of the production process, while others simply want a quality deliverable. Whatever the case, your client should be the one guiding that relationship. If your client watches an occasional movie, but has never shot a frame of video, asking them whether they want the video shot interlaced or progressive isn’t helpful to anyone. Talking high level about whether they want more of a “filmic” look or a “tv” look provides another way to address the same issues.
One of the most poisonous elements of a bad relationship with a client is when the creative professional intentionally attempts to overwhelm the client with technical information that they’re not equipped to handle. Sometimes this is an attempt to get the client to “get out of the way” – if you ask them about which gamma curve they want and whether they want any knee compensation, they’ll feel stupid, go away and let you make the choices. In reality, this makes the relationship toxic and hurts the industry as a whole.
WEEK 4: Typical Workflows
Monday – 06/25/12
DVD AND BLURAY ENCODING
This week, we’ll get into some specifics for delivering in different formats and technologies. We’ll start with one of the last bastions of physical distribution, the optical disc.
While DVDs are on their way out, there’s a long tail, and they still make a lot of sense for certain types of content. DVDs use MPEG-2 video encoding, with a variety of audio options. Although it is an older format, and not as sophisticated as H.264, MPEG-2 still performs well. Because the DVD format has been in common use for well over a decade, encoding technology has had time to evolve to make the most of the format.
The difference between a bad DVD encoder and a good one can be pretty startling. Your choice of encoder will be impacted by a variety of factors, including platform, budget, and the software you’ll be using to do the rest of your DVD mastering (creating menus, burning the actual discs, etc.). For example, if you’re planning on using Apple’s iDVD, you’ll need to rely on their encoder, as iDVD automatically compresses all media. Conversely, apps like DVD Studio Pro or Adobe Encoder can work with either raw MPEG-2 streams, or media in other formats like ProRes.
For mastering content for delivery (rather than just putting a rough cut on a disc for a quick screening), the investment in a quality encoder is well worth it. Applications like BitVice and Scenarist are able to squeeze startling quality out of a standard definition signal. Particularly when your content originated in HD, having a smart encoder to handle the downconversion and compression is very important. Higher-end encoders also automatically detect scene changes, to more efficiently use the limited bits available. General purpose commercial compression solutions like Telestream Episode or Sorenson Squeeze include DVD encoding, along with a variety of other formats, and are a great addition to any content producer’s toolbox.
BluRay encoding is fundamentally very similar to DVD, though the BluRay format is far more advanced. BluRay discs can use a variety of compression technologies, though H.264 is the most common. BluRay discs also have enhanced menu and interaction options available to content producers. Just like DVD, purchasing a quality encoding application is key to quality output.
In the case of both BluRay and DVD, high-end productions are best served by outsourcing the encoding process. Encoding is as much art as it is science, so leaving it to specialists is well worth the cost.
Tuesday – 06/26/12
Web delivery has dramatically simplified over the last year or two. Whereas web delivery had previously consisted of encoding in two or three formats (perhaps at different bitrates) – then switched primarily to encoding in the technically-inferior but widely-available flash video format – modern web distribution is almost exclusively via H.264. H.264 content can be played in the browser via a variety of mechanisms: the flash plugin, third party H.264 plugins, or browser-native HTML5 video players.
When encoding H.264 content for the web, there are a few important tips to keep in mind. As with all of the delivery methods we’ll cover in this series, using a quality encoder is the first step. Web delivery is all about delivering the most quality in the least bits; a quality, well-tuned encoder is key. Many of the encoders included with NLEs or operating systems cover up their technically deficient encoders by using lots of extra bits. Unless you’re encoding large volumes of content (on an industrial scale), encoder performance should be a relatively low priority.
Devices viewing web content are essentially always natively progressive, so delivering progressive content is important. If your content originated interlaced (older SD content, 1080i60 content, etc.) the de-interlacing process can be “make or break” – bad deinterlacing will simply blur together the two fields of the video signal. This means tossing a huge amount of resolution. Smart deinterlacing analyzes the actual content of the signal to better tune the delivery. While very time-intensive, the difference in results is well worth it. If the deinterlacer in your encoding software doesn’t at least double the encode time, it probably isn’t very good.
Multipass encoding is another area where the time/quality tradeoff is well worth making. Multipass encoding is, as the clever reader might guess, an encoding process that involves looping over the content multiple times. During the first pass, the encoder makes a “map” of the content, figuring out which sections are very complicated or noisy, and which are less complicated. It can then distribute bits throughout the file more efficiently.
Good encoders will also provide you a variety of filtering options to remove noise and other artifacts from the video. These settings are more subjective, depending on the content, but slightly smoothing noise before encoding generally allows the encoder to be more efficient.
As we’ve previously mentioned, unless you’re comfortable with all of the ins and outs of compression technology, stick with the presets your encoder provides.
Wednesday – 06/27/12
Modern mobile delivery is very similar to web delivery, with a few extra caveats.
Mobile devices, whether they be iPhones, Android devices or others, generally play H.264. However, these devices have a few constraints. First, when playing video over the cellular network, bandwidth is often substantially lower than that available on a broadband connection. In many areas, you’re lucky to see 1 megabit per second – whereas home broadband connections are often in the 8-12 megabit range. That said, 1 megabit is still more than enough to deliver quality results.
Mobile devices also have smaller screens and slower CPUs than traditional computers. Well-encoded SD-resolution video tends to look great on a mobile device. Due to CPU and power constraints, some mobile devices are also limited to certain subsets of H.264. For example, the iPhone 3GS – which is still one of the most popular smartphones – is limited to “baseline” profile content, which is a less sophisticated subset of the format.
Most mobile devices have a monthly allotment of bandwidth on the cellular network. If you’re delivering long form content that you expect users will be viewing over their 3G connections (field training material for example), sacrificing a bit of quality in exchange for lower bitrates is a nice courtesy to your users.
There are a variety of specialized delivery mechanisms in use for mobile content as well. Apple has created a format (proposed as an open standard) called HTTP Live Streaming, which aims to make streaming H.264 (either live or pre-recorded) more robust. Streams can automatically switch between bitrates as network conditions change. Seeking is more fluid, and providers can easily insert advertisements and other content in preexisting streams. HTTP Live Streaming is a relatively complicated standard to implement well, so working with a specialist is important.
A quality encoder with tuned presets is key.
Thursday – 06/28/12
It may come as a surprise, but broadcast delivery is often relatively straightforward and simple. Generally, the content producer will be beholden to the requirements of the broadcaster, but fortunately these requirements are generally clearly defined and map well to the production process. For example, many broadcasters will accept file-based delivery in a format like ProRes or DNxHD. For the content producer, it can be as simple as a straight export from the NLE.
While it is less common than it has been in the past, tape-based delivery is still fairly prevalent. This is generally via a format like HDCam. Getting your content onto an HDCam tape usually involves renting a deck and connecting it via HD-SDI to your capture card. There are also facilities that will take care of the process for you – you supply a file, they return a tape. If you’re not already equipped for HD-SDI output from your NLE (and you don’t expect to be doing these types of layoffs frequently), relying on a third party often makes the most sense.
Some broadcasters also have requirements about how content is acquired. For example, the BBC has very specific requirements about cameras and formats that are acceptable for acquisition. These standards may be more or less rigidly enforced depending on the type of content and the broadcaster.
Broadcast delivery standards are generally very specific about all the elements of your content, including video levels, audio levels, durations, etc. Read the specification carefully, and be prepared to fight for your content – standards often have some room for flexibility.
Keep in mind that your content will almost always be encoded in some way during the broadcast process. Generally it’ll be encoded a few times – compressed to send to a satellite, then recompressed for digital broadcast, for example. Many producers have had the experience of delivering a pristine master to a broadcast facility, then tuning in to watch only to find an overly compressed, oversaturated bastardization being sent to the world. These distribution steps are often aggravated by improper signal levels (blackpoint, white levels, gamut excursions) – so even if the broadcaster has lax QC requirements, finishing your project to standard levels will result in a better end project. Learning how to color correct to a set of video scopes (we’re partial to our application ScopeBox, for obvious reasons) will pay dividends on your project’s image quality when passing through broadcast equipment you can’t control.
Friday – 06/29/12
Digital theatrical distribution takes on a variety of forms. Many smaller theatres (indie cinemas, etc.) utilize systems very similar to a normal home theatre – an HD projector attached to a BluRay player. Delivery in these cases is pretty straightforward.
Larger theatres are generally based around a set of standards called the Digital Cinema Initiative. These theatres work with content in a Digital Cinema Package – a set of files with video and audio content. The format of choice for these is JPEG2000 – an intraframe format that delivers pristine quality at the expense of relatively high bitrates. JPEG2000 also has a wide range of colorspace and sampling depth options. Applications like easyDCP and FinalDCP can help with creating these packages, but seeking expert advice is recommended. Theatrical playback will quickly expose any deficiencies in your content, so having an experienced set of eyes and ears to help with the mastering process is important.
Table of Contents
Both comments and pings are currently closed.