Google I/O 2013 – WebM and the New VP9 Open Video Codec
Articles Blog

Google I/O 2013 – WebM and the New VP9 Open Video Codec


MATT FROST: Welcome to the first
session on WebM and the New VP9 Open Video Codec. We figured that there’s no way
to really add a little excitement to a presentation
than to change it at the last minute, and so what we’ve spent
this morning doing is encoding some VP9 video and
H.264 video and putting together a side by side
demonstration just to give you a taste of what we’re
working on. So what you’re going
to see is a video. The video is going to be the
same on either side. It’s going to be VP9, the
new codec on the left, H.264 on the right. And H.264, we used the X264 open
video encoder, which is commonly regarded as the
best encoder out there. We used the highest
possible settings. So we’ve done everything we
can to favor H.264 here. All of this is at the same
data rate, so both of the videos are going to be at
the same data rate. And the bit rate varies. In some cases, we’re
using 500K. In other cases, we’ve dropped
the bit rate down to bit rates that are actually banned by
certain UN conventions for the compression of HD video. And so with that, I think that’s
everything, Ronald? RONALD BULTJE: Yes. So like Matt said, what you’re
looking at here is shots that we just took this morning. We’ve encoded those in just a
couple of hours and basically, what you’re looking at here,
on the left, VP9 and on the right, H.264, is what an amazing
job we can actually do at video compression if we’re
using the very latest technologies. MATT FROST: So you can see the
blockiness on the right. On some of this, it’s a lot more
evident than others, and especially evident, if you want
afterwards to come up and take a look at this running
on the screen, we can freeze frames. But you see there on the right
especially, all this blockiness and how much
it clears up as it moves into VP9 territory. RONALD BULTJE: And a point here
really is that for high definition video, H.264 can do
a reasonable job, but we can do a lot better than that. And so having said that, let’s
actually get started on the presentation. MATT FROST: So the way that
we’re going to handle this presentation is I’m going to do
a quick introduction on why we care about open video,
both why does Google– which has historically been
involved with developing applications around video– has gotten down deeply into
actually helping work on these next generation compression
technologies. After we talk about that and
why, in general, improving video compression is good for
everybody, I’m going to turn it over to Ronald for really the
meat of this presentation, which will be to show you some
more demonstrations, to talk a little bit about how we measure
video quality, talk about some of the techniques
that we’re exploiting to really make this dramatic
improvement in compression. And then finally, after you’ve
seen this, and I hope that you’ve started to get a little
excited about what this technology can do for you, we’ll
go and talk about the last stages, how we’re going
to wrap up this project and how we’re going to get these
tools into your hands as quickly as possible. So to start off with, just
taking a quick look at how Google got into video. Video at Google started in the
same way that so many big projects at Google start,
as an experiment. And we launched these efforts
with just a single full time engineer and a number of
engineers working 20% of their time on video, really focusing
on video-related data. And then over the last 10 years,
obviously, video at Google has exploded, not only
with YouTube but with Google Talk, Hangouts, lots of
applications where you wouldn’t necessarily think of
video as playing a core role, like Chromoting, which is Chrome
Remote Desktopping. But if you look at the really
motivating factors for getting into video compression, there
are a couple that are really of note. One, of course, is the
acquisition of YouTube. And with the acquisition of
YouTube, we all of a sudden started to focus very heavily
on both improving the experience for users, improving
video quality, but also about the costs associated
with all aspects of running a service
like YouTube. There are costs associated
with ingest, transcode of video formats, storage of
multiple different formats, and then distribution of the
video, both to caches and to the edge, and ultimately
to users. The second was the move from
HTML4 to HTML5, which came at the same time, pretty much,
as our launch of Chrome. And of course, in HTML4,
although to the user, it appeared that video could be
supported in a browser, in fact, video was supported
through runtimes and plug-ins. With HTML5, video becomes a
native part of the browser. And so with the move towards
HTML5, we see it filtering through the addition of the
video tag in Chrome and the launch of HTML5 video
for YouTube. So these are the two factors– the focus on quality and
reducing cost with YouTube, the need to build a high quality
codec into Chrome and other browsers for
the video tag– that sparked the acquisition in
2010 of On2 Technologies, the company that I came from and
many members of the WebM team came from, and the launch
of the WebM project. The WebM project is an effort
to develop a high quality, open alternative
for web video. We’re very focused on web
video, not on video for Blu-ray discs, not on video
for cable television, but about solving the problems that
we find in web video. In addition, we’re very focused
on having an open standard because we believe that
the web has evolved as quickly is it has because it is
based on open technologies. And clearly, multimedia
communication has become such a core part of how we
communicate on the web that we need open technologies that are
rapidly evolving to allow us to keep pace and to make sure
that we can develop the next generation of killer
video applications. We wanted something
simple as well. So we used the VP8 Open Codec,
the Vorbis Open Audio Codec, which was a long existing open
audio codec, and then them the Matroska File Wrapper. With the launch of VP9 in a
matter of months, we’re going to be adding the VP9 Video Codec
as well as the brand new Opus Audio Codec, which is
another open audio codec, very performant and high quality. So since our launch, obviously,
web video has continued to grow. And if we just look at what we
know very well, which is YouTube, YouTube has grown to
be a global scale video platform capable of serving
video across the globe to these myriad connected
video enabled devices that we’re all using. It supports a billion monthly
users, and those users are looking at video four billion
times a day for a total of six billion plus hours of video
viewed monthly. Just to think about that number,
that is an hour of video for every person on the
planet consumed on YouTube. And on the creation
side, we’re seeing exactly the same trends. 72 hours of video is uploaded
per minute, and that video is increasingly becoming
HD video. So if you look at the graph
on the right, blue is 360p standard definition video, which
is slowly declining, but quickly being matched by
uploads of HD video. And the key here of great
importance is that HD video is obviously more complex. There’s more data for a given
HD video than there is for– unless, of course, you’re
encoding it in VP9– than there is for a standard
resolution video. In addition, I think we can all
agree that the better the video is, the higher the
resolution, the more watchable it is. And then finally, the other
trend that’s driving both creation and consumption is the
increase in mobile devices and the move towards
4G networks. So even this morning, there was
an article when I woke up and was checking my email saying
that YouTube video accounts for 25% of
all downstream web traffic in Europe. And I think BitTorrent
accounted for 13%. So there alone, between just two
web video services, we’re looking at close to 40% of all
web data in Europe being video related data. And that accords with what we
see from the latest Cisco forecasts, for instance, which
is that consumer web video is going to be close to 90% of all
consumer data on the web within the next three years. So it’s remarkably encouraging
to see the growth in video, but it also represents
a real challenge. Of course, the good news is that
we have a technology that is up to this challenge,
and that is VP9. With next generation video
codecs, with the codecs as good as VP9, we can effectively
significantly increase the size of the
internet and we can significantly increase the
speed of the internet. So obviously, if you’re
taking VP9– which, as Ronald will say,
halves the bit rate you need for the very best H.264
to deliver a given quality video– you’re going to be able to
speed the downloaded of a download and play a video,
you’re going to be able to speed, obviously, the buffering
of these videos. So we have the tools to
effectively dramatically increase the size
of the internet. But of course in doing that,
in improving the video experience, in improving the
ability to upload video quickly, we’re going to just
create the conditions for even more consumption of video. And so it’s not going to be
enough for us to rest on our laurels with VP9. We’re going to have to turn to
VP9 and keep on doing it, keep on pushing the boundaries of
what we’re capable of with video compression. So with that, I’m going to turn
it over to Ronald to show you some really remarkable
demonstrations of this new technology. RONALD BULTJE: Thank you. So to get started, I just
briefly want to say some words about video quality. So how do we measure quality? Well, the most typical way to
measure quality is to just look at it, because at the end
of the day, the only thing that we care about is that the
video that you’re looking at looks great to your eyes. But that’s, of course, not all
there is to it because as we’re developing a new video
codec, we cannot spend our whole day just watching YouTube
videos over and over and over again. That would be fun, though. So in addition to visually
analyzing and inspecting video, we’re also
using metrics. The most popular metric in the
field for measuring video quality is called PSNR. It stands for Peak Square
Noise Ratio. And the graph that you’re
looking at here on the left is a typical representation of PSNR
on the vertical axis and video bit rate on the horizontal
axis to give you some sort of a feeling of
how those two relate. So the obvious thing to note
here is that as you increase the bit rate, the video quality,
as measured by this metric, increases. So at the end of the day what
that means is that it doesn’t really matter what code you
use, as long as you’ve infinite bandwidth, you can
accomplish any quality. However, our goal is to make
it easier and faster and simpler to stream video. So how does PSNR actually
compare to visual quality? So for that, there’s
a sample clip. So what you’re looking here is
a very high penalty shot of the New York skyline. I believe that this is the
Empire State Building. And this clip has a lot of
detailed textures all across. So what we’ve done here is
that we’ve encoded it at various bit rates, and then
every couple of seconds, we’re dropping the bit rate and the
metric quality of the video will slowly decrease. So this is 45 dB, and what
you’re seeing slowly at 30 dB is that some of the detail, or
actually a lot of the detail, in the backgrounds of the
buildings just completely disappears. And that was the case at
35 dB already also. As you go to 25 dB, you can
see– we can go really low in quality, but you do not
want to watch this. Here’s a different scene. Same thing, we start with
the original 45 dB. 40 dB looks pretty good. 35 dB starts having a lot of
artifacts, and then 30 and 25 are essentially unwatchable. So what does that mean
for video quality? Well, the typical target quality
for high definition video on the internet
lies rounds 40 dB. You were just looking at the
video, and a 40 dB looked really quite good. So if you go to YouTube and
you try to stream a 720p video, that’s actually about the
quality that you will get. In terms of bit rate, what you
should expect to get is a couple of megabits a second. For this particular clip, that’s
one to two megabits a second, but that’s very source
material dependent. So what we’ve done, then, is we
have taken, I think, about 1,000 YouTube CCL licensed
uploads, just randomly selected from whatever users
give us, and we’ve then taken out particular material that
we’re not really interested in, such as stills or video
clips that contain garbage video content. And then we were left with, I
think, about 700 CCL licensed YouTube uploads, and we’ve
encoded those at various bit rates– so at various
quality settings– with our VP9 Video Codec or
with H.264 using the X264 encoder at the very best
settings that we are aware of. Then for each of these clips,
we’ve taken the left half of the resulting compressed file
and the right half of the 264 one and we’ve stitched those
back together, and then you essentially get what you’re
looking at here. So left here is VP9, right is
264, and those are at about the same bit rate. You will see graphs here on the
left and on the right, and those are actually the effective
bit rate for this particular video clip. And as you can see, it starts
being about equal. Now, you saw it just jumping up,
and that’s because we’re gradually increasing the bit
rate to allow the 264 encoder to catch up in quality. And as you can see, it slowly,
slowly starts looking a little bit better. And at this point, I would say
that it looks about equal on the left and on the right. But if you look at the bit rate
graphs, you can basically see that we’re spending about
two and a half times the bit rate on a 264 file versus
the VP9 file. So those are the compression
savings that you can get if you do same quality encodings
but you use VP9 instead of 264. So what you’re looking at here
is a comparative graph for the clip that you were
just looking at. The blue line is the 264 encoded
version and the red line is the VP9 encoded
version. And as I said in the beginning,
vertical axis is PSNR as a metric of
quality, and the horizontal axis is bit rate. So the way that you compare
these is that you can pick any point from the red line– or from the blue line,
for that matter– and then you can
do two things. Either you can draw a vertical
line and find the matching point on a blue line that
matches the points on the red line that you’re looking for
and look at what the difference in quality is. But what we usually do is we
do it the other way around. So we’re drawing a horizontal
line for the point on the red graph, and we’re finding the
point that matches the horizontal line on the blue. And what you’re looking at here
is that for the point that we were just looking at,
that is, a quality metric point of about 37.1 dB, the VP9
version takes an average of 328 kilobits a second to
reach that quality, and for H.264, you need to go up to
essentially 800 kilobits a second to get exactly
the same quality. So what that means is, again,
the metrics tell us you can get a two and a half times lower
bit rate and effectively get the same quality by using
VP9 instead of 264. If you look to the higher end
of the graph, you will see that the differences in quality
for the same bit rates might go slightly down, but
that’s basically just because at the higher end, there’s
a diminishing returns for bit rate. So if you look at the high ends
of both of those graphs and you do the horizontal line
comparison, so what is the different bit rate that
accomplishes the same quality? You will see that it about
comes down to 2x over the whole graph. So let’s look at the difference
video because I could just be cheating you with
this one video and we could have optimized our codec
for this one video. So what you’re looking at here
is, again, the same thing, VP9 on the left, 264 on the right,
live bit rate graphs and we start at the same bit rate. Then as we do that, we’re slowly
increasing the bit rate for the 264 portion video so
that it can actually catch up in quality. And what you’re looking at is
that on the right, the floor is pulsing a lot. You can actually see, if you
focus on the pants of little boy here or on the plastic box,
that it’s very noisy. But eventually, it catches
up in quality. Guess what happened
to the bit rate? It’s almost 3x for this
particular video. So here is the [INAUDIBLE] graph
for the material that we were just looking at. The red line is VP9, the
blue line is H.264. And if we do the same quality
different bit rate comparison at the point that we were just
looking at, which is about 38.6 dB, for VP9, you arrive
at about 200 kilobits a second, and for H.264, you need
to interpolate between two points because we don’t have
an exact match, and it ends up being around 550
kilobits a second. So almost 3x more bit rates to
accomplish the same quality, just because you can use
VP9 to save this. So we’ve done this over
many, many clips. I told you we had about 700
clips that we tested this on at various bit rates and various
quality settings, and overall, you can save 50%
bandwidth by encoding your videos in VP9 instead of H.264
at the very best settings that we are aware of. So how did we do this? So let’s look a little bit at
the techniques that we’re using to actually get to this
kind of compression efficiency. So a typical video sequence
consists of a series of video frames, and then each of
these video frames consist of square blocks. So for current generation video
codecs, like H.264, these blocks have a size of
a maximum 16 by 16 pixels. We’ve blown this up a lot. We have currently gone up to
64 by 64 pixels for each block, and then at that
point, we introduce a partitioning step. And in this partitioning step,
we allow you to do a vertical or horizontal partitioning,
a four-way split, or no partitioning at all, resulting
in different size sub-blocks. If you do a four-way split and
you have four 32 by 32 blocks, then for each of these blocks,
you go through the same process again of horizontal,
vertical split, four-way split, or no split at all. If you do the four-way split,
you get down to 16 by 16 pixels, do the same thing
again to get to eight by eight, and eventually
four by four pixels. So what this partitioning step
allows you to do is to break up the video in such a way that
it’s optimized for your particular content. Stuff that has a very stable
motion field can use very large blocks, whereas video
content where things are moving all across all the time,
you can go to very small video blocks. So what do we you
do after that? So after this partitioning
step, we’re usually doing motion vector coding, and
basically what that does is that you pick a reference frame,
and you pick a motion vector, and then the block of
that particular size that you selected in your partitioning
step will be coded using a motion vector pointing in one
of the previously coded reference frames. These reference frames in VP8
were usually frames that had previously been encoded, and
were therefore temporarily before the current frame. What we’ve added in VP9 is that
we have multi-level alt reference frames, and what
that allows you to do is encode the video sequence in any
frame order, and then you can use any future frame as a
reference frame for a frame that you encode in order, decide
to encode after that. So for this series
of frames in the left, this is six frames. I could, for example, choose the
first thing encode frame one, then frame six, and then
frame three using both a future as well as a
past reference. And then, now that I have
encoded three, I can encode one and two really efficiently
because they have a very proximate future and
past reference. After I’ve encoded two and
three, I go to five, which has four and six as close
neighbors. And so that allows for very
temporally close reference frames to be used as a predictor
of contents in the current block. So once you have a motion
vector, you can use subpixel filtering, and subpixel
filtering allows you to basically pick a point in
between two full pixels and this point in between is then
interpolated using a subpixel interpolation filter. In VP8, we had only a single
subpixel interpolation filter. Most codecs use just a single
subpixel interpolation filter. We’ve actually added three in
VP9, and those are optimized for different types
of material. We have a sharp subpixel
interpolation filter, which is really great for material where
there’s a very sharp edge somewhere in the middle. For example, that city clip that
we were looking at in the beginning, if you’re thinking of
a block that happens to be somewhere on the border
between the sky and a building, we consider that a
sharp edge, and so using an optimized filter for sharp edges
actually maintains a lot of that detail. On the other hand, sometimes
there’s very sharp edges but those are not consistent across
video frames across different temporal points in
the sequence that you’re looking at. And that point, this will cause
a very high frequency residual artifact, and
so for those, we’ve added a low pass filter. And what the low pass filter
does is that it basically removes sharp edges, and it does
exactly the opposite as a sharp filter. Lastly, we have a regular
filter, which is similar to the one that VP8 had. After this prediction step,
you have predicted block contents and you have the
actual block that you’re trying to get as close as
possible to, and then the difference between these two
is the residual signal that you’re going to encode. So in current generation video
codecs, we usually use four by four or eight by eight cosine
based transforms called DCTs to encode this residual
signal. What we’ve added in VP9 is much
higher resolution DCT transforms all the way up to
32 by 32 pixels, and in addition to using the DCT, we’ve
also added an asymmetric sine based transform
called ADST. And the sine based transform is
optimized for a signal that has a near zero value at the
edge of the predicted region, whereas the cosine is optimized
for a residual signal that has a zero signal
in the middle of the predicted signal. So those are optimized for
different conditions, and together, they give good gains
when used properly. Basically, the take home message
from all of this is that we’ve added big resolution
increments to our video codecs, and what that
leads to is a codec that is highly, highly optimized for
high definition video coding. But at the same time, because
it is very configurable, it still performs really well at
low resolution content, for example, SIF-based 320
by 240 video as well. So I’ll hand it back to Matt
now, who will take over. MATT FROST: Thanks, Ronald. So I just want to give you a
quick recap of what we’ve discussed and sort of the
highlights of this technology, and then to tell you about the
last steps that we’re going through to get VP9
in your hands. As Ronald said, we’re talking
about technology here that is 50% better than literally
everything that everybody else out there is using. And actually, we made a point to
say we were using the very best encoder out there at the
very best settings, settings which I really think you’re not
seeing very often in the real world because they’re very
difficult to use in a real world encoding
environment. So I hope that there are a
number of people in this audience now who are out there,
either with existing products with video or products
to which you’re looking to add video, or just
you’re thinking about how you can use these tools to launch a
new product and to come out with a start-up. This technology has not been
used by anyone right now. YouTube is testing it and we’ll
talk about that in a little bit, but if you adopt
VP9, as you can very quickly, you will have a tremendous
advantage over anybody out there with their current
offering based on 264 or even VP8. It’s currently available in
Chrome, and the libvpx library on the WebM project is out there
for you to download, compile, and test. It’s open source. You will have access
to source code. The terms of the open source
license are incredibly liberal so that you can take the code,
improve it, optimize it, modify it, integrate it with
your proprietary technology, and you’re not going to have to
give back a line of code to the project. You’re not going to have to
be concerned that you will inadvertently open source your
own proprietary code. And then finally, it’s
royalty free. And obviously, this is something
that was of great importance to us as we sought
to open source a video technology for use in HTML5
and the video tag. We believe that the best is
still to come in terms of video products on the web, and
that in order to make sure that people are free to innovate
and that start-ups are free to launch great new
video products, we have to make sure that they’re not
writing $5 or $6 million checks a year to standards
bodies. We’re working very hard on
putting this technology into your hands as soon
as possible. We did a semi freeze of the bit
stream just a couple of weeks ago, and at that time,
we said that we were taking comments on the bit stream
for 45 more days. Specifically, we’re looking for
comments from a lot of our hardware partners to some of the
software techniques that we’re using just to make sure
that we’re not doing anything that’s incredibly difficult
to implement in hardware. At the end of the 45 day period
on June 17, we’re going to be bit stream frozen, which
means that after June 17, any VP9 encoder that you use is
going to be compliant with any VP9 decoder, and that if you’re
encoding content with an encoder that’s out after June
17, it’s going to be able to play back in a decoder after
the bit stream freeze. Obviously, getting VP9
in Chrome is very important to us. The beta VP9 which you’ve
been seeing today is already in Chrome. If you download the latest
development version of Chrome and enable the VP9 experiment,
you’ll be able to play back VP9 content immediately. As soon as we’ve frozen the
bit stream as of June 17, we’re going to roll it into the
Dev Channel of Chrome as well with this final version of
VP9, and then that’s going to work through the
beta channel and through the stable channel. And by the end of the summer,
we are going to have VP9 in stable version of Chrome rolling
out to the hundreds of millions of users. I think [INAUDIBLE] today said that there are
750 million users of Chrome right now. VP9 is going to be deployed
on a massive scale by the end of summer. In terms of final development
activities that we’re going to be working on, after the bit
stream is finalized in the middle of June, we’re going to
be focusing on optimizations both for performance
and for platform. So what that means is we’ll be
working on making sure that they encoder is optimized for
a production environment. Obviously, something that’s very
important to YouTube as YouTube moves to supporting
VP9, that the decoder is sufficiently fast to play
back on many of the PCs that are out there. We’re also going to be working
on platform optimizations that will be important to Android
developers, for instance, and to people who want to support
VP9 on embedded devices. These are ARM optimizations and optimizations for other DSPs. We have hardware designs
coming out. For those of you who may work
with semiconductor companies or are thinking about a
technology like this for use in something like an action
camera, these are hardware designs that get integrated
into a larger design for a semiconductor and allow for
a fully accelerated VP9 experience. Real time optimizations are
obviously incredibly important for video conferencing, Skype
style applications, and also for new applications that are
coming out like screencasting and screen sharing. By the end of Q3, we should have
real time optimizations which allow for a very good
real time performance. Those optimizations should
then allow VP9 to be integrated into the WebRTC
project, which is a sister project to the WebM project and
basically takes the entire real time communication stack
and builds it into Chrome, and more broadly into HTML5
capable browsers. And so what this means is that
when VP9 is integrated into WebRTC, you will have tools that
are open source, free for implementation that used to,
even four years ago, require license fees of hundreds of
thousands of dollars. And you, with a few hundred
lines of JavaScript, should be able to build the same sort of
rich video conferencing style applications and screencasting
applications that you’re seeing with products
like Hangouts. And finally, in the end of this
year moving into Q1 2014, we’re going to see,
again, hardware designs for the encoder. So just to give you an idea of
how usable these technologies are, we have a VP9 demonstration
in YouTube. If you download the Development
Version of Chrome and flip the VP9 tag, you can
play back YouTube VP9 videos. And one thing this should
drive home is this was a project that was done over the
course of two weeks, that VP9 was built into YouTube. Obviously, we have very
capable teams. Obviously we have people on the
WebM team and people on the YouTube team who know a
lot about these tools, but this demonstration is VP9
in the YouTube operating environment. There’s nothing canned here. This is VP9 being encoded and
transmitted in the same way that any other video is. So this, I hope, again, will
give you guys pause to say, god, we could do this as well. We could come out very quickly
with a VP9 based service that will be remarkably better
than anything that’s out there right now. So I just want to leave you with
some thoughts about what I hope that you’re thinking
about coming away from this presentation. The WebM project is a true
community-based open source project, and obviously, these
sorts of projects thrive on contributions from
the community. We are coming out of a period
where we’ve been very intensively focused on algorithm
development. Some of this work is certainly
very complicated stuff that not every– even incredibly seasoned– software engineer can work on. But we’re moving into a point
where we’re focusing on application development, we’re
focusing on optimization, we’re focusing on bug fixes and
patches, and that’s the sort of thing that people in
this room certainly can do. So we encourage you to
contribute and we encourage you to advocate for use
of these technologies. We build open source
technologies, and yet simply because we build them, that
doesn’t mean that people adopt them. It takes work to get communities
to focus on adopting these sorts of
open technologies. So advocate within your project
in your company, advocate within your company for
use of open technologies, and advocate within the web
community as a whole. We think that with VP9, we’ve
shown the power of a rapidly developing, open technology, and
we hope that people are as excited about this as we are and
that you go out and help spread the word about
this technology. But most important, we’d like
you to use the technology. We’re building this with a
purpose, and that is for people to go out, take advantage
of these dramatic steps forward that we’ve
made with VP9. And so we hope you will go out,
that you’ll be charged up from this presentation, and
that you’ll immediately download the Development Version
of Chrome and start playing around with this and
start seeing what you can do with this tool that we’ve
been building for you. So there are just a couple of
other things I’d like to say. There are a couple of
other presentations related to this project. There’s a presentation on
Demystifying Video Encoding, Encoding for WebM VP8– and this is certainly
relevant to VP9– and then another on the
WebRTC project. And again, if you’re
considering a video conferencing style application,
screensharing, remote desktopping, this is
something that you should be very interested in. Sorry. I shouldn’t be using
PowerPoint. So with that, we can open
it up to questions. Can we switch to just the
Developers Screen, guys? Do I do that? AUDIENCE: Hey there. VP8, VP9 on mobile, do you have
any plans releasing for iOS and integrating with
my iOS applications– Native, Objective
C, and stuff? Do you have any plans
for that? MATT FROST: He’s asking
if VP8 is in iOS? AUDIENCE: VP9 on iOS running
on top of Objective C. RONALD BULTJE: So I think as
for Android, it’s obvious Android supports VP8 and
Android will eventually support VP9 as well. For iOS– MATT FROST: When I was talking
about optimizations, platform optimizations, talking about
VP9, that’s the sort of work we’re focusing on, ARM
optimizations that should apply across all of these ARM
SOCs that are prevalent in Android devices and
iOS devices. There aren’t hardware
accelerators and iOS platforms right now. Obviously, that’s something
we’d like to change, but presently, if you’re going to
try to support VP8 in iOS, you’re going to have to do
it through software. AUDIENCE: Thank you. RONALD BULTJE: Yep? AUDIENCE: Bruce Lawson
from Opera. I’ve been advocating WebM
for a couple of years. One question. I expect your answer is yes. Is it your assumption that the
agreement that you came to with MPEG LA about VP8 equally
applies to VP9? MATT FROST: It does apply to
VP9 in a slightly different way than it does with VP8. The agreement with MPEG LA and
the 11 licensors with respect to VP9 covers techniques that
are common with VP8. So obviously, we’ve added back
some techniques we were using in earlier versions, we’ve added
in some new techniques, so there are some techniques
that aren’t subject to the license in VP9. But yes, the core techniques
which are used in VP8 are covered by the MPEG LA license,
and there will be a VP9 license that will be
available for developers and manufacturers to take
advantage of. AUDIENCE: Super. Follow up question. About 18 months ago, the Chrome
team announced they were going to drop H.264 being
bundled in the browser, and that subsequently
didn’t happen. Can you comment further on
whether Chrome will drop H.264 and concentrate only on VP9? MATT FROST: I can’t
really comment on plans going forward. What I can say is that having
built H.264 in, it’s very difficult to remove
a technology. I think when you look at the
difference between VP9 and H.264, there’s not
going to be any competition between the two. So I think with respect to VP9,
H.264 is slightly less relevant because there
was nothing– we didn’t have our finger
on the scale for this presentation. And especially, we were hoping
to drive home with that initial demonstration which we
put together over the last few hours that we’re not looking
for the best videos. We’re just out there
recording stuff. So even if 264 remains
in Chrome– which I think is probably
likely– I don’t think it’s going to be
relevant for a next gen codec because there’s just such
a difference in quality. AUDIENCE: Thanks for
your answers. AUDIENCE: Hi there. I have a question about
performance. Besides the obvious difference
in royalty and licensing and all that, can you comment on
VP9 versus HEVC, and do you hope to achieve the same
performance or proof of [INAUDIBLE]? RONALD BULTJE: So the question
is in terms of quality, how do VP9 and HEVC compare? AUDIENCE: Yeah, and bit rate
performance, yeah. RONALD BULTJE: Right. So testing HEVC is difficult. I’ll answer your question
in a second. Testing HEVC is difficult
because there’s currently no either open source software or
commercial software available that can actually encode
HEVC unless it’s highly developmental in nature or it
is the development model. The problem with the alpha and
beta versions that are currently on the market for
commercial products is that we’re not allowed to use them
in comparative settings like we’re doing. Their license doesn’t
allow us to do that. Then the problem with the
reference model is it is a really good encoder, it gives
good quality, but it is so enormously slow. It can do about 10 frames
an hour for a high definition video. That’s just not something that
we can really use in YouTube. But yes, we’ve done
those tests. In terms of quality, they’re
currently about equal. There’s some videos where HEVC,
the reference model, is actually about 10%,
20% better. There’s also a couple of videos
where VP9 is about 10%, 20% better. If you take the average over,
for example, all of those CCL licensed YouTube clips that we
looked at, it’s about a 1% difference. I think that 1% is in favor of
HEVC if you so wish, but 1% is so small that really, we don’t
think that plays a role. What does that mean
going forward? Well, we’re really more
interested in commercial software that will be out there
that actually encodes HEVC at reasonable
speed settings. And like I said, there’s
currently nothing on the market but we’re really
interested in such products, so once they are on the market
and we can use them, we certainly will. AUDIENCE: Follow-up question
about the performance. Is this any reason to not expect
this to scale up to 4K video or [INAUDIBLE]? RONALD BULTJE: We think that
the current high definition trend is mostly going towards
720p and 1080p. So if you look at YouTube
uploads, there is basically no 4K material there, so it’s
just really hard to find testing materials, and that’s
why we mostly use 720p and 1080p material. MATT FROST: But certainly when
we designed the codec, we designed it with 4K in mind. There aren’t any limitations
which are going to prevent it from doing 4K. RONALD BULTJE: Right. You can use this all the way up
to 16K video if that’s what you were asking. MATT FROST: Sir? AUDIENCE: Yeah. Have you been talking to the
WebRTC team, and do you know when they’re going to integrate
VP9 into their current products? MATT FROST: We talk with the
WebRTC team regularly. As I said, we’ve got to finish
our real time enhancements in order to actually have a codec
that works well in a real time environment before we can expect
it to be integrated into WebRTC. But I think we’re looking
at Q4 2013. AUDIENCE: Great, thanks. MATT FROST: We’re
in 2013, right? RONALD BULTJE: Yeah. AUDIENCE: Hi. I just wanted to talk
about the rate of change in video codecs. I think maybe we can see like
VP8, VP9, we’re talking about an accelerating rate
of change. And that’s great, and I really
wanted to applaud the efforts to getting this out in Chrome
Dev quickly, or Chrome Stable quickly. I just wanted to ask about
maybe some of your relationships with other
software vendors that are going to be relevant, like we’re
talking Mozilla, IE, iOS was, I think, previously
mentioned. As this kind of rate of
innovation in codecs increases, how are we going to
make sure that we can have as few transcode targets
as possible? My company is working
on a video product. We don’t want to have eight
different codecs. And if we can imagine, let’s
say, that Version 10 comes out relatively soon, sometime
down the road. How can we make sure that
devices stick with a relatively small subset of
compatible decodings? MATT FROST: I guess I’m
a little unsure of what you’re asking. In terms of how we get support
on devices as quickly as possible, or how we solve
the transcoding problem? AUDIENCE: And just keeping the
number of transcoded formats as small as possible. If IE only supports
H.264, I have to have an H.264 encoding. So I was just wondering what
kind of relationships you guys are working on to make sure
that as many devices and platforms as possible can
support something like VP9. MATT FROST: We’re certainly
working very hard on that, and as I said in the slide on next
steps showing the timeline, our focus on having hardware
designs out there as quickly as possible is an effort to try
to make sure that there’s hardware that supports VP9 more
rapidly than hardware has ever been out to support
a new format. We had a VP9 summit two weeks
ago, which was largely attended by semiconductor
companies. Actually, some other very
encouraging companies were there with great interest in
these new technologies. But we’re working very hard with
our hardware partners and with OEMs to make sure that this
is supported as quickly as possible. I think internally, what we’re
looking at is probably relying on VP8 to the extent that we
need hardware now and we don’t have it in VP9. So I think what we’ve talked
about is always falling back to an earlier version of an open
technology that has very broad hardware support. But we’re trying to think very
creatively about things like transcoding and things that we
can do to ensure backwards compatibility or enhancement
layers. So part of the focus of this
open development cycle and process that we have is to
really try to think in very new ways about how we support
new technologies while maintaining the benefits of
hardware support or device support for older
technologies. AUDIENCE: Excellent. Thank you. AUDIENCE: So a key point in any
solution is going to be performance. Hardware acceleration really
solves that, and that was one of the challenges with the
adoption of VP8 in timing versus H.264, which has broad
spectrum hardware acceleration. I understand the timing, the
delays, and the efforts you guys are doing to really
achieve that hardware accelerated support for VP9. But until then, what’s the
software performance in comparison to H.264, for
either both software, software, or software,
hardware? RONALD BULTJE: So we’ve only
done software, software comparisons for that. Let me start VP8 264. Currently, VP8 decoding is about
twice as fast as 264 decoding using fully
optimized decoders. VP9 is about twice as slow
currently as VP8, decoding, and that basically means that
it’s exactly at the same speed as H.264 decoding. That’s not what we’re targeting
as a final product. We haven’t finished fully
optimizing the decoder. Eventually, what we hope to get
is about a 40% slowdown from VP8 decoding, and that will
put it well ahead of the fastest 264 decoders that are
out there in software. AUDIENCE: Great. Thank you. AUDIENCE: Hello. I was just wanting to get some
background on the comparison between H.264 and VP9. For H.264, what were
you using– CVR, BVR, and what QP values? RONALD BULTJE: This is
two path encoding at the target bit rate. So it’s preset very slow. Since we’re doing visual
comparison, there is no tune set. It’s paths one or two, and then
just a target bit rate. We tend to choose target bit
rates that are somewhere between 100 and 1,000 kilobits
a second, and then we just pick the same point for the VP9
one as well to start with. AUDIENCE: So in both of the
comparisons, you were trying to be very generic so you
weren’t tuning the encoder in any way to make it a better
quality at that bit rate. You were just giving it two
paths to try to figure it out. RONALD BULTJE: So you mean
visual quality, or– AUDIENCE: Yes. RONALD BULTJE: So we haven’t
tuned either one of them for any specific setting. For 264, the default is that
it optimizes for visual experience, and so that’s why
we optimized it to 6414. So it’s not optimized for SSIM
or PSNR in the visual displace that we did here. VP9 encoding does not have any
such tunes, so we’re not setting any item, of course. AUDIENCE: So you just used
the default settings of [INAUDIBLE]? RONALD BULTJE: We’re using the
default settings, and we’ve actually discussed this
extensively with the 264 developers. They agree. They support this kind of
testing methodology, and as far as I’m aware, they
agree with it. They fully expect the
kind of results that we’re getting here. AUDIENCE: Right. OK, thanks. AUDIENCE: Hi. One more question about
performance. I think you mentioned a little
bit about the real time. So do you think in the future,
you can manage to bring an application like application
desktop into the web? I mean like putting three,
four windows in the same browser, high definition,
things like that? RONALD BULTJE: In terms of
decoding or encoding? AUDIENCE: Both. RONALD BULTJE: So for
encoding, yes. So there will be real time
settings for this codec eventually. For no codec will that get you
exactly the types of bit rate quality ratios that you’re
seeing here. These are really using very slow
settings, and that is by far not real time. But if you set the VP9 codec
to real time settings, then yes, eventually it will
encode in real time. It will be able to do four full
desktops all at once, and it will be able to decode
all of those also. You’ll probably need a multicore
machine for this, obviously, but it will be
able to do it, yes. AUDIENCE: And you’re using
the graphics card and other things like that. You didn’t mention about the
hardware, OpenGL or– RONALD BULTJE: It’s
future software. There’s no hardware involved. AUDIENCE: No using the hardware,
the card hardware. RONALD BULTJE: We’re not using
GPU or anything like that at this point. AUDIENCE: Thank you. AUDIENCE: Hi. I just want to know, how does
a VP9, now or later, compare to VP8 and H.264 when we’re
talking about single task CBR, low bitrate, real
time encoding? Little background is we are
part of the screen sharing utility that currently uses
VP8, and we’ve been successfully using it for a
year, but the biggest gripe with VP8 is that it doesn’t
respect bit rate, especially on low bit rates, unless you
enable frame dropping, which is unacceptable. So we have to do a bunch of
hacks to actually produce quality and it doesn’t
behave like H.264 would in that situation. So how will VP9 address that
problem, or is that even on the roadmap? RONALD BULTJE: So in general,
desktop sharing and applications like this, also
real time communications, yes, they’re on the roadmap,
and yes, they will all be supported. In terms of your specific
problem, I guess the best thing to do is why don’t you
come and see us afterwards in the Chrome [INAUDIBLE], and we
can actually look at that. AUDIENCE: OK, awesome. RONALD BULTJE: As for VP9,
VP9 currently does not have a one pass mode. We’ve removed that to just speed
up development, but it will eventually be re-added, and
it will be as fast as the VP8 one but with a 50% reduction
in bit rate. AUDIENCE: Do you have
a timeline for that? Is it going to this year,
or next year? RONALD BULTJE: Like Matt said,
that will happen– MATT FROST: Late Q3. RONALD BULTJE: Q3 2013,
around then. We’re currently focusing on
YouTube, and those kind of things will come after that. AUDIENCE: Awesome. Thank you. AUDIENCE: I have two questions,
unrelated questions to that. What is the latency performance
of VP8 compared to VP9 in terms of decoding
and encoding? And the second question is, how
does VP9 compare to H.265? RONALD BULTJE: So I think H.265,
I addressed earlier. So do you want me to go into
that further, or was that OK? AUDIENCE: More in terms of the
real time performance. RONALD BULTJE: So in terms of
real time performance, I think for both, that’s really, really
hard to say because there is no real time HEVC
encoder and there is no real time VP9 encoder. So I can sort of guess, but
this is something that the future will have to tell us. We will put a lot of effort
into writing real time encoders or adapting our
encoder to be real time capable because that is
very important for us. MATT FROST: But in terms of
raw latency, it should be faster than VP8. You can decode the first
frame, right? RONALD BULTJE: I think it
will be the same as VP8. So VP8 allows one frame in, one
frame out, and VP9 will allow exactly that same
frame control model. AUDIENCE: So you mentioned that
you’ve asked hardware manufacturers for any concerns
or comments. Have you gotten any yet? MATT FROST: Sorry. Are considering supporting it? AUDIENCE: Well, in terms of
the algorithms and how you would actually– MATT FROST: They’re working
on it quickly. AUDIENCE: But there’s no
concerns or comments or anything yet? MATT FROST: No concerns. AUDIENCE: You said you opened
up for comments. MATT FROST: No. We have received comments. We have a hardware team
internally that took a first pass at comments. We’ve received a couple of
comments additionally just saying, here’s some stuff you’re
doing in software that doesn’t implement well,
and hardware. I don’t foresee a lot of
additional comments from the hardware manufacturers. The other work that we’re doing
over the next 45 days is we had a bunch of experiments
that we had to close out, and so we’re doing some closing
out as well and just finishing the code. Absent act of God, this is bit
stream final on June 17. RONALD BULTJE: So we have
actually received comments from some hardware
manufacturers, and we are actively addressing the ones
that we’re getting. AUDIENCE: OK, thanks. AUDIENCE: Hi. I might have missed this, but
when did you say the ARM optimizations for VP9 are
going to come out? MATT FROST: Actually starting
now really, we’re focusing on doing some optimizations by
ourselves and with partners. So I would say that’s going to
be coming out second half of the year, and it’ll probably be
sort of incremental where you may get an initial pass of
ARM optimizations and then some final optimization. It’s obviously very important
for us for Android to be able to get VP9 working as well as
possible, and obviously, ARM is incredibly important for
the Android ecosystem, so that’s an area of significant
focus. AUDIENCE: And in terms of real
time encoding, so in order to blend into WebRTC,
you’re going to have to get that working. So is this going to coincide
with the assimilation of VP9 into WebRTC? MATT FROST: It’ll be real time
optimizations, which I think we were sort of thinking about
end of Q3, beginning of Q4, and then integration into WebRTC
will follow on that. Obviously, the one thing
I’d say, it’s an open source project. If you guys think that you see
an opportunity, you can go out and do the optimizations
yourselves. There are contractors
who can do it. So I encourage you guys to think
about that, that you can take the code and you can start
working on some of this stuff yourselves. Obviously, we’d love it if you’d
contribute it back but we’re not going to
force you to. Yeah, I guess last question. AUDIENCE: This is a question
about how VP9 relates to what the Android team talked about
with Google proxy and the speedy proxy. You alluded to transcoding
real time for backwards compatible device support. Do you see Google doing the same
thing they’re going to do with images in this proxy and
doing video transcoding to adapt this and use this
for compression mode in the Google proxy? RONALD BULTJE: That’s a really
interesting application, and that’s something that we’ll have
to look into the future. It’s not as easy as it sounds
because video transcoding actually takes some time. So that would mean that you
would actually have to wait a minute while the video is
transcoding until you can visit that website, and that
might not be quite what you’re looking for. But it’s an interesting
application and we might look into that in the future. MATT FROST: I think that’s it. I think we’re out of time. Sorry, but we’re happy to
talk to you afterwards. [APPLAUSE]

60 thoughts on “Google I/O 2013 – WebM and the New VP9 Open Video Codec

  1. Good presentation, but you should apply dynamic range compression to the audio of this YouTube video. As it is, it's hard to hear the questions and one of the presenters.

  2. It's possible that the audio engineers were hesitant to do such a thing due to its audio quality-reducing implications. Perhaps a real-time BS 1770-based gain adjuster would be better.

  3. They have to…. Google uses Macs more than Windows PCs. Actually, I think most Google employees also have a Chromebook as well.

  4. If what they say about the differences between HEVC and VP9 are true, then the ever-so-slight advantage of HEVC encode performance is largely outweighed by the royalty-free and open-source benefits of VP9 and its respective library.

  5. Unfair to compare h264 with vp9. Should have been vp8vsh264 or vp9vsh265

    here are links for vp8 vs h264
    ietf.org/mail-archive/web/rtcweb/current/msg07028.html
    ietf.org/proceedings/86/slides/slides-86-rtcweb-11.pdf

  6. When will the new Google Video Codec be release to the public this is great step forward for video codecs

  7. Compression would cause the already botched audio to sound worse (at a louder volume). Yes, the mistake is "fixable" but only to a certain extent. They should have adjusted their gain structure correctly from the beginning.

  8. How do you know that? I don't think a top of the range processor would ever be the minimum for average encoding. Sure,it's going to be more intensive than the older H264. But not by that much.

  9. This video does not play in HTML5 mode in Chrome 29 (which uses VP9 here). And the same with the few samples I've found. What is wrong?

  10. So the PSNR from h.264 was invalid. At least tune for PSNR and tune for same speed or cpu complexity.

  11. I wouldn't care if it is equal to H.264, I'm using it. I'm tired of MpegLa blocking new innovation, and deciding who can make a new h.264/mp4/dvd/etc device or program because they hold the license. Want to know why your new windows 8 won't play dvd when your win 7 did? Can thank MpegLa license fees to M$ for that. Just downloaded VLC instead, live in the US? You just violated DMCA anti-circumvention law. Seriously, you just committed a federal offence. It is time, only use open codecs!

  12. bitch bitch oh did you know &v is bitch I hard they can't hack a hack if it wase in front of them they are to beet being a bich

  13. Usually the h264 doesn't look that bad. Im actually seeing this video with a HTML5 video player that plays h.264 and its kind of ironic, because only your sample looks bad.

  14. Some things are not absolutely right in your comment, because M$ is one of the MPEG LA licensors. Apart from that… there's a lot of BS corporation that you're right.

  15. Indeed, M$ does have licensing agreements with MPEG LA, however, MLA licensing agreements are complex and per product line. i.e. A license for Winblows 7 to decode mpeg2 (dvd codec) grants only that, not mpeg2 for Win8, MP3 decode for Win7-8, etc. Microsoft contracted a mpeg2 licence for Win7, hence native DVD playback, they didn't for Win8, so too bad user. The reason all comes down to money.

    If DVDs used a free codec, every device/software that wanted to could provide native decoding.

  16. Right now(may change), from the perspective of an average consumer/encoder, they are essentially identical. IF you disregard the financial & innovation burdens that licensed codecs cause, and compare them solely on their technical merits. It's another vhs/beta or hddvd/bluray type deal then. With a properly tweaked encode, your average consumer wouldn't be able to tell you which was which side by side. That said, both have yet to release an optimized version, so neither is practical for use atm.

  17. How can we push VP9 and Daala as the standard? I suppose the best way would be to introduce by the scene. Though, they are still using XviD in some instances and are slow to adopt new codecs because of legacy hardware. Maybe having the capabilities embeded on mobile and set top devices (phones, PS5, Xbox 720) of the future would help.

  18. To help people like netflix see this as a good move you should give some help to firefox, opera and any other project that will cover many people and are already working on it.

    I have been looking at libvpx implementation into firefox but am confused about what version (1.2, 1.3, 1.4?) is the one that will start to support vp9. I am a curious amateur but who else is confused about when this will be out?

    Also drink more tea. Oolong is a good tea. Always drink the best tea you can for life is too short to waste time on bad tea.

  19. "In other cases we've dropped the bitrate down to bitrates that are actually banned by certain UN conventions for the compression of HD video"

    What on earth is he talking about?

  20. hey google, i have an idea for a codec, i am developer but have no idea about video codecs, where should i submit my idea to?

  21. So how much less CPU is this going to use? I am extremely doubtful of this kind of shit when it ends up using more resources instead of less.

    Fuck you.

  22. Interesting skeptical article from an analyst – points out a serious Nokia Patent lawsuit, and that VP9 is much lower quality to HEVC and many other interesting points.  http://www.streamingmedia.com/Articles/Editorial/Featured-Articles/YouTube-and-VP9-A-Made-for-Press-Release-Event-94067.aspx

  23. I was really excited about this, but it's just not cutting it. The problem with VP9 is that the encoder is slow as hell. I mean, even x265 is very slow, but with -preset fast gives better results than VP9's -cpu-used 7 and encodes faster. You should really give us multicore encoding. Also there should be more optimized default settings. x264/x265 doesn't require to do brain surgery to create good results. I understand that YouTube is Google's primary target for VP9 but this will not really earn trust from the community doing HTML5 video outside Google. Even decoding have to be GPU accelerated by now. Why there's no 2K and 4K VP9 videos on YouTube otherwise?

  24. You guys are destroying my videos.
    Those videos are childs play. I've got a challenge for you guys. Fix my videos.

  25. All i can say is that VP9 causes my CPU to increase temperatures by 33% so i switched back to h264. I value my CPU over Youtube saving some money for less bandwith.

  26. VP9 may be "good" for bitstarved low-quality videos, but really sucks for high quality video. A 5000kbps video encoded in VP9 looks significantly worse than a 2500kbps video encoded whit x264.

    And HEVC/H.265 is not better than AVC/H.264, actually. It just destroys the details.

  27. the first speaker sound is ok, second speaker I can't hear a thing. Maybe Google should spend some more time on sound enhancement.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top