The basic solution that streaming video provides is that an entire video file does not need to be downloaded before it is viewed (yay). Little chunks of media are grabbed for playback in order to achieve the same effect. The one caveat is that the media must be received as fast of the player consumes it… the higher the quality of a particular piece of content, then the more space occupies on disk. Therefore, to stream higher quality video a faster network connection is required. When it comes to network speeds there is going to be some variability.
The introduction of Adaptive Bitrate (ABR) streaming technology has made a considerable paradigm shift in the delivery of streaming media. There are two fundamental ideas behind ABR:
- Multiple copies of the content at various quality levels are stored on the server.
- The client device detects its current network conditions and requests lower quality content when network speeds are slower and higher quality content when network speeds are faster.
While these principles are fairly simple, there are technical challenges involved in producing a functional design for an ABR system. Abolve all else is the media segments on the server must be created in such a way that the client application is allowed to switch between higher and lower quality versions at any time without seeing a disruptive change in the presentation like frame jitter, or poor audio continuity. Also, there needs to be a mechanism for the client to discover the characteristics of the ABR content so that it knows what sorts of quality choices are available. The client itself then needs to be able to detect network speed changes and switch to a different quality stream as needed.
A lot of streaming services are using one of several ABR formats. Apple’s HTTP Live Streaming (HLS) and MPEG’s Dynamic Adaptive Streaming over HTTP (DASH) are the predominant technologies. Adobe’s HTTP Dynamic Streaming (HDS) and Microsoft’s Smooth Streaming (MSS) have dropped in popularity in favor of open-standards. Most ABR technologies rely on HTTP as the network protocol for serving and accessing data due to the near ubiquitous support in servers and clients. All ABR technologies specify some sort of descriptive file or “manifest” which describes the locations, quality, and types of content available to the client.
Of all the ABR formats, only MPEG-DASH was developed through an open, standards-based process in an effort to incorporate input from all industries and organizations that plan to deploy it.
The MPEG-DASH specification was first published in early 2012 and has undergone several updates since. As in other formats, media segments are stored on a standard web server and downloaded using the HTTP or HTTPS protocols. While DASH is audio/video codec agnostic, there are profiles in the specification that indicate how media is to be segmented on the server for ISOBMFF and MPEG2 Transport Stream container formats. Additionally, both live and on-demand media types have been given special consideration.
The DASH manifest file is known as a Media Presentation Description, or MPD. It is XML-based and contains all the information necessary for the client to download and present a given piece of content.
The root element in the manifest is named MPD. This contains high-level information about the content as a whole. MPDs can be “static” or “dynamic”. A static MPD is what would be used for a typical on-demand movie. The client can parse the manifest once and expect to have all the information it needs to present the content in its entirety. A “dynamic” MPD indicates that the contents of the manifest may change over time, such as would be expected for live or interactive content. For dynamic manifests, the MPD node indicates the maximum time the client should wait before it requests a new copy.
Within the root MPD element is one or more Period elements. A Period represents a window of time in which media is expected to be presented. A Period can reference an absolute point in time, as would be the case for live media. Alternatively, it can simply indicate duration for the media items contained within it. When multiple Periods are present in an MPD, it is not necessary to specify a start time for each Period in order for them to be played in the sequence that they appear. Periods may even appear in a manifest prior to its associated media segments being installed on the server. This allows clients to prepare themselves for upcoming presentations.
Each Period contains one or more AdaptationSet elements. An AdaptationSet describes a single media element available for selection in the presentation. There may be one AdaptationSet for HD video and another one for SD video. Another reason to use multiple AdaptationSets is when there are multiple copies of the media that use different video codecs, which would enable playback on clients that only support one codec or the other. Additionally, clients may want to be able to select between multiple language audio tracks or between multiple video viewpoints. Each AdaptationSet contains attributes that allow the client to identify the type and format of media available in the set so that it can make appropriate choices regarding which to present.
audio_aac-lc_128k_dashinit.mp4 audio_aac-lc_192k_dashinit.mp4 video_1280x720_h264-2500k_dashinit.mp4 video_1920x1080_h264-4500k_dashinit.mp4 video_512x288_h264-360k_dashinit.mp4 video_704x396_h264-620k_dashinit.mp4 video_896x504_h264-1340k_dashinit.mp4
At the next level of the MPD is the Representation. Every AdaptationSet contains a Representation element for each quality-level (bandwidth value) of media available for selection. Different video resolutions and bitrates may be available for selection and the Representation element tells the client exactly how to find media segments for that quality level. There exists several different mechanisms to describe the exact duration and name of each media file in the Representation (SegmentTemplate, SegmentTimeline, etc.), but we won’t dive into that level of detail in this article.
I want to briefly discuss the features incorporated in the DASH specifications that describe support for encrypted media.
There are actually four separate documents that make up the DASH specification set. Part 1 (ISO/IEC 23009-1) is the base DASH specification. Part 2 (ISO/IEC 23009-2) describes the requirements for conformance software to validate the specification. Part 3 (ISO/IEC 23009-3) provides guidelines for implementing the DASH spec. Finally, part 4 (ISO/IEC 23009-4) describes content protection for segment encryption and authentication. In segment encryption, the entire segment file is encrypted and hashed so that its contents can be protected and its integrity validated. This is different than the “sample encryption” that is the focus of this blog series. For sample encryption, only audio and video sample information is encrypted, leaving container and codec metadata “in the clear”.
The ContentProtection element indicates that one or more media components are protected in some way. ContentProtection elements can be specified at either the Representation or AdaptationSet level in the MPD. The schemeIdUri attribute of the ContentProtection element uniquely identifies the particular system used to protect the content.