HTML5 Tutorial – Captioning video tracks

imageThe <track> element represents a timed text file to provide users with multiple languages or commentary for videos. You can use multiple tracks and set one as default to be used when the video starts.

You can provide a transcript of the video.

This article introduces how can use WebVTT (Web Video Text Tracks) and Media Multiple Text Tracks API as part of your video.

Internet Explorer 10, Chrome, Windows Store apps using JavaScript support the track element within the video element. You can use the track element to add timed text tracks, such as:

  • Subtitles — what you might expect to see while watching a foreign-language film — they’re a transcription or translation of the video’s dialogue.
  • Captions, — for viewers who can’t hear the audio of the video, and include descriptions of non-dialogue sound. For example, if a character in a video slams a door off-camera, the captions would include something like [door slams]. Both subtitles and captions are displayed by the browsers as text overlays on top of the playing video.
  • Descriptions are not displayed visually, but are rather spoken out loud by a screen reader, benefitting viewers who can’t see the video. Not surprisingly, descriptions describe what’s happening visually in the scene.

The <track> element represents a timed text file to provide users with multiple languages or commentary for videos. You can use multiple tracks, and set one as default to be used when the video starts.

The text is displayed in the lower portion of the video player. At this time the position and color can’t be controlled, but you can retrieve text through script and display it in your own way.

For a hands-on demo of the track element in action with HTML5 video, see IE10 Video Captioning and HTML5 Video Caption Maker on the IE Test Drive.

Text tracks use a simplified version of the Web Video Text Track (WebVTT) or Timed Text Markup Language (TTML) timed text file formats.

You define the track inside the video element.

<video id="mainvideo" controls loop>
<track src="en_track.vtt" srclang="en" label="English" kind="caption" default>

The attributes are:

  • kind. Defines the type of text content. Possible values are: subtitles, captions, descriptions, chapters, metadata.
  • src. URL of the timed text file.
  • srclang.Language of the timed text file. For information purposes; not used in the player.
  • label. Provides a label that can be used to identify the timed text. Each track must have a unique label.
  • default. Specifies the default track element. If not specified, no track is displayed.

About WebVTT

WebVTT files are 8-bit Unicode Transformation Format (UTF-8) format text files that look like the following.

00:00.000 –> 00:10.000
This text is related to the first ten seconds of the video
00:10.000 –> 00:20.000
This text is related to the next ten seconds of the video

The file starts with the tag WEBVTT on the first line, followed by a line feed. The timing cues are in the format HH:MM:SS.sss. The Start and End cues are separated by a space, two hyphens and a greater-than sign ( --> ), and another space. The timing cues are on a line by themselves with a line feed. Immediately following the cue is the caption text. Text captions can be one or more lines. The only restriction is that there must be no blank lines between lines of text.

The MIME type for WebVTT files is text/vtt, which you will need to set in IIS.

Getting the Captions

Here is a sample that reads the captions in the WebVTT and puts it on the the page as it is being read. It uses reads the text from the active track to get the cues and inserts them into the display paragraph.

<!DOCTYPE html >
<title>Popeye Script</title>
<script type="text/javascript">
// after elements are loaded
window.addEventListener("load", function () {
var trackElem = document.getElementById('enTrack');
var myCues = trackElem.track.cues;
for (i = 0; i < myCues.length; i++) {
//get the text of the cue
display.innerHTML += myCues[i].text + "<br>";
}, false);
<video id="video1" width="320" height="240" controls>
<source src="video/PopeyeForPresident_qtp.mp4" type="video/mp4">
<source src="video/PopeyeForPresident_qtp.ogv" type="video/ogg">
<track id="enTrack" src="video/engtrack.vtt" label="Popeye"
kind="captions" srclang="en" default>
<div>Your browser does not support HTML5 MPG4 video.</div>
<h3>Partial Script</h3>
<p id="display"></p>

Note: The track must be loaded to read the cues.

For more samples, including one where you can interact with the transcript to take your user to a particular part of the video, see Video (Windows).

Browser Support

Not all browsers support WebVTT directly. IE10 and Windows Store applications do support WebVTT.

You can select from a number of JavaScript libraries available that help you use the WebVTT file format and provide subtitles for your videos, including:

All of these solutions support video subtitles, and some offer additional features.

Server Support

If the MIME type for the video is not set correctly on the server, the video may not show or show a gray box containing an X (if JavaScript is enabled).

IIS supports serving .Ogg, WevM, and MP4, but you will need to set your MIME type for your track element.

For more information about setting MIME type, see Add a MIME Type (IIS 7).


Apache support is explained in <video> on Mozilla.


Sample Code

Sample code is available in the DevDays GitHub repository. See