Adding Media Overlays to a Children's Ebook Title

December 28, 2014

Adding media overlays is a great feature for children's ebooks, because it highlights words as they are narrated. The process is fairly straightforward and involves three types of files:

  • SMIL
  • X/HTML
  • OPF

For an overview of media overlays, read EPUBZone’s EPUB 3 Media Overlays (

The SMIL files are the key, because they break up the audio narration times, signaling when to highlight which words. But before working on the SMIL files, it’s best to get the audio narration times. The easiest way to do this is by using the free audio editor and recorder, Audacity ( After downloading the program, you can either record your own audio or upload a previously recorded audio file.

For this post, I’m using a children’s book I wrote and narrated as an example, called Apple’s Adventures. The ebook already has all the images, text, fonts, and styles in place.

The first step in Audacity is to label each word. Using the Label Track in Audacity is the quickest way to get the begin and end times for each word in the story. To add the Label Track, click on TracksAdd

NewLabel Track. 

The Label Track will appear below the audio. Press play, and add a label at the beginning of every word. You only need to worry about the beginning of each word, because you can use the begin time of a new word for the end time of the previous word. One keyboard shortcut you can use to add labels is Ctrl B (for PC) or Cmd B (for Mac). 

Label each word. To keep things simple, I used p#w#, where p is the page number and w is the word number.

After you finish adding labels to all the words, you will want to export the Label Track. 

Audacity will export all the start and end times associated with the labels to a txt file. You may notice that the start and end times are the same. For the SMIL files, you will want to use the start time for p1w1 to begin the audio for the first word and the start time for p1w2 to end the audio for the first word. Then use the start time for p1w2 to begin the audio for the second word and the start time for p1w3 to end the audio for the second word, and so on.


Now that all the prep work is done, you will need to open a text editor, such as TextWrangler ( to work on the SMIL, X/HTML, and OPF files. 


There should be a SMIL file for each page with narration. Each SMIL file will contain the following heading and close with the </body> and </smil> tags:

In the body tags, insert the following for each word on the page:

<par id="id1"><text src="../page01.xhtml#p1w1"/><audio clipBegin="7.358141" clipEnd="7.905533" src="../Audio/Apples_Adventures.mp3"/></par>

Note that audio files can also be mp4. 

Everything in red will need to be changed to convey the information in your book. Below is an example of how to implement the code for a whole page:

The reading system will use the 'oebps' folder as the root folder and move forward to subfolders from there (note that in the general case, the root folder does not have to be named ‘oebps'). Therefore, to make sure all files are connected and will work smoothly in the EPUB, it’s easiest to have all the XHTML or HTML files loose in the OEBPS folder. The audio file can be in a subfolder of the OEBPS folder. In this case, I’ve put it in a folder called Audio. Syntax is important, so I made sure to keep the capitalization when referencing it in my SMIL files. 

For longer works, where you want to highlight full paragraphs, it’s helpful to also use the seq element to help represent structures such as sections or lists. The seq element has media that is rendered sequentially. But since this is a children’s book, I’ve decided to only use the par element, which has media that is rendered in parallel. 

The same par id can be used in multiple SMIL documents, but it must be unique within each document. The fragment identifier (after the # in the text src line) however, must always be unique. In this case, I used the same p#w# format for the fragment identifiers. 

The clipBegin and clipEnd times come from the Audacity Label Track. Again, these times signal the time it takes to read each word. Breaking it up this way allows the highlighting of individual words. 


Next, you will need to add some code to each XHTML or HTML file that displays words. The code around each word should look like this (anything highlighted in red will need to be changed):

<span id="p1w1">Word</span>

The span tag relates to the specific times for each word—which is why it’s important to have unique identifiers for each in the SMIL files. 

Here is an example of what an entire XHTML page would look like:


Last, you will need to add all this information to your OPF file, or else none of this will work. The OPF states which files are included. If you’d like, you can add extra information in the metadata section:

<meta property="media:duration">0:04:42</meta>

<meta property="media:narrator">Sabrina Ricci</meta>

<meta property="media:active-class">-epub-media-overlay-active</meta>

The first line refers to the length of the entire audio narration. The second line is the name of the narrator, and the third defines the name of the class that dictates the color of the highlighted word. You could also include more granular metadata by specifying the media:duration of each audio segment. To do that, use 

<meta property="media:duration" refines=”#audio1”>0:00:08</meta>

You will also need to add media-overlay=”audio1” to the end of each reference to an X/HTML page that contains narration. You can change “audio1” to any name, but make sure to use the same name for the SMIL id. Also make sure all the syntax is correct (capitalizations match, etc.). See below for examples of what the manifest looks like when referencing all the audio narration elements. 


CSS Bonus

As a fun bonus, you can choose which color the text should change to as it’s highlighted. Reading systems use a default highlighting color (e.g. blue or yellow), but you can change it to whatever you'd like. Here is some CSS code that will change the highlighted words to red:




You can change the red text to another HTML color ( Note that visual properties other than foreground color can be changed using CSS, such as background color, but also bold or italic formatting.


The current version of epubcheck is 3.0.1, which you can find at can also use the EPUB Validator at IDPF ( if your epub is under 10 MB.


If, after building your epub, you run it through epubcheck and find a long list of errors, don't worry. There is some support for Media Overlays, but you may still see some errors for fallbacks.