EPUB 3 Validation

February 24, 2014

Adapted with permission from EPUB 3 Best Practices (O'Reilly Media)

To paraphrase a common expression, there are three things you need to know about a finished EPUB 3 file: it must adhere to rules, rules, and damned rules.

Which is only to say that there is nothing unique about EPUB 3 as a document publishing format. The rules are there to ensure that your content can be opened and rendered by any reading system. They can’t tell you how your content will look on any given reading system, but they can alert you to bugs that are the result of bad markup. If you skip the validation stage and assume that just because it seemed to be fine testing in a reading system, or a program exported it so it must be valid, you run the risk of a lot of wasted time and effort later.

Some vendors will prevent your file from being distributed if it doesn’t validate (which is a good thing), in which case you’ll be forced back to this step right away should you try to avoid it. Others may not, or you might distribute the file yourself, in which case it might only be as you get flooded with angry emails from customers that you’ll learn all the things you did wrong. Once your reputation is tarnished, even if just in a comments section on a product page, it can be hard to get back. No one appreciates someone who hasn’t bothered to do basic validation to ensure their content renders, after all.

This article diverges slightly from the best practices pattern of other pieces on EPUB 3. The best practice is simply to validate your content. Instead, this article looks at how to get up and running with the epubcheck validation tool and then spends some time looking at some of the most common error messages you’re likely to encounter, including breaking down where they come from in the validation process.

epubcheck

The epubcheck tool is the gold standard as far as EPUB 3 validation goes. The tool has been around since the early EPUB 2 days. It was originally developed by Adobe but is now maintained as an open source tool by the IDPF. It’s free to use and modify as you need.

It has also improved significantly both in terms of the scope of what it checks for and the comprehensibility of the error messages it returns, part of a major upgrading it has undergone in conjunction with the release of EPUB 3. There’s still work to be done to add CSS and scripting support, but it’s come a long way from where it was.

Installing

Before downloading epubcheck, you will need to verify that you have Java installed on your computer. Any version of the Java Runtime Environment will do, which is the version that gets installed when you install Java for your browser (available from the Java website).

Note

To simplify calling Java from the command line, you will need to add the path to the Java executable to the PATH environment variable. On Windows machines, for example, this path is typically c:\program files\java\jre7. Instructions on how to add this variable are operating system dependent, but plenty of resources exist on the Web. You can omit this step, but it means manually entering the full path to Java every time you want to run epubcheck.

epubcheck is currently hosted on a Google projects site under the same name. The latest stable build is typically linked to from the main page, but can also be found by clicking on the Downloads tab.

epubcheck does not have an installer, but instead comes as a ZIP file containing the necessary libraries to run. After downloading, simply unzip the contents to a directory in your operating system. The folder will contain the epubcheck .jar file and a directory called lib, which contains additional libraries that epubcheck depends on to run, as shown in Figure 11-1.

That’s it. You now have epubcheck installed on your computer.

 Directory listing
Figure 11-1. epubcheck distribution files

Running

epubcheck is a command-line tool, meaning that you’re going to have to become familiar with your operating system’s command shell. If you’re already familiar with the command line and how to run Java, you can skim this section to get the command line call. If not, the first task is bringing up the command shell:

  • Windows users, click the Start menu button and type cmd in the Run box (XP users) or the new search box at the bottom of the Start menu (Vista and Windows 7).
  • Mac users need to go to the Applications/Utilities directory and double-click on Terminal.
  • Linux users may find the shell in a number of different places, and under a number of different names, depending on the flavor and version of Linux they are running.

One of the nuisances of a command-line tool like epubcheck is entering all the necessary paths in order to get it to run. Adding the java executable location to the PATH variable allows you to call it without having to type the full directory path, but what directory you invoke epubcheck from will affect the other paths you have to specify.

If you try to run epubcheck from the default directory your command shell opens in, you’ll need to add the full path to both the epubcheck .jar file and your EPUB:

$ java -jar c:\epubcheck\epubcheck.jar c:\books\mybook\xyz.epub

If you change directories in the command shell to the epubcheck directory, you can avoid having to specify the full path to the .jar file:

$ java -jar epubcheck.jar c:/books/mybook/xyz.epub

Conversely, if you navigate to your book directory, you just have to specify the path to the .jar file.

Note

The actual epubcheck .jar file typically has a build number appended to the end of it (e.g., epubcheck-3.0-RC-1.jar). This build number will be omitted from the examples in this article, because it is subject to change.

Either way is a nuisance, but you can use a couple tricks to speed things up. The simplest is to use the autocomplete feature that most command shells provide. If you start typing the name of a file or directory, you can press the Tab key to fill the name in automatically. For example, to quickly insert the epubcheck .jar file, you could start by typing this:

$ java -jar c:\ep

Pressing the Tab key should expand the directory to epubcheck (if you had another directory in your root drive starting with ep, simply press the Tab key again to rotate through the possible options). You can then repeat this shortcut to add the .jar file. Because there is only one file starting with the letter e in the epubcheck folder, again you could type the one letter e:

$ java -jar c:\epubcheck\e

Then press the Tab key to expand to the full .jar file name.

If you don’t like typing at all, another option is to open both the epubcheck and book directories first (e.g., in a My Computer or Windows Explorer window on Windows, or a Finder window on Macs). You can then drag and drop the files into the command shell. For example, first type the Java commands:

$ java -jar

Then drag the epubcheck .jar file onto the Terminal window and drop it. The full path to the file will be automatically inserted:

$ java -jar C:\epubcheck\epubcheck.jar

You could then do the same to add the EPUB file to validate.

A final option is to create a script to automatically run epubcheck for you. On Windows, create a new text file containing the following command:

$ java -jar c:\epubcheck\epubcheck.jar %*

Save this file as epubcheck.bat in the epubcheck directory. On Linux and Macs, an equivalent shell script might be:

#!/bin/sh
java -jar ~/epubcheck/epubcheck.jar $@

Save this file as epubcheck.sh.

You can now add the epubcheck folder to the PATH environment variable, as you did earlier for the Java executable. If you close and re-open your terminal window after making this change, you can now invoke epubcheck from any directory simply by typing the name of the file you just created, as shown in Figure 11-2.

epubcheck.bat file invoked with default output that at least one argument is expected
Figure 11-2. Invoking the batch file on Windows

To validate a file, all you need to do now is specify its path after the script filename, regardless of what directory your terminal window initializes in:

$ epubcheck.bat c:/books/mybook.epub

Again, you could drag and drop the EPUB file if that’s simpler.

One last trick you can use to improve the command-line experience is to pipe the output to a file for easier reading. Command shells are awfully little windows to try to read error messages in, and flipping between the window and your content to find and understand the problems quickly becomes a headache. Depending on how the command shell is configured, and how many errors and warnings your book has, you may not even be able to scroll back to the beginning of the report, meaning the most critical error might no longer be discoverable.

You aren’t restricted to working in the command shell, though. To pipe errors to a file, you add the number 2 followed by a right angle bracket (>) to the end of the command that invokes epubcheck, and then include the path and name of the file to write to.

For example, to pipe errors to the file c:/books/error.txt, invoke epubcheck like this:

$ java -jar epubcheck.jar c:/books/mybook.epub 2> c:/books/error.txt

As long as you are working with a text editor that automatically updates open files, you should be able to run the command over and over and immediately see the new results. The command shell window in Figure 11-3 shows only the information written to standard output. Errors are listed in the specified text file.

Error messages are written to a text file, while general information remains in the command window
Figure 11-3. Redirecting epubcheck errors to a text file

Options

This section quickly reviews the different ways you can call epubcheck to validate EPUBs.

Validating EPUB archives

The typical use for epubcheck is to validate a completed EPUB archive. To do so, simply include the path to your EPUB after invoking the epubcheck .jar file:

$ java -jar epubcheck.jar c:/books/mybook.epub

Make sure there are no spaces in the directory path to your EPUB or in the filename itself. If there are, you must enclose the entire path in quotes:

$ java -jar epubcheck.jar "c:/Users/matt/My Documents/EPUBs/My Book.epub"

or URI-escape the spaces as %20:

$ java -jar epubcheck.jar c:/Users/matt/My%20Documents/EPUBs/My%20Book.epub

If you forget to do this, the following cryptic error is generated:

$ java.lang.RuntimeException: For files other than epubs, mode must be
specified!

epubcheck will interpret the path as three separate arguments because of the spaces: c:/Users/matt/My", "Documents/EPUBs/My and Book.epub. Because the first part of the path does not appear to be an EPUB, since it has no extension, epubcheck will report that error and stop processing. The next couple of sections demonstrate what the mode argument does.

Validating unpacked EPUBs

Although most people reach the validation stage only at the very end of a project, when they have an archive file for distribution, it’s not the only workflow that epubcheck can handle. Being able to work on the unzipped files is extremely helpful, and if you have a folder containing the full structure of your EPUB (mimetype file, META-INF directory and content), you can run epubcheck on it using the mode argument as follows:

$ java -jar epubcheck.jar c:/path/to/book -mode exp

The exp value is short for expanded, which doesn’t mean that epubcheck will run more tests, just that the input is an unpacked EPUB. This feature saves you from having to zip up your content each time you fix an error in order to see whether your publication will successfully validate.

A related, and largely unknown but extremely useful, feature of epubcheck is the ability to generate an EPUB archive after successful validation of an unpacked directory. If epubcheck returns a successful report (no errors, only warnings), you can request that it also zip up the directory contents by adding a save argument to the command:

$ java -jar epubcheck.jar c:/path/to/book -mode exp -save

If all goes well, you’ll find a finished .epub file in the directory where you ran the command. epubcheck will use the folder name containing your publication for the finished file.

Note that if you get the following error message, it means that you’re working in a directory where you can’t write the finished archive file:

java.lang.NullPointerException
   at com.adobe.epubcheck.util.Archive.createArchive(Archive.java:102)
   at com.adobe.epubcheck.tool.Checker.run(Checker.java:188)
   at com.adobe.epubcheck.tool.Checker.main(Checker.java:177)

The windows command shell initializes by default in the write-protected Windows\system32 folder, for example. If you change the current directory to one where you have write permissions, the process will run smoothly. Linux and Macs typically start in the user’s home directory, so this error should be less common, but if you can’t find the file after epubcheck builds it, always check from the directory in which you ran the command.

Validating EPUB component files

You also have the option to validate individual component files using epubcheck (e.g., to validate content before going through the process of zipping your content up into a distribution archive).

To invoke epubcheck on individual files, you need to add the following two arguments to the command line:

mode

The type of file that is being validated. The value must be one of the following:

mo
Media overlays
nav
Navigation document
opf
Package document
svg
SVG content document
xhtml
XHTML content document
version
The version of EPUB that the file conforms to. The value can be either 2.0 or 3.0.

To validate a navigation document, for example, you’d invoke the following command:

$ java -jar epubcheck.jar nav.xhtml -mode nav -version 3.0

Although you can use any mode for EPUB 3 validation, only opf, svg and xhtml can be used to validate EPUB 2 content.

Assessment reports

A new experimental option has been included in the latest version of epubcheck: the ability to generate an assessment report. These reports are XML files that not only contain the errors and warnings generated by epubcheck, but also provide various metadata about the EPUB, such as the Dublin Core metadata properties that have been set, the language of the publication, and what properties are known about its content (e.g., that it contains audio, video, MathML, script, etc.).

To generate a report, you must use the -out argument followed by the file to write the assessment to:

$ java -jar epubcheck.jar c:/path/to/book -mode exp -out c:/reports/book.xml

At the time of this writing, the report format was not documented on the epubcheck site, but it is described as an extension of the documentMD format. Each report contains a root doc element, which always contains a child document element. This element lists the extracted information:

  • The documentInformation element lists the filename (fileName) of the EPUB followed by all Dublin Core properties found (each listed in an element corresponding to its local name).
  • The formatDesignation element lists the EPUB mime type and version number (formatName and formatVersion, respectively).
  • The assessmentInformation element indicates whether the validation run was successful or not (outcome). If warnings or errors are reported, each message will be included in an outcomeDetailNote element (the type of message is not identified in the markup, but can be determined by the presence of WARNING or ERROR at the start of the element).
  • The characterCount element provides the total character count of all text data.
  • The Language element provides the language of the publication as set in the package document.
  • Zero or more Font elements list all embedded fonts.
  • Zero or more Reference elements list all the external links and references.
  • Zero or more Features elements list all the unique properties of the content, as defined in the properties attributes on manifest entries.

Here’s an example of a condensed assessment report:

<doc>
 <document creationDateTime="2012-03-01T20:55:42-05:00">
  <documentInformation>
   <fileName>accessible_epub_3-20121006.epub</fileName>
   <identifier>urn:isbn:9781449328030</identifier>
   <title>Accessible EPUB 3</title>
   <creator>Matt Garrish</creator>
  </documentInformation>
  <formatDesignation>
   <formatName>application/epub+zip</formatName>
   <formatVersion>3.0</formatVersion>
  </formatDesignation>
  <assessmentInformation agentName="epubcheck" agentVersion="3.0-RC-1">
   <outcome>Not valid</outcome>
   <outcomeDetailNote>ERROR: FreeSerif.otf: resource missing</outcomeDetailNote>
  </assessmentInformation>
  <CharacterCount>208463</CharacterCount>
  <Language>en</Language>
  <Font FontName="Free Serif" isEmbeded="true" />
  <Reference>http://shop.oreilly.com/product/0636920025283.do</Reference>
  <Features>hasScript</Features>
 </document>
</doc>

Although these reports are primarily designed for automated workflows, they provide an interesting peek into your EPUBs.

That’s as deep as we’ll go into this feature, though, because it’s still an early experiment and may have changed by the time you read this article. The epubcheck site should be updated to include more information as the report format is formalized, so you can check there for changes.

Help

If you’re ever in doubt about how to call epubcheck or want to verify whether features are still supported or new ones have been added, you can request a help listing from the program. Simply add the help argument after calling the .jar file:

$ java -jar epubcheck.jar -help

You should get information about the program and a listing of options similar to the following:

Epubcheck Version 3.0-RC-1

When running this tool, the first argument should be the name (with the path) of
the file to check.

If checking a non-epub file, the epub version of the file must be specified
using -v and the type of the file using -mode.

The default version is: 3.0.

Modes and versions supported:
-mode opf -v 2.0
-mode opf -v 3.0
-mode xhtml -v 2.0
-mode xhtml -v 3.0
-mode svg -v 2.0
-mode svg -v 3.0
-mode nav -v 3.0
-mode mo  -v 3.0 // For Media Overlays validation
-mode exp        // For expanded EPUB archives

This tool also accepts the following flags:
-save        = saves the epub created from the expanded epub
-out <file>  = ouput an assessment XML document in file (experimental)
-? or -help  = displays this help message

Reading Errors

Now that you have a grasp on how to invoke epubcheck to run a validation report, the next challenge is reading the error reports that come back from it. Later sections of this article will get into much more detail about what the errors themselves indicate, but this section looks at how to make sense of all the information that gets reported to simplify tracking down and correcting errors.

A typical message from epubcheck follows this basic pattern:

[ERROR|WARNING]: [file](line,offset): Message

The following is a sample error message that results if a closing quote character is omitted from a class attribute, for example:

ERROR: c:/epub/accessible_epub_3.epub/EPUB/ch01.xhtml(10,44): The value of
attribute "class" associated with an element type "section" must not contain
the '<' character.

Here you can see that this is an error (must be fixed to pass validation), that it is in the file /EPUB/ch01.xhtml inside the EPUB archive c:/epub/accessible_epub_3.epub, and that the error has been found 44 characters into line 10. Even if you don’t have an XML-aware editor, jumping to the exact line and character offset should be easy to do in any text editor.

You may not always get file, line, and offset information, depending on the problem. When epubcheck verifies that all items listed in the manifest are in the archive, it does not maintain information about the original package document XML. Consequently, if you have an entry for a nonexistent file, you’ll get an error like this:

ERROR: c:/epub/accessible_epub_3.epub: OPS/XHTML file EPUB/pr01a.xhtml is missing

This is when being able to interpret where errors are coming from and what they mean is going to be critical. You need to know that all your files are listed in the package document manifest to even begin figuring this kind of message out.

You may also find that the line and character offsets seem misleading. If you were to forget a closing aside tag early on in your file, it may not get reported as an error until the containing section gets closed:

<section>
   ...
   <aside>
      <p>...</p>
      <-- forgot a closing tag here on line 22
   <p>...</p>
    ...
   <p>...</p>
</section> <-- but error reported here on line 196

The error message resulting from this tagging might be as follows:

ERROR: c:/epub/accessible_epub_3.epub/EPUB/ch01.xhtml(196,3): The element type
"aside" must be terminated by the matching end-tag "</aside>".

People new to validation typically want to know why the error location isn’t reported on the opening tag to simplify fixing the problem, but you have to bear in mind that there is no problem with the opening tag. The problem is with the closing tag, or lack of one before the section closes, and that doesn’t occur until line 196 in this case. The validator does not backtrack to the opening tag to report where the aside opened, because validators simply report what is wrong. For all the validator knows, you simply forgot the end tag at that point.

Part of validating is doing the sleuthing to find where these kinds of problems originate. Just hope that there aren’t a lot of asides in your file, because you’ll have to check each one in turn to find the broken one! An aside can contain another aside, like a div can contain a div, so the error could take a bit of time to track down.

Beyond the Command Line

Running epubcheck from the command line is not the only option available. Integrating the library more seamlessly into internal workflows is an option, of course, but requires developer help. For those who don’t have those kinds of resources available, this section reviews a few other options that can simplify the validation process.

Web Validation

The IDPF currently maintains a web-based version of epubcheck at http://validator.idpf.org/. To run the validator, you simply select your EPUB file and click the Validate button on the page, as shown in Figure 11-4.

 Upload form includes a button to select the EPUB and another to begin validation
Figure 11-4. IDPF EPUB Validator service

The current version of epubcheck also powers this web service, but instead of command shell error output, you receive messages in the more human-readable table format shown in Figure 11-5.

Result table includes columns for type of error, location of the file, line and character offsets, and the error message
Figure 11-5. Web validation results

The web results make it simpler to identify the error type, file, line, and character offset of the reported problem, but this information is the same as is provided in the command shell results shown previously.

Unfortunately, the web interface is not for use by anyone doing commercial validation, and it also has some limitations that work against it even for users who meet the use criteria. For one, you are capped to a maximum file size of 10 MB. While this is not going to be problematic for simple text works, any publication with images, audio and video content, or embedded fonts will quickly go over the cap. It can also be a nuisance, and waste of bandwidth, to continuously upload your EPUB over and over to the IDPF server in order to have it validated. It’s not the fastest or most effective use of time depending on how big your EPUB is. Learning to use epubcheck from the command line is a better long-term strategy.

The source for the web service is also not available for general download as of this writing, but it could be made available at a later date once the validator moves out of its beta phase. Installing the service locally, whether on an individual PC running a web server or in a corporate environment, would greatly simplify the validation process for anyone wanting to avoid command line and/or commercial options.

It is possible to use the .jar file to create your own web service, but you would have to add a layer to it to parse and format the results in order to provide equivalent table markup.

Graphical Interface

A much-desired feature for epubcheck has been to add a graphical interface to simplify the whole process we’ve just gone through of configuring programs and paths and selecting files. Unfortunately, at this time, it remains a much-desired feature. The developers are aware of the need, so stay tuned.

Commercial Options

Although the process to manually call epubcheck can seem tedious, especially if you aren’t a developer who is regularly in the command shell, there are programs that natively integrate epubcheck and/or can be configured to run external tools like epubcheck from within them.

Prime among these is oXygen Editor (shown in Figure 11-6), which has native support for EPUB 2 and 3 markup editing. oXygen allows you to drag and drop your EPUB archive directly into the program, enabling editing of the content files without having to unzip. It also includes built-in support for the latest epubcheck validator, so all you have to do is click a button to validate your archive. It is also nondestructive, in that it will not modify your source markup when saving and validating.

Application has panes for browsing the EPUB archive, editing content documents, and reporting errors.
Figure 11-6. oXygen editing interface

Perhaps the most useful feature that oXygen provides is the ability to jump directly to the listed error. By double-clicking on an error in the result pane at the bottom of the program, the file will be automatically loaded (if not already open) and jumped to the corresponding line. oXygen also shows validation errors in red on the side of the text editor, enabling quick location and correction.

Understanding Errors

It’s somewhat disheartening to discover that the program you used to author your EPUB has generated invalid content, but it’s not atypical. There often aren’t straight 1:1 mappings when dealing with export routines that go from an internal layout format to a distribution like EPUB. Adding to that, developers often try to help these processes through heuristic and natural language parsing tricks. The resulting content may appear to be okay in a reading system, but tag soup is not just invalid to the theoretical purity of specifications but causes real-world problems for anyone using the markup to navigate.

This article can’t possibly be a reference to every single error that you might encounter in every technology that EPUB incorporates, but this section will walk through the main validation stages and look at what can go wrong. Hopefully, with a sense of what epubcheck is doing, even if you can’t find your particular problem here, you’ll find some hints to where you should be looking.

The other consideration is that error messages change over time, with the hope of making them easier to understand. The obvious result is that the error messages you find in the following sections may not exactly match what epubcheck reports depending on the version you’re using.

If there is one best practice to give when it comes to understanding errors (one learned from many years validating markup data), it’s to always start with the first error reported. Validators don’t generally stop at the first problem they find, and the result can be many, many erroneous errors that are simply related to the first problem (e.g., forgetting to include a closing tag can cause every following element to be reported as invalid).

A second, closely related tip, is to validate often. The way that errors cascade can result in some odd issues appearing in your report, so never assume you can always pick out which ones are related to an earlier problem and which ones are unique. That’s how you spend time searching for improbable solutions to problems that didn’t actually exist. Running validation reports is quick and free, so when in doubt, run the report again.

And finally, note that validators are not infallible. You might find errors being reported that shouldn’t be, but there’s a difference between an incorrect check of a specification requirement and not being sure what an error means. If you are unsure whether it is the validator that is wrong or your understanding of the message, seek assistance. The IDPF forums are a friendly venue where you can ask for help deciphering error reports.

Common XML Errors

As EPUB is a predominantly XML-based format, there are a number of common errors that get reported across document types. If a document is not well formed, or does not meet schema requirements, the error message does not change, only the element and attribute names. Rather than list the same issues over and over, this section will tackle these problems once.

Document errors

Each XML file must have a single root element (e.g., for XHTML documents, this is the html element). epubcheck will generate the following errors if it finds XML files that aren’t conformant to this requirement:

Content is not allowed in prolog
This error occurs when you have text content before the root element. Only the XML declaration, processing instructions, and doctype declarations can precede the root element. It may also be a sign that you’ve accidentally specified a text file with an XML media type in the manifest.
Content is not allowed in trailing section
This is the opposite error, where you have text or markup content after the closing root tag.

Element errors

If your markup is not conformant to the schema for a given document type, you’ll receive the following errors:

Element X not allowed here; expected the element end-tag, text or element A,B,C

This is probably the most common element error you’ll encounter, and A,B,C usually ends up being a wildly long list of alternative elements. This error can occur either when you’ve used an element where it’s not allowed (e.g., putting a div inside of a p in a content document), or have accidentally forgotten to close an element (e.g., omit a closing </p> tag and every following sibling block element will register as an error).

This error can also indicate that you’ve inserted an element out of order. The figcaption must be the first or last element in a figure, for example. If you place it anywhere else, you’ll get this error on the elements that follow it. The same applies to table markup. The order of the major divisions in the package document is also enforced (metadata, manifest, spine, guide, bindings).

This error can also occur if you forget a namespace. MathML and SVG in HTML5 do not require namespaces, for example, so if you forget to declare one, or copy an HTML5 example from the Web, the element might be reported as invalid.

And finally, make sure that you’re using lowercase element names. XHTML is case sensitive, so you cannot use element names like H1 and SECTION. If you do, you will also receive this error that they are not allowed. All elements names in the package document are also lowercase. It would be nicer if a distinction could be made between elements that aren’t defined and elements that aren’t allowed, as used to be the case, but that’s a limitation of the RelaxNG schemas you just have to work around.

Element X incomplete; expected A,B,C
This error often occurs in the package metadata if you omit one of the three required Dublin Core elements. XHTML content documents don’t have a lot of requirements, but they do exist (e.g., the ruby element requires at least one child rt). You’re more likely to encounter this error when you add MathML or SVG to your content documents, as there tend to be more dependencies.
The prefix X for element Y is not bound
You’ve used a prefix on an element without declaring it (e.g., dc: on the package metadata elements without declaring xmlns:dc="http://purl.org/dc/terms/"). XML declarations are often included on the root element but can be scoped to the most relevant element (e.g., the Dublin Core namespace is typically declared on the metadata element, not the root package element, because Dublin Core elements are not used outside of the metadata section). This error also occurs in content documents when MathML and SVG embedded without a namespace declared.
Element X missing required attribute Y
The specified attribute cannot be omitted. An example is the unique-identifier attribute on the package document. In content documents, forgetting href and src attributes is often the cause of this error.
Element type X must be followed by either attribute specifications, ">" or "/>"
This error occurs either when you’ve omitted a closing quote character on an attribute or have forgotten the closing angle bracket on the element.

Attribute errors

Likewise, if you use attributes improperly, schema validation will return the following errors:

Attribute X not allowed here; expected attribute A,B,C
One error type that attributes share with elements is being used in the wrong place. It’s not valid to use a name attribute on a elements anymore in HTML5, for example. Attributes are also case sensitive, which can cause this error.
The prefix X for attribute Y associated with an element type Z is not bound
Forgetting to declare namespaces is another shared issue. If you receive this message, you don’t have an in-scope namespace declaration. This problem typically occurs when using the epub:type attribute without declaring the namespace on the root html element, for example.
Duplicate ID X
In this case, you have two or more id attributes in the same file with the same value. You will need to manually inspect the attributes to determine which one needs to be changed.
Value of attribute "id" is invalid; must be an XML name without colons
This error most often occurs when an id attribute value is numeric (e.g., id="1"), begins with a number, or contains invalid characters. Although HTML5 has relaxed the restriction that all ids start with an alphabetic character, other XML formats allowed in EPUB 3 must still conform to this naming.
The value of attribute X associated with an element type Y must not contain the '<' character
This error may indicate that you’ve included a left angle bracket character in an attribute, but more often is an indication that you missed a closing quote character on an attribute (i.e., the validator sees the next tag as part of the attribute value).

Character encoding

All XML formats defined by the EPUB specification, including XHTML content documents, must be encoded as UTF-8 or UTF-16. The following errors may occur if your documents do not conform:

Only UTF-8 and UTF-16 encodings are allowed, detected X
Verify that the file is actually encoded as UTF-8 or UTF-16 (don’t trust the XML declaration).
Malformed byte sequence: X. Check encoding
This error typically arises when content in one encoding is pasted into a document encoded in another, but can also occur if you transcode your content from one character set to another. It indicates that there is a sequence of bytes that don’t conform to the Unicode specification, so they cannot be resolved to a character. When you view the file, you may not see anything at the location, as the malformed byte may not show as character data. epubcheck does not provide more detailed information, so to find the exact location, you’ll typically need to open the invalid file in an XML editor that can report the exact location.
Any Publication Resource that is an XML-Based Media Type must be a conformant XML 1.0 Document. XML Version retrieved: #
You cannot use XML version 1.1 for XML content. If you’ve included an XML declaration at the top of your file, make sure that the version pseudoattribute is set to 1.0 (<?xml version="1.0"?>)

Note that CSS style sheets must also be encoded as UTF-8 or UTF-16. If you create your CSS files as plain ASCII text files, you should not receive an error. The ASCII character set maps to the same range of characters in UTF-8, so all ASCII text files are valid UTF-8 files.

Linking errors

One of the handier features of epubcheck is that it will verify all internal links to see if they can be resolved, and report problems if not:

'X': referenced resource missing in the package
You’ve attempted to link to the file X, but a matching resource could not be found in the container. Check that the resource exists and that there is an entry for it in the package document manifest.
'X': fragment identifier is not defined in 'Y'
The file Y could be located, but there isn’t an element inside it with the id X. Typos and renamed IDs are the most common cause.

Container Errors

Container errors can be some of the most perplexing to solve, because they often arise as a result of the way the content has been zipped up. In order to ensure that your EPUB can be opened and the content discovered, you need to ensure that there are no problems with the packaging. To that end, epubcheck verifies that your EPUB meets all the following conditions:

File name contains characters disallowed in OCF file names: X
See section 2.4 of the OCF specification for a list of characters that must not be used in your EPUB filename or any files in it.
Filename contains spaces. Consider changing filename such that URI escaping is not necessary
This message is actually just a warning. It is generated because it’s possible that a poorly designed reading system might break if there are spaces in your file names (e.g., failing to encode the spaces properly as %20), not because problems are expected.
File name contains non-ascii characters: X. Consider changing filename
This message is also a warning, similar to the preceding one. Although modern operating systems have no issue with non-ASCII characters in filenames, processing tools sometimes do. If you are targeting older EPUB 2 reading systems, this may be a concern, but it should not affect your decision to use these characters in EPUB 3.
Filename is not allowed to end with '.'
Ending filenames with a dot is a little more serious, so this is an error. Some operating systems do not handle filenames so named, which can break rendering.
Corrupted ZIP header
Occurs if the container does not begin with the string PK (i.e., it is not a valid ZIP file).
Cannot read header
Some form of file corruption has occurred that is preventing the ZIP file from being read.
Length of first filename in archive must be 8, but was #
If the first filename found in the archive is not eight characters, it cannot be the required mimetype file. If you manually zip your archive, you must add the mimetype before you add any other files.
Mimetype entry missing or not the first in archive
This error is rare but can occur if the first file is eight characters long (to get past the previous check) but is not the mimetype file. Again, check how the archive has been zipped.
Extra field length for first filename must be 0, but was #
Indicates that there is character data between the mimetype filename and its content (the extra fields are being used). This error can occur if the program you’ve used to zip the container adds additional metadata.
Mimetype contains wrong type (application/epub+zip) expected
Ensure that the media type has been typed correctly.
Mimetype file should contain only the string "application/epub+zip"
Ensure that there are no extra spaces or linebreaks in the file.

epubcheck will also verify that the package document can be located by a reading system. The following errors indicate problems with this discovery process:

Required META-INF/container.xml resource is missing
Somewhat self explanatory. The container.xml file is a required file in the META-INF directory, because it identifies the path to the package file.
No rootfiles with media type 'application/oebps-package+xml'
The container.xml file was found, but it does not contain an entry for the package file (check the mediaType attribute value correctly matches the one in the error).
Entry X not found in zip file
The package document could not be found at the location specified by the rootfile element.

Package Validation

If you think of validation as a progression through Dante’s rings of hell, if your content is packaged properly, the next ring you’ll find yourself in is whether your package document has been properly constructed.

The following metadata problems will be reported:

unique-identifier attribute in package element must reference an existing identifier element id
The unique-identifier attribute does not point to the id value of a dc:identifier element in the metadata section.
character content of element "X" invalid; must be a string with length at least 1
All metadata must be at least one character in length (whitespace does not count). This error indicates that an empty value was found.
Package dcterms:modified meta element must occur exactly once
The dcterms:modified property indicates the last modification date of the EPUB, and is used to create the publication identifier, so only one can be included in the metadata.
dcterms:modified illegal syntax (expecting: 'CCYY-MM-DDThh:mm:ssZ')
Check the time and date specified in the dcterms:modified property matches the specified format. You cannot use abbreviated dates or omit the timestamp.
@refines missing target id: 'X'
If the refines attribute begins with a hash (#), it must reference the ID of an item in the manifest. This error occurs when a match cannot be found.

epubcheck will also check for problems with resources that should, or shouldn’t, be listed in the manifest:

X file Y is missing
You have a reference to a missing resource. X identifies the type of file, but is not the media type (OPS/XHTML is used for XHTML content documents, image is used for any image type, etc.)
item (X) exists in the zip file, but is not declared in the manifest file
Warning that you have a file in the container that is not listed in the manifest.
'http://X/Y/Z': remote resource reference not allowed; resource must be placed in the OCF
This error indicates that you’ve invalidly referenced a resource outside the container (e.g., a file on the Web from an object tag). In older versions of epubcheck, this error was also emitted when remote audio/video clips were not listed in the manifest.

epubcheck will also emit the following errors if resources don’t match the information supplied about them:

Item property: X is not defined for: Y
You’ve attached a properties attribute value to a file type to which it doesn’t belong (e.g., mathml on a JPEG)
This file should declare in opf the property: X
This message occurs when a content document contains a feature that hasn’t been declared in the properties attribute (e.g., scripting).
This file should not declare in opf the properties: X
The listed properties cannot be verified and should be removed from the properties attribute for the item.
Exactly one manifest item must declare the 'nav' property (number of 'nav' items: #).
You didn’t specify the navigation document or specified more than one.
Multiple occurrences of the 'cover-image' property (number of 'cover-image' items: #).
Similarly for the cover image.
Object type and the item media-type declared in manifest, do not match
The media type declared in the media-type attribute on the manifest entry does not match the type attribute specified on the object tag in the content file.

epubcheck will also alert you if you haven’t provided a core media type fallback for a foreign resource:

Manifest item element fallback attribute must resolve to another manifest item (given reference was 'X')
The ID referenced in the fallback attribute does not point to another item. Check for a typo and that the fallback hasn’t been removed.
Spine item with non-standard media-type 'X' with no fallback
You’ve referenced a file that is not an XHTML or SVG content document from the spine without providing a fallback to one of those two.
Spine item with non-standard media-type 'X' with fallback to non-spine-allowed media-type
Again, you’ve referenced a foreign resource from the spine, but this time the only fallback found is to another foreign resource.
Circular reference in fallback chain
Each fallback in a fallback chain must be to a unique resource. If one resource in the chain references another earlier in the chain, you end up in an endless loop of incompatible formats.

When adding media overlays, the required metadata is also verified:

Media overlay items must be of the 'application/smil+xml' type (given type was 'X')
Media overlays are SMIL files, so you must make sure the correct media type has been given (i.e., not application/xml).
Item media:duration meta element not set (expecting: meta property= 'media:duration' refines='#X')
When attaching a media overlay to a content document, you must add a meta element indicating the total audio duration for that document. This error indicates that this property is missing.
Global media:duration meta element not set
You must include a meta element with no refines attribute containing the cumulative time of all the individual overlays.

And finally, there are a couple of possible errors tested for if you include an NCX for rendering in EPUB 2 reading systems:

spine element toc attribute must reference the NCX manifest item (referenced media type was 'X')
The value of the toc attribute on the spine must be the same as the id attribute on the manifest entry for the NCX file.
spine element toc attribute must be set when an NCX is included in the publication
If you include an NCX, you must add a toc attribute to the spine.

Content Validation

Now, moving to the stage of validating content, providing lists of checks and error messages becomes more difficult. Although the EPUB 3 specification imposes some requirements and restrictions, error messages are more likely to come from the underlying technologies it employs.

Fortunately, many of the issues you’ll run into at the content level are similar in nature. In XHTML and SVG content documents, most errors are related to the invalid use of markup we covered in the generic XML errors section.

The following XHTML-specific errors may be reported by epubcheck.

General document and header errors:

The lang and xml:lang attributes must have the same value
When using both language attributes on the same element, their values must match.
There must not be more than one meta element with a charset attribute per document
A document only has one character encoding, so specifying the value twice is redundant and will possibly conflict.
The sizes attribute must not be specified on link elements that do not have a rel attribute that specifies the icon keyword
The sizes attribute is only allowed to be used to specify the dimensions of an icon referenced by the link element.
For each Document, there must be no more than one time element with a pubdate attribute that does not have an ancestor article element
A document can only have a single publication date. If other time elements contain publication dates, they must each be inside a unique article.
For each article element, there must be no more than one time element child with a pubdate attribute
Another duplication error. Each article can only have a single publication date.

Map element errors:

Duplicate map name 'X'
Two or more map elements have the same name attribute value, but each must be unique.
The id attribute on the map element must have the same value as the name attribute
Just one of those quirky things that must be true.

Form element errors:

A select element whose multiple attribute is not specified must not have more than one descendant option element with its selected attribute set
If you can pick only one option, it doesn’t make sense to specify that two or more are set by default.

Audio/video errors:

The track element label attribute value must not be the empty string
The label is used to announce the track type to the readers, so it cannot be empty.
There must not be more than one track child of a media element with the default attribute specified
As its name suggests, the default attribute is used to indicate which track to use when no reader preference is available. Specifying more than one default defeats the purpose of the attribute.

Referencing errors:

The X attribute must refer to an element in the same document (the ID 'Y' does not exist)
Some elements must reference other elements in the document. The for attribute on a label element, for example, must reference the id of the form element it labels.
The X attribute must refer to elements in the same document (target ID missing)
This error is the same as the last, but occurs when an attribute can reference more than one other element. There are a number of ARIA attributes that can reference multiple elements (aria-describedby, aria-labelledby, aria-controls, etc.). Check that each reference can be resolved.
The X attribute does not refer to an allowed target element (expecting: Y)
The attribute references another element, but it is the wrong kind of element. To use label again, it would be incorrect for it to point to anything but a form element.

The following errors impose additional restrictions on element and attribute usage that could not be enforced through the structural schema validation stage:

The X element must have a Y attribute
This error occurs if the bdo element does not include a dir attribute.
The X element must not appear inside Y elements
This error occurs if you attempt to embed one element inside another where it would make no sense or would cause rendering issues, such as an audio/video tag inside another audio or video, an address inside an address, etc.
The X element must have an ancestor Y element
This error occurs when an element is found outside of its expected ancestor. This error specifically occurs when an area tag is found outside of a map and when an image map is not wrapped inside of an a tag.

epub:type property errors:

Undefined property: X
If a property in the epub:type does not have a prefix, it must be defined in the EPUB Structural Semantics vocabulary.
Undeclared prefix: X
You have used a prefix that has not been declared in the prefix attribute on the root html element.

SSML errors:

The ssml:ph attribute must not be specified on a descendant of an element that also carries this attribute
When you use the ssml:ph attribute, the pronunciation is used in place of the text content of the element it is attached to. If you include an ssml:ph on a descendant element, it will never be announced.

CSS errors:

The scoped style element must occur before any other flow content other than other style elements and inter-element whitespace
When adding CSS style definitions scoped to the current element, the style element must be the first child. This can be problematic when scoping styles for figures, as it is invalid to include the style element before a figcaption at the start of the figure.

Alt style sheet errors:

Conflicting attributes found: X
You’ve specified both horizontal and vertical or night and day.

Entity errors

The following errors occur when you include malformed entities:

The entity &xyz; was referenced, but not declared
You need to change the referenced named entity to a numeric one.
The entity name must immediately follow the '&' in the entity reference
You have an & in your document that needs to be changed to &amp; to be valid.
The reference to entity X must end with the ';' delimiter
You’re missing a semicolon at the end of an entity.

Note that the following error is not related to the use of entities in your content:

External entities are not allowed in XML. External entity declaration found: %OEBEntities

If you receive this error, you need to remove an XHTML 1.1 DOCTYPE declaration from the invalid file. The current version of epubcheck now handles this issue with a more meaningful message:

Obsolete or irregular DOCTYPE statement. External DTD entities are not allowed.
Use '<!DOCTYPE html>' instead.

Style

Although epubcheck does include some minimal CSS verification checks, it does not perform true CSS validation. It will not tell you if you’ve entered property names or values incorrectly or even if you have malformed syntax (incorrect shorthands, missing brackets on definitions, semicolon delimiters at the end of properties, etc.).

Integration of a CSS3-compliant validator is in the future for the program, but is not possible as of this writing. At this time, all you’ll be notified of is the presence of properties that are not recommended by the specification (e.g., using position: fixed).

Scripting

A structural validator like epubcheck cannot assist you in finding errors in your code, because it does attempt to interpret any JavaScript you’ve included. To verify that your code will work as expected, you need to test it in reading systems. Debugging code within reading systems will present its own challenges, unless error reporting materializes with scripting functionality.

It may be possible to validate the code used in individual content files by opening them in a browser, but if your code depends on any EPUB extensions (such as the epubReadingSystem object), you won’t get a useful result. Restrictions on access to the DOM, to the Internet, etc., are only going to be verifiable within reading systems, but at least with these you can sprinkle your code with debugging alerts to verify what is happening under the hood.

Accessibility

Although verifying the structural integrity of your publication is a great first step in ensuring its ability to be read, it is only a small part of what is required to verify that a publication is accessible. A validator can typically only tell you if the markup you’ve used is valid. What it can’t do is tell you if you could have done things better.

epubcheck is no different. It cannot determine for you if content should be part of the logical reading order or not, for example, because that distinction can only be verified by humans. Whether or not elements have been appropriately used is likewise not something epubcheck is good at determining. The list goes on. EPUB 3 and Accessibility covers a number of issues, but there is more to making accessible content than can be covered in one article.

To help assist in understanding the issues and verifying that you have met key accessibility criteria, the IDPF is developing reference accessibility guidelines that can be used in conjunction with this article, with the goal of providing broader coverage of the many issues involved. The site also provides a quality assurance checklist, including the ability to dynamically generate a checklist depending on the features your EPUB contains.

The idea of creating an interactive validator for EPUB 3 content has been bandied about since the EPUB 3 specification was finalized, but as of this writing, work on one has not begun. The DAISY Consortium is currently investigating how to improve epubcheck to report accessibility issues and/or create this new tool, and seeking funding for the work, so there are positive signs that accessibility checking will improve in the near future.