Adapted with permission from EPUB 3 Best Practices (O'Reilly Media)
To paraphrase a common expression, there are three things you need to know about a finished EPUB 3 file: it must adhere to rules, rules, and damned rules.
Which is only to say that there is nothing unique about EPUB 3 as a document publishing format. The rules are there to ensure that your content can be opened and rendered by any reading system. They can’t tell you how your content will look on any given reading system, but they can alert you to bugs that are the result of bad markup. If you skip the validation stage and assume that just because it seemed to be fine testing in a reading system, or a program exported it so it must be valid, you run the risk of a lot of wasted time and effort later.
Some vendors will prevent your file from being distributed if it doesn’t validate (which is a good thing), in which case you’ll be forced back to this step right away should you try to avoid it. Others may not, or you might distribute the file yourself, in which case it might only be as you get flooded with angry emails from customers that you’ll learn all the things you did wrong. Once your reputation is tarnished, even if just in a comments section on a product page, it can be hard to get back. No one appreciates someone who hasn’t bothered to do basic validation to ensure their content renders, after all.
This article diverges slightly from the best practices pattern of other pieces on EPUB 3. The best practice is simply to validate your content. Instead, this article looks at how to get up and running with the
epubcheck validation tool and then spends some time looking at some of the most common error messages you’re likely to encounter, including breaking down where they come from in the validation process.
The epubcheck tool is the gold standard as far as EPUB 3 validation goes. The tool has been around since the early EPUB 2 days. It was originally developed by Adobe but is now maintained as an open source tool by the IDPF. It’s free to use and modify as you need.
It has also improved significantly both in terms of the scope of what it checks for and the comprehensibility of the error messages it returns, part of a major upgrading it has undergone in conjunction with the release of EPUB 3. There’s still work to be done to add CSS and scripting support, but it’s come a long way from where it was.
Before downloading epubcheck, you will need to verify that you have Java installed on your computer. Any version of the Java Runtime Environment will do, which is the version that gets installed when you install Java for your browser (available from the Java website).
To simplify calling Java from the command line, you will need to add the path to the Java executable to the
PATH environment variable. On Windows machines, for example, this path is typically
c:\program files\java\jre7. Instructions on how to add this variable are operating system dependent, but plenty of resources exist on the Web. You can omit this step, but it means manually entering the full path to Java every time you want to run epubcheck.
epubcheck is currently hosted on a Google projects site under the same name. The latest stable build is typically linked to from the main page, but can also be found by clicking on the Downloads tab.
epubcheck does not have an installer, but instead comes as a ZIP file containing the necessary libraries to run. After downloading, simply unzip the contents to a directory in your operating system. The folder will contain the epubcheck .jar file and a directory called lib, which contains additional libraries that epubcheck depends on to run, as shown in Figure 11-1.
That’s it. You now have epubcheck installed on your computer.
epubcheck is a command-line tool, meaning that you’re going to have to become familiar with your operating system’s command shell. If you’re already familiar with the command line and how to run Java, you can skim this section to get the command line call. If not, the first task is bringing up the command shell:
- Windows users, click the Start menu button and type
cmdin the Run box (XP users) or the new search box at the bottom of the Start menu (Vista and Windows 7).
- Mac users need to go to the Applications/Utilities directory and double-click on Terminal.
- Linux users may find the shell in a number of different places, and under a number of different names, depending on the flavor and version of Linux they are running.
One of the nuisances of a command-line tool like epubcheck is entering all the necessary paths in order to get it to run. Adding the
java executable location to the
PATH variable allows you to call it without having to type the full directory path, but what directory you invoke epubcheck from will affect the other paths you have to specify.
If you try to run epubcheck from the default directory your command shell opens in, you’ll need to add the full path to both the epubcheck .jar file and your EPUB:
$java -jar c:
If you change directories in the command shell to the epubcheck directory, you can avoid having to specify the full path to the .jar file:
$java -jar epubcheck.jar c:/books/mybook/xyz.epub
Conversely, if you navigate to your book directory, you just have to specify the path to the .jar file.
The actual epubcheck .jar file typically has a build number appended to the end of it (e.g., epubcheck-3.0-RC-1.jar). This build number will be omitted from the examples in this article, because it is subject to change.
Either way is a nuisance, but you can use a couple tricks to speed things up. The simplest is to use the autocomplete feature that most command shells provide. If you start typing the name of a file or directory, you can press the Tab key to fill the name in automatically. For example, to quickly insert the epubcheck .jar file, you could start by typing this:
$java -jar c:
Pressing the Tab key should expand the directory to epubcheck (if you had another directory in your root drive starting with ep, simply press the Tab key again to rotate through the possible options). You can then repeat this shortcut to add the .jar file. Because there is only one file starting with the letter e in the epubcheck folder, again you could type the one letter
$java -jar c:
Then press the Tab key to expand to the full .jar file name.
If you don’t like typing at all, another option is to open both the epubcheck and book directories first (e.g., in a My Computer or Windows Explorer window on Windows, or a Finder window on Macs). You can then drag and drop the files into the command shell. For example, first type the Java commands:
Then drag the epubcheck .jar file onto the Terminal window and drop it. The full path to the file will be automatically inserted:
$java -jar C:
You could then do the same to add the EPUB file to validate.
A final option is to create a script to automatically run epubcheck for you. On Windows, create a new text file containing the following command:
$java -jar c:
Save this file as epubcheck.bat in the epubcheck directory. On Linux and Macs, an equivalent shell script might be:
#!/bin/shjava -jar ~/epubcheck/epubcheck.jar
Save this file as epubcheck.sh.
You can now add the epubcheck folder to the
PATH environment variable, as you did earlier for the Java executable. If you close and re-open your terminal window after making this change, you can now invoke epubcheck from any directory simply by typing the name of the file you just created, as shown in Figure 11-2.
To validate a file, all you need to do now is specify its path after the script filename, regardless of what directory your terminal window initializes in:
Again, you could drag and drop the EPUB file if that’s simpler.
One last trick you can use to improve the command-line experience is to pipe the output to a file for easier reading. Command shells are awfully little windows to try to read error messages in, and flipping between the window and your content to find and understand the problems quickly becomes a headache. Depending on how the command shell is configured, and how many errors and warnings your book has, you may not even be able to scroll back to the beginning of the report, meaning the most critical error might no longer be discoverable.
You aren’t restricted to working in the command shell, though. To pipe errors to a file, you add the number
2 followed by a right angle bracket (>) to the end of the command that invokes epubcheck, and then include the path and name of the file to write to.
For example, to pipe errors to the file
c:/books/error.txt, invoke epubcheck like this:
$java -jar epubcheck.jar c:/books/mybook.epub 2> c:/books/error.txt
As long as you are working with a text editor that automatically updates open files, you should be able to run the command over and over and immediately see the new results. The command shell window in Figure 11-3 shows only the information written to standard output. Errors are listed in the specified text file.
This section quickly reviews the different ways you can call epubcheck to validate EPUBs.
Validating EPUB archives
The typical use for epubcheck is to validate a completed EPUB archive. To do so, simply include the path to your EPUB after invoking the epubcheck .jar file:
$java -jar epubcheck.jar c:/books/mybook.epub
Make sure there are no spaces in the directory path to your EPUB or in the filename itself. If there are, you must enclose the entire path in quotes:
$java -jar epubcheck.jar
"c:/Users/matt/My Documents/EPUBs/My Book.epub"
or URI-escape the spaces as
$java -jar epubcheck.jar c:/Users/matt/My%20Documents/EPUBs/My%20Book.epub
If you forget to do this, the following cryptic error is generated:
$java.lang.RuntimeException: For files other than epubs, mode must be
epubcheck will interpret the path as three separate arguments because of the spaces:
c:/Users/matt/My", "Documents/EPUBs/My and
Book.epub. Because the first part of the path does not appear to be an EPUB, since it has no extension, epubcheck will report that error and stop processing. The next couple of sections demonstrate what the
mode argument does.
Validating unpacked EPUBs
Although most people reach the validation stage only at the very end of a project, when they have an archive file for distribution, it’s not the only workflow that epubcheck can handle. Being able to work on the unzipped files is extremely helpful, and if you have a folder containing the full structure of your EPUB (mimetype file, META-INF directory and content), you can run epubcheck on it using the
mode argument as follows:
$java -jar epubcheck.jar c:/path/to/book -mode exp
exp value is short for expanded, which doesn’t mean that epubcheck will run more tests, just that the input is an unpacked EPUB. This feature saves you from having to zip up your content each time you fix an error in order to see whether your publication will successfully validate.
A related, and largely unknown but extremely useful, feature of epubcheck is the ability to generate an EPUB archive after successful validation of an unpacked directory. If epubcheck returns a successful report (no errors, only warnings), you can request that it also zip up the directory contents by adding a
save argument to the command:
$java -jar epubcheck.jar c:/path/to/book -mode exp -save
If all goes well, you’ll find a finished .epub file in the directory where you ran the command. epubcheck will use the folder name containing your publication for the finished file.
Note that if you get the following error message, it means that you’re working in a directory where you can’t write the finished archive file:
java.lang.NullPointerException at com.adobe.epubcheck.util.Archive.createArchive(Archive.java:102) at com.adobe.epubcheck.tool.Checker.run(Checker.java:188) at com.adobe.epubcheck.tool.Checker.main(Checker.java:177)
The windows command shell initializes by default in the write-protected Windows\system32 folder, for example. If you change the current directory to one where you have write permissions, the process will run smoothly. Linux and Macs typically start in the user’s home directory, so this error should be less common, but if you can’t find the file after epubcheck builds it, always check from the directory in which you ran the command.
Validating EPUB component files
You also have the option to validate individual component files using epubcheck (e.g., to validate content before going through the process of zipping your content up into a distribution archive).
To invoke epubcheck on individual files, you need to add the following two arguments to the command line:
The type of file that is being validated. The value must be one of the following:
- Media overlays
- Navigation document
- Package document
- SVG content document
- XHTML content document
- The version of EPUB that the file conforms to. The value can be either
To validate a navigation document, for example, you’d invoke the following command:
$java -jar epubcheck.jar nav.xhtml -mode nav -version 3.0
Although you can use any
mode for EPUB 3 validation, only
xhtml can be used to validate EPUB 2 content.
A new experimental option has been included in the latest version of epubcheck: the ability to generate an assessment report. These reports are XML files that not only contain the errors and warnings generated by epubcheck, but also provide various metadata about the EPUB, such as the Dublin Core metadata properties that have been set, the language of the publication, and what properties are known about its content (e.g., that it contains audio, video, MathML, script, etc.).
To generate a report, you must use the
-out argument followed by the file to write the assessment to:
$java -jar epubcheck.jar c:/path/to/book -mode exp -out c:/reports/book.xml
At the time of this writing, the report format was not documented on the epubcheck site, but it is described as an extension of the documentMD format. Each report contains a root
doc element, which always contains a child
document element. This element lists the extracted information:
documentInformationelement lists the filename (
fileName) of the EPUB followed by all Dublin Core properties found (each listed in an element corresponding to its local name).
formatDesignationelement lists the EPUB mime type and version number (
assessmentInformationelement indicates whether the validation run was successful or not (
outcome). If warnings or errors are reported, each message will be included in an
outcomeDetailNoteelement (the type of message is not identified in the markup, but can be determined by the presence of
ERRORat the start of the element).
characterCountelement provides the total character count of all text data.
Languageelement provides the language of the publication as set in the package document.
- Zero or more
Fontelements list all embedded fonts.
- Zero or more
Referenceelements list all the external links and references.
- Zero or more
Featureselements list all the unique properties of the content, as defined in the
propertiesattributes on manifest entries.
Here’s an example of a condensed assessment report:
<title>Accessible EPUB 3
<outcomeDetailNote>ERROR: FreeSerif.otf: resource missing
Although these reports are primarily designed for automated workflows, they provide an interesting peek into your EPUBs.
That’s as deep as we’ll go into this feature, though, because it’s still an early experiment and may have changed by the time you read this article. The epubcheck site should be updated to include more information as the report format is formalized, so you can check there for changes.
If you’re ever in doubt about how to call epubcheck or want to verify whether features are still supported or new ones have been added, you can request a help listing from the program. Simply add the
help argument after calling the .jar file:
$java -jar epubcheck.jar -help
You should get information about the program and a listing of options similar to the following:
Epubcheck Version 3.0-RC-1 When running this tool, the first argument should be the name (with the path) of the file to check. If checking a non-epub file, the epub version of the file must be specified using -v and the type of the file using -mode. The default version is: 3.0. Modes and versions supported: -mode opf -v 2.0 -mode opf -v 3.0 -mode xhtml -v 2.0 -mode xhtml -v 3.0 -mode svg -v 2.0 -mode svg -v 3.0 -mode nav -v 3.0 -mode mo -v 3.0 // For Media Overlays validation -mode exp // For expanded EPUB archives This tool also accepts the following flags: -save = saves the epub created from the expanded epub -out <file> = ouput an assessment XML document in file (experimental) -? or -help = displays this help message
Now that you have a grasp on how to invoke epubcheck to run a validation report, the next challenge is reading the error reports that come back from it. Later sections of this article will get into much more detail about what the errors themselves indicate, but this section looks at how to make sense of all the information that gets reported to simplify tracking down and correcting errors.
A typical message from epubcheck follows this basic pattern:
[ERROR|WARNING]: [file](line,offset): Message
The following is a sample error message that results if a closing quote character is omitted from a
class attribute, for example:
ERROR: c:/epub/accessible_epub_3.epub/EPUB/ch01.xhtml(10,44): The value of attribute "class" associated with an element type "section" must not contain the '<' character.
Here you can see that this is an error (must be fixed to pass validation), that it is in the file /EPUB/ch01.xhtml inside the EPUB archive c:/epub/accessible_epub_3.epub, and that the error has been found 44 characters into line 10. Even if you don’t have an XML-aware editor, jumping to the exact line and character offset should be easy to do in any text editor.
You may not always get file, line, and offset information, depending on the problem. When epubcheck verifies that all items listed in the manifest are in the archive, it does not maintain information about the original package document XML. Consequently, if you have an entry for a nonexistent file, you’ll get an error like this:
ERROR: c:/epub/accessible_epub_3.epub: OPS/XHTML file EPUB/pr01a.xhtml is missing
This is when being able to interpret where errors are coming from and what they mean is going to be critical. You need to know that all your files are listed in the package document manifest to even begin figuring this kind of message out.
You may also find that the line and character offsets seem misleading. If you were to forget a closing
aside tag early on in your file, it may not get reported as an error until the containing
section gets closed:
<-- forgot a closing tag here on line 22
<-- but error reported here on line 196
The error message resulting from this tagging might be as follows:
ERROR: c:/epub/accessible_epub_3.epub/EPUB/ch01.xhtml(196,3): The element type "aside" must be terminated by the matching end-tag "</aside>".
People new to validation typically want to know why the error location isn’t reported on the opening tag to simplify fixing the problem, but you have to bear in mind that there is no problem with the opening tag. The problem is with the closing tag, or lack of one before the
section closes, and that doesn’t occur until line 196 in this case. The validator does not backtrack to the opening tag to report where the
aside opened, because validators simply report what is wrong. For all the validator knows, you simply forgot the end tag at that point.
Part of validating is doing the sleuthing to find where these kinds of problems originate. Just hope that there aren’t a lot of
asides in your file, because you’ll have to check each one in turn to find the broken one! An
aside can contain another
aside, like a
div can contain a
div, so the error could take a bit of time to track down.
Beyond the Command Line
Running epubcheck from the command line is not the only option available. Integrating the library more seamlessly into internal workflows is an option, of course, but requires developer help. For those who don’t have those kinds of resources available, this section reviews a few other options that can simplify the validation process.
The IDPF currently maintains a web-based version of epubcheck at http://validator.idpf.org/. To run the validator, you simply select your EPUB file and click the Validate button on the page, as shown in Figure 11-4.
The current version of epubcheck also powers this web service, but instead of command shell error output, you receive messages in the more human-readable table format shown in Figure 11-5.
The web results make it simpler to identify the error type, file, line, and character offset of the reported problem, but this information is the same as is provided in the command shell results shown previously.
Unfortunately, the web interface is not for use by anyone doing commercial validation, and it also has some limitations that work against it even for users who meet the use criteria. For one, you are capped to a maximum file size of 10 MB. While this is not going to be problematic for simple text works, any publication with images, audio and video content, or embedded fonts will quickly go over the cap. It can also be a nuisance, and waste of bandwidth, to continuously upload your EPUB over and over to the IDPF server in order to have it validated. It’s not the fastest or most effective use of time depending on how big your EPUB is. Learning to use epubcheck from the command line is a better long-term strategy.
The source for the web service is also not available for general download as of this writing, but it could be made available at a later date once the validator moves out of its beta phase. Installing the service locally, whether on an individual PC running a web server or in a corporate environment, would greatly simplify the validation process for anyone wanting to avoid command line and/or commercial options.
It is possible to use the .jar file to create your own web service, but you would have to add a layer to it to parse and format the results in order to provide equivalent table markup.
A much-desired feature for epubcheck has been to add a graphical interface to simplify the whole process we’ve just gone through of configuring programs and paths and selecting files. Unfortunately, at this time, it remains a much-desired feature. The developers are aware of the need, so stay tuned.
Although the process to manually call epubcheck can seem tedious, especially if you aren’t a developer who is regularly in the command shell, there are programs that natively integrate epubcheck and/or can be configured to run external tools like epubcheck from within them.
Prime among these is oXygen Editor (shown in Figure 11-6), which has native support for EPUB 2 and 3 markup editing. oXygen allows you to drag and drop your EPUB archive directly into the program, enabling editing of the content files without having to unzip. It also includes built-in support for the latest epubcheck validator, so all you have to do is click a button to validate your archive. It is also nondestructive, in that it will not modify your source markup when saving and validating.
Perhaps the most useful feature that oXygen provides is the ability to jump directly to the listed error. By double-clicking on an error in the result pane at the bottom of the program, the file will be automatically loaded (if not already open) and jumped to the corresponding line. oXygen also shows validation errors in red on the side of the text editor, enabling quick location and correction.
It’s somewhat disheartening to discover that the program you used to author your EPUB has generated invalid content, but it’s not atypical. There often aren’t straight 1:1 mappings when dealing with export routines that go from an internal layout format to a distribution like EPUB. Adding to that, developers often try to help these processes through heuristic and natural language parsing tricks. The resulting content may appear to be okay in a reading system, but tag soup is not just invalid to the theoretical purity of specifications but causes real-world problems for anyone using the markup to navigate.
This article can’t possibly be a reference to every single error that you might encounter in every technology that EPUB incorporates, but this section will walk through the main validation stages and look at what can go wrong. Hopefully, with a sense of what epubcheck is doing, even if you can’t find your particular problem here, you’ll find some hints to where you should be looking.
The other consideration is that error messages change over time, with the hope of making them easier to understand. The obvious result is that the error messages you find in the following sections may not exactly match what epubcheck reports depending on the version you’re using.
If there is one best practice to give when it comes to understanding errors (one learned from many years validating markup data), it’s to always start with the first error reported. Validators don’t generally stop at the first problem they find, and the result can be many, many erroneous errors that are simply related to the first problem (e.g., forgetting to include a closing tag can cause every following element to be reported as invalid).
A second, closely related tip, is to validate often. The way that errors cascade can result in some odd issues appearing in your report, so never assume you can always pick out which ones are related to an earlier problem and which ones are unique. That’s how you spend time searching for improbable solutions to problems that didn’t actually exist. Running validation reports is quick and free, so when in doubt, run the report again.
And finally, note that validators are not infallible. You might find errors being reported that shouldn’t be, but there’s a difference between an incorrect check of a specification requirement and not being sure what an error means. If you are unsure whether it is the validator that is wrong or your understanding of the message, seek assistance. The IDPF forums are a friendly venue where you can ask for help deciphering error reports.
Common XML Errors
As EPUB is a predominantly XML-based format, there are a number of common errors that get reported across document types. If a document is not well formed, or does not meet schema requirements, the error message does not change, only the element and attribute names. Rather than list the same issues over and over, this section will tackle these problems once.
Each XML file must have a single root element (e.g., for XHTML documents, this is the
html element). epubcheck will generate the following errors if it finds XML files that aren’t conformant to this requirement:
Content is not allowed in prolog
- This error occurs when you have text content before the root element. Only the XML declaration, processing instructions, and doctype declarations can precede the root element. It may also be a sign that you’ve accidentally specified a text file with an XML media type in the manifest.
Content is not allowed in trailing section
- This is the opposite error, where you have text or markup content after the closing root tag.
If your markup is not conformant to the schema for a given document type, you’ll receive the following errors:
Element X not allowed here; expected the element end-tag, text or
This is probably the most common element error you’ll encounter, and
A,B,Cusually ends up being a wildly long list of alternative elements. This error can occur either when you’ve used an element where it’s not allowed (e.g., putting a
divinside of a
pin a content document), or have accidentally forgotten to close an element (e.g., omit a closing
</p>tag and every following sibling block element will register as an error).
This error can also indicate that you’ve inserted an element out of order. The
figcaptionmust be the first or last element in a
figure, for example. If you place it anywhere else, you’ll get this error on the elements that follow it. The same applies to table markup. The order of the major divisions in the package document is also enforced (
This error can also occur if you forget a namespace. MathML and SVG in HTML5 do not require namespaces, for example, so if you forget to declare one, or copy an HTML5 example from the Web, the element might be reported as invalid.
And finally, make sure that you’re using lowercase element names. XHTML is case sensitive, so you cannot use element names like
SECTION. If you do, you will also receive this error that they are not allowed. All elements names in the package document are also lowercase. It would be nicer if a distinction could be made between elements that aren’t defined and elements that aren’t allowed, as used to be the case, but that’s a limitation of the RelaxNG schemas you just have to work around.
Element X incomplete; expected A,B,C
- This error often occurs in the package metadata if you omit one of the three required Dublin Core elements. XHTML content documents don’t have a lot of requirements, but they do exist (e.g., the
rubyelement requires at least one child
rt). You’re more likely to encounter this error when you add MathML or SVG to your content documents, as there tend to be more dependencies.
The prefix X for element Y is not bound
- You’ve used a prefix on an element without declaring it (e.g.,
dc:on the package metadata elements without declaring
xmlns:dc="http://purl.org/dc/terms/"). XML declarations are often included on the root element but can be scoped to the most relevant element (e.g., the Dublin Core namespace is typically declared on the
metadataelement, not the root
packageelement, because Dublin Core elements are not used outside of the metadata section). This error also occurs in content documents when MathML and SVG embedded without a namespace declared.
Element X missing required attribute Y
- The specified attribute cannot be omitted. An example is the
unique-identifierattribute on the
packagedocument. In content documents, forgetting
srcattributes is often the cause of this error.
Element type X must be followed by either attribute specifications, ">" or "/>"
- This error occurs either when you’ve omitted a closing quote character on an attribute or have forgotten the closing angle bracket on the element.
Likewise, if you use attributes improperly, schema validation will return the following errors:
Attribute X not allowed here; expected attribute A,B,C
- One error type that attributes share with elements is being used in the wrong place. It’s not valid to use a
aelements anymore in HTML5, for example. Attributes are also case sensitive, which can cause this error.
The prefix X for attribute Y associated with an element type Z is not bound
- Forgetting to declare namespaces is another shared issue. If you receive this message, you don’t have an in-scope namespace declaration. This problem typically occurs when using the
epub:typeattribute without declaring the namespace on the root
htmlelement, for example.
Duplicate ID X
- In this case, you have two or more
idattributes in the same file with the same value. You will need to manually inspect the attributes to determine which one needs to be changed.
Value of attribute "id" is invalid; must be an XML name without colons
- This error most often occurs when an
idattribute value is numeric (e.g.,
id="1"), begins with a number, or contains invalid characters. Although HTML5 has relaxed the restriction that all
ids start with an alphabetic character, other XML formats allowed in EPUB 3 must still conform to this naming.
The value of attribute X associated with an element type Y must not
contain the '<' character
- This error may indicate that you’ve included a left angle bracket character in an attribute, but more often is an indication that you missed a closing quote character on an attribute (i.e., the validator sees the next tag as part of the attribute value).
All XML formats defined by the EPUB specification, including XHTML content documents, must be encoded as UTF-8 or UTF-16. The following errors may occur if your documents do not conform:
Only UTF-8 and UTF-16 encodings are allowed, detected X
- Verify that the file is actually encoded as UTF-8 or UTF-16 (don’t trust the XML declaration).
Malformed byte sequence: X. Check encoding
- This error typically arises when content in one encoding is pasted into a document encoded in another, but can also occur if you transcode your content from one character set to another. It indicates that there is a sequence of bytes that don’t conform to the Unicode specification, so they cannot be resolved to a character. When you view the file, you may not see anything at the location, as the malformed byte may not show as character data. epubcheck does not provide more detailed information, so to find the exact location, you’ll typically need to open the invalid file in an XML editor that can report the exact location.
Any Publication Resource that is an XML-Based Media Type must be a
conformant XML 1.0 Document. XML Version retrieved: #
- You cannot use XML version 1.1 for XML content. If you’ve included an XML declaration at the top of your file, make sure that the version pseudoattribute is set to
Note that CSS style sheets must also be encoded as UTF-8 or UTF-16. If you create your CSS files as plain ASCII text files, you should not receive an error. The ASCII character set maps to the same range of characters in UTF-8, so all ASCII text files are valid UTF-8 files.
One of the handier features of epubcheck is that it will verify all internal links to see if they can be resolved, and report problems if not:
'X': referenced resource missing in the package
- You’ve attempted to link to the file X, but a matching resource could not be found in the container. Check that the resource exists and that there is an entry for it in the package document manifest.
'X': fragment identifier is not defined in 'Y'
- The file Y could be located, but there isn’t an element inside it with the
X. Typos and renamed IDs are the most common cause.
Container errors can be some of the most perplexing to solve, because they often arise as a result of the way the content has been zipped up. In order to ensure that your EPUB can be opened and the content discovered, you need to ensure that there are no problems with the packaging. To that end, epubcheck verifies that your EPUB meets all the following conditions:
File name contains characters disallowed in OCF file names: X
- See section 2.4 of the OCF specification for a list of characters that must not be used in your EPUB filename or any files in it.
Filename contains spaces. Consider changing filename such that URI
escaping is not necessary
- This message is actually just a warning. It is generated because it’s possible that a poorly designed reading system might break if there are spaces in your file names (e.g., failing to encode the spaces properly as
%20), not because problems are expected.
File name contains non-ascii characters: X. Consider changing filename
- This message is also a warning, similar to the preceding one. Although modern operating systems have no issue with non-ASCII characters in filenames, processing tools sometimes do. If you are targeting older EPUB 2 reading systems, this may be a concern, but it should not affect your decision to use these characters in EPUB 3.
Filename is not allowed to end with '.'
- Ending filenames with a dot is a little more serious, so this is an error. Some operating systems do not handle filenames so named, which can break rendering.
Corrupted ZIP header
- Occurs if the container does not begin with the string
PK(i.e., it is not a valid ZIP file).
Cannot read header
- Some form of file corruption has occurred that is preventing the ZIP file from being read.
Length of first filename in archive must be 8, but was #
- If the first filename found in the archive is not eight characters, it cannot be the required mimetype file. If you manually zip your archive, you must add the mimetype before you add any other files.
Mimetype entry missing or not the first in archive
- This error is rare but can occur if the first file is eight characters long (to get past the previous check) but is not the mimetype file. Again, check how the archive has been zipped.
Extra field length for first filename must be 0, but was #
- Indicates that there is character data between the mimetype filename and its content (the extra fields are being used). This error can occur if the program you’ve used to zip the container adds additional metadata.
Mimetype contains wrong type (application/epub+zip) expected
- Ensure that the media type has been typed correctly.
Mimetype file should contain only the string "application/epub+zip"
- Ensure that there are no extra spaces or linebreaks in the file.
epubcheck will also verify that the package document can be located by a reading system. The following errors indicate problems with this discovery process:
Required META-INF/container.xml resource is missing
- Somewhat self explanatory. The container.xml file is a required file in the META-INF directory, because it identifies the path to the package file.
No rootfiles with media type 'application/oebps-package+xml'
- The container.xml file was found, but it does not contain an entry for the package file (check the
mediaTypeattribute value correctly matches the one in the error).
Entry X not found in zip file
- The package document could not be found at the location specified by the
If you think of validation as a progression through Dante’s rings of hell, if your content is packaged properly, the next ring you’ll find yourself in is whether your package document has been properly constructed.
The following metadata problems will be reported:
unique-identifier attribute in package element must reference an
existing identifier element id
unique-identifierattribute does not point to the
idvalue of a
dc:identifierelement in the metadata section.
character content of element "X" invalid; must be a string with length at least 1
- All metadata must be at least one character in length (whitespace does not count). This error indicates that an empty value was found.
Package dcterms:modified meta element must occur exactly once
dcterms:modifiedproperty indicates the last modification date of the EPUB, and is used to create the publication identifier, so only one can be included in the metadata.
dcterms:modified illegal syntax (expecting: 'CCYY-MM-DDThh:mm:ssZ')
- Check the time and date specified in the
dcterms:modifiedproperty matches the specified format. You cannot use abbreviated dates or omit the timestamp.
@refines missing target id: 'X'
- If the
refinesattribute begins with a hash (
#), it must reference the ID of an
itemin the manifest. This error occurs when a match cannot be found.
epubcheck will also check for problems with resources that should, or shouldn’t, be listed in the manifest:
X file Y is missing
- You have a reference to a missing resource. X identifies the type of file, but is not the media type (OPS/XHTML is used for XHTML content documents, image is used for any image type, etc.)
item (X) exists in the zip file, but is not declared in the manifest file
- Warning that you have a file in the container that is not listed in the manifest.
'http://X/Y/Z': remote resource reference not allowed; resource must be placed in the OCF
- This error indicates that you’ve invalidly referenced a resource outside the container (e.g., a file on the Web from an
objecttag). In older versions of epubcheck, this error was also emitted when remote audio/video clips were not listed in the manifest.
epubcheck will also emit the following errors if resources don’t match the information supplied about them:
Item property: X is not defined for: Y
- You’ve attached a
propertiesattribute value to a file type to which it doesn’t belong (e.g.,
mathmlon a JPEG)
This file should declare in opf the property: X
- This message occurs when a content document contains a feature that hasn’t been declared in the
propertiesattribute (e.g., scripting).
This file should not declare in opf the properties: X
- The listed properties cannot be verified and should be removed from the
propertiesattribute for the item.
Exactly one manifest item must declare the 'nav' property (number of 'nav' items: #).
- You didn’t specify the navigation document or specified more than one.
Multiple occurrences of the 'cover-image' property (number of
'cover-image' items: #).
- Similarly for the cover image.
Object type and the item media-type declared in manifest, do not match
- The media type declared in the
media-typeattribute on the manifest entry does not match the
typeattribute specified on the
objecttag in the content file.
epubcheck will also alert you if you haven’t provided a core media type fallback for a foreign resource:
Manifest item element fallback attribute must resolve to another
manifest item (given reference was 'X')
- The ID referenced in the
fallbackattribute does not point to another
item. Check for a typo and that the fallback hasn’t been removed.
Spine item with non-standard media-type 'X' with no fallback
- You’ve referenced a file that is not an XHTML or SVG content document from the spine without providing a fallback to one of those two.
Spine item with non-standard media-type 'X' with fallback to
- Again, you’ve referenced a foreign resource from the spine, but this time the only fallback found is to another foreign resource.
Circular reference in fallback chain
- Each fallback in a fallback chain must be to a unique resource. If one resource in the chain references another earlier in the chain, you end up in an endless loop of incompatible formats.
When adding media overlays, the required metadata is also verified:
Media overlay items must be of the 'application/smil+xml' type (given type was 'X')
- Media overlays are SMIL files, so you must make sure the correct media type has been given (i.e., not
Item media:duration meta element not set (expecting: meta property=
- When attaching a media overlay to a content document, you must add a
metaelement indicating the total audio duration for that document. This error indicates that this property is missing.
Global media:duration meta element not set
- You must include a
metaelement with no
refinesattribute containing the cumulative time of all the individual overlays.
And finally, there are a couple of possible errors tested for if you include an NCX for rendering in EPUB 2 reading systems:
spine element toc attribute must reference the NCX manifest item
(referenced media type was 'X')
- The value of the
tocattribute on the spine must be the same as the
idattribute on the manifest entry for the NCX file.
spine element toc attribute must be set when an NCX is included in the publication
- If you include an NCX, you must add a
tocattribute to the
Now, moving to the stage of validating content, providing lists of checks and error messages becomes more difficult. Although the EPUB 3 specification imposes some requirements and restrictions, error messages are more likely to come from the underlying technologies it employs.
Fortunately, many of the issues you’ll run into at the content level are similar in nature. In XHTML and SVG content documents, most errors are related to the invalid use of markup we covered in the generic XML errors section.
The following XHTML-specific errors may be reported by epubcheck.
General document and header errors:
The lang and xml:lang attributes must have the same value
- When using both language attributes on the same element, their values must match.
There must not be more than one meta element with a charset attribute per document
- A document only has one character encoding, so specifying the value twice is redundant and will possibly conflict.
The sizes attribute must not be specified on link elements that do not have a rel attribute that specifies the icon keyword
sizesattribute is only allowed to be used to specify the dimensions of an icon referenced by the
For each Document, there must be no more than one time element with a pubdate attribute that does not have an ancestor article element
- A document can only have a single publication date. If other
timeelements contain publication dates, they must each be inside a unique
For each article element, there must be no more than one time element child with a pubdate attribute
- Another duplication error. Each
articlecan only have a single publication date.
Map element errors:
Duplicate map name 'X'
- Two or more
mapelements have the same
nameattribute value, but each must be unique.
The id attribute on the map element must have the same value as the name attribute
- Just one of those quirky things that must be true.
Form element errors:
A select element whose multiple attribute is not specified must not have more than one descendant option element with its selected attribute set
- If you can pick only one option, it doesn’t make sense to specify that two or more are set by default.
The track element label attribute value must not be the empty string
labelis used to announce the track type to the readers, so it cannot be empty.
There must not be more than one track child of a media element with the default attribute specified
- As its name suggests, the
defaultattribute is used to indicate which track to use when no reader preference is available. Specifying more than one default defeats the purpose of the attribute.
The X attribute must refer to an element in the same document (the ID 'Y' does not exist)
- Some elements must reference other elements in the document. The
forattribute on a
labelelement, for example, must reference the
idof the form element it labels.
The X attribute must refer to elements in the same document (target ID missing)
- This error is the same as the last, but occurs when an attribute can reference more than one other element. There are a number of ARIA attributes that can reference multiple elements (
aria-controls, etc.). Check that each reference can be resolved.
The X attribute does not refer to an allowed target element (expecting: Y)
- The attribute references another element, but it is the wrong kind of element. To use
labelagain, it would be incorrect for it to point to anything but a form element.
The following errors impose additional restrictions on element and attribute usage that could not be enforced through the structural schema validation stage:
The X element must have a Y attribute
- This error occurs if the
bdoelement does not include a
The X element must not appear inside Y elements
- This error occurs if you attempt to embed one element inside another where it would make no sense or would cause rendering issues, such as an
videotag inside another
The X element must have an ancestor Y element
- This error occurs when an element is found outside of its expected ancestor. This error specifically occurs when an
areatag is found outside of a
mapand when an image map is not wrapped inside of an
epub:type property errors:
Undefined property: X
- If a property in the
epub:typedoes not have a prefix, it must be defined in the EPUB Structural Semantics vocabulary.
Undeclared prefix: X
- You have used a prefix that has not been declared in the
prefixattribute on the root
The ssml:ph attribute must not be specified on a descendant of an
element that also carries this attribute
- When you use the
ssml:phattribute, the pronunciation is used in place of the text content of the element it is attached to. If you include an
ssml:phon a descendant element, it will never be announced.
The scoped style element must occur before any other flow content other than other style elements and inter-element whitespace
- When adding CSS style definitions scoped to the current element, the
styleelement must be the first child. This can be problematic when scoping styles for figures, as it is invalid to include the
styleelement before a
figcaptionat the start of the figure.
Alt style sheet errors:
Conflicting attributes found: X
- You’ve specified both
The following errors occur when you include malformed entities:
The entity &xyz; was referenced, but not declared
- You need to change the referenced named entity to a numeric one.
The entity name must immediately follow the '&' in the entity reference
- You have an
&in your document that needs to be changed to
&to be valid.
The reference to entity X must end with the ';' delimiter
- You’re missing a semicolon at the end of an entity.
Note that the following error is not related to the use of entities in your content:
External entities are not allowed in XML. External entity declaration found: %OEBEntities
If you receive this error, you need to remove an XHTML 1.1 DOCTYPE declaration from the invalid file. The current version of epubcheck now handles this issue with a more meaningful message:
Obsolete or irregular DOCTYPE statement. External DTD entities are not allowed. Use '<!DOCTYPE html>' instead.
Although epubcheck does include some minimal CSS verification checks, it does not perform true CSS validation. It will not tell you if you’ve entered property names or values incorrectly or even if you have malformed syntax (incorrect shorthands, missing brackets on definitions, semicolon delimiters at the end of properties, etc.).
Integration of a CSS3-compliant validator is in the future for the program, but is not possible as of this writing. At this time, all you’ll be notified of is the presence of properties that are not recommended by the specification (e.g., using
It may be possible to validate the code used in individual content files by opening them in a browser, but if your code depends on any EPUB extensions (such as the
epubReadingSystem object), you won’t get a useful result. Restrictions on access to the DOM, to the Internet, etc., are only going to be verifiable within reading systems, but at least with these you can sprinkle your code with debugging alerts to verify what is happening under the hood.
Although verifying the structural integrity of your publication is a great first step in ensuring its ability to be read, it is only a small part of what is required to verify that a publication is accessible. A validator can typically only tell you if the markup you’ve used is valid. What it can’t do is tell you if you could have done things better.
epubcheck is no different. It cannot determine for you if content should be part of the logical reading order or not, for example, because that distinction can only be verified by humans. Whether or not elements have been appropriately used is likewise not something epubcheck is good at determining. The list goes on. EPUB 3 and Accessibility covers a number of issues, but there is more to making accessible content than can be covered in one article.
To help assist in understanding the issues and verifying that you have met key accessibility criteria, the IDPF is developing reference accessibility guidelines that can be used in conjunction with this article, with the goal of providing broader coverage of the many issues involved. The site also provides a quality assurance checklist, including the ability to dynamically generate a checklist depending on the features your EPUB contains.
The idea of creating an interactive validator for EPUB 3 content has been bandied about since the EPUB 3 specification was finalized, but as of this writing, work on one has not begun. The DAISY Consortium is currently investigating how to improve epubcheck to report accessibility issues and/or create this new tool, and seeking funding for the work, so there are positive signs that accessibility checking will improve in the near future.