HTTP 503 While Parsing XML

When you encounter a HTTP 503 error while parsing XML, you might come across the W3C wants to reduce its traffic.

java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1313)

See W3C Team Blog for detailed information. A recommended workaround is to cache the DTDs locally.

Just take care creating a directory dtd within class’ package and store the .dtd and .ent files you need there, downloadable from W3C, e.g.:

  • http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
  • http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
  • http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
  • http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
  • http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent

Here is how to do this in Java:

package com.bensmann.xml;
import java.io.IOException;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class CachedDTD implements EntityResolver {

    /**
     * Return DTD 'systemId' as InputSource.
     * @param publicId
     * @param systemId
     * @return InputSource for locally cached DTD.
     */
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        String[] resource = systemId.split("/");
        try {
            InputStream uri = CachedDTD.class.getResourceAsStream("dtd/" + resource[resource.length - 1]);
            return new InputSource(uri);
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

}

Making it really Groovy:

package com.bensmann.xml
class CachedDTD {

    /**
     * Return DTD 'systemId' as InputSource.
     * @param publicId
     * @param systemId
     * @return InputSource for locally cached DTD.
     */
    def static entityResolver = [
        resolveEntity: { publicId, systemId ->
            try {
                new org.xml.sax.InputSource(CachedDTD.class.getResourceAsStream("dtd/" + systemId.split("/").last()))
            } catch (e) {
                e.printStackTrace()
                null
            }
        }
    ] as org.xml.sax.EntityResolver

}

Give it a try:

// Create instance of XmlSlurper and set EntityResolver
def slurper = new XmlSlurper()
slurper.setEntityResolver(com.bensmann.xml.CachedDTD.entityResolver)
// Parse UTF-8 XML file
slurper.parseText(new File("/Users/rbe/example.xml").getText("UTF-8"))

This entry was posted in Groovy, Java, Software Development and tagged , . Bookmark the permalink.

Leave a Reply