Compressed HTTP Requests

Optimizing traffic size is a important thing for two main reasons: speed and traffic — saving time and bandwidth. For example think of mobile applications.

The common use case is to compress data leaving the web server (response). But I’m using a web service which accepts a XML payload and in a certain case it’s around 4 MB, compressed size around 270 kB.

So I try to accept gzip’ed requests, containing an XML document. It’s a Grails application and parsing an incoming XML request body (by accessing it in a controller through request.XML) relies on the correct header.

For tests I used a shell script and curl:

curl -v -s \
    --trace-ascii http_post_trace.log \
    --data-binary @compressed.xml.gz \
    --basic -u "user:pwd" \
    -H "Content-Type: text/xml" \
    -H "Content-Encoding: gzip" \
    -o bla.pdf \
    -X POST \
    http://service.example.com/app

Apache 2.2 and mod_deflate

Apache has a module mod_deflate which can also be used to decompress incoming data, directive SetInputFilter:

<IfModule mod_deflate.c>
    SetInputFilter DEFLATE
</IfModule>

Just combine this with mod_proxy and mod_proxy_ajp:

<VirtualHost *:80>
    <IfModule mod_deflate.c>
        SetInputFilter DEFLATE
    </IfModule>
    <IfModule mod_proxy.c>
        ProxyRequests Off
        ProxyPreserveHost On
        ProxyVia On
        <IfModule mod_proxy_ajp.c>
            ProxyPass /app ajp://localhost:8009/app
        </IfModule>
    </IfModule>
</VirtualHost>

So I gave it a try and expected just decompressed data arriving at the controller… and yes, data got decompressed, but headers are incorrect.

The server made a boo boo

Unfortunately there are unresolved bugs #54255 – mod_deflate adjusts the headers “too late”, should have a fixups hook and 52595 – requests with gzip+chunked encoded body don’t proxy reliably, so I had to use another solution.

Here’s the log:

$ cat http_post_trace.log
== Info: About to connect() to service.example.com port 80 (#0)
== Info:   Trying 127.0.0.1...
== Info: Adding handle: conn: 0x7fb1dc009000
== Info: Adding handle: send: 0
== Info: Adding handle: recv: 0
== Info: Curl_addHandleToPipeline: length: 1
== Info: - Conn 0 (0x7fb1dc009000) send_pipe: 1, recv_pipe: 0
== Info: Connected to service.example.com (127.0.0.1) port 80 (#0)
== Info: Server auth using Basic with user 'user'
=> Send header, 232 bytes (0xe8)
0000: POST /odisee/document/generate HTTP/1.1
0029: Authorization: Basic bXXXnnnMMM12345678==
0054: User-Agent: curl/7.30.0
006d: Host: service.example.com:80
008a: Accept: */*
0097: Content-Type: text/xml
00b9: Content-Encoding: gzip
00d1: Content-Length: 246
00e6: 
=> Send data, 246 bytes (0xf6)
0000: ..........e.AN.0.E.=E.=..BU..,..@bA9@h&R......X-.).;..?......,
0040: .r.pW....g......t....%.bNh....me..7......5"X83O.....Y{..E....3.C
0080: [))..n..0.St.*.Q6=....J..g.f>.2:.0..T.%l.....k.z.r.?...w....np..
00c0: 4..d....B".z. ].....!`..`.n....;...3.r....k..3.......
== Info: upload completely sent off: 246 out of 246 bytes
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 37 bytes (0x25)
0000: Date: Thu, 16 May 2013 10:26:09 GMT
<= Recv header, 28 bytes (0x1c)
0000: Transfer-Encoding: chunked
<= Recv header, 24 bytes (0x18)
0000: Content-Type: text/xml
[...]
== Info: Connection #0 to host service.example.com left intact
== Info: Closing connection #0

Setting LogLevel debug in httpd‘s configuration will show this:

[Thu May 16 12:22:28 2013] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Authorization] = [Basic bXXXnnnMMM12345678==]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [User-Agent] = [curl/7.30.0]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Host] = [service.example.com:80]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Accept] = [*/*]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [Content-Type] = [application/x-gzip]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[5] [Content-Encoding] = [gzip]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[6] [Content-Length] = [246]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
[Thu May 16 12:22:28 2013] [debug] mod_deflate.c(915): [client 127.0.0.1] Zlib: Inflated 228 to 387 : URL /app
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(268): proxy: APR_BUCKET_IS_EOS
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(273): proxy: data to read (max 8186 at 4)
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(288): proxy: got 387 bytes of data
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 06
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(697): ajp_parse_type: got 06
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(391): (20014)Internal error: ap_get_brigade failed
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(564): proxy: Processing of request failed backend: 0, output: 1
[Thu May 16 12:22:28 2013] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (localhost)

These lines are interesting:

[Thu May 16 12:22:28 2013] [debug] mod_deflate.c(915): [client 127.0.0.1] Zlib: Inflated 228 to 387 : URL /app
[Thu May 16 12:22:28 2013] [debug] mod_proxy_ajp.c(391): (20014)Internal error: ap_get_brigade failed

Our request got decompressed, but was’nt correctly forwarded, as the request headers should look like:

Content-Length: <uncompressed size of request body>
Content-Encoding: 

Even if I add AddInputFilter to decompress Content-Type: text/xml:

<IfModule mod_deflate.c>
    SetInputFilter DEFLATE
    AddInputFilter DEFLATE text/xml
</IfModule>

it does not work:

[Thu May 16 13:19:39 2013] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Authorization] = [Basic bXXXnnnMMM12345678==]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [User-Agent] = [curl/7.30.0]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Host] = [service.example.com:80]
[Thu May 16 12:22:28 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Accept] = [*/*]
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [Content-Type] = [text/xml]
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[5] [Content-Encoding] = [gzip]
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[6] [Content-Length] = [246]
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
[Thu May 16 13:19:39 2013] [debug] mod_deflate.c(915): [client 84.187.156.165] Zlib: Inflated 228 to 387 : URL /app
[Thu May 16 13:19:39 2013] [debug] mod_proxy_ajp.c(268): proxy: APR_BUCKET_IS_EOS
[Thu May 16 13:19:39 2013] [debug] mod_proxy_ajp.c(273): proxy: data to read (max 8186 at 4)
[Thu May 16 13:19:39 2013] [debug] mod_proxy_ajp.c(288): proxy: got 387 bytes of data
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 06
[Thu May 16 13:19:39 2013] [debug] ajp_header.c(697): ajp_parse_type: got 06
[Thu May 16 13:19:39 2013] [debug] mod_proxy_ajp.c(391): (20014)Internal error: ap_get_brigade failed
[Thu May 16 13:19:39 2013] [debug] mod_proxy_ajp.c(564): proxy: Processing of request failed backend: 0, output: 1
[Thu May 16 13:19:39 2013] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (localhost)

Solution

Java, Groovy, Grails

As Grails relies on the HTTP header when parsing XML (or JSON) by using request.XML and my solution with mod_deflate did not work as expected, I wrote a wrapper to decompress a maybe-gzip’ed stream. Additionally I can check the HTTP header Content-Encoding against the value gzip and/or deflate.

// Parse POST body: can be just text or gzip'ed stream
// Do not use request.XML as it relies on HTTP request headers
InputStream postBody = StreamHelper.decompressStream(request.inputStream)
List<String> lines = postBody.readLines('UTF-8')
Element xml = XmlHelper.asElement(lines)

The wrapper returns the stream itself or a new GZIPInputStream(InputStream). This works as Java streams follow the Composite pattern:

import java.util.zip.GZIPInputStream

public class StreamHelper {

    /**
     * Check if InputStream is gzip'ed by looking at the first two bytes (magic number)
     * and if it is, return a GZIPInputStream wrapped stream.
     * @param input An input stream.
     * @return The input or GZIIPInputStream(input).
     */
    public static InputStream decompressStream(InputStream input) {
        PushbackInputStream pushbackInputStream = new PushbackInputStream(input, 2);
        byte[] signature = new byte[2];
        pushbackInputStream.read(signature);
        pushbackInputStream.unread(signature);
        if (signature[0] == (byte) 0x1f && signature[1] == (byte) 0x8b) {
            return new GZIPInputStream(pushbackInputStream);
        } else {
            return pushbackInputStream;
        }
    }

}

Another trace:

== Info: About to connect() to localhost port 80 (#0)
== Info:   Trying 127.0.0.1... == Info: connected
== Info: Server auth using Basic with user 'user'
=> Send header, 298 bytes (0x12a)
0000: POST /odisee/document/generate HTTP/1.1
0029: Authorization: Basic bXXXnnnMMM12345678==
0054: User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 Ope
0094: nSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
00c5: Host: localhost
00d6: Accept: */*
00e3: Content-Type: text/xml
00fb: Content-Encoding: gzip
0113: Content-Length: 277
0128: 
=> Send data, 277 bytes (0x115)
0000: .......Q..odisee_6201067693417485029.xml.e.AN.0.E.=E.=..BU..,..@
0040: bA9@h&R......X-.).;..?^?......,.r.pW....g......t....%.bNh....me.
0080: .7......5"X83O.....Y{..E....3.C[))..n..0.St.*.Q6=....J..g.f>.2:.
00c0: 0..T.%l.....k.z.r.?...w....np..4..d...^?.B".z. ].....!`..`.n....;
0100: ...3.r....k..3.......
== Info: upload completely sent off: 277 out of 277 bytes
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 37 bytes (0x25)
0000: Date: Mon, 20 May 2013 14:34:03 GMT
<= Recv header, 27 bytes (0x1b)
0000: Server: Apache-Coyote/1.1
<= Recv header, 60 bytes (0x3c)
0000: Cache-Control: no-cache,no-store,must-revalidate,max-age=0
<= Recv header, 44 bytes (0x2c)
0000: Content-Disposition: inline; filename=document.pdf
<= Recv header, 31 bytes (0x1f)
0000: Content-Type: application/pdf
<= Recv header, 23 bytes (0x17)
0000: Content-Length: 82289
<= Recv header, 20 bytes (0x14)
0000: Via: 1.1 127.0.1.1
<= Recv header, 2 bytes (0x2)
0000: 
<= Recv data, 8759 bytes (0x2237)
0000: %PDF-1.4.%.........2 0 obj.<</Length 3 0 R/Filter/FlateDecode>>.
[...]
1f36: EOF.
== Info: Connection #0 to host localhost left intact
== Info: Closing connection #0

HTH.

This entry was posted in Software Development, System Administration and tagged , , , . Bookmark the permalink.