RESOLVED – GZIP compression after URL-ReWrite on IIS

DRAFT / UNDER-CONSTRUCTION

blog.xarta.co.uk

UPDATE – RESOLVED ISSUES – NEED TO COMPLETELY CHANGE THIS

I would really appreciate if anyone can tell me what I’m doing wrong here.

Simplified flow diagram for reverse proxy and webserver indicating requests, responses, and how gzip fits in
Click on image for description. Using IIS 8.5 for URL-ReWrite. Putting it in front of a simple LAMP VM (Ubuntu 16.04, Apache 2, MySql, Php) that I’m using to get WordPress working on Linux (used to have it directly on a Windows / IIS set-up). SSL (currently StartSSLStartcom for the public side, ‘though Firefox and Chrome update this month – January 2016, might distrust my issued certificate). I’m using self-signed PKI (managing with XCA) for the private network side, so SSL all the way.

 

This problem

  • My back-end fresh WordPress install will happily provide gzip if the accept-encoding says so in the Request Header. I don’t want to have to remember to change each back-end server to not compress!
  • My URL ReWrite module can’t read compressed responses to see what it needs to rewrite e.g. https://blog.xarta.co.uk from WordPress to https://blog.xarta.co.uk on the internet. So before any request is sent to the back-end, HTTP_ACCEPT_ENCODING in the request header is stored in another variable and set to “”.
  • After all the rewriting is done, HTTP_ACCEPT_ENCODING is restored from the temporary server variable.
  • Then, on IIS, I thought the dynamic compression module would kick-in … see that the HTTP_ACCEPT_ENCODING says “gzip” and re-compress the rewritten responses on the fly thus halving the payload in KB terms but also avoiding child-issues:
    • /wp-admin/load-styles.php is called with GET and failed to completely download every single time after the rewrite process. Compressed it’s about 79KB.  Uncompressed it cuts out at about 240KB (well within GET specs. + common implementation for the browser and servers etc.) (Should be about 319KB-ish).  Eventually I discovered this was due to some problem with the ARR buffer … just by increasing the threshold I avoided the issue by avoiding the buffer. I’m still investigating this issue. In the interim, I had put in an inbound-rule condition to not process the pattern (.*)/wp-admin/load-styles.php(.*) for setting the HTTP_ACCEPT_ENCODING  to empty, and put in a pre-condition for the outbound rules to ignore gzip content-types, as nothing needs rewriting in it. It’s a hack I’m keeping for now while I can’t compress the response.

 
Highlighting secondary issue revealed by post-rewrite-gzip failure – the size of this document, “load-styles.php”, exceeds the ARR buffer threshold.
211KB transferred using internal site rather than IIS reverse-proxy for wp-admin (Dashboard).
464KB transferred using IIS reverse-proxy for wp-admin (Dashboard). load-styles.php ignored by rewrite – still gzip.

...

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <rewrite>
            <rules>
                <clear />
                <rule name="ReverseProxyInboundRule1" enabled="true" stopProcessing="true">
                    <match url="(.*)" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="false">
                        <add input="{URL}" pattern="(.*)/wp-admin/load-styles.php(.*)" negate="true" />
                    </conditions>
                    <serverVariables>
                        <set name="HTTP_X_ORIGINAL_ACCEPT_ENCODING" value="{HTTP_ACCEPT_ENCODING}" />
                        <set name="HTTP_ACCEPT_ENCODING" value="" />
                    </serverVariables>
                    <action type="Rewrite" url="https://blog.xarta.co.uk/{R:1}" />
                </rule>
                <rule name="load-style-issue" enabled="true" stopProcessing="true">
                    <match url="(.*)" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="false" />
                    <action type="Rewrite" url="https://blog.xarta.co.uk/{R:1}" logRewrittenUrl="false" />
                </rule>
            </rules>
            <outboundRules>
                <clear />
                <rule name="ReverseProxyOutboundRule1" preCondition="ResponseIsHtml1" enabled="true">
                    <match filterByTags="None" pattern="http(s)?://blog.xarta.co.uk/(.*)" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="true" />
                    <action type="Rewrite" value="https://blog.xarta.co.uk/{R:2}" />
                </rule>
                <rule name="javascript-encoded-anchor-tag" preCondition="ResponseIsHtml1" enabled="true">
                    <match pattern="href=(.*?)https://blog.xarta.co.uk/(.*?)\s"/>
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="true" />
                    <action type="Rewrite" value="href={R:1}https://blog.xarta.co.uk/{R:2}"/>
                </rule>
                <rule name="javascript-encoded-form-element" preCondition="ResponseIsHtml1" enabled="true">
                    <match pattern="action=(.*?)https://blog.xarta.co.uk/(.*?)\\" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="true" />
                    <action type="Rewrite" value="action={R:1}https://blog.xarta.co.uk/{R:2}\" />
                </rule>
                <rule name="javascript-backslashe-url" preCondition="ResponseIsHtml1" enabled="true" patternSyntax="ExactMatch">
                    <match pattern="https:\/\/blog.xarta.co.uk\/" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="true" />
                    <action type="Rewrite" value="https:\/\/blog.xarta.co.uk\/" />
                </rule>
                <rule name="RestoreAcceptEncoding" preCondition="NeedsRestoringAcceptEncoding" enabled="true">
                    <match serverVariable="HTTP_ACCEPT_ENCODING" pattern="^(.*)" />
                    <conditions logicalGrouping="MatchAll" trackAllCaptures="true" />
                    <action type="Rewrite" value="{HTTP_X_ORIGINAL_ACCEPT_ENCODING}" />
                </rule>
                <preConditions>
                    <preCondition name="ResponseIsHtml1">
                        <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/(.+)" />
                        <add input="{RESPONSE_CONTENT_ENCODING}" pattern="gzip" negate="true" />
                    </preCondition>
                    <preCondition name="NeedsRestoringAcceptEncoding">
                        <add input="{HTTP_X_ORIGINAL_ACCEPT_ENCODING}" pattern=".+" />
                    </preCondition>
                    <preCondition name="gzip-only">
                        <add input="{RESPONSE_CONTENT_ENCODING}" pattern="gzip" />
                    </preCondition>
                </preConditions>
            </outboundRules>
        </rewrite>
        <urlCompression doStaticCompression="false" dynamicCompressionBeforeCache="false" doDynamicCompression="true" />
        <tracing>
            <traceFailedRequests>
                <add path="*">
                    <traceAreas>
                        <add provider="WWW Server" areas="Authentication,Security,Filter,StaticFile,CGI,Compression,Cache,RequestNotifications,Module,FastCGI,WebSocket,Rewrite,RequestRouting" verbosity="Verbose" />
                    </traceAreas>
                    <failureDefinitions timeTaken="00:00:00" statusCodes="502" verbosity="Ignore" />
                </add>
            </traceFailedRequests>
        </tracing>
    </system.webServer>
</configuration>

Primary source to help me

http://stackoverflow.com/questions/15926203/iis-as-a-reverse-proxy-compression-of-rewritten-response-from-backend-server

I followed suggestions including:


<urlCompression doStaticCompression="false" dynamicCompressionBeforeCache="false" doDynamicCompression="true" >[/xml ]


reg add HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\InetStp\Rewrite /v LogRewrittenUrlEnabled /t REG_DWORD /d 0[/code ]

  • Checked the ordered listing of modules – check.

Other useful resources

https://blogs.msdn.microsoft.com/friis/2016/08/25/setup-iis-with-url-rewrite-as-a-reverse-proxy-for-real-world-apps/

https://www.iis.net/learn/extensions/url-rewrite-module/creating-outbound-rules-for-url-rewrite-module

https://www.iis.net/learn/extensions/url-rewrite-module/url-rewrite-module-configuration-reference#UsingServerVars

Other settings

I set-up an application pool for the "site" (on the reverse-proxy) (I have a few SSL sites on the IIS Server and use Server Name Indication).  And I set appropriate restrictive NTFS permissions for the physical path for the site.

During testing and experimentation, I experienced many "odd" errors by which I mean intermittent including the sudden cessation of https://blog.xarta.co.uk while pages from https://blog.xarta.co.uk/wp-admin/* still worked.  At one point I had specific issues when IFrames were used for some pages.  I received various error status codes, and "cache-private" notifications in response headers.  I investigated ARR caching - trying to disable as much as possible, and buffer sizes, and buffer thresholds etc. but many of the issues seem related to NTFS permissions to the physical path for the "site" (just containing web-config).  Difficult to pin down as there is definitely some "state" involved. So I set temporarily read-access to that path for "everyone" (if it definitely helps I'll try progressive restrictions e.g. IIS_USRS etc.) and it seems to have cleared all issues except, for now, the gzip (dynamic compression) problem.

[Update-next-day] Nope. Had a bad-gateway error again. After updating a post. Noticed in the failed-request-trace that I didn't have custom error codes in En-GB so I fixed that, and a permissions issue with En-US ... and then the problem went again. However I don't think it was because of the details of what I did so much as maybe a background time-out or my somehow triggering a change in state somewhere that can't be equally accomplished by an iisreset.
Two warnings indicating a URL was unresolved leading to a 502 error
Maybe I'll try using the IP-address rather than internal DNS/Hostname for the back-end server, using a server variable to assign one on the proxy. Just in case of intermittent DNS caching or availability issues? Feeling forced to examine the new failed request trace more carefully, in this particular instance, I could see the ApplicationRequestRouting module / MAP_REQUEST_HANDLER making a mapping - getting rid of the "https://blog.xarta.co.uk" that had already been re-written into the incoming request, and replacing with "/". Obviously, just "/" could not be found. Not sure if to do with reverse-host-rewrite-in-response setting. But this only happens occasionally - so I'm not yet sure why it does it. Maybe I need to turn on caching again (which I now know to be separate from the "buffer")? The initial inbound rule was generated by a wizard, where it might see empty {URL} and expand that to "/" and rewrite to https://blog.xarta.co.uk. And seems verified by tutorials.
I also tried temporarily setting the application pool to DefaultAppPool (still pass-through authentication), and even set authentication to an admin account - just to rule out the direct-possibility.

Internal vs external wp-admin access

On my main xarta.co.uk site, I use a client certificate and basic authentication (forced over https) and google authenticator ... maybe overkill considering I haven't yet remodelled my home network in a secured way. (Intending to use PFsense on another compute stick, and vlan tags on my cheap smart switches, as well as isolation using VMware VMs and Docker). For a blog, I would like to be able to log-in more simply but also from the internet using blog.xarta.co.uk, and internally using blog.xarta.co.uk. I wanted to be able to access some resources internally without relying on the working presense of the "internet" proper (and without relying on fickle caches so much, more proxying, or lmhosts edits etc.) It seemed to me to be a simpler proposition to have any WordPress install set-up as a complete "entity" only aware of its internal naming and PKI. In keeping, in terms of sentiment, with the principle of loose coupling. Alas, for now, while there are still issues of not all patterns including https://blog.xarta.co.uk in responses being caught for rewrite, although nearly all are dealt with ... just some annoying exceptions, and because I don't fancy spending much more time on it right now, I'm making the temporary concession of changing the site url fields within WordPress:

Failed Request Tracing

I tried stopping Apache and making a new request via the proxy to force a 502 bad gateway error … just so I could look at a failed request tracing log.

I’m still not very good at reading them – i.e. when some modules are repeatedly referred to, but as far as I can tell, the pre-conditions module is mentioned somewhat after rewrite rules evaluation and dynamic compression … so I am somewhat confused! (The ACCEPT_ENCODING header isn't restored until really late on in the chain and after any mention of dynamic compression).

https://blog.xarta.co.uk/wp-content/uploads/2017/01/fr000005.xml

... These Failed Request Tracings work best in Internet Explorer, I think.

Conclusion

I'm obviously flapping about not knowing what I'm doing. I don't know why the restoration of the ACCEPT-ENCODING header happens so late, and seemingly after the dynamic compression module stops doing anything. I'm not sure why I still get occasional errors ... every time they happen I'll try to refer to failed request traces - I'll expand the capture criteria, to see if I can pin-point a common issue. I don't want to expend lots of time and energy researching and experimenting with this - it's just a tiny albeit essential part of my plans. For now, I will continue adding more rewrite rules for WordPress etc. ... but for the future, I want to turn to nginx and specifically running it either in Docker, or as a reverse proxy on a Raspberry Pi. nb the Pi2 and Pi3 both use hardware random number generators - but not the Pi zero unfortunately. I'm trying to get away from heavier tightly coupled infrastructure to more low-energy, modular approaches.

Background

My goals, for my IoT Anxious Annie project, include using my PCG01 Bay Trail (Atom) Compute Sticks for Docker containers (on one) and PF Sense on the other.  Each Compute stick would require much less than 5W @ 5V, 1A on average making them a compelling choice for an always-on home infrastructure working alongside Raspberry Pi’s and Arduino devices, using (suitable) USB PowerBanks as UPS’s.  At the moment I have web services running on an IIS Windows VM so I’m using it for the reverse proxy as I move toward the model of back-end discrete services via a proxy on the network edge, and I was thinking of maybe a Windows IIS Container in the future (requiring more “reliable/low-power” hardware perhaps) … but it would probably be easier to use Nginx on a Raspberry Pi 3 or later.  I’ve discovered that a Raspberry Pi 2 is quite capable of heavy (streaming) use as an Open VPN server despite their poor reputation for networking.

Eventually, my WordPress will sit in a container beside a Mosquito MQTT hub container, and .net Core containers all working within a 2GB Ram constraint.  I hope.  I want to be able to make my own images rather than use pre-built ones from the hub and will move to using Alpine Linux, Nginx and PHP-FPM etc. in the WordPress Docker image.  The Ubuntu LAMP stack is an interim arrangement and also for Docker-engine practice in Linux (Ubuntu) as using it in Windows when I first attempted Dockerising WordPress became problematic.

Print Friendly