A graph generated by PRTG showing initial 21 days graphing of TimeOutTherapy response time

TimeOutTherapy network issues

The site I made, self-host, and maintain for my friend … https://timeouttherapy.co.uk.

PRTG and Pingdom et al are great for monitoring and provide feedback for cache tweaking etc. … but have confused me a little.

I’ve been updating timeouttherapy in the last 2-months or so, as well as physical/configuration set-up of my (new) network (mail & web servers, reverse proxy, real-time monitoring & alerts, and, quite advanced-deployment-of pfSense etc.), hMailServer backup scripts, updates to this website and the plug-in I’m developing particularly.

https://timeouttherapy.co.uk is served from my IIS webserver (running on Window’s 10 so request queuing is limited being a desktop rather than server OS, but good enough for small sites – especially when using image CDN’s too etc. and server-farm ARR caching in my DMZ-placed IIS reverse proxy: eventually I’ll add another machine to make it a proper farm, and, put in another IIS reverse proxy and leverage NLB on the pfSense box – at least for the purposes of a fail-over pool). NB: I use SSL termination (and effectively management of Let’s Encrypt certificates) on the IIS Reverse Proxy/to-be-proxies … but also then use my own PKI internally. My vlans only permit SSL to be passed. (The Reverse Proxies also redirect non-HTTPS to HTTPS etc.).

Over-all I’m pleased with my progress and network response & stability. However, I’m still seeing occasional unexplained outages I need to investigate when I have time.

It’s most perplexing. Pingdom records 15-minute outages, but the reality according to PRTG that I host internally to monitor my servers and infrastructure, is that the outage times are just a couple of minutes:

Screenshot taken 31st May @ 15:07 hours

The little down-times are mostly due to my deleting the caches as I don’t have a staging server set-up yet (pending moving some other sites after which I’ll simply “clone” and adjust my “production” [VM] server). The PRTG monitoring is set-up to test the HTTP/S response via my IIS reverse proxy (internally I use split-DNS but monitor the websites via the reverse-proxy). Note … no downtime recorded at all since 27th May. No sign of a 15-minute outage on the 29th May.

Can’t see these 15-min outages on PRTG

The above image shows when I started monitoring with PRTG. There is still much to be done … the most sophisticated sensor configuration / set-up I have is the Imap round-trip for my email server. I’ve yet to set-up SNMP, but meanwhile use http/s monitoring and ICMP pings that I carefully permit across my outbound-blocked vlans based on source/destination and monitored with Snort (IPS/IDS). I also want to make my own sensors (just looking for a return value from a script) to check the results of some of my back-up scripts.

Testing allowed me to work on getting my server/site response time down, and focus on the biggest/easiest wins. There was an issue with sending vlans over my homeplug connection … that concentration of downtime was due to some issue with the homeplug and requires further investigation.

I’ve checked Snort for anything that might explain it … “alerts” etc. (blocking is set for longer than 15-mins and nothing suspicious there either).

I’ve not noticed any corresponding loss of internet connectivity (using PRTG) … for example, my phone connects to https://x.xarta.co.uk where I expose via reverse proxy PRTG monitor services, plus PRTG probes continually check SAAS APIs e.g. Google, Bing, YouTube etc. and even notify me if their average response times change … no obvious answers).

My watch would wake me up if there are any major alarms I need to know about.

Anyway. Pingdom gives me a decent time, testing from Stockholm. This isn’t a cold-cache test, and I serve the big images (deferred) (if the DOM ready time is quick enough, and after page-load at lower res) from Cloudinary which itself caches for the local servers where the request comes in.

As an additional test I used Google PageInsights. I’m happy with the results. The deferred images I use score negatively. It’s difficult to combine the JavaScript scripts without breaking things and so the effort-verses-reward seems satisfactory as things are … same goes for above the fold render-blocking generally. This is a site I effectively maintain and host for free, for a friend – and to put in the effort to get better results in PageInsights wouldn’t be a productive use of my time. I think I do better than many commercial sites (but bearing in mind this is at the moment, despite being WordPress based, mostly a static brochure style site though I’m planning on adding twitter & LinkedIn widgets in the near future – they might need some work / tweaking for avoiding additional upfront page-load).

When developing the site, I use Chrome for device screen sizes, and with throttling … letting me know when not to download bigger images.

e.g.

Chrome developer tools – throttled to an extremely slow “regular 3G” (less 1/15th of average in the UK), and on Iphone 6 plus screen size. Responsively sized images are smaller, and big higher quality deferred images won’t load as DOM ready was higher than the threshold I set. The time is slow, but not awful. The home page has the most image assets, which are then browser cached.

TODO (THIS-POST): alt-text / description / links etc. for images – and preload/defer etc. where possible. Add post with screenshots of timeouttherapy.co.uk pages at different screen sizes to demonstrate responsive work.