Google Cloud Platform

I’ve done extensive work on AWS using CloudFormation, Autoscale groups and have implemented pretty much every service that Amazon offers in some shape or form over the last 3 years for various projects. Probably the most notable implementations being Remote Control Tourist and AusPost Video Stamp. So with all this AWS work and experience why would I look at Google Cloud Platform? Platform as a Service (PaaS) is the short answer, let me explain…

Imagine you have a small development team (half a dozen people), some are amazing frontend programmers, others (like me) do a lot of the backend programming, another is a solution architect and another is a tester. You need to have DevOps skills in-house to be able to use AWS the way it is intended. Which means you need:
a.) multiple people trained in the AWS suite and how to implement them
b.) multiple people with sys admin/devops skills to software the instances or implement a system for automating software installation and configuration

In a small team you are unlikely to have this crossover of skills which means you are going to want a Platform as a Service offering. Now I know you can go with Elastic Beanstalk however there is no comparison to Google AppEngine (in my opinion).

Where AppEngine shines

You have a small site, it only gets a little bit of traffic but when it does get traffic, the traffic is really spiky. Naturally high availability is mandatory because customers love it when their sites go down under load. On AWS this would like something like:
1x Elastic Load Balancer
2x Small EC Instances (in separate Availability Zones)
1x Small Multi Availability Zone RDS Instance

On top of that, Elastic Load Balancers are not as “elastic” as people think. In order to prepare for a big spike you need to:
a.) have a support contract with Amazon
b.) notify Amazon via support ticket at least 24 hours in advance with an estimation of your traffic breakdown

So that’s quite a list of stuff. What other requirements are there? Oh you need the hosting to always be on in case you get the odd bit of traffic from time to time outside of your peak windows. Even with reserved instances you are talking about a fair bit of cash each month for the setup I’ve outlined above and on top of all of that you still need to have Chef/Puppet/Ansible/Saltstack to handle the autoscaling aspect which means a developer coding up configurations for this infrastructure.

Or you could create a AppEngine project, press one button to deploy it, and then pretty much forget about it. As an added bonus AppEngine will scale up and down as it needs to, loadbalancing is a non-issue and completely automatic. So I have a high availability, autoscaling, 1 click deployable application that is also free when the traffic is below the daily free limits. This is possibly the most significant. A recent project would have cost me ~$250 on AWS per month. The project ran for 3 months. The cost on Google AppEngine was a staggering $75 for the 3 months!

What is particularly interesting about the Google Cloud Platform is that while it appears that they have fewer tools available the fact is that the tools they do have are exactly what you need to build large scale applications. Additionally, doing due diligence on optimisation and aggressively caching wherever possible to reduce database reads pays big dividends in cost savings. The way AppEngine is priced encourages developers to code properly as they are rewarded for it.

The most recent thing I’ve noticed in the last few weeks is that the UI of both the administration system and the documentation have undergone a huge restructuring and styling treatment. The Google Cloud Platform really is the most accessible it has ever been and is definitely worth consideration. Well done Google!

Geofencing with Nginx and MaxMind’s GeoIP Database

Today I ran into an interesting problem that required me to geofence traffic from certain parts of the world. I knew about MaxMind’s GeoIP database and I also knew about the Nginx geoip module (ngx_http_geoip_module) so I had a starting point. What I wanted to do was drive traffic from selected countries to a static landing page to reduce the load on the dynamic infrastructure. After purchasing the GeoIP database I immediately uploaded it to my web server and then started building out my Nginx config file.

The first thing was to include the GeoIP database in the HTTP section of the Nginx master config file:

http {
    ...
    geoip_country  /path/to/GeoIP.dat;
    ...
}

Next up was to create a mapping for the countries that I wanted to geofence (in this example I’m geofencing India and China):

http {
    ...
    geoip_country  /path/to/GeoIP.dat;

    map $geoip_country_code $mapping {
        default     default;
        CN          fenced;
        IN          fenced;
    }
    ...
}

After creating the mappings we need to build a rudimentary config that proxies the incoming request to the appropriate backends. The important part here is that the mapping variable is used in the upstream names:

server {

    listen          80;
    server_name     example.com;

    location / {
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_pass http://$mapping.server;
    }
}

upstream fenced.server {
    server  127.0.0.1:9002;
}

upstream default.server {
    server  127.0.0.1:9001;
}

Finally all that is needed is to build out the configs for the ports we are listening on:


server {
    listen          9001;
    server_name     127.0.0.1;

    root    /path/to/webroot;
    index   index.html index.htm;

    location / {
        add_header X-Geo $geoip_country_code;
        add_header X-Geo3 $geoip_country_code3;
        add_header X-IP $remote_addr;

        try_files $uri $uri/ /index.html;
    }

}


server {
    listen          9002;
    server_name     127.0.0.1;

    root    /path/to/fenced;
    index   index.html index.htm;

    location / {
        add_header X-Geo $geoip_country_code;
        add_header X-Geo3 $geoip_country_code3;
        add_header X-IP $remote_addr;

        try_files $uri $uri/ /index.html;
    }

}

And thats it. Obviously the configs require a lot of work to get them to production ready however that is how I went about building a basic geofencing solution. I’d love to see other people’s solutions as I had trouble finding other examples online.

Planning and Preparation – WTF am I going to build?

I’m going to build a single page web application to replace my WordPress site so the first thing I’m going to need to do is work out what I want to build. This is the hardest part of the process I find as there are a million things I’d like to build so narrowing it down to measurable goals can be quite tough. Lets start off with the requirements, what must the site do?

Requirements:

  • Retain all WordPress data – Articles and Comments
  • Retain all permalinks to said articles (SEO matters)
  • Have live feeds from my social networking tools
  • Enable visitors to easily share the pages on my site
  • Single page web application
  • Parallax scrolling (flavour of the month, I’m a sucker for punishment)
  • Google SEO compliance for single page web apps
  • Scalable infrastructure with autoscaling

That’s quite a list and some of those items are going to take a lot of work. First things first we need to decide on some technologies that we are going to use to build this monstrosity. I love Python so we are definitely going to be using it in the backend and also for our deployment process. But there are so many web frameworks for Python, which one should we use? I’ve used Django a number of times and feel quite comfortable with it so therefore we wont be using it, instead I’m going to try out Tornado and we’re going to be running it on top of Pypy. Pypy is an alternative implementation of Python with JIT compilation giving it significant performance gains. Tornado is a popular Python web framework that was originally written for FriendFeed and has since been open sourced by Facebook, it is asynchronous and has numerous async modules that will aid in our development (tornadio2 and asyncdynamo being two that spring to mind).

That’s all well and good but we have some data we need to retain, those first two points in the requirements clearly say that we’re taking WordPress with us! Getting the data out of MySQL isn’t exactly difficult, a simple

mysqldump --user=username --password my_database > /tmp/my_database.sql

will dump all the SQL out in a file however we aren’t going to be using a relational database. We’re going to join the NoSQL fan club and store our articles in DynamoDB seeming as this whole project will be running on AWS (Amazon Web Services). I’ve never used DynamoDB before but a quick look at the documentation appears as though it should do the job just fine for what I have in mind. Does an article need more than a primary key for querying? Guess we’ll find out soon enough.

I’m thinking Disqus is probably the best solution for comments, I don’t particularly want to write my own commenting system when I can plugin a free service like Disqus. After a little reading I discovered that you can export from WordPress to Disqus using a format known as WXR. Looking at the example it looks pretty straightforward, loop through the existing WordPress DB, push the articles to DynamoDB and put the new ID’s into WXR for importing into Disqus so DynamoDB and Disqus are in synch. As a bonus, Disqus enables visitors to share pages on my site so we’ll be killing two birds with one stone again!

The next three items are all frontend questions, to build a single page app you’ll probably want to use some sort of javascript MVC framework, and Angular.js is the tool we are going to use. I could use Backbone or Ember however I like Google so I’ve decided to go with Angular.js because I like how opinionated it is and the separation of HTML from JS is really nice (in my humble opinion). I’m going to have to do some research on the SEO compliance front as I’m not to sure what the implications will be on our project but some quick googling uncovered that Google has some recommendations on how to make a single page app available to Google so you don’t cop a penalty or worse yet, not get listed because the javascript isn’t processed by the crawler. On the parallax front, I’m sure there will be Javascript libraries to help with this part and its more sugar than a requirement so I wont dwell on it here.

Our final requirement is building this site to scale. I’ve worked with Rackspace and Linode before and have only relatively recently started working with Amazon Web Services but (and it’s a huge BUT) AWS really is in a league of its own. We’re going to use as much of AWS as my wallet will let me! Seriously though I’ll only be using t1.micro instances because I really don’t want to sink hundreds of dollars into AWS each month for a technology demo. We’ll be building a CloudFormation template to bring up as much of our environment as humanly possible, an external elastic load balancer, 2 frontend web servers (n+1), an internal load balancer, 2 backend web/application servers (n+1), DynamoDB, Route53, S3, CloudFront, CloudWatch, SES, SQS all nicely packaged inside a Virtual Private Cloud. We’ll be implementing autoscale groups in case we get smashed with traffic.

An additional requirement that isn’t on our list is SaltStack. I’ve used Puppet before and I want to get across SaltStack as its written in Python and its configuration is done using YAML which seems quite appealing compared to Puppet’s Ruby pseudo config language. We’ll be using SaltStack to software our EC2 instances when they start up through autoscaling.

That’s it for this instalment. The next post will be about setting up DynamoDB and Disqus, we’ll even write some handy little code to migrate the data over for us!

Time for a change!

After much deliberating I’ve decided to rebuild nigeldunn.com from scratch. I’m still nutting out the details of what I’m going to use but at this stage I’m planning on single page app using HTML5, angular.js, pypy and cyclone.io. I want this to be a knowledge sharing exercise as much as it is a redesign so…

The goals of this exercise are:

  1. Build a SEO friendly single page web application
  2. Retain all of my blog articles and comments from WordPress
  3. Make the site fully responsive (in other words: desktop and mobile friendly)
  4. Add some interesting real time functionality to the site (websockets or similar)
  5. Document the process so others can learn from it
  6. Showcase how to build a high performance, scalable website using the latest techniques
  7. Illustrate how to effectively use AWS for high availability and auto scaling
  8. Maybe learn a few things along the way :)

Now to be fair most of my skillset is around DevOps so I’ll do my best on the design front but I may have to recruit a graphic designer to help. I also plan on releasing all of my code for my site including deployment scripts and cloudformation templates via my GitHub account.

Because my site doesnt get very much traffic I’m going to use t1.micro instances on AWS to keep the costs down as this site will be a proof of concept (it doesn’t generate income so I don’t want to pay hundreds of dollars a month unnecessarily) however it will have autoscaling setup to allow for traffic spikes.

Continue on to Part 1: Planning and Preparation – WTF am I going to build?

Myki – Another Failed System


Myki is the digital ticketing system used in Victoria, Australia for public transport. It is also one of the worst systems I have ever seen when it comes to usability. The reason for this post is to outline my experiences with Myki and to list ways in which the Myki system can be improved. So lets start with the ugly…

1. Going to a Myki machine at a train station and topping up your card and being told “please wait 24 hours before using the card” is ridiculous.

2. If the auto top-up feature fails overnight, locking the card so that it cannot be used either to top up or travel is ridiculous.

3. Not being able to update your credit card online and do an immediate payment to unlock your card is ridiculous.

4. Having to spend over an hour on the phone just to talk to someone that says “you need to wait 24 hours or purchase another Myki to travel today” is ridiculous.

5. Having a call center that can’t do manual credit card transactions or unlock your Myki then and there is ridiculous.

I’m a web developer, part of my job is to think about how to simplify things, how to make things so easy you don’t even realise you are doing them. Design in other words. The fundamental problem with Myki is that a bunch of programmers sat down and coded an application without any thought to how the average person would use their product or how their decisions would impact the daily commuter. If someone needs to top up their card they are going to need it right then and there because it is extremely likely that the machine just told them they are out of credit. Having a delayed system that makes the transaction at midnight each day is insanity.

When architecting Myki did they ever think about message queues and asynchronous workers? Making a payment should be immediately sent to a worker to be processed against a payment gateway. Either online or at a machine at a station. Payments need to be immediate because almost every user is going to discover they are out of credit when they are actually out of credit. They aren’t going to log into Myki each day to check their balance, who wants to spend 5 minutes of their life each day checking their eticket?

Locking a card for any reason other than it being lost/stolen is insane if there is no way for the user to immediately unlock it via a payment. Having a recurring billing system that overrides manual payments is crazy, not being able to make a manual payment to immediately unlock a card and cancel a pending recurring payment is hard to comprehend. Furthermore, not being able to log in to the website and update your credit card details so that the recurring payment will succeed the next time it tries is laughable at best.

And now the call center…
Having to wait for an hour to talk to someone is really really frustrating. Having to waste my time to contact you for something that should just work is frustrating enough without having to wait an hour first. But, and this is a huge but, then to be told that there is nothing they can do to help you and that you need to wait 24 hours or purchase a new Myki (at the non-trivial price of $16 per card) is beyond infuriating. I can forgive all of the previous failures if, and only if, the customer service representative can unlock and top up your card for you. Being told that they can’t help you after waiting an hour is like having someone spit in your face.

Myki, sort your shit out! Fire whoever the person was that designed your unbelievably frustrating system, build the tools for you customer service people so they can be helpful rather than cannon fodder for frustrated commuters, and do some real world testing with how people actually use the system – not your way of sadistically trying to reprogram people.

RE:Masters

After going and seeing Amon Tobin live in Melbourne I got to thinking about classical music. I think classical music’s appeal has diminished significantly over time due to “pop” music and the advent of electronic music. What I think would be cool would be for Amon to reinterpret the compositions of the classical masters: Mozart, Beethoven, Bach, Chopin, Tchaikovsky, etc.

Amon has an amazing ability to create interesting, intelligent music that never gets old. I think taking Amon’s unique style of finding and creating sounds and then layering them into fantastic arrangements makes him the perfect candidate to take on something as challenging as reinterpreting the Masters.

The title I came up was:

RE:Masters

The title was based around satirising replying to the Masters via an electronic form which was never available in their day. It was also a play on the word remasterMake a new master of (a recording), typically in order to improve the sound quality.

I’m not a musician, I’m a computer programmer so feel free to steal the idea, album title, whatever suits. If you like the idea would love a mention in the album cover :P

Linux and the desktop

I recently installed Ubuntu 12.04 LTS on one of my desktop machines to just play around with. I really was dumbstruck at how good Ubuntu Linux is on the desktop. My favorite apps all have versions available. I’ve been using Ubuntu Server for a number of years and every so often trying it out in the desktop edition just to see what the state of it is. The laptop that I bought to run Ubuntu on was a giant fail at the time but works perfectly now (6 months later).

When I first tried out Ubuntu 11.04/11.10 I had a lot of problems with Unity and the machine would either boot into Gnome or just hang and need to be rebooted 50 times just to get to the desktop. However, this time around, 12.04 LTS installed and worked perfectly from the get go. Unity really is nice to use and the desktop is very very snappy. I was able to get up and going inside an hour (installing the OS and all the applications I needed) to be productive which I thought was pretty decent.

Apt is still one of my favourite features of Debian/Ubuntu, being able to install the majority of your software via apt-get makes life much easier when it comes to upgrades and security updates. The other good thing is that when it comes to compiling software, it just works, something that cannot be said of OSX.

I tried installing some apps that I love on my Mac just to see what they’re like on Linux. Spotify and Sublime Text 2 are two pieces of software that I use everyday and surprise surprise they were both available on Linux and both worked well. Spotify is still a preview edition but it still worked well enough that I had no noticeable problems with it. Sublime was perfect, UI is identical to the OSX version and just as snappy to use.

I know that there is Gimp for Linux however Adobe Photoshop is and always has been the reason I haven’t fully committed to Linux in the past. After reading the Adobe forums it appears that Adobe are still digging their toes into the dirt. Apparently they haven’t learnt anything from the gradual death of Flash. HTML5 and Javascript will completely replace Flash. They could switch to a SaaS model for their software or they could drop the price of Photoshop and sell more units rather than less units at a higher price.

But I digress, Ubuntu 12.04 LTS Desktop is very nice. I haven’t spent months and months working on it but my initial reactions are all positive. Ubuntu really feels like a cohesive desktop that just works. If I was building a new business I’d definitely be considering Ubuntu for the desktop and using SaaS for the software that is mission critical.

eWay Rebill API Python Wrapper

I couldn’t find any python wrappers for the eWay Rebill API. So I smashed one out quickly. I’ve put it on my Github account so that anyone can use it (improve it). Its pretty rough with no unit tests. Unfortunately the eWay API documentation doesnt match the WSDL and as a result you have to work out what the method name should be. The only requirement for this class is SUDS.

Hopefully this saves someone a few hours.

Change is in the air…

Unfortunately I think PHP and I are going to part company. A few weeks ago I decided to try out Django and Python, I’ve never coded in Python before and I’d seen a few performance benchmarks around the place showing some amazing stats for pythons speed. Anyone that knows me, knows that I have a firm belief that if you start with something slow at the lowest level then its only going to get exponentially worse each layer above.

3 weeks in and I’m dumbfounded that at how far I’ve come, what I’ve been able to build with almost no knowledge of the language. Compared to PHP, Python is beautiful to use. You can do so much with so little code. Everything is an object so OOP isn’t optional. Building up an application can be incredibly quick as there are tons of amazing modules that add instant functionality. Unlike the vast majority of PHP projects, code seems to be much higher quality with all the modules I’ve used having extensive tests packaged with them.

Django seems very solid. The automatic admin system is one of my favourite features of Django. It’s nice to have fun coding again, and Python and Django are certainly making it fun. Maybe if I get some time at a later date I’ll rip out wordpress and roll my own blog or possibly put Mezzanine through its paces.

libevent-2.0.so.5: cannot open shared object file: No such file or directory

When installing Memcached on a machine I came across the following error when I tried to start it:

memcached: error while loading shared libraries: libevent-2.0.so.5: cannot open shared object file: No such file or directory

The solution for this on Debian/Ubuntu (and probably most other linux distros) is this…

On a 32 bit system:

ln -s /usr/local/lib/libevent-2.0.so.5 /usr/lib/libevent-2.0.so.5

On a 64 bit system:

ln -s /usr/local/lib/libevent-2.0.so.5 /usr/lib64/libevent-2.0.so.5

After creating the appropriate symlink for your system you should now be able to start memcached as normal:

memcached -d -u memcached_user -m 256 127.0.0.1 -p 11211