Automate provisioning a Linux VM in Microsoft Azure

At my company we’ve been looking at various cloud providers, including Microsoft Azure.  My interest has always been in automation of computer configuration, particularly on linux with puppet, and most cloud providers have an API with which to kick off a custom script on a VM once it’s freshly installed and running.  Except there does not seem to be anything on Microsoft’s API.   Sufficient googling showed that others were reporting a similar problem with no clear solution, hence this blog post for my approach.

I have to say, the xplat-cli (Cross Platform Command Line Interface), based on NodeJS, is actually quite nice for programmers, and is fairly easy to use.  But as mentioned, there’s not really a way to automate kicking off customization.  The closest I found was with the “CustomData” parameter, which allows you to upload a file that, once base-64 encoded, must be 64 kb or less, and gets included in an xml file, /var/lib/waagent/ovf-env.xml, that in no way knows to decrypt and run itself.

So, there are several options that we have:

  1. Don’t use the CustomData piece at all.  Just use a script that creates your VM and then uses the ssh key you provisioned it with to scp a custom script for that VM over to it, then ssh to the VM and sudo script.
  2. Similar to above, but rather than scp a custom script over to run, scp a fixed script that decodes the CustomData field from the XML file, writes that to a script, and runs it.   This is a little more involved than #1, but it moves the VM customizations to the CustomData parameter rather than in a custom script for each VM that gets copied.   I’m not really sure if this practically buys you anything over #1, but it’s what I will outline below, since it’s the most encompassing of all three of these.
  3. Finally, you can create a VM image that has in its initscripts to, upon firstboot, check the CustomData field, decode the data to a script, and run it.

In the example below, I assume you have already installed the azure-cli and connected your Azure subscription.  (Note that I edited the installed “bin/azure” command to find the fully qualified azure.js script, and “azure” is in my path)

Create your VM called “nattest” with a command similar to:

$ azure vm create --vm-size extrasmall --location "East US" --ssh 22 --no-ssh-password --ssh-cert ~/.ssh/NatAzureCert.pem --custom-data ~/Azure/linux/NatCustomTest nattest 0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2 nat

info:    Executing command vm create
+ Looking up image 0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info:    vm create command OK

Incidentally, you can get info about your new cloud server, including its IP address, by:

$ azure vm list --dns-name nattest --json
[
  {
    "DNSName": "nattest.cloudapp.net",
    "VMName": "nattest",
    "IPAddress": "100.79.96.21",
    "InstanceStatus": "RoleStateUnknown",
    "InstanceSize": "ExtraSmall",
    "InstanceStateDetails": "",
    "OSVersion": "",
    "Image": "0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2",
    "OSDisk": {
      "HostCaching": "ReadWrite",
      "DiskName": "nattest-nattest-0-201402212150500652",
      "MediaLink": "http://portalvhdsz934l0cn6dph9.blob.core.windows.net/vhd-store/nattest-87fbac9b59526826.vhd",
      "SourceImageName": "0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2",
      "OS": "Linux"
    },
    "DataDisks": "",
    "Network": {
      "Endpoints": [
        {
          "LocalPort": "22",
          "Name": "ssh",
          "Port": "22",
          "Protocol": "tcp",
          "Vip": "23.96.113.197",
          "EnableDirectServerReturn": "false"
        }
      ]
    }
  }
]

Above I see that, when it’s ready (a minute or two after the command line exits, since the VM is booting up), I can ssh to 23.96.113.197 with the private key corresponding to the public key I included in the machine creation.

So, notice in the create command I included the –custom-data parameter with a filename (~/Azure/linux/NatCustomTest) – that file contains whatever custom stuff I want root to do… for example, install puppet:

#!/bin/bash

# Install puppet
rpm -ivh https://yum.puppetlabs.com/el/6/products/x86_64/puppetlabs-release-6-7.noarch.rpm
yum install -y yum-plugin-fastestmirror puppet

# etech repo
cd /etc/yum.repos.d
wget http://etechrepo.ops.invesco.net/etech.repo

# Get preconfigured puppet keys on
# ...

# run puppet
# ...

So that file’s contents gets base-64 encoded and put in an XML file on the server when it’s provisioned. My script that creates the VM then needs to poll the VM to see when it’s ready. To do that, I need to get the IP address to check and run a test – the following works well if nc does not time out (didn’t on my linux tests, but did when checking RDP on windows servers, which took a lot longer to boot up!):

# Get the IP address
IPADDRESS=`azure vm list --json --dns-name nattest | grep Vip | cut -f4 -d\"`
echo "VM created at $IPADDRESS... Waiting for VM to come up..."
nc -zv $IPADDRESS 22

Once that’s up, I scp my script to deal with the CustomData and run it:

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/NatAzureKey.key ~/Azure/linux/runCustomData ${IPADDRESS}:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/NatAzureKey.key -t -q $IPADDRESS "sudo ./runCustomData"

The only remaining piece is what’s in the runCustomData script:

#!/usr/bin/perl
# Script for bootstrapping an Azure Linux VM
use MIME::Base64;
$datafile='/var/lib/waagent/ovf-env.xml';
$initscript="/tmp/CustomData.init";
open(R, $datafile) || die "Could not open $datafile";
while () {
  if (/CustomData>(.*)\<\/CustomData>/) {
    my $base64CD=$1;
    open(W, ">$initscript") || die "Could not write $initscript";
    print W decode_base64($base64CD)."\n";
    close(W);
    chmod (0555, $initscript);
  }
}
close(R);
system($initscript);

So, putting it all together, you have a 6 line bash script that:

  1. Creates your vm
  2. Gets the VM’s IP address, reports it
  3. Polls the VM until it is up
  4. SCP the runCustomData script to your user account
  5. SSH to your user account and runs the runCustomData script as root, which decodes the CustomData and runs it, which installs puppet and does whatever else you want it to.

If establishing a longer-term approach, I’d go with option 3 and not have to scp over the runCustomData script.  If going with quick and dirty, I’d go with option 1, which does not have the 64 kb limitation on the custom script.   Option 2 is really only best for showing how both options 1 and 3 might be implemented, although it could be argued that it’s better than option 3 in that you can use any stock VM, rather than having to keep updating a VM with patches and then your custom script.

At any rate, have fun, and please let me know of suggestions for improving the process, or if I missed something completely obvious.