Random problems with spring boot and Docker

We run a bunch of Java spring-boot microservices in docker containers. I’m not sure java is the best choice when it comes to running a lot of containers, simply because of JVM memory allocation, but that’s a different topic. The issue I want to address here is that we simply cannot start a bunch at once, or they… simply….. slow……. down…….. to ……. a ………. crawl. Server startup times go from 15 seconds to, in cases where 20 containers are starting up, 30 minutes! THAT’S CRAZY!

At any rate, this morning after server patching, I decided that I’d had enough, as this was going to take me an hour to start them all up one by one. I dove into the logs of one of the ones that did not start within my arbitrary 90 second timeframe (much longer than the usual 25 seconds when starting solo), and here’s what I saw:

  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
:: Spring Boot ::        (v1.3.0.RELEASE)
2016-07-17 07:13:20.456  INFO 1 --- [           main] com.company.EventApplication             : Starting EventApplication on 16bad0525076 with PID 1 (/opt/java1-service-1.0.0-SNAPSHOT.jar started by root in /)
2016-07-17 07:13:20.475  INFO 1 --- [           main] com.company.EventApplication             : The following profiles are active: production
2016-07-17 07:13:20.755  INFO 1 --- [           main] ationConfigEmbeddedWebApplicationContext : Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@619ffb1: startup date [Sun Jul 17 07:13:20 CDT 2016]; root of context hierarchy
2016-07-17 07:13:23.171  INFO 1 --- [           main] o.s.b.f.s.DefaultListableBeanFactory     : Overriding bean definition for bean 'beanNameViewResolver' with a different definition: replacing [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration.class]] with [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter.class]]
2016-07-17 07:13:24.728  INFO 1 --- [           main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat initialized with port(s): 8080 (http)
2016-07-17 07:13:24.783  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service Tomcat
2016-07-17 07:13:24.786  INFO 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet Engine: Apache Tomcat/8.0.28
2016-07-17 07:13:25.180  INFO 1 --- [ost-startStop-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2016-07-17 07:13:25.181  INFO 1 --- [ost-startStop-1] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 4451 ms
2016-07-17 07:13:25.790  INFO 1 --- [ost-startStop-1] o.s.b.c.e.ServletRegistrationBean        : Mapping servlet: 'dispatcherServlet' to [/]
2016-07-17 07:13:25.797  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'metricFilter' to: [/*]
2016-07-17 07:13:25.798  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'characterEncodingFilter' to: [/*]
2016-07-17 07:13:25.798  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'hiddenHttpMethodFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'httpPutFormContentFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'requestContextFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'webRequestLoggingFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'applicationContextIdFilter' to: [/*]
2016-07-17 07:14:53.243  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [87,395] milliseconds.

WHOA. What’s that last line say? “Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [87,395] milliseconds” Wow, almost 90 seconds. I’m not a Java developer, but I do know “PRNG” = Pseudo Random Number Generator. I also know that a PRNG needs to be fed entropy / randomness to be effective. Seeing that the embedded Tomcat in spring boot was using it to start up, I decided to try a test. But before that, I wanted to also measure the amount of entropy the system had.

With a little googling, I found the following command line works:

#cat /proc/sys/kernel/random/entropy_avail

I also found that collectd has a simple entropy module, so I included it and now have entropy readings every 10 seconds being sent to graphite.

Now for my test… I stopped and started 4 microservices via our microservice “swap out” script – start a new container up, and when “Server started in X seconds” is shown, remove the previous one of the same image, while updating the vulcan proxy load balancer. Hence, once one was fully up, I started the next…

# for i in java2-service java3-service java4-service java5-service; do ./$i/swapout; done

Once complete, I looked for instances of PRNG in their log files:

# for i in java2-service java3-service java4-service java5-service; do docker logs $i | grep PRNG; done
2016-07-17 08:54:39.224  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [17,541] milliseconds.
2016-07-17 08:55:49.809  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [19,913] milliseconds.
2016-07-17 08:57:03.131  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [28,985] milliseconds.
2016-07-17 08:58:29.855  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [57,045] milliseconds.

Sure enough, each service took longer and longer to get enough entropy. You can see below that the entropy bounce back a little after the first hit, and a little again after the second hit, but the third and fourth hits both have entropy values down low. Eventually it recovers.


We have a couple of options to make this better:

  • Tell tomcat to use /dev/urandom rather than /dev/random – /dev/urandom is not blocking. That’s not great for anything using crypto, but should not add ANY additional time to create the Tomcat’s SecureRandom instance. I imagine each of the above samples would all take the same time as the first one did, around 17 seconds.
  • Find a way to add entropy to the system. I generally prefer this method, since we wouldn’t want to systematically have all our tomcat instances use the less secure one and inadvertently have a service where it’s critical not have good randomness.

I investigated and found the haveged daemon, which adds randomness to the system from process run times… and WOW:


I then ran the same four microservice swapouts, and not only was there no impact to later microservices, overall microservice startup time was noticeably faster. (the above chart includes MS startup) There was no PRNG line in the log (I believe if there’s no delay, there’s no mention in the log), so nothing real to compare against other than the visually faster bit. Looking at some of our tomcat services that are not in docker, I see they suffer from the same PRNG issue occasionally. This will help all of our tomcat instances start up sooner.

Script to generate PKI keys and CSRs

Openssl has always been a great tool for creating SSL/TLS PKI keys and certs, but I’ve not ever really had a one-liner for it… at least, not until today, messing with some automation for ChatOps and Let’s Encrypt. This could be easily adapted to create self-signed certs if desired…

One prerequisite is that you need to either edit the openssl.cnf and set defaults for all but the hostname, or edit below and put them in the \n\n\n string. Oh, and set the key passphrase, and any other configure section items (default for RHEL/CentOS environment).

# autogen hostname [san hostname2] [san hostname3] etc
# With more than one hostname, a Subject Alternate Name cert request is created
# CSR and KEY are put in /etc/pki/CA/certs/auto/ directory.
# Assumes openssl.cnf is set up with all defaults except CN
# 4/3/2016, nwg 


if [ "$hostname" == "" ]; then 
    echo "Syntax: autogen hostname [san hostname2] [san hostname3] etc"
    exit 1


if [ -r "$keyfile" ]; then
    echo "$keyfile already exists - either (re)move it or choose another hostname\n";
    exit 1

if [ "$2" == '' ]; then
    # no san
    printf "\n\n\n\n\n${hostname}\n\n\n\n" | $openssl req -newkey rsa:2048 -sha256 -keyout $keyfile -out $csrfile -passout "pass:$cert_pwd" > /dev/null 2>&1
    # san

    for i in $*; do 
	if [ "$sanstring" = "" ]; then
    #echo $sanstring

    printf "\n\n\n\n\n${hostname}\n\n\n\n" | $openssl req -newkey rsa:2048 -sha256 -keyout $keyfile -out $csrfile -passout "pass:$cert_pwd" -reqexts SAN -config <(cat $openssl_cnf <(printf "[SAN]\n$sanstring\n")) > /dev/null 2>&1


chmod 400 $keyfile
ls -l $csrfile $keyfile
echo " "
cat $csrfile


As an example running it:

root:/etc/pki/CA #./autogen guyton.net www.guyton.net
-rw-r--r-- 1 root root 1212 Apr  3 16:55 /etc/pki/CA/certs/auto/guyton.net-csr
-r-------- 1 root root 1751 Apr  3 16:55 /etc/pki/CA/certs/auto/guyton.net-key

Automating Automated Testing

One of the popular tools to test websites these days is Selenium, which is used to pass scripted actions to a browser such as Firefox or Chrome.   Unfortunately, it’s been a pain to set up, and you need to get a virtual frame buffer running for the browser (or perhaps run headless phantomjs, which is cool, but still WORK), and the browser itself… not to mention patch as updates come out.

Enter Docker: with docker, I can pull down the latest docker image provided by SeleniumHQ, with either Firefox or Chrome, and bingo!  It’s running and ready to go.   (That’s the general magic of docker, by the way)  Here’s an example:

[root@dockerhost] ~# docker run -d -p 4444:4444 selenium/standalone-chrome
Unable to find image 'selenium/standalone-chrome:latest' locally
Pulling repository selenium/standalone-chrome
c806a5e36041: Download complete 
511136ea3c5a: Download complete 
f0dde87450ec: Download complete 
76b658ecb564: Download complete 
4faa69f72743: Download complete 
2103b00b3fdf: Download complete 
60436a106b63: Download complete 
a5c56ead162a: Download complete 
1bcd40b41d9f: Download complete 
827b3070b898: Download complete 
f4f79c0be042: Download complete 
16bd409ea0a4: Download complete 
cd8ff3fed89b: Download complete 
4d67331e6a88: Download complete 
25e1b30f6eed: Download complete 
96ce19254976: Download complete 
8f0aaca2aae7: Download complete 
8e8240458885: Download complete 
cc1baa889ab6: Download complete 
2056ca638414: Download complete 
0606bc3f54f6: Download complete 
31a41159beb8: Download complete 
b532c7ea89cb: Download complete 
4129af115033: Download complete 
ba449c72b933: Download complete 
6f5a2f2e02a8: Download complete 
ba2fb7eae244: Download complete 
12317e85b372: Download complete 
4e75ed61c12f: Download complete 
5c9def4180f1: Download complete 
464ec9e0e9fb: Download complete 
9aaa498f52ed: Download complete 
639eff742ba8: Download complete 
a6fa8f2703b2: Download complete 
1715776d49ae: Download complete 
d834fd67171e: Download complete 
4b588c5bce51: Download complete 
cd22dea8848a: Download complete 
Status: Downloaded newer image for selenium/standalone-chrome:latest

[root@dockerhost] ~# docker ps
CONTAINER ID        IMAGE                               COMMAND                CREATED             STATUS              PORTS                    NAMES
c29d453b4d73        selenium/standalone-chrome:latest   "/opt/bin/entry_poin   2 minutes ago       Up 2 minutes>4444/tcp   naughty_wright   

There! Up and running…. Now I just need to use it… there are several languages that will do so – I’ll show the Node.JS way. In this case, I’ll be running node.js from another machine (having installed the selenium webdriver in the standard /usr/lib/node_modules dir), and it will talk to my docker container running on the host above (or Amazon, or wherever I wanted it).

Here’s my script: note it connects to dockerhost:4444 as set up above…

[root@DEV] tmp# cat nat.js 
var webdriver = require("selenium-webdriver");
function createDriver() {
    var driver = new webdriver.Builder()
    return driver;
var driver = createDriver();
driver.getTitle().then(function (title) {

Then all I need to do is run it! It should fetch the page and spit out the title.

[root@DEV] tmp# export NODE_PATH=/usr/lib/node_modules
[root@DEV] tmp# node nat.js 

Now I have a lot more incentive to learn how to use Selenium better, since the barriers to getting started are lower.

Automate provisioning a Linux VM in Microsoft Azure

At my company we’ve been looking at various cloud providers, including Microsoft Azure.  My interest has always been in automation of computer configuration, particularly on linux with puppet, and most cloud providers have an API with which to kick off a custom script on a VM once it’s freshly installed and running.  Except there does not seem to be anything on Microsoft’s API.   Sufficient googling showed that others were reporting a similar problem with no clear solution, hence this blog post for my approach.

I have to say, the xplat-cli (Cross Platform Command Line Interface), based on NodeJS, is actually quite nice for programmers, and is fairly easy to use.  But as mentioned, there’s not really a way to automate kicking off customization.  The closest I found was with the “CustomData” parameter, which allows you to upload a file that, once base-64 encoded, must be 64 kb or less, and gets included in an xml file, /var/lib/waagent/ovf-env.xml, that in no way knows to decrypt and run itself.

So, there are several options that we have:

  1. Don’t use the CustomData piece at all.  Just use a script that creates your VM and then uses the ssh key you provisioned it with to scp a custom script for that VM over to it, then ssh to the VM and sudo script.
  2. Similar to above, but rather than scp a custom script over to run, scp a fixed script that decodes the CustomData field from the XML file, writes that to a script, and runs it.   This is a little more involved than #1, but it moves the VM customizations to the CustomData parameter rather than in a custom script for each VM that gets copied.   I’m not really sure if this practically buys you anything over #1, but it’s what I will outline below, since it’s the most encompassing of all three of these.
  3. Finally, you can create a VM image that has in its initscripts to, upon firstboot, check the CustomData field, decode the data to a script, and run it.

In the example below, I assume you have already installed the azure-cli and connected your Azure subscription.  (Note that I edited the installed “bin/azure” command to find the fully qualified azure.js script, and “azure” is in my path)

Create your VM called “nattest” with a command similar to:

$ azure vm create --vm-size extrasmall --location "East US" --ssh 22 --no-ssh-password --ssh-cert ~/.ssh/NatAzureCert.pem --custom-data ~/Azure/linux/NatCustomTest nattest 0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2 nat

info:    Executing command vm create
+ Looking up image 0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info:    vm create command OK

Incidentally, you can get info about your new cloud server, including its IP address, by:

$ azure vm list --dns-name nattest --json
    "DNSName": "nattest.cloudapp.net",
    "VMName": "nattest",
    "IPAddress": "",
    "InstanceStatus": "RoleStateUnknown",
    "InstanceSize": "ExtraSmall",
    "InstanceStateDetails": "",
    "OSVersion": "",
    "Image": "0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2",
    "OSDisk": {
      "HostCaching": "ReadWrite",
      "DiskName": "nattest-nattest-0-201402212150500652",
      "MediaLink": "http://portalvhdsz934l0cn6dph9.blob.core.windows.net/vhd-store/nattest-87fbac9b59526826.vhd",
      "SourceImageName": "0b11de9248dd4d87b18621318e037d37__RightImage-CentOS-6.5-x64-v13.5.2",
      "OS": "Linux"
    "DataDisks": "",
    "Network": {
      "Endpoints": [
          "LocalPort": "22",
          "Name": "ssh",
          "Port": "22",
          "Protocol": "tcp",
          "Vip": "",
          "EnableDirectServerReturn": "false"

Above I see that, when it’s ready (a minute or two after the command line exits, since the VM is booting up), I can ssh to with the private key corresponding to the public key I included in the machine creation.

So, notice in the create command I included the –custom-data parameter with a filename (~/Azure/linux/NatCustomTest) – that file contains whatever custom stuff I want root to do… for example, install puppet:


# Install puppet
rpm -ivh https://yum.puppetlabs.com/el/6/products/x86_64/puppetlabs-release-6-7.noarch.rpm
yum install -y yum-plugin-fastestmirror puppet

# etech repo
cd /etc/yum.repos.d
wget http://etechrepo.ops.invesco.net/etech.repo

# Get preconfigured puppet keys on
# ...

# run puppet
# ...

So that file’s contents gets base-64 encoded and put in an XML file on the server when it’s provisioned. My script that creates the VM then needs to poll the VM to see when it’s ready. To do that, I need to get the IP address to check and run a test – the following works well if nc does not time out (didn’t on my linux tests, but did when checking RDP on windows servers, which took a lot longer to boot up!):

# Get the IP address
IPADDRESS=`azure vm list --json --dns-name nattest | grep Vip | cut -f4 -d\"`
echo "VM created at $IPADDRESS... Waiting for VM to come up..."
nc -zv $IPADDRESS 22

Once that’s up, I scp my script to deal with the CustomData and run it:

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/NatAzureKey.key ~/Azure/linux/runCustomData ${IPADDRESS}:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/NatAzureKey.key -t -q $IPADDRESS "sudo ./runCustomData"

The only remaining piece is what’s in the runCustomData script:

# Script for bootstrapping an Azure Linux VM
use MIME::Base64;
open(R, $datafile) || die "Could not open $datafile";
while () {
  if (/CustomData>(.*)\<\/CustomData>/) {
    my $base64CD=$1;
    open(W, ">$initscript") || die "Could not write $initscript";
    print W decode_base64($base64CD)."\n";
    chmod (0555, $initscript);

So, putting it all together, you have a 6 line bash script that:

  1. Creates your vm
  2. Gets the VM’s IP address, reports it
  3. Polls the VM until it is up
  4. SCP the runCustomData script to your user account
  5. SSH to your user account and runs the runCustomData script as root, which decodes the CustomData and runs it, which installs puppet and does whatever else you want it to.

If establishing a longer-term approach, I’d go with option 3 and not have to scp over the runCustomData script.  If going with quick and dirty, I’d go with option 1, which does not have the 64 kb limitation on the custom script.   Option 2 is really only best for showing how both options 1 and 3 might be implemented, although it could be argued that it’s better than option 3 in that you can use any stock VM, rather than having to keep updating a VM with patches and then your custom script.

At any rate, have fun, and please let me know of suggestions for improving the process, or if I missed something completely obvious.


Clickjacking is a vulnerability where pages with sensitive functionality are placed in an invisible IFRAME that overlays seemingly innocuous content. By enticing the user to click various buttons in the innocuous content, the attacker can get victims to click buttons that perform sensitive functionality. Because the victim is actually interacting with the application through the hidden frame, the victim’s cookies containing the session identifier are being sent with each request. If they are already authenticated, any authenticated functionality would be accessible.

Steps to reproduce:
1.    Open the below HTML file with an IE browser, changing the IFRAME target to some webpage with form input.

var keylog='Entered text: ';
function keypress() {
keylog = keylog + String.fromCharCode(window.event.keyCode);
<body style="margin: 0; padding: 0"
<div style="padding: 10px; border-bottom: 1px solid red; color=red;">
(see typed words in your status bar)
<iframe src="https://www.somesite.com/"
width="100%" height="90%" padding="0"
margin="0" frameborder="0" security="Restricted">

2. Enter text in any input field and observe that the page is hosted in an IFRAME that echoes back the entered text.   Creepy!

Pages that include form input need to prevent other pages from setting them in iframes and stealing keypresses.  The following JavaScript can be used to “break out” of any frames and ensure that the site is loaded on the top window and not in any frame controlled by the attacker.

if (top!= self) top.location.href = self.document.location;
if (parent!= self) top.location.href = location.href;
if (top.frames.length!=0) top.location=self.document.location;
if (window!= window.top) top.location.href = location.href;