Random problems with spring boot and Docker

We run a bunch of Java spring-boot microservices in docker containers. I’m not sure java is the best choice when it comes to running a lot of containers, simply because of JVM memory allocation, but that’s a different topic. The issue I want to address here is that we simply cannot start a bunch at once, or they… simply….. slow……. down…….. to ……. a ………. crawl. Server startup times go from 15 seconds to, in cases where 20 containers are starting up, 30 minutes! THAT’S CRAZY!

At any rate, this morning after server patching, I decided that I’d had enough, as this was going to take me an hour to start them all up one by one. I dove into the logs of one of the ones that did not start within my arbitrary 90 second timeframe (much longer than the usual 25 seconds when starting solo), and here’s what I saw:

  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot ::        (v1.3.0.RELEASE)
 
2016-07-17 07:13:20.456  INFO 1 --- [           main] com.company.EventApplication             : Starting EventApplication on 16bad0525076 with PID 1 (/opt/java1-service-1.0.0-SNAPSHOT.jar started by root in /)
2016-07-17 07:13:20.475  INFO 1 --- [           main] com.company.EventApplication             : The following profiles are active: production
2016-07-17 07:13:20.755  INFO 1 --- [           main] ationConfigEmbeddedWebApplicationContext : Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@619ffb1: startup date [Sun Jul 17 07:13:20 CDT 2016]; root of context hierarchy
2016-07-17 07:13:23.171  INFO 1 --- [           main] o.s.b.f.s.DefaultListableBeanFactory     : Overriding bean definition for bean 'beanNameViewResolver' with a different definition: replacing [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration.class]] with [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter.class]]
2016-07-17 07:13:24.728  INFO 1 --- [           main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat initialized with port(s): 8080 (http)
2016-07-17 07:13:24.783  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service Tomcat
2016-07-17 07:13:24.786  INFO 1 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet Engine: Apache Tomcat/8.0.28
2016-07-17 07:13:25.180  INFO 1 --- [ost-startStop-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2016-07-17 07:13:25.181  INFO 1 --- [ost-startStop-1] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 4451 ms
2016-07-17 07:13:25.790  INFO 1 --- [ost-startStop-1] o.s.b.c.e.ServletRegistrationBean        : Mapping servlet: 'dispatcherServlet' to [/]
2016-07-17 07:13:25.797  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'metricFilter' to: [/*]
2016-07-17 07:13:25.798  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'characterEncodingFilter' to: [/*]
2016-07-17 07:13:25.798  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'hiddenHttpMethodFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'httpPutFormContentFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'requestContextFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'webRequestLoggingFilter' to: [/*]
2016-07-17 07:13:25.799  INFO 1 --- [ost-startStop-1] o.s.b.c.embedded.FilterRegistrationBean  : Mapping filter: 'applicationContextIdFilter' to: [/*]
2016-07-17 07:14:53.243  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [87,395] milliseconds.

WHOA. What’s that last line say? “Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [87,395] milliseconds” Wow, almost 90 seconds. I’m not a Java developer, but I do know “PRNG” = Pseudo Random Number Generator. I also know that a PRNG needs to be fed entropy / randomness to be effective. Seeing that the embedded Tomcat in spring boot was using it to start up, I decided to try a test. But before that, I wanted to also measure the amount of entropy the system had.

With a little googling, I found the following command line works:

#cat /proc/sys/kernel/random/entropy_avail
135

I also found that collectd has a simple entropy module, so I included it and now have entropy readings every 10 seconds being sent to graphite.

Now for my test… I stopped and started 4 microservices via our microservice “swap out” script – start a new container up, and when “Server started in X seconds” is shown, remove the previous one of the same image, while updating the vulcan proxy load balancer. Hence, once one was fully up, I started the next…

# for i in java2-service java3-service java4-service java5-service; do ./$i/swapout; done

Once complete, I looked for instances of PRNG in their log files:

# for i in java2-service java3-service java4-service java5-service; do docker logs $i | grep PRNG; done
2016-07-17 08:54:39.224  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [17,541] milliseconds.
2016-07-17 08:55:49.809  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [19,913] milliseconds.
2016-07-17 08:57:03.131  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [28,985] milliseconds.
2016-07-17 08:58:29.855  INFO 1 --- [ost-startStop-1] o.a.c.util.SessionIdGeneratorBase        : Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [57,045] milliseconds.

Sure enough, each service took longer and longer to get enough entropy. You can see below that the entropy bounce back a little after the first hit, and a little again after the second hit, but the third and fourth hits both have entropy values down low. Eventually it recovers.

entropy-1

We have a couple of options to make this better:

  • Tell tomcat to use /dev/urandom rather than /dev/random – /dev/urandom is not blocking. That’s not great for anything using crypto, but should not add ANY additional time to create the Tomcat’s SecureRandom instance. I imagine each of the above samples would all take the same time as the first one did, around 17 seconds.
  • Find a way to add entropy to the system. I generally prefer this method, since we wouldn’t want to systematically have all our tomcat instances use the less secure one and inadvertently have a service where it’s critical not have good randomness.

I investigated and found the haveged daemon, which adds randomness to the system from process run times… and WOW:

entropy-2

I then ran the same four microservice swapouts, and not only was there no impact to later microservices, overall microservice startup time was noticeably faster. (the above chart includes MS startup) There was no PRNG line in the log (I believe if there’s no delay, there’s no mention in the log), so nothing real to compare against other than the visually faster bit. Looking at some of our tomcat services that are not in docker, I see they suffer from the same PRNG issue occasionally. This will help all of our tomcat instances start up sooner.

Automating Automated Testing

One of the popular tools to test websites these days is Selenium, which is used to pass scripted actions to a browser such as Firefox or Chrome.   Unfortunately, it’s been a pain to set up, and you need to get a virtual frame buffer running for the browser (or perhaps run headless phantomjs, which is cool, but still WORK), and the browser itself… not to mention patch as updates come out.

Enter Docker: with docker, I can pull down the latest docker image provided by SeleniumHQ, with either Firefox or Chrome, and bingo!  It’s running and ready to go.   (That’s the general magic of docker, by the way)  Here’s an example:

[root@dockerhost] ~# docker run -d -p 4444:4444 selenium/standalone-chrome
Unable to find image 'selenium/standalone-chrome:latest' locally
Pulling repository selenium/standalone-chrome
c806a5e36041: Download complete 
511136ea3c5a: Download complete 
f0dde87450ec: Download complete 
76b658ecb564: Download complete 
4faa69f72743: Download complete 
2103b00b3fdf: Download complete 
60436a106b63: Download complete 
a5c56ead162a: Download complete 
1bcd40b41d9f: Download complete 
827b3070b898: Download complete 
f4f79c0be042: Download complete 
16bd409ea0a4: Download complete 
cd8ff3fed89b: Download complete 
4d67331e6a88: Download complete 
25e1b30f6eed: Download complete 
96ce19254976: Download complete 
8f0aaca2aae7: Download complete 
8e8240458885: Download complete 
cc1baa889ab6: Download complete 
2056ca638414: Download complete 
0606bc3f54f6: Download complete 
31a41159beb8: Download complete 
b532c7ea89cb: Download complete 
4129af115033: Download complete 
ba449c72b933: Download complete 
6f5a2f2e02a8: Download complete 
ba2fb7eae244: Download complete 
12317e85b372: Download complete 
4e75ed61c12f: Download complete 
5c9def4180f1: Download complete 
464ec9e0e9fb: Download complete 
9aaa498f52ed: Download complete 
639eff742ba8: Download complete 
a6fa8f2703b2: Download complete 
1715776d49ae: Download complete 
d834fd67171e: Download complete 
4b588c5bce51: Download complete 
cd22dea8848a: Download complete 
Status: Downloaded newer image for selenium/standalone-chrome:latest
c29d453b4d73e8196c3ef0e76f894d543e3ed0e5061b5c866d1a9ca4d6398833

[root@dockerhost] ~# docker ps
CONTAINER ID        IMAGE                               COMMAND                CREATED             STATUS              PORTS                    NAMES
c29d453b4d73        selenium/standalone-chrome:latest   "/opt/bin/entry_poin   2 minutes ago       Up 2 minutes        0.0.0.0:4444->4444/tcp   naughty_wright   

There! Up and running…. Now I just need to use it… there are several languages that will do so – I’ll show the Node.JS way. In this case, I’ll be running node.js from another machine (having installed the selenium webdriver in the standard /usr/lib/node_modules dir), and it will talk to my docker container running on the host above (or Amazon, or wherever I wanted it).

Here’s my script: note it connects to dockerhost:4444 as set up above…

[root@DEV] tmp# cat nat.js 
var webdriver = require("selenium-webdriver");
 
function createDriver() {
    var driver = new webdriver.Builder()
        .usingServer('http://dockerhost:4444/wd/hub')
        .withCapabilities(webdriver.Capabilities.chrome())
        .build();
    driver.manage().timeouts().setScriptTimeout(10000);
    return driver;
}
 
var driver = createDriver();
driver.get("http://www.google.com");
 
driver.getTitle().then(function (title) {
    console.log(title);
});
 
driver.quit();

Then all I need to do is run it! It should fetch the page and spit out the title.

[root@DEV] tmp# export NODE_PATH=/usr/lib/node_modules
[root@DEV] tmp# node nat.js 
Google

Now I have a lot more incentive to learn how to use Selenium better, since the barriers to getting started are lower.