How to Avoid a Docker Database Disaster

Containers are cool – and everyone and their mother is trying to get on board with them. While many applications are natural fit for containers in many cases, it feels like some applications are forced into containers so vendors can say, “Hey look at me! I do containers too!” This is particularly true of database vendors who are using container hype to sell their software. Imagine for a moment this not so unrealistic anecdote: your CEO just got back from a conference and heard all the really cool things that you can do with containers. He or she gives the edict to IT to containerize everything because he or she heard the sales pitch: “Containers can unify DevOps pipelines for databases, apps, and resources in IT. Containers are easier and faster to setup and install compared to virtual machines. Containers lower management needs and hardware requirements relative to VM’s by reducing infrastructure. All of this means huge costs saving. Wow! Aren’t containers great!?” So now, you’re faced with this edict and you have to figure out how to take a massive MS-SQL cluster, and containerize it… or do you?

Perhaps not on this scale, but this scenario is one that enterprises are facing every day. Where the decision point becomes less obvious is whenever the scale is not so large as in the anecdote above but neither is it trivial either like your grandma’s cooking blog. The general rule of thumb for databases in containers has historically been “don’t do it” because of two main reasons, both of which fly in the face of the kinds of workloads intended for containers.

First, there’s the problem of data persistence. This is the first and foremost difficult challenge of containers in general, and databases magnify this exponentially. The storage story in containers, particularly in a fully managed environment is pretty rough even as the container technology continues to mature. In an unmanaged environment that’s installed on IaaS or on premise, these requires the container technology to support volumes that can be mounted inside of containers. There’s a myriad of technologies that can serve up volumes from storage devices and servers in the container space, but it does raise security concerns among other things when using volumes, as these are not always secure by default. One has to prevent other containers from mounting the database volume as well as other servers and services from seeing the data. Moreover, setting up volumes is not as trivial as it may seem.

Second, there’s the problem of application state. Containers can run all kinds of applications, but they are primarily intended for stateless applications. Nodes on a database cluster need to be able to discover and communicate with one another to share state information. Statefulness in an environment introduces challenges when one wants to scale up or down and also when one wants to create an environment with high availability and/or redundancy.

While these problems are not showstoppers, one must first ask, are containers the best option I have? If you’re considering going to the cloud, almost always the answer to this question is no. There are much better options that can be found on most popular public clouds through DBaaS. Microsoft Azure for example has Azure SQL as a service which provides a PaaS solution for MS SQL databases. Also, recently Microsoft announced its CosmosDB, which has a compatible API for MongoDB for document databases as well as the MS DocumentDB API, and other popular storage formats. Likewise, also has in preview right now MySQL and PostegreSQL for DBaaS. All of these technologies are fully managed, meaning that encryption, scale, backups, redundancy, and so on are all managed by Microsoft in world-class data centers with best-of-breed IT security and physical security. Given the nature of DBaaS in the cloud, containers seem like a poor choice for databases.

On premise though, DBaaS isn’t an option. The temptation to think of containers as the next best option arises, but this isn’t always the case. Scale as it relates to databases contains multiple factors related to traffic volume, kinds of traffic volume, database complexity, database sizes, and demanding SLA’s. One of the main drivers behind setting up multi-node DBMS’s is to accommodate databases that have one or more of these metrics in large quantities such that an application requires all the resources of a given machine and then some. For such scenarios, containerizing such a database will see very little benefit. In fact, it will probably hurt, rather than help the environment as it adds another abstraction layer for the database to go through – namely the container engine. Likewise, the benefits of containers are not realized given that you’re only likely to get one container per machine in such scenarios and one of the main reasons for going to containers is to improve application density on a given piece of physical or virtual.

Where containers become a possibility though is for databases of a smaller scale that don’t have the sorts of requirements that are demanded by larger applications. In this case, a database has lower volume, complexity, and size that it doesn’t require large, dedicated machinery to run, nor are the SLA’s such that you care to have redundancy. Some examples may be internal websites or blogs that don’t need high availability, applications that are one-off data gathering tools like surveys and the like, or brochureware. These sorts of applications may be ideal for containerization, so here’s a few considerations for using them.

Regardless of the size of the database, always, always, always persist data outside the container on a volume. This way, you don’t accidently delete the data either by pushing a new container when updating the software or deleting the container.

Ensure that your container environment has enough CPU and memory resources. If your container environment does not have enough resources, databases will suffer.
Keep container as secure as possible. With Docker, it’s simple to set up an isolated network that only the containers for a given application can access. The database can be completely isolated from the external network this way.
Don’t try to recreate your database infrastructure using containers as a replacement for virtual machines or physical machines. This is especially true for applications whose SLA is not mission critical. Rather, try to keep single databases to a DBMS for smaller applications. Treat databases more like an application component rather than as a separate environment that hosts a database. This scenario works well too for open source databases given there are no restrictions on how many instances one can spin up.

As a container fanboy, I think containerization is cool technology, but just because something is cool doesn’t mean it’s a golden hammer that’s going to solve all my problems. No technology ever does despite what the sales pitch says. For containers and databases, the bottom line is to be cautious when considering containers for databases. This is one of those areas where containers are probably not the best choice in many cases, however sometimes the container benefits outweigh the costs of using containers for smaller applications. With this in mind, it’s always wise to count the costs about what it takes to implement a database solution with containers.