Zero-Downtime Deployments for Node.js Applications: A Step-by-Step Guide with AWS ECS and Terraform
Deploying updates to a live application can be risky, especially if downtime affects user experience. This guide will walk you through setting up zero-downtime deployments for a Node.js application using AWS ECS (Elastic Container Service) and Terraform. We’ll explore setting up an ECS cluster, deploying a Node.js app with Docker, and using Terraform to manage the infrastructure.
Prerequisites
Before we begin, ensure you have the following:
• AWS Account: Access to an AWS account with permissions to create and manage ECS, ALB, VPC, and other resources.
• Terraform: Installed and configured to manage AWS resources.
• Docker: Installed locally to build and test the Node.js application.
• Node.js and NPM: Installed for application development and testing.
Why Choose AWS ECS for Zero-Downtime Deployments?
AWS Elastic Container Service (ECS) is a fully managed container orchestration service that simplifies deploying, managing, and scaling containerized applications. ECS excels at zero-downtime deployments for several reasons:
- Managed Service: ECS abstracts the complexities of managing underlying compute resources, allowing developers to focus solely on container deployment and management.
- Blue-Green Deployment Support: When combined with an Application Load Balancer (ALB), ECS effortlessly facilitates blue-green deployments. This strategy enables deploying a new version of an application alongside the old one, with traffic switched to the new version once it’s proven stable.
- Seamless AWS Service Integration: ECS integrates smoothly with other AWS services such as CloudWatch for monitoring, IAM for security, and ECR for Docker image management. This integration creates a robust and secure environment for deploying production workloads.
- Serverless Options via AWS Fargate: ECS supports Fargate, a serverless compute engine that eliminates EC2 instance management. Fargate handles scaling, patching, and infrastructure security, further streamlining zero-downtime deployments.
- Dynamic Scaling and Load Balancing: ECS offers auto-scaling capabilities, ensuring your application scales up or down based on demand. When paired with Application Load Balancers, this ensures a consistent and reliable user experience without interruptions.
Introduction to Setting Up a Node.js Application for Zero Downtime
To set up a zero-downtime deployment pipeline for a Node.js application on AWS ECS, we need to follow a series of steps:
-
Dockerize the Application: Package the Node.js application and its dependencies into a Docker container.
-
Create an ECS Cluster and Task Definitions: Define how the Docker container will run in the ECS cluster using task definitions.
-
Set Up an Application Load Balancer (ALB): Use an ALB to distribute incoming traffic across multiple ECS tasks and manage blue-green deployments.
-
Deploy the Application Using Terraform: Use Terraform to automate the provisioning and deployment of AWS resources.
Why Dockerize the Application?
Dockerizing an application means packaging it with all its dependencies, libraries, and environment settings into a single, lightweight container. This container can be deployed consistently across different environments without worrying about environment-specific issues. Dockerization is a critical step for several reasons:
-
Consistency Across Environments: Docker ensures that the application runs the same way in development, staging, and production environments, eliminating the “works on my machine” problem.
-
Isolation: Containers provide isolation from the host system and other containers. This makes managing dependencies, libraries, and runtime environments easier and reduces conflicts.
-
Simplified Deployment: Docker images provide a standardized unit of deployment. Once the application is packaged into a Docker image, it can be deployed to any Docker-compatible environment, such as AWS ECS.
-
Scalability: Docker containers are lightweight and can be quickly started, stopped, or replicated. This makes scaling applications easier and more efficient, which is essential for managing traffic and ensuring zero downtime.
Here, we have a simple Nodejs application to deploy into AWS ECS. Let’s dockerize the application to deploy.
Dockerizing a Node.js Application
Dockerizing a Node.js application involves creating a Docker image that contains the application code and all the necessary dependencies. This image can then be deployed to any environment that supports Docker.
Here’s a basic overview of the steps to Dockerize a Node.js application:
1.Create a Dockerfile: A Dockerfile is a script containing instructions on how to build a Docker image for the application. Here’s an example:
FROM --platform=linux/amd64 node:lts-alpine as builder
# Create app directory
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production image
FROM --platform=linux/amd64 node:lts-alpine
ENV NODE_ENV=production
USER node
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY --from=builder /app/build ./build
ENV PORT=8080
EXPOSE 8080
CMD [ "node", "build/main.js" ]
The provided Dockerfile uses a multi-stage build process to create a Docker image for a Node.js application. This approach is efficient because it separates the build environment from the production environment, resulting in a smaller, more secure, and optimized image for deployment. Let’s break down each part of the Dockerfile step-by-step:
1. Build Stage
The first stage is the build stage where the application is built.
# Build stage
FROM --platform=linux/amd64 node:lts-alpine as builder
• FROM –platform=linux/amd64 node:lts-alpine as builder: This line specifies the base image for the build stage. It uses the node:lts-alpine image, which is a lightweight Alpine Linux-based Node.js image. The —platform=linux/amd64 option ensures compatibility for a specific platform architecture. The as builder part gives a name to this stage, which is used later to refer to it.
# Create app directory
WORKDIR /app
•WORKDIR /app: This sets the working directory inside the Docker container to /app. All subsequent commands will be run from this directory.
COPY package*.json ./
RUN npm ci
• *COPY package.json ./**: This copies the package.json and package-lock.json files from the host machine to the current working directory inside the Docker container.
• RUN npm ci: This installs the dependencies listed in package-lock.json using npm ci (clean install). npm ci is faster and more reliable for CI/CD environments because it uses the exact versions specified in package-lock.json.
COPY . .
RUN npm run build
• COPY . .: This copies all the application source files from the host machine to the Docker container.
• RUN npm run build: This runs the build command defined in package.json to create a production-ready build of the application. The resulting files are typically placed in a build directory.
2. Production Stage
The second stage is the production stage, where the final production image is created.
# Production image
FROM --platform=linux/amd64 node:lts-alpine
• FROM –platform=linux/amd64 node:lts-alpine: This starts a new stage using the same node:lts-alpine base image but without the as builder alias. This is a fresh, minimal environment that will only contain the production-ready application.
ENV NODE_ENV=production
USER node
WORKDIR /app
• ENV NODE_ENV=production: This sets the NODE_ENV environment variable to production, which optimizes the performance of the Node.js application for production use.
• USER node: This switches the user from root to node, which is a non-privileged user that comes with the node Docker image. Running as a non-root user enhances the security of the container.
• WORKDIR /app: This sets the working directory to /app for this stage as well.
COPY package*.json ./
RUN npm ci --production
• *COPY package.json ./**: This copies the package.json and package-lock.json files again into the working directory of the production stage.
• RUN npm ci –production: This installs only the production dependencies by using the —production flag, which helps reduce the image size and surface area for vulnerabilities.
COPY --from=builder /app/build ./build
• COPY –from=builder /app/build ./build: This copies the built application files from the /app/build directory of the builder stage to the /build directory of the current (production) stage. This is where the built code lives and will be executed from.
ENV PORT=8080
EXPOSE 8080
• ENV PORT=8080: This sets the PORT environment variable to 8080, which is the port on which the Node.js application will run.
• EXPOSE 8080: This exposes port 8080 to the outside world, allowing external access to the application running inside the container.
CMD[('node', 'build/main.js')];
• CMD [“node”, “build/main.js”]: This is the command that runs when the Docker container starts. It starts the Node.js application by executing node build/main.js, which is the entry point of the built application.
Why Multi-Stage Docker build?
-
Smaller Image Size: By separating the build and production stages, only the essential files and dependencies are included in the final image, reducing the overall size.
-
Security: Running as a non-root user and excluding unnecessary build tools in the final image enhances security.
-
Efficiency: Multi-stage builds avoid the overhead of creating and managing multiple Dockerfiles for different environments. It also speeds up the CI/CD pipeline by ensuring that dependencies are cached and reused efficiently.
-
Consistency: The build and deployment processes are defined in code (the Dockerfile), ensuring consistent and reproducible environments.
Now, we have the docker build ready to deploy to ECS. Let’s configure infrastructure to provision the resources.
Provisioning Infrastructure using Terraform
When configuring infrastructure to provision resources for deploying a Dockerized Node.js application on AWS ECS using Terraform, it’s common to follow a modular approach. This approach enhances code readability, reusability, and maintainability by organizing each resource in its own module. Here, we are going to follow the same approach separating each resource as module and configure them using terraform.
Terraform structure
infra/
├── main.tf
├── variables.tf
├── outputs.tf
├── providers.tf
├── vpc/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
├── ecs/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
├── alb/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
├── dynamodb/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
└── cloudwatch/
├── main.tf
├── variables.tf
├── outputs.tf
-
main.tf: This is the root configuration file where the infrastructure’s main components are defined. It pulls together all the modules and provides overall orchestration.
-
variables.tf: This file contains input variables used across the infrastructure. These variables help make the code more dynamic and flexible.
-
outputs.tf: Defines the outputs of the Terraform configuration, which are the values you may need to reference later, such as IDs, ARNs, or URLs.
-
providers.tf: Specifies the cloud provider (AWS) and any additional provider-specific configurations required for the deployment.
Module Structure
•vpc/: Sets up the Virtual Private Cloud (VPC), subnets, and security groups. This foundational module establishes the networking layer for other components.
•ecs/: Configures the ECS Cluster, Service, and Task Definition. It defines the compute environment for running the Dockerized application.
•alb/: Creates the Application Load Balancer (ALB). The ALB distributes incoming traffic to ECS Service instances, enabling zero-downtime deployments.
•cloudwatch/: Establishes CloudWatch Logs for monitoring and logging. This module captures application logs for troubleshooting.
Deploying the infrastructure requires a logical order to resolve dependencies correctly:
1.vpc/ Module: Begin with the VPC, subnets, and security groups. This foundational network layer must be in place before other components.
2.alb/ Module: Set up the Application Load Balancer next. The ALB depends on the subnets and security groups defined in the VPC module.
3.ecs/ Module: Deploy the ECS Cluster, Service, and Task Definition after the ALB. ECS resources rely on both the VPC for networking and the ALB for load balancing.
4.cloudwatch/ Module: Configure CloudWatch Logs to monitor ECS tasks. Deploy this module alongside or after the ECS module.
Now, we have an idea of modules required to deploy the application to AWS ECS and achieving Zero downtime. Let’s provision the infrastructure to deploy it.
1. Configuring default VPC
The vpc/main.tf
file is responsible for defining the networking environment required for deploying resources such as ECS (Elastic Container Service), ALB (Application Load Balancer), and other AWS services. In this code, we are creating a Default VPC along with Default Subnets in two different availability zones.
resource "aws_default_subnet" "default_subnet_ab" {
availability_zone = "${var.aws_region}a"
}
resource "aws_default_subnet" "default_subnet_ac" {
availability_zone = "${var.aws_region}b"
}
resource "aws_default_vpc" "ecs_vpc" {
tags = {
Name = "ecs-vpc"
}
}
Now, we have the default VPC to configure ECS. Let’s provision resources to create ECS clusters and Services.
2. Setting Up ECS Task Definition and ECS Service
The ecs/main.tf
file contains the configuration for setting up the ECS Cluster, Task Definition, and ECS Service.
resource "aws_ecs_task_definition" "ecs-nodejs-api-task-definition" {
family = "ecs-nodejs-api-task"
container_definitions = <<DEFINITION
[
{
"name": "web-api",
"image": "${var.ecr_repository_url}:${var.ecr_repository_tag}",
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"hostPort": 8080
}
],
"memory": 512,
"cpu": 256,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "${module.ecs_task_logs.log_group_name}",
"awslogs-region": "${var.aws_region}",
"awslogs-stream-prefix": "web-api"
}
}
}
]
DEFINITION
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
memory = 2048
cpu = 1024
execution_role_arn = "${aws_iam_role.ecsTaskExecutionRole.arn}"
}
• family: This is a name used to group multiple versions of a task definition.
• container_definitions: A JSON block that defines the containers within the task. Here, we specify the container’s name, Docker image, essential status, port mappings, memory, CPU, and log configuration.
• The image is pulled from an ECR repository (var.ecr_repository_url), with a specific tag (var.ecr_repository_tag).
• Port Mappings: Maps container port 8080 to the host port 8080.
• Log Configuration: Uses AWS CloudWatch Logs to capture and store logs, with the log group specified by module.ecs_task_logs.log_group_name.
• requires_compatibilities: Specifies that this task definition is compatible with the Fargate launch type.
• network_mode: Uses awsvpc mode for better security and networking control.
• memory and cpu: Specifies the memory and CPU requirements for the task.
• execution_role_arn: Refers to an IAM role (ecsTaskExecutionRole) that grants the ECS tasks necessary permissions.
CloudWatch Log Group Configuration
resource "aws_cloudwatch_log_group" "ecs_task_logs" {
name = "ecs-web-api"
}
A Log Group in CloudWatch is a group of log streams that share the same retention, monitoring, and access control settings. A Log Stream represents a sequence of log events that share the same source, such as an ECS task.
• Centralized Logging: The ecs-web-api log group will store all the logs emitted by the ECS tasks running the Node.js application. This helps in centralized logging and monitoring.
• Log Management: CloudWatch Log Groups allow you to manage retention policies, monitor logs, and create alarms for specific patterns (e.g., errors or exceptions).
Logging is essential for zero-downtime deployments as it enables real-time monitoring of the process, facilitates quick identification and resolution of issues, provides insights into application behavior before and after deployment, aids in troubleshooting and potential rollbacks, and supports continuous improvement of the deployment process. By maintaining comprehensive logs, teams can ensure smooth transitions, minimize risks, and optimize future updates, all of which are crucial for maintaining uninterrupted service during deployments.
ECS Cluster and Service
resource "aws_ecs_cluster" "ecs-cluster" {
name = "ecs-web-api-cluster"
}
resource "aws_ecs_service" "ecs-nodejs-api-service" {
name = "ecs-nodejs-api-service"
cluster = "${aws_ecs_cluster.ecs-cluster.id}"
task_definition = "${aws_ecs_task_definition.ecs-nodejs-api-task-definition.arn}"
launch_type = "FARGATE"
desired_count = 1
load_balancer {
target_group_arn = "${module.alb.alb_target_group.arn}"
container_name = "web-api"
container_port = 8080
}
network_configuration {
subnets = ["${module.vpc.default_subnet_ab}", "${module.vpc.default_subnet_ac}"]
security_groups = ["${module.alb.ecs_service_security_group_id.id}"]
assign_public_ip = true
}
}
• ECS Cluster: The cluster (ecs-web-api-cluster) is the container orchestration environment where our ECS tasks will run.
• ECS Service:
• name: Name of the ECS service.
• cluster: Specifies the cluster where the service runs.
• task_definition: Refers to the ECS task definition that describes how the container should be configured.
• launch_type: Fargate is chosen for serverless compute resources, allowing focus on application development without managing servers.
• desired_count: Sets the number of tasks to run; here, it’s set to 1.
• load_balancer: Associates the ECS service with a target group in an Application Load Balancer (ALB).
• network_configuration: Specifies subnets, security groups, and whether to assign a public IP.
Here is a parameter that plays a major role for Zero-downtime deployment. The desired_count
parameter is set to 1 in this example for simplicity, but it plays a crucial role in zero-downtime deployments. In a production environment, you would typically set this to a higher number to ensure high availability and fault tolerance. For zero-downtime deployments, you would:
- Start with multiple tasks running (e.g.,
desired_count = 2
or higher). - When deploying a new version, ECS will gradually replace old tasks with new ones, ensuring that there’s always at least one healthy task running.
- The ALB continues to route traffic to healthy tasks during this process, preventing any downtime.
- Once all tasks are updated and healthy, the deployment is complete without any service interruption.
This approach ensures continuous service availability during updates, which is the essence of zero-downtime deployments. Adjusting the desired_count
based on your application’s needs and traffic patterns is key to optimizing this process.
3. Configuring the Application Load Balancer (ALB)
The alb/main.tf file configures the Application Load Balancer, which distributes incoming traffic to the ECS tasks for high availability.
Load Balancer Security Group
resource "aws_security_group" "load_balancer_sg" {
vpc_id = "${module.vpc.vpc.id}"
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
This security group allows inbound and outbound traffic from any IP address. In a real-world scenario, it’s advisable to restrict these to specific IPs or ranges for security.
It is important to note that the security group configuration shown here is for demonstration purposes only. In a production environment, it is absolutely essential to implement strict security measures. Allowing unrestricted inbound and outbound traffic (0.0.0.0/0) poses significant security risks and should never be used in a real-world scenario. Instead, you should:
- Limit inbound traffic to only necessary ports and protocols (e.g., HTTP/HTTPS for web traffic).
- Restrict inbound access to known IP ranges or VPCs where possible.
- Configure outbound rules to allow only required connections, adhering to the principle of least privilege.
- Regularly audit and update security group rules to maintain a strong security posture.
Implementing these security best practices is critical for protecting your infrastructure and data in production environments.
ALB Target Group
resource "aws_lb_target_group" "alb_target_group" {
port = 8080
protocol = "HTTP"
target_type = "ip"
name_prefix = "alb-tg"
vpc_id = "${module.vpc.vpc.id}"
health_check {
path = "/health"
}
lifecycle {
create_before_destroy = true
}
}
• port and protocol: Specifies that the target group listens on HTTP port 8080.
• target_type: Set to ip for IP-based routing.
• name_prefix: Allows Terraform to generate a unique name based on this prefix.
• health_check: Ensures that only healthy targets receive traffic.
• lifecycle - create_before_destroy: Ensures a new target group is created before destroying the old one to prevent downtime during updates.
Load Balancer Listener
resource "aws_lb_listener" "listener" {
load_balancer_arn = "${aws_alb.app_load_balancer.arn}"
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.alb_target_group.arn}"
}
depends_on = [aws_lb_target_group.alb_target_group]
}
• port and protocol: The listener listens for incoming HTTP traffic on port 80.
• default_action: Forwards incoming traffic to the target group (alb_target_group).
Application Load Balancer
resource "aws_alb" "app_load_balancer" {
name = "ecs-web-api-alb"
load_balancer_type = "application"
subnets = [
"${module.vpc.default_subnet_ab}",
"${module.vpc.default_subnet_ac}"
]
security_groups = [
"${aws_security_group.load_balancer_sg.id}",
]
}
• load_balancer_type: Specifies that this is an Application Load Balancer.
• subnets: Subnets where the ALB is deployed.
• security_groups: Associates the ALB with the load_balancer_sg security group defined earlier.
The ECS Cluster and Service are tightly integrated with the ALB for traffic distribution. The security groups and VPC settings ensure proper networking and isolation, while the use of CloudWatch provides observability into the application’s performance and logs.
Now that we have all the resources to create our AWS ECS environment, let’s deploy our Terraform module to provision the infrastructure. The process involves three key steps:
- Run
terraform init
to download the necessary provider plugins and set up the local environment for managing resources. - Use
terraform plan
to generate an execution plan. This outlines the changes Terraform will make to achieve the desired state defined in your configuration files. - After reviewing and approving the plan, execute
terraform apply
. This provisions the resources on AWS, creating a fully configured environment based on your code.
By following these steps, you’ll transform your Terraform configuration into a live, functioning AWS ECS infrastructure.
It’s important to thoroughly review the plan output before executing terraform apply
. This review step allows you to:
- Verify that the planned changes align with your intentions and requirements
- Identify any potential issues or unintended modifications
- Ensure that critical resources are not accidentally deleted or altered
- Catch any configuration errors before they impact your live environment.
Checkout the complete code here.
Conclusion
AWS ECS (Elastic Container Service) offers a robust solution for running containerized applications, providing a highly scalable, secure, and reliable environment with minimal operational overhead. By leveraging ECS with Terraform, you can achieve zero-downtime deployments for Node.js applications—ensuring seamless updates without service interruptions. Dockerizing the application and deploying it on ECS with Fargate allows for serverless management of containers, simplifying infrastructure requirements and reducing costs.
Integrating CloudWatch for logging enhances monitoring and troubleshooting capabilities, offering comprehensive insights into application performance. This approach—combining ECS, Terraform, Docker, and CloudWatch—provides a streamlined and efficient way to manage modern applications, reinforcing best practices in DevOps and cloud-native architectures.