Deploy a Jenkins Cluster on AWS

Few months ago, I gave a talk at Nexus User Conference 2018 on how to build a fully automated CI/CD platform on AWS using Terraform, Packer & Ansible. I illustrated how concepts like infrastructure as code, immutable infrastructure, serverless, cluster discovery, etc can be used to build a highly available and cost-effective pipeline. The platform I built is given in the following diagram:



The platform has a Jenkins cluster with a dedicated Jenkins master and workers inside an autoscaling group. Each push event to the code repository will trigger the Jenkins master which will schedule a new build on one of the available slaves. The slave will be responsible of running the unit and pre-integration tests, building the Docker image, storing the image to a private registry and deploying a container based on that image to Docker Swarm cluster.



On this post, I will walk through how to deploy the Jenkins cluster on AWS using top trending automation tools.

The cluster will be deployed into a VPC with 2 public and 2 private subnets across 2 availability zones. The stack will consists of an autoscaling group of Jenkins workers in a private subnets and a private instance for the Jenkins master sitting behind an elastic Load balancer. To add or remove Jenkins workers on-demand, the CPU utilisation of the ASG will be used to trigger a scale out (CPU > 80%) or scale in (CPU < 20%) event. (See figure below)



To get started, we will create 2 AMIs (Amazon Machine Image) for our instances. To do so, we will use Packer, which allows you to bake your own image.

The first AMI will be used to create the Jenkins master instance. The AMI uses the Amazon Linux Image as a base image and for provisioning part it uses a simple shell script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"variables" : {
"region" : "eu-west-3",
"source_ami" : "ami-0ebc281c20e89ba4b"
},
"builders" : [
{
"type" : "amazon-ebs",
"profile" : "default",
"region" : "{{user `region`}}",
"instance_type" : "t2.micro",
"source_ami" : "{{user `source_ami`}}",
"ssh_username" : "ec2-user",
"ami_name" : "jenkins-master-2.107.2",
"ami_description" : "Amazon Linux Image with Jenkins Server",
"run_tags" : {
"Name" : "packer-builder-docker"
},
"tags" : {
"Tool" : "Packer",
"Author" : "mlabouardy"
}
}
],
"provisioners" : [
{
"type" : "file",
"source" : "COPY FILES",
"destination" : "COPY FILES"
},
{
"type" : "shell",
"script" : "./setup.sh",
"execute_command" : "sudo -E -S sh '{{ .Path }}'"
}
]
}

The shell script will be used to install the necessary dependencies, packages and security patches:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash

echo "Install Jenkins stable release"
yum remove -y java
yum install -y java-1.8.0-openjdk
wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
rpm --import https://jenkins-ci.org/redhat/jenkins-ci.org.key
yum install -y jenkins
chkconfig jenkins on

echo "Install Telegraf"
wget https://dl.influxdata.com/telegraf/releases/telegraf-1.6.0-1.x86_64.rpm -O /tmp/telegraf.rpm
yum localinstall -y /tmp/telegraf.rpm
rm /tmp/telegraf.rpm
chkconfig telegraf on
mv /tmp/telegraf.conf /etc/telegraf/telegraf.conf
service telegraf start

echo "Install git"
yum install -y git

echo "Setup SSH key"
mkdir /var/lib/jenkins/.ssh
touch /var/lib/jenkins/.ssh/known_hosts
chown -R jenkins:jenkins /var/lib/jenkins/.ssh
chmod 700 /var/lib/jenkins/.ssh
mv /tmp/id_rsa /var/lib/jenkins/.ssh/id_rsa
chmod 600 /var/lib/jenkins/.ssh/id_rsa

echo "Configure Jenkins"
mkdir -p /var/lib/jenkins/init.groovy.d
mv /tmp/basic-security.groovy /var/lib/jenkins/init.groovy.d/basic-security.groovy
mv /tmp/disable-cli.groovy /var/lib/jenkins/init.groovy.d/disable-cli.groovy
mv /tmp/csrf-protection.groovy /var/lib/jenkins/init.groovy.d/csrf-protection.groovy
mv /tmp/disable-jnlp.groovy /var/lib/jenkins/init.groovy.d/disable-jnlp.groovy
mv /tmp/jenkins.install.UpgradeWizard.state /var/lib/jenkins/jenkins.install.UpgradeWizard.state
mv /tmp/node-agent.groovy /var/lib/jenkins/init.groovy.d/node-agent.groovy
chown -R jenkins:jenkins /var/lib/jenkins/jenkins.install.UpgradeWizard.state
mv /tmp/jenkins /etc/sysconfig/jenkins
chmod +x /tmp/install-plugins.sh
bash /tmp/install-plugins.sh
service jenkins start

It will install the latest stable version of Jenkins and configure its settings:

  • Create a Jenkins admin user.
  • Create a SSH, GitHub and Docker registry credentials.
  • Install all needed plugins (Pipeline, Git plugin, Multi-branch Project, etc).
  • Disable remote CLI, JNLP and unnecessary protocols.
  • Enable CSRF (Cross Site Request Forgery) protection.
  • Install Telegraf agent for collecting resource and Docker metrics.

The second AMI will be used to create the Jenkins workers, similarly to the first AMI, it will be using the Amazon Linux Image as a base image and a script to provision the instance:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash

echo "Install Java JDK 8"
yum remove -y java
yum install -y java-1.8.0-openjdk

echo "Install Docker engine"
yum update -y
yum install docker -y
usermod -aG docker ec2-user
service docker start

echo "Install git"
yum install -y git

echo "Install Telegraf"
wget https://dl.influxdata.com/telegraf/releases/telegraf-1.6.0-1.x86_64.rpm -O /tmp/telegraf.rpm
yum localinstall -y /tmp/telegraf.rpm
rm /tmp/telegraf.rpm
chkconfig telegraf on
usermod -aG docker telegraf
mv /tmp/telegraf.conf /etc/telegraf/telegraf.conf
service telegraf start

A Jenkins worker requires the Java JDK environment and Git to be installed. In addition, the Docker community edition (building Docker images) and a data collector (monitoring) will be installed.

Now our Packer template files are defined, issue the following commands to start baking the AMIs:

1
2
3
4
5
# validate packer template
packer validate ami.json

# build ami
packer build ami.json

Packer will launch a temporary EC2 instance from the base image specified in the template file and provision the instance with the given shell script. Finally, it will create an image from the instance. The following is an example of the output:



Sign in to AWS Management Console, navigate to “EC2 Dashboard” and click on “AMI”, 2 new AMIs should be created as below:



Now our AMIs are ready to use, let’s deploy our Jenkins cluster to AWS. To achieve that, we will use an infrastructure as code tool called Terraform, it allows you to describe your entire infrastructure in templates files.

I have divided each component of my infrastructure to a template file. The following template file is responsible of creating an EC2 instance from the Jenkins master’s AMI built earlier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
resource "aws_instance" "jenkins_master" {
ami = "${data.aws_ami.jenkins-master.id}"
instance_type = "${var.jenkins_master_instance_type}"
key_name = "${var.key_name}"
vpc_security_group_ids = ["${aws_security_group.jenkins_master_sg.id}"]
subnet_id = "${element(var.vpc_private_subnets, 0)}"

root_block_device {
volume_type = "gp2"
volume_size = 30
delete_on_termination = false
}

tags {
Name = "jenkins_master"
Author = "mlabouardy"
Tool = "Terraform"
}
}

Another template file used as a reference to each AMI built with Packer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
data "aws_ami" "jenkins-master" {
most_recent = true
owners = ["self"]

filter {
name = "name"
values = ["jenkins-master-2.107.2"]
}
}

data "aws_ami" "jenkins-slave" {
most_recent = true
owners = ["self"]

filter {
name = "name"
values = ["jenkins-slave"]
}
}

The Jenkins workers (aka slaves) will be inside an autoscaling group of a minimum of 3 instances. The instances will be created from a launch configuration based on the Jenkins slave’s AMI:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Jenkins slaves launch configuration
resource "aws_launch_configuration" "jenkins_slave_launch_conf" {
name = "jenkins_slaves_config"
image_id = "${data.aws_ami.jenkins-slave.id}"
instance_type = "${var.jenkins_slave_instance_type}"
key_name = "${var.key_name}"
security_groups = ["${aws_security_group.jenkins_slaves_sg.id}"]
user_data = "${data.template_file.user_data_slave.rendered}"

root_block_device {
volume_type = "gp2"
volume_size = 30
delete_on_termination = false
}

lifecycle {
create_before_destroy = true
}
}

// ASG Jenkins slaves
resource "aws_autoscaling_group" "jenkins_slaves" {
name = "jenkins_slaves_asg"
launch_configuration = "${aws_launch_configuration.jenkins_slave_launch_conf.name}"
vpc_zone_identifier = "${var.vpc_private_subnets}"
min_size = "${var.min_jenkins_slaves}"
max_size = "${var.max_jenkins_slaves}"

depends_on = ["aws_instance.jenkins_master", "aws_elb.jenkins_elb"]

lifecycle {
create_before_destroy = true
}

tag {
key = "Name"
value = "jenkins_slave"
propagate_at_launch = true
}

tag {
key = "Author"
value = "mlabouardy"
propagate_at_launch = true
}

tag {
key = "Tool"
value = "Terraform"
propagate_at_launch = true
}
}

To leverage the power of automation, we will make the worker instance join the cluster automatically (cluster discovery) using Jenkins RESTful API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash

JENKINS_URL="${jenkins_url}"
JENKINS_USERNAME="${jenkins_username}"
JENKINS_PASSWORD="${jenkins_password}"
TOKEN=$(curl -u $JENKINS_USERNAME:$JENKINS_PASSWORD ''$JENKINS_URL'/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)')
INSTANCE_NAME=$(curl -s 169.254.169.254/latest/meta-data/local-hostname)
INSTANCE_IP=$(curl -s 169.254.169.254/latest/meta-data/local-ipv4)
JENKINS_CREDENTIALS_ID="${jenkins_credentials_id}"

sleep 60

curl -v -u $JENKINS_USERNAME:$JENKINS_PASSWORD -H "$TOKEN" -d 'script=
import hudson.model.Node.Mode
import hudson.slaves.*
import jenkins.model.Jenkins
import hudson.plugins.sshslaves.SSHLauncher
DumbSlave dumb = new DumbSlave("'$INSTANCE_NAME'",
"'$INSTANCE_NAME'",
"/home/ec2-user",
"3",
Mode.NORMAL,
"slaves",
new SSHLauncher("'$INSTANCE_IP'", 22, SSHLauncher.lookupSystemCredentials("'$JENKINS_CREDENTIALS_ID'"), "", null, null, "", "", 60, 3, 15),
RetentionStrategy.INSTANCE)
Jenkins.instance.addNode(dumb)
' $JENKINS_URL/script

At boot time, the user-data script above will be invoked and the instance private IP address will be retrieved from the instance meta-data and a groovy script will be executed to make the node join the cluster:

1
2
3
4
5
6
7
8
9
10
data "template_file" "user_data_slave" {
template = "${file("scripts/join-cluster.tpl")}"

vars {
jenkins_url = "http://${aws_instance.jenkins_master.private_ip}:8080"
jenkins_username = "${var.jenkins_username}"
jenkins_password = "${var.jenkins_password}"
jenkins_credentials_id = "${var.jenkins_credentials_id}"
}
}

Moreover, to be able to scale out and scale in instances on demand, I have defined 2 CloudWatch metric alarms based on the CPU utilisation of the autoscaling group:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// Scale out
resource "aws_cloudwatch_metric_alarm" "high-cpu-jenkins-slaves-alarm" {
alarm_name = "high-cpu-jenkins-slaves-alarm"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "80"

dimensions {
AutoScalingGroupName = "${aws_autoscaling_group.jenkins_slaves.name}"
}

alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = ["${aws_autoscaling_policy.scale-out.arn}"]
}

resource "aws_autoscaling_policy" "scale-out" {
name = "scale-out-jenkins-slaves"
scaling_adjustment = 1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = "${aws_autoscaling_group.jenkins_slaves.name}"
}

// Scale In
resource "aws_cloudwatch_metric_alarm" "low-cpu-jenkins-slaves-alarm" {
alarm_name = "low-cpu-jenkins-slaves-alarm"
comparison_operator = "LessThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "20"

dimensions {
AutoScalingGroupName = "${aws_autoscaling_group.jenkins_slaves.name}"
}

alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = ["${aws_autoscaling_policy.scale-in.arn}"]
}

resource "aws_autoscaling_policy" "scale-in" {
name = "scale-in-jenkins-slaves"
scaling_adjustment = -1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = "${aws_autoscaling_group.jenkins_slaves.name}"
}

Finally, an Elastic Load Balancer will be created in front of the Jenkins master’s instance and a new DNS record pointing to the ELB domain will be added to Route 53:

1
2
3
4
5
6
7
8
9
10
11
resource "aws_route53_record" "jenkins_master" {
zone_id = "${var.hosted_zone_id}"
name = "jenkins.slowcoder.com"
type = "A"

alias {
name = "${aws_elb.jenkins_elb.dns_name}"
zone_id = "${aws_elb.jenkins_elb.zone_id}"
evaluate_target_health = true
}
}

Once the stack is defined, provision the infrastructure with terraform apply command:

1
2
3
4
5
6
7
8
# Install the AWS provider plugin
terraform int

# Dry-run check
terraform plan

# Provision the infrastructure
terraform apply --var-file=variables.tfvars

The command takes an additional parameter, a variables file with the AWS credentials and VPC settings (You can create a new VPC with Terraform from here):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
region = ""

aws_profile = ""

shared_credentials_file = ""

key_name = ""

hosted_zone_id = ""

bastion_sg_id = ""

jenkins_username = ""

jenkins_password = ""

jenkins_credentials_id = ""

vpc_id = ""

vpc_private_subnets = []

vpc_public_subnets = []

ssl_arn = ""

Terraform will display an execution plan (list of resources that will be created in advance), type yes to confirm and the stack will be created in few seconds:



Jump back to EC2 dashboards, a list of EC2 instances will created:



In the terminal session, under the Outputs section, the Jenkins URL will be displayed:



Point your favorite browser to the URL displayed, the Jenkins login screen will be displayed. Sign in using the credentials provided while baking the Jenkins master’s AMI:



If you click on “Credentials” from the navigation pane, a set of credentials should be created out of the box:



The same goes for “Plugins”, a list of needed packages will be installed also:



Once the Autoscaling group finished creating the EC2 instances, the instances will join the cluster automatically as you can see in the following screenshot:



You should now be ready to create your own CI/CD pipeline !



You can take this further and build a dynamic dashboard in your favorite visualisation tool like Grafana to monitor your cluster resource usage based on the metrics collected by the agent installed on each EC2 instance:



Drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

AWS Events Analysis with ELK

Recording your AWS environment activity is a must have. It can help you monitor your environment’s security continuously and detect suspicious or undesirable activity in real-time. Hence, saving thousands of dollars. Luckily, AWS offers a solution called CloudTrail that allow you to achieve that. It records all events in all AWS regions and logs every API calls in a single S3 bucket.



From there, you can setup an analysis pipeline using the popular logging stack ELK (ElasticSearch, Logstash & Kibana) to read those logs, parse, index and visualise them in a single dynamic dashboard and even take actions accordingly:



To get started, create an AMI with the ELK components installed and preconfigured. The AMI will be based on an Ubuntu image:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{
"variables" : {
"region" : "us-east-1"
},
"builders" : [
{
"type" : "amazon-ebs",
"profile" : "default",
"region" : "{{user `region`}}",
"instance_type" : "t2.xlarge",
"source_ami" : "ami-759bc50a",
"ssh_username" : "ubuntu",
"ami_name" : "elk-stack-6.2.4",
"ami_description" : "ELK Stack",
"run_tags" : {
"Name" : "packer-builder-docker",
"Tool" : "Packer",
"Author" : "mlabouardy"
}
}
],
"provisioners" : [
{
"type" : "file",
"source" : "./elasticsearch.yml",
"destination" : "/tmp/elasticsearch.yml"
},
{
"type" : "file",
"source" : "./cloudtrail.conf",
"destination" : "/tmp/cloudtrail.conf"
},
{
"type" : "file",
"source" : "./kibana.yml",
"destination" : "/tmp/kibana.yml"
},
{
"type" : "shell",
"script" : "./setup.sh",
"execute_command" : "sudo -E -S sh '{{ .Path }}'"
}
]
}

To provision the AMI, we will use the following shell script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash

echo "Install Java JDK 8"
apt-get update
apt-get install openjdk-8-jre -y

echo "Install ElasticSearch 6"
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-6.x.list
apt-get update
apt-get install -y elasticsearch
chown -R elasticsearch:elasticsearch /usr/share/elasticsearch
mv /tmp/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml

echo "Start ElasticSearch"
systemctl enable elasticsearch.service
systemctl start elasticsearch.service

echo "Install Logstash"
apt-get install -y apt-transport-https logstash
mv /tmp/cloudtrail.conf /etc/logstash/conf.d/cloudtrail.conf

echo "Start Logstash"
systemctl enable logstash
systemctl start logstash

echo "Install Kibana"
apt-get install -y kibana
mv /tmp/kibana.yml /etc/kibana/kibana.yml

echo "Start Kibana"
systemctl enable kibana
systemctl start kibana

Now the template is defined, bake a new AMI with Packer:

1
packer build ami.json

Once the AMI is created, create a new EC2 instance based on the AMI with Terraform. Make sure to grant S3 permissions to the instance to be able to read CloudTrail logs from the bucket:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
provider "aws" {
region = "${var.aws_region}"
}

data "aws_ami" "elk" {
most_recent = true
owners = ["self"]

filter {
name = "state"
values = ["available"]
}

filter {
name = "name"
values = ["elk-stack-6.2.4"]
}
}

resource "aws_security_group" "elk_sg" {
name = "elk_sg"
description = "Allow traffic on elasticsearch & kibana ports"

ingress {
from_port = "22"
to_port = "22"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

ingress {
from_port = "9200"
to_port = "9200"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

ingress {
from_port = "5601"
to_port = "5601"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

egress {
from_port = "0"
to_port = "0"
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}

tags {
Name = "elk_sg"
Author = "mlabouardy"
Tool = "Terraform"
}
}

resource "aws_iam_role_policy" "cloudtrail_bucket_access_policy" {
name = "CloudTrailEventsBucketFullAccessPolicy"
role = "${aws_iam_role.cloudtrail_bucket_access_role.id}"

policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::cloudtrail-demo-2018"
}
]
}
EOF
}

resource "aws_iam_role" "cloudtrail_bucket_access_role" {
name = "CloudTrailEventsBucketFullAccessRole"

assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}

resource "aws_iam_instance_profile" "cloudtrail_bucket_access_profile" {
name = "cloudtrail_bucket_access_profile"
role = "${aws_iam_role.cloudtrail_bucket_access_role.name}"
}

resource "aws_instance" "elk" {
key_name = "${var.key_name}"
instance_type = "${var.instance_type}"
ami = "${data.aws_ami.elk.id}"
security_groups = ["${aws_security_group.elk_sg.name}"]
iam_instance_profile = "${aws_iam_instance_profile.cloudtrail_bucket_access_profile.name}"

root_block_device {
volume_size = 100
}

tags {
Name = "elk"
Author = "mlabouardy"
Tool = "Terraform"
}
}

Issue the following command to provision the infrastructure:

1
terraform apply

Head back to AWS Management Console, navigate to CloudTrail, and click on “Create Trail” button:



Give it a name and apply the trail to all AWS regions:



Next, create a new S3 bucket on which the events will be stored on:



Click on “Create“, and the trail should be created as follows:



Next, configure Logstash to read CloudTrail logs on an interval basis. The geoip filter adds information about the geographical location of IP addresses, based on sourceIPAddress field. Then, it stores the logs to Elasticsearch automatically:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
input {
s3 {
"bucket" => "cloudtrail-demo-2018"
}
}

filter {
json {
source => "message"
}

split {
field => "Records"
}

geoip {
source => "[Records][sourceIPAddress]"
target => "geoip"
add_tag => [ "cloudtrail-geoip" ]
}
}

output {
elasticsearch {
hosts => 'http://localhost:9200'
index => 'cloudtrail-%{+YYYY.MM.dd}'
}
}

In order for the changes to take effect, restart Logstash with the command below:

1
service restart logstash

A new index should be created on Elasticsearch (http://IP:9200/_cat/indices?v)



On Kibana, create a new index pattern that match the index format used to store the logs:



After creating index, we can start exploring our CloudTrail events:



Now that we have processed data inside Elasticsearch, let’s build some graphs. We will use the Map visualization in Kibana to monitor geo access to our AWS environment:



You can now see where the environment is being accessed from:



Next, create more widgets to display information about the identity of the user, the user agent and actions taken by the user. Which will look something like this:



You can take this further and setup alerts based on specific event (someone accesses your environment from an undefined location) to be alerted in near real-time.

Full code can be found on my GitHub. Make sure to drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

One-shot containers with Serverless

Have you ever had short lived containers like the following use cases:

  • Batch and ETL (Extract, Transform & Load) Jobs.
  • Database backups and synchronisation.
  • Machine Learning algorithms for generation of learning and training models.
  • Integration & Sanity tests.
  • Web scrapers & crawlers.

And you were wondering how you can deploy your container periodically or in response to an event ? The answer is by using Lambda itself, the idea is by making a Lambda function trigger a deployment of your container from the build server. The following figure illustrates how this process can be implemented:



I have wrote a simple application in Go to simulate a short time process using sleep method:

1
2
3
4
5
6
7
8
9
10
11
12
package main

import (
"fmt"
"time"
)

func main() {
fmt.Println("Start working ...")
time.Sleep(10 * time.Second)
fmt.Println("Done")
}

As Go is a complied language, I have used Docker multi-stage build feature to build a lightweight Docker image with the following Dockerfile:

1
2
3
4
5
6
7
8
9
10
FROM golang:1.10
WORKDIR /go/src/github.com/mlabouardy/lambda-oneshot-container
COPY main.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/mlabouardy/lambda-oneshot-container/app .
CMD ["./app"]

Next, I have a simple CI/CD workflow in Jenkins, the following is the Jenkinsfile used to build the pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
node('slaves'){
stage('Checkout'){
checkout scm
}

stage('Build'){
docker.build(image)
}

stage('Push'){
docker.withRegistry(registry, 'registry') {
docker.image(image).push("${commitID()}")

if (env.BRANCH_NAME == 'master') {
docker.image(image).push('latest')
}
}
}

stage('Deploy'){
build job: "oneshot-app-deployment"
}
}

An example of the pipeline execution is given as follows:



Now, all changes to the application will trigger a new build on Jenkins which will build the new Docker image, push the image to a private registry and deploy the new Docker image to the Swarm cluster:



If you issue the “docker service logs APP_NAME” on one of the cluster managers, your application should be working as expected:



Now our application is ready, let’s make execute everyday at 8am using a Lambda function. The following is the entrypoint (handler) that will be executed on each invocation of the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func triggerJob() error {
url := fmt.Sprintf(`%s/job/%s/build`, os.Getenv("JENKINS_HOST"), os.Getenv("JENKINS_JOB"))

client := http.Client{}
req, err := http.NewRequest("POST", url, nil)
if err != nil {
return err
}

crumb, err := getToken()
if err != nil {
return err
}

req.Header.Set(crumb[0], crumb[1])
req.SetBasicAuth(os.Getenv("JENKINS_USERNAME"), os.Getenv("JENKINS_PASSWORD"))

resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()

if resp.StatusCode != 201 {
return errors.New("Cannot trigger job")
}

return nil
}

It uses the Jenkins API to trigger the deployment process job.

Now the function is defined, use the shell script below to create the following:

  • Build a deployment package (.zip file).
  • Create an IAM role with permissions to push logs to CloudWatch.
  • Create a Go based Lambda function from the deployment package.
  • Create a CloudWatch Event rule that will be executed everyday at 8am.
  • Make the CloudWatch Event invoke the Lambda function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash

## Override
JENKINS_HOST=""
JENKINS_USERNAME=""
JENKINS_PASSWORD=""
JENKINS_JOB=""
CRON_EXPRESSION="cron(0 8 * * ? *)"
## Global variables
AWS_REGION="us-east-1"
FUNCTION_NAME="RestartJob"

echo "Building binary"
GOOS=linux go build -o main main.go

echo "Generating deployment package"
zip deployment.zip main

echo "Creating IAM Role"
POLICY_ARN=$(aws iam create-policy --policy-name $FUNCTION_NAME --policy-document file://policy.json | jq -r '.Policy.Arn')
ROLE_ARN=$(aws iam create-role --role-name $FUNCTION_NAME --assume-role-policy-document file://role.json | jq -r '.Role.Arn')
aws iam attach-role-policy --role-name $FUNCTION_NAME --policy-arn $POLICY_ARN

echo "Creating Lambda function"
FUNCTION_ARN=$(aws lambda create-function --function-name $FUNCTION_NAME --runtime go1.x \
--handler main --role $ROLE_ARN \
--zip-file fileb://./deployment.zip \
--environment Variables="{JENKINS_HOST=$JENKINS_HOST,JENKINS_USERNAME=$JENKINS_USERNAME,JENKINS_PASSWORD=$JENKINS_PASSWORD,JENKINS_JOB=$JENKINS_JOB}" \
--region $AWS_REGION | jq -r '.FunctionArn')

echo "Creating CloudWatch Event rule"
RULE_ARN=$(aws events put-rule --name launch-container-daily --schedule-expression ''"$CRON_EXPRESSION"'' | jq -r '.RuleArn')
aws lambda add-permission --function-name $FUNCTION_NAME \
--statement-id 1 \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn $RULE_ARN
sed -i '.bak' 's/FUNCTION_ARN/'"$FUNCTION_ARN"'/g' targets.json
aws events put-targets --rule launch-container-daily --targets file://targets.json


echo "Cleaning up"
rm main deployment.zip *.bak

As a result, a Lambda function will be created as follows:

1
aws lambda invoke --function-name RestartJob output

A new deployment should be triggered in Jenkins and your application should be deployed once again:



That’s it, it was a quick example on how you can use Serverless with Containers, you can go further and use Lambda functions to scale out/scale in your services in your Swarm/Kubernetes cluster by using either CloudWatch events for expected increasing traffic (Holidays, Black Friday …) or other AWS managed services like API Gateway in response to incoming client requests.

Full code can be found on my GitHub. Make sure to drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Docker on Elastic Beanstalk Tips

AWS Elastic Beanstalk is one of the most used PaaS today, it allows you to deploy your application without provisioning the underlying infrastructure while maintaining the high availability of your application. However, it’s painful to use due to the lack of documentation and real-world scenarios. In this post, I will walk you through how to use Elastic Beanstalk to deploy Docker containers from scratch. Followed by how to automate your deployment process with a Continuous Integration pipeline. At the end of this post, you should be familiar with advanced topics like debugging and monitoring of your applications in EB.



1 – Environment Setup

To get started, create a new Application using the following AWS CLI command:

1
2
aws elasticbeanstalk create-application --application-name avengers \
--region eu-west-3

Create a new environment. Let’s call it “staging” :

1
2
3
4
5
aws elasticbeanstalk create-environment --application-name avengers \
--environment-name staging \
--solution-stack-name "64bit Amazon Linux 2017.09 v2.9.2 running Docker 17.12.0-ce" \
--option-settings file://options.json \
--region eu-west-3

Head back to AWS Elastic Beanstalk Console, your new environment should be created:



Point your browser to the environment URL, a sample Docker application should be displayed:



Let’s deploy our application. I wrote a small web application in Go to return a list of Marvel Avengers (I see you Thanos 😉 )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package main

import (
"encoding/json"
"io/ioutil"
"log"
"net/http"
)

type Avenger struct {
Character string `json:"character"`
Name string `json:"name"`
}

var avengers []Avenger

func init() {
data, _ := ioutil.ReadFile("avengers.json")
json.Unmarshal(data, &avengers)
}

func IndexHandler(w http.ResponseWriter, r *http.Request) {
response, _ := json.Marshal(avengers)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(200)
w.Write(response)
}

func main() {
http.HandleFunc("/", IndexHandler)
if err := http.ListenAndServe(":3000", nil); err != nil {
log.Fatal(err)
}
}

Next, we will create a Dockerfile to build the Docker image. Go is a compiled language, therefore we can use the Docker multi-stage feature to build a lightweight Docker image:

1
2
3
4
5
6
7
8
9
10
11
12
13
FROM golang:1.10 as builder
WORKDIR /go/src/github.com/mlabouardy/docker-eb-ci-mon
COPY main.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

FROM alpine:latest
MAINTAINER mlabouardy
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/mlabouardy/docker-eb-ci-mon/app .
COPY avengers.json .
EXPOSE 3000
CMD ["./app"]

Next, we create a Dockerrun.aws.json that describes how the container will be deployed in Elastic Beanstalk:

1
2
3
4
5
6
7
8
9
10
11
12
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "mlabouardy/avengers",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "3000"
}
]
}

Now the application is defined, create an application bundle by creating a ZIP package:

1
zip -r deployment.zip .

Then, create a S3 bucket to store the different versions of your application bundles:

1
aws s3 mb s3://avengers-docker-eb --region AWS_REGION

And create a new application version from the application bundle:

1
2
3
4
5
aws elasticbeanstalk create-application-version --application-name avengers \
--version-label v1 \
--source-bundle S3Bucket="avengers-docker-eb",S3Key="deployment.zip" \
--auto-create-application \
--region AWS_REGION


Finally, deploy the version to the staging environment:

1
2
3
aws elasticbeanstalk update-environment --application-name avengers \
--environment-name staging \
--version-label v1 --region AWS_REGION

Give it a few seconds while it’s deploying the new version:



Then, repoint your browser to the environment URL, a list of Avengers will be returned in a JSON format as follows:



Now that our Docker application is deployed, let’s automate this process by setting up a CI/CD pipeline.

2 – CI/CD Pipeline

I opt for CircleCI, but you’re free to use whatever CI server you’re familiar with. The same steps can be applied.

Create a circle.yml file with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
version: 2
jobs:
build:
docker:
- image: circleci/golang:1.10

working_directory: /go/src/github.com/mlabouardy/docker-eb-ci-mon

steps:
- checkout

- setup_remote_docker

- run:
name: Install AWS CLI
command: |
sudo apt-get update
sudo apt-get install -y awscli

- run:
name: Test
command: go test

- run:
name: Build
command: docker build -t mlabouardy/avengers:latest .

- run:
name: Push
command: |
docker login -u$DOCKERHUB_LOGIN -p$DOCKERHUB_PASSWORD
docker tag mlabouardy/avengers:latest mlabouardy/avengers:${CIRCLE_SHA1}
docker push mlabouardy/avengers:latest
docker push mlabouardy/avengers:${CIRCLE_SHA1}

- run:
name: Deploy
command: |
zip -r deployment-${CIRCLE_SHA1}.zip .
aws s3 cp deployment-${CIRCLE_SHA1}.zip s3://avengers-docker-eb --region eu-west-3
aws elasticbeanstalk create-application-version --application-name avengers \
--version-label ${CIRCLE_SHA1} --source-bundle S3Bucket="avengers-docker-eb",S3Key="deployment-${CIRCLE_SHA1}.zip" --region eu-west-3
aws elasticbeanstalk update-environment --application-name avengers \
--environment-name staging --version-label ${CIRCLE_SHA1} --region eu-west-3

The pipeline will firstly prepare the environment, installing the AWS CLI. Then run unit tests. Next, a Docker image will be built, then pushed to DockerHub. Last step is creating a new application bundle and deploying the bundle to Elastic Beanstalk.

In order to grant Circle CI permissions to call AWS operations, we need to create a new IAM user with following IAM policy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"elasticbeanstalk:*",
"s3:*",
"ec2:*",
"cloudformation:*",
"autoscaling:*",
"elasticloadbalancing:*"
],
"Resource": "*"
}
]
}

Generate AWS access & secret keys. Then, head back to Circle CI and click on the project settings and paste the credentials :



Now, everytime you push a change to your code repository, a build will be triggered:



And a new version will be deployed automatically to Elastic Beanstalk:



3 – Monitoring

Monitoring your applications is mandatory. Unfortunately, CloudWatch doesn’t expose useful metrics like Memory usage of your applications in Elastic Beanstalk. Hence, in this part, we will solve this issue by creating our custom metrics.

I will install a data collector agent on the instance. The agent will collect metrics and push them to a time-series database.

To install the agent, we will use .ebextensions folder, on which we will create 3 configuration files:

  • 01-install-telegraf.config: install Telegraf on the instance
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
container_commands:
01downloadpackage:
command: "wget https://dl.influxdata.com/telegraf/releases/telegraf-1.6.0-1.x86_64.rpm -O /tmp/telegraf.rpm"
ignoreErrors: true
02installpackage:
command: "yum localinstall -y /tmp/telegraf.rpm"
ignoreErrors: true
03removepackage:
command: "rm /tmp/telegraf.rpm"
ignoreErrors: true
04enablereboot:
command: "chkconfig telegraf on"
ignoreErrors: true
05fixpermission:
command: "usermod -a -G docker telegraf"
ignoreErrors: true
  • 02-config-file.config: create a Telegraf configuration file to collect system usage & docker containers metrics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
files:
"/etc/telegraf/telegraf.conf":
mode: "000666"
owner: root
group: root
content: |
[global_tags]
hostname="Avengers"

# Read metrics about CPU usage
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = [ "usage*" ]
name_suffix = "_vm"

# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ "inodes*" ]
mount_points=["/"]
name_suffix = "_vm"

# Read metrics about network usage
[[inputs.net]]
interfaces = [ "eth0", "eth1" ]
fielddrop = [ "icmp*", "ip*", "tcp*", "udp*" ]
name_suffix = "_vm"

# Read metrics about memory usage
[[inputs.mem]]
name_suffix = "_vm"

# Read metrics about swap memory usage
[[inputs.swap]]
name_suffix = "_vm"

# Read metrics about system load and uptime
[[inputs.system]]
name_suffix = "_vm"

# Read metrics from docker socket api
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
container_names = []
name_suffix = "_docker"

[[outputs.influxdb]]
database = "instances"
urls = ["http://172.31.38.51:8086"]
namepass = ["*_vm"]

[[outputs.influxdb]]
database = "containers"
urls = ["http://172.31.38.51:8086"]
namepass = ["*_docker"]
  • 03-start-telegraf.config: start Telegraf agent.
1
2
3
4
container_commands:
01starttelegraf:
command: "service telegraf start"
ignoreErrors: true

Once the application version is deployed to Elastic Beanstalk, metrics will be pushed to your timeseries database. In this example, I used InfluxDB as data storage and I created some dynamic Dashboards in Grafana to visualize metrics in real-time:

Containers:



Hosts:



Note: for in-depth explaination on how to configure Telegraf, InfluxDB & Grafana read my previous article.

Full code can be found on my GitHub. Make sure to drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy

Infrastructure Cost Optimization with Lambda

Having multiple environments is important to build a continuous integration/deployment pipeline and be able to reproduce bugs in production with ease but this comes at price. In order to reduce cost of AWS infrastructure, instances which are running 24/7 unnecessarily (sandbox & staging environments) must be shut down outside of regular business hours.

The figure below describes an automated process to schedule, stop and start instances to help cutting costs. The solution is a perfect example of using Serverless computing.



Note: full code is available on my GitHub.

2 Lambda functions will be created, they will scan all environments looking for a specific tag. The tag we use is named ‘Environment’. Instances without an Environment tag will not be affected:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func getInstances(cfg aws.Config) ([]Instance, error) {
instances := make([]Instance, 0)

svc := ec2.New(cfg)
req := svc.DescribeInstancesRequest(&ec2.DescribeInstancesInput{
Filters: []ec2.Filter{
ec2.Filter{
Name: aws.String("tag:Environment"),
Values: []string{os.Getenv("ENVIRONMENT")},
},
},
})
res, err := req.Send()
if err != nil {
return instances, err
}

for _, reservation := range res.Reservations {
for _, instance := range reservation.Instances {
for _, tag := range instance.Tags {
if *tag.Key == "Name" {
instances = append(instances, Instance{
ID: *instance.InstanceId,
Name: *tag.Value,
})
}
}
}
}

return instances, nil
}

The StartEnvironment function will query the StartInstances method with the list of instance ids returned by the previous function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func startInstances(cfg aws.Config, instances []Instance) error {
instanceIds := make([]string, 0, len(instances))
for _, instance := range instances {
instanceIds = append(instanceIds, instance.ID)
}

svc := ec2.New(cfg)
req := svc.StartInstancesRequest(&ec2.StartInstancesInput{
InstanceIds: instanceIds,
})
_, err := req.Send()
if err != nil {
return err
}
return nil
}

Similarly, the StopEnvironment function will query the StopInstances method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func stopInstances(cfg aws.Config, instances []Instance) error {
instanceIds := make([]string, 0, len(instances))
for _, instance := range instances {
instanceIds = append(instanceIds, instance.ID)
}

svc := ec2.New(cfg)
req := svc.StopInstancesRequest(&ec2.StopInstancesInput{
InstanceIds: instanceIds,
})
_, err := req.Send()
if err != nil {
return err
}
return nil
}

Finally, both functions will post a message to Slack channel for real-time notification:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func postToSlack(color string, title string, instances string) error {
message := SlackMessage{
Text: title,
Attachments: []Attachment{
Attachment{
Text: instances,
Color: color,
},
},
}

client := &http.Client{}
data, err := json.Marshal(message)
if err != nil {
return err
}

req, err := http.NewRequest("POST", os.Getenv("SLACK_WEBHOOK"), bytes.NewBuffer(data))
if err != nil {
return err
}

resp, err := client.Do(req)
if err != nil {
return err
}

return nil
}

Now our functions are defined, let’s build the deployment packages (zip files) using the following Bash script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash

echo "Building StartEnvironment binary"
GOOS=linux GOARCH=amd64 go build -o main start/*.go

echo "Creating deployment package"
zip start-environment.zip main
rm main

echo "Building StopEnvironment binary"
GOOS=linux GOARCH=amd64 go build -o main stop/*.go

echo "Creating deployment package"
zip stop-environment.zip main
rm main

The functions require an IAM role to be able to interact with EC2. The StartEnvironment function has to be able to describe and start EC2 instances:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup",
"ec2:DescribeInstances",
"ec2:StartInstances"
],
"Resource": [
"*"
]
}
]
}

The StopEnvironment function has to be able to describe and stop EC2 instances:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup",
"ec2:DescribeInstances",
"ec2:StopInstances"
],
"Resource": [
"*"
]
}
]
}

Finally, create an IAM role for each function and attach the above policies:

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash

echo "IAM role for StartEnvironment"
arn=$(aws iam create-policy --policy-name StartEnvironment --policy-document file://start/policy.json | jq -r '.Policy.Arn')
result=$(aws iam create-role --role-name StartEnvironmentRole --assume-role-policy-document file://role.json | jq -r '.Role.Arn')
aws iam attach-role-policy --role-name StartEnvironmentRole --policy-arn $arn
echo "ARN: $result"

echo "IAM role for StopEnvironment"
arn=$(aws iam create-policy --policy-name StopEnvironment --policy-document file://stop/policy.json | jq -r '.Policy.Arn')
result=$(aws iam create-role --role-name StopEnvironmentRole --assume-role-policy-document file://role.json | jq -r '.Role.Arn')
aws iam attach-role-policy --role-name StopEnvironmentRole --policy-arn $arn
echo "ARN: $result"

The script will output the ARN for each IAM role:



Before jumping to deployment part, we need to create a Slack WebHook to be able to post messages to Slack channel:



Next, use the following script to deploy your functions to AWS Lambda (make sure to replace the IAM roles, Slack WebHook token & the target environment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash

START_IAM_ROLE="arn:aws:iam::ACCOUNT_ID:role/StartEnvironmentRole"
STOP_IAM_ROLE="arn:aws:iam::ACCOUNT_ID:role/StopEnvironmentRole"
AWS_REGION="us-east-1"
SLACK_WEBHOOK="https://hooks.slack.com/services/TOKEN"
ENVIRONMENT="sandbox"

echo "Deploying StartEnvironment to Lambda"
aws lambda create-function --function-name StartEnvironment \
--zip-file fileb://./start-environment.zip \
--runtime go1.x --handler main \
--role $START_IAM_ROLE \
--environment Variables="{SLACK_WEBHOOK=$SLACK_WEBHOOK,ENVIRONMENT=$ENVIRONMENT}" \
--region $AWS_REGION


echo "Deploying StopEnvironment to Lambda"
aws lambda create-function --function-name StopEnvironment \
--zip-file fileb://./stop-environment.zip \
--runtime go1.x --handler main \
--role $STOP_IAM_ROLE \
--environment Variables="{SLACK_WEBHOOK=$SLACK_WEBHOOK,ENVIRONMENT=$ENVIRONMENT}" \
--region $AWS_REGION \


rm *-environment.zip

Once deployed, if you sign in to AWS Management Console, navigate to Lambda Console, you should see both functions has been deployed successfully:

StartEnvironment:



StopEnvironment:



To further automate the process of invoking the Lambda function at the right time. AWS CloudWatch Scheduled Events will be used.

Create a new CloudWatch rule with the below cron expression (It will be invoked everyday at 9 AM):



And another rule to stop the environment at 6 PM:



Note: All times are GMT time.

Testing:

a – Stop Environment



Result:



b – Start Environment



Result:



The solution is easy to deploy and can help reduce operational costs.

Full code can be found on my GitHub. Make sure to drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Deploy a Swarm Cluster with Alexa

Serverless and Containers changed the way we leverage public clouds and how we write, deploy and maintain applications. A great way to combine the two paradigms is to build a voice assistant with Alexa based on Lambda functions – written in Go – to deploy a Docker Swarm cluster on AWS.

The figure below shows all components needed to deploy a production-ready Swarm cluster on AWS with Alexa.



Note: Full code is available on my GitHub.

A user will ask Amazon Echo to deploy a Swarm Cluster:



Echo will intercept the user’s voice command with built-in natural language understanding and speech recognition. Convey them to the Alexa service. A custom Alexa skill will convert the voice commands to intents:



The Alexa skill will trigger a Lambda function for intent fulfilment:



The Lambda Function will use the AWS EC2 API to deploy a fleet of EC2 instances from an AMI with Docker CE preinstalled (I used Packer to bake the AMI to reduce the cold-start of the instances). Then, push the cluster IP addresses to a SQS:



Next, the function will insert a new item to a DynamoDB table with the current state of the cluster:



Once the SQS received the message, a CloudWatch alarm (it monitors the ApproximateNumberOfMessagesVisible parameter) will be triggered and as a result it will publish a message to an SNS topic:



The SNS topic triggers a subscribed Lambda function:



The Lambda function will pull the queue for a new cluster and use the AWS System Manager API to provision a Swarm cluster on the fleet of EC2 instances created earlier:



For debugging, the function will output the Swarm Token to CloudWatch:



Finally, it will update the DynamoDB item state from Pending to Done and delete the message from SQS.

You can test your skill on your Amazon Echo, Echo Dot, or any Alexa device by saying, “Alexa, open Docker



At the end of the workflow described above, a Swarm cluster will be created:



At this point you can see your Swarm status by firing the following command as shown below:





Improvements & Limitations:

  • Lambda execution timeout if the cluster size is huge. You can use a Master Lambda function to spawn child Lambda.
  • CloudWatch & SNS parts can be deleted if SQS is supported as Lambda event source (AWS PLEAAASE !). DynamoDB streams or Kinesis streams cannot be used to notify Lambda as I wanted to create some kind of delay for the instances to be fully created before setting up the Swarm cluster. (maybe Simple Workflow Service ?)
  • Inject SNS before SQS. SNS can add the message to SQS and trigger the Lambda function. We won’t need CloudWatch Alarm.
  • You can improve the Skill by adding new custom intents to deploy Docker containers on the cluster or ask Alexa to deploy the cluster on a VPC

In-depth details about the skill can be found on my GitHub. Make sure to drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Immutable AMI with Packer

When dealing with Hybrid or multi-cloud environments, you would need to have an identical machine images for multiple platforms from a single source configuration. That’s were Packer comes into play.

To get started, find the appropriate package for your system and download Packer:

1
2
curl https://releases.hashicorp.com/packer/1.2.2/packer_1.2.2_darwin_amd64.zip -O /usr/local/bin/packer
chmod +x packer

With Packer installed, let’s just dive right into it and bake our AMI with a preinstalled Docker Engine in order to build a Swarm or Kubernetes cluster and avoid cold-start of node machines.

Packer is template-driven, templates are written in JSON format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"variables" : {
"region" : "us-east-1"
},
"builders" : [
{
"type" : "amazon-ebs",
"profile" : "default",
"region" : "{{user `region`}}",
"instance_type" : "t2.micro",
"source_ami" : "ami-1853ac65",
"ssh_username" : "ec2-user",
"ami_name" : "docker-17.12.1-ce",
"ami_description" : "Amazon Linux Image with Docker-CE",
"run_tags" : {
"Name" : "packer-builder-docker",
"Tool" : "Packer",
"Author" : "mlabouardy"
}
}
],
"provisioners" : [
{
"type" : "shell",
"script" : "./setup.sh"
}
]
}

The template is divided into 3 sections:

  • variables: Custom variables that can be overriden during runtime by using the -var flag. In the above snippet, we’re specifying the AWS region.
  • builders: You can specify multiple builders depending on the target platforms (EC2, VMware, Google Cloud, Docker …).
  • provisioners: You can pass a shell script or use configuration managements tools like Ansible, Chef, Puppet or Salt to provision the AMI and install all required packages and softwares.

Packer will use an existing Amazon Linux Image “Gold Image” from the marketplace and install the latest Docker community edition using the following Bash script:

1
2
3
4
5
6
#/bin/sh

sudo yum update -y
sudo yum install docker -y
sudo service docker start
sudo usermod -aG docker ec2-user

Note: You can avoid hardcoding the Gold Image ID in the template by using the source_ami_filter attribute.

Before we take the template and build an image from it, let’s validate the template by running:

1
packer validate ami.json

Now that we have our template file and bash provisioning script ready to go, we can issue the following command to build our new AMI:

1
packer build ami.json


This will chew for a bit and finally output the AMI ID:



Next, create a new EC2 instance based on the AMI:

1
2
3
4
5
aws ec2 run-instances --image-id ami-3fbc1940
--count 1 --instance-type t2.micro
--key-name KEY_NAME --security-group-ids SG_ID
--region us-east-1
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=demo}]'

Then, connect to your instance via SSH and type the following command to verify Docker latest release is installed:



Simple right ? Well, you can go further and setup a CI/CD pipeline to build your AMIs on every push, recreate your EC2 instances with the new AMIs and rollback in case of failure.



Drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Become AWS Certified Developer with Alexa

Being an AWS Certified can boost your career (increasing your salary, finding better job or getting a promotion) and make your expertise and skills relevant. Hence, there’s no better way to prepare for your AWS Certified Developer Associate exam than getting your hands dirty and build a Serverless Quiz game with Alexa Skill and AWS Lambda.



Note: all the code is available in my GitHub.

1 – DynamoDB

To get started, create a DynamoDB Table using the AWS CLI:

1
2
3
4
5
aws dynamodb create-table --table-name Questions
--attribute-definitions AttributeName=ID,AttributeType=S
--key-schema AttributeName=ID,KeyType=HASH
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
--region AWS_REGION

I prepared in advance a list of questions for the following AWS services:



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[
{
"category" : "S3",
"questions" : [
{
"question": "In S3 what can be used to delete a large number of objects",
"answers" : {
"A" : "QuickDelete",
"B" : "Multi-Object Delete",
"C" : "Multi-S3 Delete",
"D" : "There is no such option available"
},
"correct" : "B"
},
{
"question": "S3 buckets can contain both encrypted and non-encrypted objects",
"answers" : {
"A" : "False",
"B" : "True"
},
"correct" : "B"
}
]
}
]

Next, import the JSON file to the DynamoDB table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func main() {
cfg, err := external.LoadDefaultAWSConfig()
if err != nil {
log.Fatal(err)
}

raw, err := ioutil.ReadFile("questions.json")
if err != nil {
log.Fatal(err)
}

services := make([]Service, 0)
json.Unmarshal(raw, &services)

for _, service := range services {
fmt.Printf("AWS Service: %s\n", service.Category)
fmt.Printf("\tQuestions: %d\n", len(service.Questions))
for _, question := range service.Questions {
err = insertToDynamoDB(cfg, service.Category, question)
if err != nil {
log.Fatal(err)
}
}
}

}

The insertToDynamoDB function uses the AWS DynamoDB SDK and PutItemRequest method to insert an item into the table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func insertToDynamoDB(cfg aws.Config, category string, question Question) error {
tableName := os.Getenv("TABLE_NAME")

item := DBItem{
ID: xid.New().String(),
Category: category,
Question: question.Question,
Answers: question.Answers,
Correct: question.Correct,
}

av, err := dynamodbattribute.MarshalMap(item)
if err != nil {
fmt.Println(err)
return err
}

svc := dynamodb.New(cfg)
req := svc.PutItemRequest(&dynamodb.PutItemInput{
TableName: &tableName,
Item: av,
})
_, err = req.Send()
if err != nil {
return err
}

return nil
}

Run the main.go file by issuing the following command:



If you navigate to DynamoDB Dashboard, you should see that the list of questions has been successfully inserted:



2 – Alexa Skill

This is what ties it all together, by linking the phrases the user says to interact with the quiz to the intents.

For people who are not familiar with NLP. Alexa is based on an NLP Engine which is a system that analyses phrases (users messages) and returns an intent. Intents describe what a user want or want to do. It is the intention behind his message. Alexa can learn new intents by attributing examples of messages to an intent. Behind the scenes, the Engine will be able to predict the intent even if he had never seen it before.

So, sign up to Amazon Developer Console, and create a new custom Alexa Skill. Set an invocation name as follows:



Create a new Intent for starting the Quiz:



Add a new type “Slot” to store user choice:



Then, create another intent for AWS service choice:



And for user’s answer choice:



Save your interaction model. Then, you’re ready to configure your Alexa Skill.

3 – Lambda Function

The Lambda handler function is self explanatory, it maps each intent to a code snippet. To keep track of the user’s score. We use Alexa sessionAttributes property of the JSON response. The session attributes will then be passed back to you with the next request JSON inside the session’s object. The list of questions is retrieved from DynamoDB using AWS SDK and SSML (Speech Synthesis Markup Language) is used to make Alexa speaks a sentence ending in a question mark as a question or add pause in the speech:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
package main

import (
"context"
"fmt"
"log"
"strings"

"github.com/aws/aws-lambda-go/lambda"
)

func HandleRequest(ctx context.Context, r AlexaRequest) (AlexaResponse, error) {
resp := CreateResponse()

switch r.Request.Intent.Name {
case "Begin":
resp.Say(`<speak>
Choose the AWS service you want to be tested on <break time="1s"/>
A <break time="1s"/> EC2 <break time="1s"/>
B <break time="1s"/> VPC <break time="1s"/>
C <break time="1s"/> DynamoDB <break time="1s"/>
D <break time="1s"/> S3 <break time="1s"/>
E <break time="1s"/> SQS
</speak>`, false, "SSML")
case "ServiceChoice":
number := strings.TrimSuffix(r.Request.Intent.Slots["choice"].Value, ".")

questions, _ := getQuestions(services[number])

resp.SessionAttributes = make(map[string]interface{}, 1)
resp.SessionAttributes["score"] = 0
resp.SessionAttributes["correct"] = questions[0].Correct

resp.Say(text, false, "SSML")
case "AnswerChoice":
resp.SessionAttributes = make(map[string]interface{}, 1)
score := int(r.Session.Attributes["score"].(float64))
correctChoice := r.Session.Attributes["correct"]
userChoice := strings.TrimSuffix(r.Request.Intent.Slots["choice"].Value, ".")

text := "<speak>"
if correctChoice == userChoice {
text += "Correct</speak>"
score++
} else {
text += "Incorrect</speak>"
}

resp.Say(text, false, "SSML")

case "AMAZON.HelpIntent":
resp.Say(welcomeMessage, false, "PlainText")
case "AMAZON.StopIntent":
resp.Say(exitMessage, true, "PlainText")
case "AMAZON.CancelIntent":
resp.Say(exitMessage, true, "PlainText")
default:
resp.Say(welcomeMessage, false, "PlainText")
}
return *resp, nil
}

func main() {
lambda.Start(HandleRequest)
}

Note: Full code is available in my GitHub.

Sign in to AWS Management Console, and create a new Lambda function in Go from scratch:



Assign the following IAM role to the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:CreateLogGroup",
"logs:PutLogEvents"
],
"Resource": "*"
},
{
"Sid": "2",
"Effect": "Allow",
"Action": "dynamodb:Scan",
"Resource": [
"arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/Questions/index/ID",
"arn:aws:dynamodb:us-east-1:ACCOUNT_ID:table/Questions"
]
}
]
}

Generate the deployment package, and upload it to the Lambda Console and set the TABLE_NAME environment variable to the table name:



4 – Testing

Now that you have created the function and put its code in place, it’s time to specify how it gets called. We’ll do this by linking the Lambda ARN to Alexa Skill:



Once the information is in place, click Save Endpoints. You’re ready to start testing your new Alexa Skill !



To test, you need to login to Alexa Developer Console, and enable the “Test” switch on your skill from the “Test” Tab:



Or use an Alexa enabled device like Amazon Echo, by saying “Alexa, Open AWS Developer Quiz” :



Drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Slack Notification with CloudWatch Alarms & Lambda

ChatOps has emerged as one of the most effective techniques to implement DevOps. Hence, it will be great to receive notifications and infrastructure alerts into collaboration messaging platforms like Slack & HipChat.

AWS CloudWatch Alarms and SNS are a great mix to build a real-time notification system as SNS supports multiple endpoints (Email, HTTP, Lambda, SQS). Unfortunately SNS doesn’t support out of the box sending notifications to tools like Slack.



CloudWatch will trigger an alarm to send a message to an SNS topic if the monitoring data gets out of range. A Lambda function will be invoked in response of SNS receiving the message and will call the Slack API to post a message to Slack channel.



To get started, create an EC2 instance using the AWS Management Console or the AWS CLI:

1
2
3
4
5
aws ec2 run-instances --image-id ami_ID
--count 1 --instance-type t2.micro
--key-name KEY_NAME --security-group-ids SG_ID
--region us-east-1
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=demo}]'

Next, create a SNS topic:

1
aws sns create-topic --name cpu-utilization-alarm-topic --region us-east-1

Then, setup a CloudWatch alarm when the instance CPU utilization is higher than 40% and send notification to the SNS topic:

1
2
3
4
5
6
7
8
9
10
11
12
aws cloudwatch put-metric-alarm --region us-east-1 
--alarm-name "HighCPUUtilization"
--alarm-description "High CPU Utilization"
--actions-enabled
--alarm-actions "TOPIC_ARN"
--metric-name "CPUUtilization"
--namespace AWS/EC2 --statistic "Average"
--dimensions "Name=InstanceName,Value=demo"
--period 60
--evaluation-periods 60
--threshold 40
--comparison-operator "GreaterThanOrEqualToThreshold"

As a result:



To be able to post messages to slack channel, we need to create a Slack Incoming WebHook. Start by setting up an incoming webhook integration in your Slack workspace:



Note down the returned WebHook URL for upcoming part.

The Lambda handler function is written in Go, it takes as an argument the SNS message. Then, it parses it and queries the Slack API to post a message to the Slack channel configured in the previous section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
package main

import (
"bytes"
"encoding/json"
"fmt"
"log"
"net/http"
"os"

"github.com/aws/aws-lambda-go/lambda"
)

type Request struct {
Records []struct {
SNS struct {
Type string `json:"Type"`
Timestamp string `json:"Timestamp"`
SNSMessage string `json:"Message"`
} `json:"Sns"`
} `json:"Records"`
}

type SNSMessage struct {
AlarmName string `json:"AlarmName"`
NewStateValue string `json:"NewStateValue"`
NewStateReason string `json:"NewStateReason"`
}

type SlackMessage struct {
Text string `json:"text"`
Attachments []Attachment `json:"attachments"`
}

type Attachment struct {
Text string `json:"text"`
Color string `json:"color"`
Title string `json:"title"`
}

func handler(request Request) error {
var snsMessage SNSMessage
err := json.Unmarshal([]byte(request.Records[0].SNS.SNSMessage), &snsMessage)
if err != nil {
return err
}

log.Printf("New alarm: %s - Reason: %s", snsMessage.AlarmName, snsMessage.NewStateReason)
slackMessage := buildSlackMessage(snsMessage)
postToSlack(slackMessage)
log.Println("Notification has been sent")
return nil
}

func buildSlackMessage(message SNSMessage) SlackMessage {
return SlackMessage{
Text: fmt.Sprintf("`%s`", message.AlarmName),
Attachments: []Attachment{
Attachment{
Text: message.NewStateReason,
Color: "danger",
Title: "Reason",
},
},
}
}

func postToSlack(message SlackMessage) error {
client := &http.Client{}
data, err := json.Marshal(message)
if err != nil {
return err
}

req, err := http.NewRequest("POST", os.Getenv("SLACK_WEBHOOK"), bytes.NewBuffer(data))
if err != nil {
return err
}

resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()

if resp.StatusCode != 200 {
fmt.Println(resp.StatusCode)
return err
}

return nil
}

func main() {
lambda.Start(handler)
}

As Go is a compiled language, build the application and create a Lambda deployment package using the bash script below:

1
2
3
4
5
6
7
8
9
10
#!/bin/bash

echo "Building binary"
GOOS=linux GOARCH=amd64 go build -o main main.go

echo "Create deployment package"
zip deployment.zip main

echo "Cleanup"
rm main

Once created, use the AWS CLI to deploy the function to Lambda. Make sure to override the Slack WebHook with your own:

1
2
3
4
5
6
aws lambda create-function --function-name SlackNotification
--zip-file fileb://./deployment.zip
--runtime go1.x --handler main
--role IAM_ROLE_ARN
--environment Variables={SLACK_WEBHOOK=https://hooks.slack.com/services/TOKEN}
--region AWS_REGION

Note: For non Gophers you can download the zip file directly from here.

From here, configure the invoking service for your function to be the SNS topic we created earlier:

1
2
3
aws sns subscribe --topic-arn TOPIC_ARN --protocol lambda
--notification-endpoint LAMBDA_ARN
--region AWS_REGION

Lambda Dashboard:



Let’s test it out, connect to your instance via SSH, then install stress which is a tool to do workload testing:

1
sudo yum install -y stress

Issue the following command to generate some workloads on the CPU:

1
stress --cpu 2 --timeout 60s

You should receive a Slack notification as below:



Note: You can go further and customize your Slack message.

Drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Build a Serverless Production-Ready Blog

Are you tired of maintaining your CMS (WordPress, Drupal, etc) ? Paying expensive hosting fees? Fixing security issues everyday ?



I discovered not long ago a new blogging framework called Hexo which let you publish Markdown documents in the form of blog post. So as always I got my hands dirty and wrote this post to show you how to build a production-ready blog with Hexo and use the AWS S3 to make your blog Serverless and pay only per usage. Along the way, I will show you how to automate the deployment of new posts by setting up a CI/CD pipeline

To get started, Hexo requires Node.JS & Git to be installed. Once all requirements are installed, issue the following command to install Hexo CLI:

1
npm install -g hexo-cli

Next, create a new empty project:

1
hexo init slowcoder.com

Modify blog global settings in _config.yml file:

1
2
3
4
5
6
7
8
# Site
title: SlowCoder
subtitle: DevOps News and Tutorials
description: DevOps, Cloud, Serverless, Containers news and tutorials for everyone
keywords: programming,devops,cloud,go,mobile,serverless,docker
author: Mohamed Labouardy
language: en
timezone: Europe/Paris

Start a local server with “hexo server“. By default, this is at http://localhost:4000. You’ll see Hexo’s pre-defined “Hello World” test post:



If you want to change the default theme, you just need to go here and find a new one you prefer.

I opt for Magnetic Theme as it includes many features:

  • Disqus and Facebook comments
  • Google Analytics
  • Cover image for posts and pages
  • Tags Support
  • Responsive Images
  • Image Gallery
  • Social Accounts configuration
  • Pagination

Clone the theme GitHub repository as below:

1
git clone https://github.com/klugjo/hexo-theme-magnetic themes/magnetic

Then update your blog’s main _config.yml to set the theme to magnetic. Once done, restart the server:



Now you are almost done with your blog setup. It is time to write your first article. To generate a new article file, use the following command:

1
hexo new POST_TITLE


Now, sign in to AWS Management Console, navigate to S3 Dashboard and create an S3 Bucket or use the AWS CLI to create a new one:

1
aws s3 mb s3://slowcoder.com

Add the following policy to the S3 bucket to make all objects public by default:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"Id": "Policy1522074684919",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1522074683215",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::slowcoder.com/*",
"Principal": "*"
}
]
}

Next, enable static website hosting on the S3 bucket:

1
aws s3 website s3://slowcoder.com --index-document index.html

In order to automate the process of deployment of the blog to S3 each time a new article is been published. We will setup a CI/CD pipeline using CircleCI.

Sign in to CircleCI using your GitHub account, then add the circle.yml file to your project:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
version: 2
jobs:
build:
docker:
- image: node:9.9.0
working_directory: ~/slowcoder.com
steps:
- checkout
- run:
name: Install Hexo CLI
command: npm install -g hexo-cli
- restore_cache:
keys:
- npm-deps-{{ checksum "package.json" }}
- run:
name: Install dependencies
command: npm install
- save_cache:
key: npm-deps-{{ checksum "package.json" }}
paths:
- node_modules
- run:
name: Generate static website
command: hexo generate
- run:
name: Install AWS CLI
command: |
apt-get update
apt-get install -y awscli
- run:
name: Push to S3 bucket
command: cd public/ && aws s3 sync . s3://slowcoder.com

Note: Make sure to set the AWS Access Key ID and Secret Access Key in your Project’s Settings page on CircleCI (s3:PutObject permission).

Now every time you push changes to your GitHub repo, CircleCI will automatically deploy the changes to S3. Here’s a passing build:



Finally, to make our blog user-friendly, we will setup a custom domain name in Route53 as below:



Note: You can go further and setup a CloudFront Distribution in front of the S3 bucket to optimize delivery of blog assets.

You can test your brand new blog now by typing the following adress: http://slowcoder.com :



Drop your comments, feedback, or suggestions below — or connect with me directly on Twitter @mlabouardy.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×