Bevorstehendes Event! AWS General Immersion Day | 10/20/2020

Serverless Jenkins - Scaling Jenkins to infinity and beyond - Part 2
by Jörg Herzinger on June 30, 2020

Serverless Jenkins Agents

In the second part (part 1) of this serverless Jenkins series we will take a look at the actual workers doing the heavy lifting of our jobs, be that building a Node.js web application or running Terraform for us. Here we face a common problem that every Jenkins Administrator knows: Maintaining agents is just as much, sometimes even more effort, as running Jenkins itself. You need a server that is accessible from Jenkins, run the agent on there and you need to install, maintain, upgrade and backup all the tools your application team needs. Usually these agent servers tend to become a huge mess with tons of undocumented global configuration lying around which makes them indispensable. The second problem with these agents is, that they scale horizontally, so you can simply add more servers, but since they are often undocumented about their configuration that is a very hard task. Vertical scaling is sometimes an option, but it also requires some effort and you have to take your agent down for it. So here is what we want:

  • We want to scale horizontally on demand
  • We want to have documented and reproducible build systems
  • We want as little maintenance overhead as possible
  • And we want all of that to be cheap There are various Plugins for Jenkins to add worker agents on demand using classic services like EC2. Depending on your use case any of them can fit, but since we know about Fargate already and it clearly beats all others in costs and scaling we will go with that one. So lets head over to Jenkins and install the Plugin we need.

In order for Jenkins to be allowed to spawn ECS Tasks we need to assign the Jenkins Task IAM permissions to do so. To simply things for this setup we chose to assign full administrative permissions to the Jenkins master, but we strongly encourage you to restrict those permissions to only the required ones for a production rollout.

data "aws_iam_policy_document" "ecs_task_jenkins" {
  statement {
    effect = "Allow"

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }

    actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role" "ecs_task_jenkins" {
  name               = "ServerlessJenkinsECSTaskContainerRole"
  assume_role_policy = data.aws_iam_policy_document.ecs_task_jenkins.json
}

resource "aws_iam_role_policy_attachment" "ecs_task_jenkins" {
  role       = aws_iam_role.ecs_task_jenkins.name
  policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
}

Now that Jenkins is allowed to create ECS tasks as agents all that is left for us is to configure it. Lets head over to the Jenkins Management where we can now "Manage Nodes and Clouds". If the IAM permissions are set correctly everything should be available as drop down options, so there is not much configuration to copy and paste.

We just specify a name, region and the cluster to use. We don't have to choose credentials since they are given by the ECS Tasks IAM profile. The interesting part lies in the agent configuration.

Here we have to do some configuration. Most notably we specify Fargate with "awsvpc" as a network which means that each agent uses an IP in our VPC allowing access to our internal services and more. You might also want to specify subnets and security groups to use for agent tasks.

Agent to master connection

We did not yet talk about how this agent connects to Jenkins. If you want to know more in depth how this works you can read about it here. For us now we only need to know that the Agent is started with a URL and a secret that the Plugin passes for us which allows the connection. Since this connection is done through a binary protocol called JNLP which does not allow proxies in front of Jenkins we have to find a way around this. So here is the idea

We can't use the application load balancer because of the OpenID authentication that we configured and because it simply is not supported by Jenkins. So we will be using an additional network load balancer that the agent can use for direct connection to the web interface and to the JNLP port 50000. Since this opens up access to Jenkins through the new network load balancer we have to think about some security too.

Additionally to the client secret that the agent uses we will attach a security group to the agents (see screenshot above) and allow access to Jenkins only from this security group and the application load balancers security group. In terraform this looks like this

# After creation copy the ID of this secruity group to your Jenkins Cloud configuration
resource "aws_security_group" "jenkins_agent" {
  vpc_id = module.vpc.vpc_id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "for Jenkins Fargate agents"
    Project     = local.project
    Environment = local.environment
  }
}

# Allow access from application load balancer ad agents
resource "aws_ecs_service" "jenkins" {
  name            = "jenkins"
  cluster         = aws_ecs_cluster.container_instance.id
  task_definition = aws_ecs_task_definition.jenkins.arn
  desired_count   = 1

  network_configuration {
    subnets = module.vpc.private_subnets
    security_groups = [aws_security_group.ecs_lb.id, aws_security_group.jenkins_agent.id]
  }

  ...
}

And to use this we configure the plugin to use the network load balancers internal URL to connect from the agents to Jenkins

Now lets test this setup with a simple job that runs "echo Hello world" in our new on demand scaleable Jenkins agent setup.

Build tools for agents

Of course you will need your build tools available on the agent. For this you can either start from the current inbound-agent Docker image at Github or roll your own. To get you started RedHat has a nice collection of examples available.

Next, head over to part 3 showing how to run jobs on actual EC2 instances.