DevOps 20 Sep 2023

Tips and tricks for accelerating your pipelines on GitLab

When practicing continuous integration and continuous deployment (CI/CD), optimizing build, test, and deploy times becomes crucial. GitLab, a popular platform for managing CI/CD pipelines, offers several strategies to speed up your workflows. Let’s explore some tricks you can use to accelerate the entire process.

Configure jobs from different stages to run in parallel

The default case for stages in GitLab is that they run sequentially. The jobs in one stage are waiting for all the jobs on previous stage to finish before they start running. However, we can configure jobs of different stages to run in parallel, if these jobs do not have any dependency with each other, for example the one stage needs no data (artifacts) from a previous one.

To allow one or more jobs to start outside of the pipeline order you can use the needs keyword. If you define the needs: [] inside the .yaml configuration of a job this specific job has no dependency to other jobs, and it will start as soon as possible in parallel with the first job of the first stage. Doing this way you can have a job inside one stage, for a clearer structure of your pipeline, however, you want this job to start independently improving overall pipeline speed.

my-job:
  stage: my-stage
  script:
    - echo "This job runs independently"
  needs: []

Accordingly, you could define inside the needs a list of all the other jobs that one job depends upon, so this job only waits for these jobs to finish, before it can start. You can find more information about the use of the needs keyword and the theory behind the directed acyclic graph (DAG) on the official documentation.

Identify which jobs to run for different types of pipelines

GitLab supports various pipeline types, such as merge request pipelines, UI-triggered pipelines, and push pipelines. Each one of the different pipelines has different purpose, so when you start designing your pipelines you might want to ask yourself, which jobs should run for each pipeline type. For example, you most probably do not want the deploy-job to run with the merge request pipeline, so you can define a rule that deactivates this job.

For that you will have to use the rules keyword inside a job. The rule could look similar to this, for the merge request pipeline:

deploy-job:
  stage: deploy
  script:
    - echo "This is the deploy job"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: never

The when: never property prevents the job from executing during merge request pipelines. Avoid using the deprecated only and except keywords.

The official documentation about the rules keyword can be found here.

Download only the needed artifacts from other jobs inside another job

By default, every folder or file which was made accessible with the artifacts keyword inside a job is made available to all other jobs. If you want to narrow down what each job requires, then you can use the dependencies keyword and define a list of all dependencies of the current job. That way, only the artifacts created from the listed jobs are getting downloaded. You can find more information about dependencies here.

Use caching for faster accesses to common data between jobs

Different to artifacts, caches pose a better solution for common data that are used by many jobs in different stages. An example could be the NuGget or node_modules files which where restored on one job and are now needed on another one.

Specifically for .NET applications you can set up caching by using the configuration described in the official repository of GitLab.

Use the Fast Zip option for zipping and uploading data

Uploading artifacts to GitLab servers can be time-consuming. To speed up this process, consider using the Fast Zip option. This technique involves zipping the artifacts using the FF_USE_FASTZIP global variable. For more details, refer to this StackOverflow question.