Today I Learned ...

... about GitLab Caches and Artifacts (and which to use for node_modules)

When you deploy a NodeJS application with GitLab, one of the first steps is the installation of the dependencies. Many subsequent pipeline jobs will depend on the installation result, which is the node_modules folder (or several folders).

GitLab jobs, however, runs completely independent from each other and don’t share any resulting output by default. So how can subsequent jobs use the node_modules folder?

In GitLab there are two ways to hand over files to other jobs: Artifacts and Caching. Here are some of the basic characteristics.

Artifacts

Caches

Using a cache for gitlab installation seems like the way to go here, so how does it look in a .gitlab-ci.yml file? It’s pretty simple: A cache needs two basic properties: key and path:

install:
  stage: .pre # Predefiend GitLab stage. Always runs in the beginning
  cache: 
    key: my-npm-dependencies 
	paths:
	  - node_modules
  script:
    - npm ci

So in this example npm ci will only run if the cache doesn’t exist. Pretty simple, right? There’s a catch though: What if the cache does exist, but does not reflect the currently needed dependencies any more (because, in the case of NodeJS, you have an updated package.json file? This could lead to inconsistencies because subsequent jobs would run on an outdated cache.

In this case you could just make your cache key more specific and use the current commit hash:

  ...
  cache:
	key: ${CI_COMMIT_SHORT_SHA}-my-npm-dependencies
  ...

This would bind the cache to the current commit and make sure it’s always updated. But it would also mean that everytime the pipeline is started, it will install the dependencies, even if they haven’t changed at all. It would make node_modules available to subsequent jobs, but wouldn’t speed up anything for subseqent pipelines.

Fortunately there’s a better way to do it. Create a cache key from the dependency file itself:

...
  cache:
    key:
      files:
        - package-lock.json
	paths:
	  - node_modules
...

So in this scenario if the package-lock.json file has changed, the dependencies will be freshly installed. But if it hasn’t it will just use the existing cache (which is still perfectly valid).

Tags: