... about GitLab Caches and Artifacts (and which to use for node_modules)
When you deploy a NodeJS application with GitLab, one of the first steps is the installation of the dependencies. Many subsequent pipeline jobs will depend on the installation result, which is the node_modules folder (or several folders).
GitLab jobs, however, runs completely independent from each other and don’t share any resulting output by default. So how can subsequent jobs use the node_modules folder?
In GitLab there are two ways to hand over files to other jobs: Artifacts and Caching. Here are some of the basic characteristics.
Artifacts
- Are used to pass files between jobs within the same pipeline and thus:
- Can not be used in different pipelines.
- Can be downloaded in the GitLab user face.
Caches
- Are saved for a specific amount of time and can be used across pipelines.
- Are mainly used to speed up pipeline runs (e.g. dependency installation)
Using a cache for gitlab installation seems like the way to go here, so how does it look in a .gitlab-ci.yml file? It’s pretty simple: A cache needs two basic properties: key and path:
install:
stage: .pre # Predefiend GitLab stage. Always runs in the beginning
cache:
key: my-npm-dependencies
paths:
- node_modules
script:
- npm ci
So in this example npm ci will only run if the cache doesn’t exist. Pretty simple, right? There’s a catch though: What if the cache does exist, but does not reflect the currently needed dependencies any more (because, in the case of NodeJS, you have an updated package.json file? This could lead to inconsistencies because subsequent jobs would run on an outdated cache.
In this case you could just make your cache key more specific and use the current commit hash:
...
cache:
key: ${CI_COMMIT_SHORT_SHA}-my-npm-dependencies
...
This would bind the cache to the current commit and make sure it’s always updated. But it would also mean that everytime the pipeline is started, it will install the dependencies, even if they haven’t changed at all. It would make node_modules available to subsequent jobs, but wouldn’t speed up anything for subseqent pipelines.
Fortunately there’s a better way to do it. Create a cache key from the dependency file itself:
...
cache:
key:
files:
- package-lock.json
paths:
- node_modules
...
So in this scenario if the package-lock.json file has changed, the dependencies will be freshly installed. But if it hasn’t it will just use the existing cache (which is still perfectly valid).
Links
- The official GitLab Cache-Documentation: Caching in GitLab CI/CD
- A nice blog post with some easy optimization tips: Difference between caches and artifacts in GitLab CI