Add prefix to CI Job Tokens

Proposal

Add a prefix to CI Job tokens. Much like Personal Access Tokens with the glpat- prefix, adding a prefix to CI job tokens would make it easier for secret detection and incident response to be effective.

Current behaviour

When a pipeline job is about to run, GitLab generates a unique token and injects it as the CI_JOB_TOKEN predefined variable.
The token has the same permissions to access the API [only on specific endpoints] as the user that caused the job to run.
The token is valid only while the pipeline job runs. After the job finishes, you cannot use the token anymore.

https://docs.gitlab.com/ee/ci/jobs/ci_job_token.html

In practice a CI_JOB_TOKEN can live for up to one month, IF the project is configured with the max build timeout AND the job runs that long. By default it's 1hr. Jobs handled by SaaS runners on GitLab.com time out after 3 hours, regardless of the timeout configured in a project.

If a threat actor is able to steal or find a still-active token, they can perform the actions listed in the docs. Some high level items:

Clone a private repo
The attack described here (currently )
Read / publish packages
Read artifacts, plans, environments, etc
(Maybe) Trigger another pipeline and get another CI_JOB_TOKEN

Current Format

Currently CI_JOB_TOKEN is constructed in the form "#{partition_id.to_s(16)}_#{Devise.friendly_token}":

Partition ID could be any number, rendered in hex. E.g. partition 123 -> 7b or 321 -> 141.
- The design blueprint indicates this could be up to four characters / max 65535.
- A spec helper indicates it could be up to 99999 -> 1869f
Devise friendly token is (pretty much) ^[\w-]{20}$

Proposed Format

GitLab Ci Build Ttoken.

We use the term build token because that's what the model is (app/models/ci/build.rb). An alternative is to use gljt for job token which is a common name for this record, e.g. in the predefined job variables. But, seeing as it's for detection by automated means and not really supposed to be userfriendly per se, sticking to the model abbreviation convention seems like a good idea. See also: https://docs.gitlab.com/ee/development/secure_coding_guidelines.html#token-prefixes

The resulting detection regex would be /^glcbt-[\h]{1,5}_[\w-]{20}$/

glcbt is not being used in gitlab-org (which includes gitlab-org/security-products/analyzers/secrets).

Risks of making a change

Breaking partitioning / uniqueness across partitions
- Likelihood: Nil. See #426137 (comment 1698221334)
- ~~Impact: TBD - corrupt database perhaps, incorrect distribution across partitions, broken builds...?~~
- Mitigation: testing
- (Alternative, not desirable) If needed we could break from the pattern of other token prefixes and instead put the new prefix in the middle: PARTITION_PREFIX_glcbt_TOKEN. That should still be detectable given we have a static component in there.
Breaking CI jobs
- Likelihood: Low. Unless there's something that breaks due to increased length, any GitLab consumer of CI_JOB_TOKEN should be unaffected. (No validations will be added that might reject unprefixed tokens). Nothing seemed to break when the partition prefix was added.
- Impact: broken builds, customer dissatisfaction, rollbacks, comms
- Mitigation: use group-based feature flag and test on GitLab.com gitlab-owned groups first (e.g. gitlab-org).
Breaking third-party systems by adding the prefix
- Likelihood: Low. This would only occur if third parties (who shouldn't need the CI_JOB_TOKEN anyway?...) assume it takes the current form. But, again, nothing seemed to break when the partition prefix was added in Dec 2022.
Breaking existing CI_JOB_TOKEN masking
- Likelihood: TBC
- Impact: see "current risks" above, but Very Bad™️
- Mitigation: testing
Breaking CI jobs by improving in-job masking
- Mitigation: do in a separate issue, if it's even needed. (CI_JOB_TOKEN is already masked)
Making it easier for malicious entities to detect and misuse
- This risk is true for all the other prefixes we've added.
- The current risk is somewhat greater for CI Job Tokens since the current format is already somewhat predictable
- Mitigation: we add some frontend detection in the MR that introduces the feature, and we follow closely with issues to update other scanners

TODO

Understand what parts of the codebase are extracting the partition prefix from the token (if any)
Validate that adding a prefix to the existing prefix is feasible
Update Ci::Build prefix
- Create a group-based FF & rollout issue
Create follow up issues to update our scanners

Edited Dec 20, 2023 by Nick Malcolm