continuous integration - How does CI affect semantic versioning?

Question

Welcome To Ask or Share your Answers For Others

continuous integration - How does CI affect semantic versioning?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

continuous integration - How does CI affect semantic versioning?

In Countinous Delivery book, it's recommended to keep everything - including CI scripts - in the version control. Actually, current CI systems like gitlab CI already follow this rule of thumb and search for CI scripts in the same codebase.
On the other hand, we are versioning our codebase (and it's built artifacts) whenever it changes. And we follow semantic versioning for that; incrementing patch field for bugfixes, minor for non-breaking features, and so on...
And we make sure the version is incremented between commits by checking it in the CI.
But, there are commits that only change the CI scripts; i.e. adding an analysis job, optimizing another, etc.
My question, after this long boring preface, is that what is the best practice to versioning such changes to the CI? Since it possibly can affect the final built artifact (e.g. changing a build flag in the CI job for optimization or ...).
Is it ok to increment the version in this case?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:33:59+0000

Git is a revision control system. Every time you commit something to a git repo, it labels the content of the repo with a content hash value that represents that version of the repo. Semantic versioning of a git repo's content is redundant and pointless. The whole point of SemVer is to provide a means for producers to communicate risk to consumers. In other words, semantic versioning is intended for build product labeling, not the bits that go into producing the build.

If you attempt to apply SemVer semantics to the repo, you are labeling the product inputs, not the product itself. You should not apply a SemVer string until after all unit/regression/acceptance tests have been performed. How else can you have any certainty whether the code/build-script changes have broken anything?

Pre-build labeling cannot work. Build processes that are capable of reproducing the exact same output twice in a row, are extremely rare, if any exist at all. It is a violation of best practice to have multiple API's/packages in the world with the same SemVer string attached to them. If you label the repo content and then forward that label to the build output, every time you run the build, you produce a package with different content. There will always be some risk that more than one of those outputs will be released into the wild. Many security conscious consumers pay close attention to the content hash of packages they consume. Detecting that a particular producer has released multiple package hashes without bumping the version number, will raise red flags and lead to mistrust of that producers internal processes.

This is a very deep topic that can't be fully covered here. Other issues to consider are OS/Compiler/Tool chain updates. Will you also be committing the entire build tool chain to the same repo? This is an untenable approach, full of hazards I cannot fully enumerate, without taking a few months off work to document them.

Best practice:

Use semantic commit messages that clearly state the developer's intent.
Validate build outputs prior to packaging/labeling.
Always keep humans in the loop, for non-prerelease publications.

Just for clarity, let me add that maintaining build scripts and tool manifests in the repo is considered a best practice. It ties the versions of your scripts and tools, to the versions of the code you are building. Git does do this job quite well, by creating a commit hash that encompasses the state of the entire repo (minus the tags if I recall correctly). But there will be issues eventually, with older versions of tools, being withdrawn from file shares/feeds, particularly when they are found to create security vulnerabilities.

It will sometimes be the case, that older versions of your products, cannot be reproduced using the earlier build process. Checking in the binaries is often promoted as a fix for this issue, but I would argue that it's an anti-pattern. Binaries you are likely never going to want or need in the future, should not be stored in your repo. It just clogs everything up.

Consider using an alternate archival system. Maintaining a separate archive of older tools isn't a bad idea, but you will often find that you simply can't run them on current hardware and OS's, without significant reconfiguration of build machine(s) and re-introducing well known security risks. You should frequently prune such an archive, based on the latest known risks and weighing the cost of having to do some additional work, if/when the day ever comes, that you need to build from a really old commit hash.

It is better to maintain an up-to-date build system, that can build all of your code base, back to some reasonable point in its history. That point is usually the oldest bits that you are willing to actively support with bug fixes.

Categories

continuous integration - How does CI affect semantic versioning?

continuous integration - How does CI affect semantic versioning?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags