#11: The Ultimate Artifact

From Junior to CTO Weekly Thoughts

Jul 23, 2023

The Ultimate Commit, which I described in the previous issue, defines how to make meaningful commits that drastically improve your automation, code review, and program comprehension. A commit by its nature is just a static and identifiable representation of sources: code, API schema, and configuration files. We usually do not use sources as runnable items, to become runnable they need at least resolved and loaded dependencies, but in favor of reusability, we prefer to store only pointers to dependencies among sources (gradle files, requirement.txt, package.json, etc.). We pack artifacts that either contain dependencies or expect that they are provided. An artifact can be used simply without awareness of development specifics, but at the same time it represents the logic we describe in sources. In this weekly issue, we discuss the perfect one - The Ultimate Artifact.

The same as for The Ultimate Commit all my practice is based on Git as the most popular VCS, but if you do not use Git but understand how to map Git practices to your VCS, the issue might be valuable for you as well.

Classification of The Ultimate Artifacts

Unfortunately, we cannot define a single artifact type by saying, “Docker image is the single thing you need in any scenario,” so let’s classify artifacts first. That might be obvious, but we have to align here.

Library

The main motivation for this kind of artifact is to provide out-of-the-box logic for other artifacts.

The key properties:

not executable - we cannot execute provided logic without calling code
reusable in other artifacts - there should be a way to include provided logic to another artifact

Examples:

.jar file
npm package
DLL file

Application

Application as a term might have different meanings in different contexts. In the context of this article, we are talking about a single executable that might run as a part of a distributed system or be self-sufficient.

The topic of Application typology (wider meaning) is out of the scope of this article. I will cover this in one of the next issues. Subscribe not to miss.

The key properties:

executable - can be directly executed
might be reusable as a part of a bigger distributed system
has a single entry point or convention how to run

Examples:

a docker image
.jar file
.war file
grep

Templates

Templates look similar to libraries, but the key difference is that users fully control them. Everyone can potentially override any piece of provided logic. Because of that property, they are distributed in the form of sources.

The key properties:

not executable - you could create an application from a template and execute but it is impossible to execute a template directly
overridable - any piece of a template can be modified
a form of sources - usually, we can use source form only to communicate with a template user in the same language
reusable in other artifacts - the same template helps to create different artifacts
depending on implementation, it removes boilerplate or at least simplifies its creation
the way of implementation defines how complex refreshing will be

Examples:

website templates
CI/CD templates

We classified artifacts, so let’s keep this in mind and proceed with the properties of the Ultimate Artifact. The first one is the Single Kind.

Properties of the Ultimate Artifact

The Single Kind, The Single Use Case

The Ultimate Artifact will never mix more than one kind (Library/Application/Template). Even if it looks useful because we have one executable to run a few different things, do not mix them. Even if we ignore the fact that this approach breaks the single responsibility principle, it introduces a big confusion for a user who has a particular use case to cover. The fact that a user gets a new entry point without awareness might introduce new problems and implicitly increases the attack surface.

Identifiable

The Ultimate Artifact is based on The Ultimate Commit. Any Ultimate Commit might produce zero or more artifacts for different reasons:

an artifact was not provided because logic was not changed, for example, documentation was updated only
provided separated kinds of artifacts from the same code base (API schema + server app; a few apps in monorepo)

So we can say that desired artifact’s name, in conjunction with commit’s hash should uniquely identify the artifact (being primary identification) among others provided from the same source repository. Commit’s hash only is not enough because of potential collisions globally as well as non-unified behavior with monorepos. To provide globally unique identification, the ultimate artifact has a group name (aka namespace) at the beginning to distinguish a company or a project in the company.

This is also important to mention that a branch name of sources is not a part of primary identification as well as tags because a few branches and tags might point to the same commit, so it doesn’t make sense to choose “the best pointer”.

Searchable

Primary identification is not an exclusive option to search for an artifact in the artifact registry. The Ultimate Artifact uses the branch name to point to the latest artifact built on the branch. For example, if a branch name equals a feature ticket code, that helps to find the most recent changes in the development of a particular feature. This also works for a few components under development simultaneously. For example, the statement “Components A, B and C are impacted in feature XX-1234” means that to see the latest changes before release you should just deploy A, B, C components with tag XX-1234. Some artifacts support multi-tags (docker), for others, only limited flexibility is available (maven).

One of the options to improve searchability is to tag commits that originated a particular artifact with environment type (production / staging) to speed up the hotfix procedure and find what exactly the state of code should be changed. This point will be covered in detail when we will discuss The Ultimate CD Process.

Searchability goes hand in hand with versioning because an artifact version is one of the options to search.

Versionizable

Identification is not versioning. Versioning helps to identify a particular artifact. All artifacts are identifiable but not all are versioned. For example, there are artifacts that are created during development, they require primary identification and searchability but not a real version.

Semver is the primary option to version The Ultimate Artifact. The importance of Semver for libraries is difficult to overestimate. Probably all users of libraries would like to know will library updates break something or only introduce new features/fixes.

Some types of artifacts (like maven) have versioning as a primary option to identify artifacts because tagging is not available. The limitation of this approach is that it is impossible to take an artifact for any commit on a feature branch, and versioning of merged changes is the single option, so you cannot choose not to version a commit.

Auto-versioning is available for The Ultimate Artifact, thanks to The Ultimate Commits. I will cover that topic in the next issue, where we will talk about The Ultimate CI Pipeline. Subscribe not to miss.

Idempotent

The same sources should not provide a new artifact. If we built a commit on a default branch and added a new tag, we should not rebuild a new artifact for the new tag. Rather, we should add a new pointer to the already-known artifact. This is one of the important motivation points to use hash-based identification in general - to not generate new artifacts for the same commit.

This also means that if you use fast-forward (FF) merge and nobody introduced changes to the default branch before you tested an artifact, you will get exactly the same one in production.

Configurable

The Ultimate Artifact should be configurable according to artifact usage via command line attributes / environment variables / configuration files. The Ultimate Artifacts should be ready for the fact that configuration changes in time, so if the Ultimate Artifact’s startup time is too long for some reason, the artifact should be ready to refresh values in runtime (if acceptable). Command line attributes might be used to emphasize that the value can be updated only with a restart. At the same time, it doesn’t mean that env variables cannot have immutable values.

Environment type-independent

The Ultimate Artifact is not relying on the environment type (production / staging / etc.). If variables should have different values depending on an environment type, this part of the configuration should be moved outside of The Ultimate Artifact.

Deployment-independent

The Ultimate Artifact doesn’t contain information on how to deploy the artifact. Otherwise, deployment changes will impact the artifact’s version what does not have any sense and negatively impacts the separation of concerns.

Reusable (even partially)

The Ultimate Artifact defines the requirements to launch/use. If requirements are satisfied, the artifact can be reused. The fewer requirements that need to satisfy, the more reusable the artifact will be.

This is important to mention that some artifacts might support partial re-usage, for example, layers in docker assist you in sorting your code in the order of increasing change probability: framework → dependencies → custom code. That allows not to update and reuse the first two layers if only custom code were updated.

Integratable

The Ultimate Artifact must provide UI (incl. CLI) / API or both. It is possible either to integrate seamlessly into a business process as a tool or provide a clean, standardized API that follows one consistent approach and is user-centric. The topic of The Ultimate API is out of the scope here but probably can be covered later.

Static

Versioned artifacts cannot be changed. But non-version tags can be moved. For example, if we tagged an artifact with a branch name as the latest one, a new commit to the same branch moves the tag, which should override the artifact.

Secure

Before The Ultimate Artifact can be used, we must perform at least a security check which confirms the artifact doesn’t have obvious leaks. Security scanners like Snyk are helpful for that purpose.

That is probably obvious but I have to mention that to guarantee that the Ultimate Artifact is the same as released, we have to sign it (an example for gradle).

Stack-agnostic

The artifact is an out-of-the-box solution that you need to configure and launch. The less external awareness about the stack you have, the less attack surface you make, and readiness for evolution is practiced. This is not a strict rule but rather a recommendation.

Here are my considerations regarding the Ultimate Artifact. In the next issue, we will discuss The Ultimate CI Pipeline, an automation process that will help you convert The Ultimate Commit to The Ultimate Artifact.

Thanks for reading From Junior to CTO Weekly Thoughts! Share with your colleagues

DevTower

Discussion about this post

DevTower

#11: The Ultimate Artifact

From Junior to CTO Weekly Thoughts

Classification of The Ultimate Artifacts

Library

Application

Templates

Properties of the Ultimate Artifact

The Single Kind, The Single Use Case

Identifiable

Searchable

Versionizable

Idempotent

Configurable

Environment type-independent

Deployment-independent

Reusable (even partially)

Integratable

Static

Secure

Stack-agnostic

Video of The Week

Discussion about this post