License Metrics
We Are Transitioning to a New Licensing Model
Starting from version 2024.2, Alchemist is using a new licensing model that fully leverages the power of Alchemist's deep understanding of code and data. This new model is based solely on the token count metric described below.
The Problem with Counting Lines of Code¶
We believe that Alchemist cost must reflect the value to the end user. Most tools on the market rely on metrics such as lines of code or "complexity". That's because it is easy for a tool to implement, but they neither represent the true size of the project nor the added value of automation.
Even worse, these popular metrics sometimes doesn't make sense at all. Consider the following two code snippets:
-- short comment
SELECT A, B, C FROM T1
/* very verbose comment with helpful description
of the query that spans multiple lines */
SELECT
A,
B,
C
FROM T1
Despite both of them being exactly the same in terms of the result and conversion complexity, the second would cost you 3.5X more than the first one!
Same applies to complexity metrics. For example, manually rewriting a job with one very complicated step may take less time than a job with ten simple steps, but a typical complexity metric would price it the other way around!
In practice, users of tools that rely on these metrics are often forced to pre-process the source (remove whitespace, comments, etc.), which not only complicates the migration process and testing but also reduces the quality of the output. Things like comments and formatting are important and should be preserved.
Alchemist Licensing Model¶
Thanks to the way Alchemist works, we can calculate and rely on our "super" metric, the Source Token Count - which is based on the parsed AST tree and is independent of the formatting, comments, and other non-essential parts of the source.
In addition to that, we are not charging for identical source files. If two pieces of code are identical in their AST tree, they will be counted as one.
We Are Transitioning to a New Licensing Model
Between version 2023.5 and 2024.2 we've also experimented with a second metric, based on the "convertability" of the source. However, it proved to be a poor fit for the majority of our users and we are transitioning to the new model based solely on the source token count.
Source Tokens¶
The true "size" is essentially a minimum number of significant "tokens" necesssary to represent the given source. A token can be as small as one symbol (e.g. +
operator), or as big as hundreds of symbols (e.g. single comment is one token).
Going back to our example above:
/* very verbose comment with helpful description
of the query that spans multiple lines */
SELECT
A,
B,
C
FROM T1
SELECT
, 3 for A, B, C
, 1 for FROM
, 1 for T1
).
This means:
- The size is entirely independent of the formatting
- The size is the same whether your users use short or long names for tables, columns, variables etc.
- All delimeters, whitespace are not counted
- If a token is optional, it is not counted as well (e.g. in SAS
run;
is optional after a step, so we don't count it)
For non-code sources, we convert the source into an equivalent "virtual" code representation and then calculate equivalent tokens (roughly what would this take if written as code).