License Metrics

We Are Transitioning to a New Licensing Model

Starting from version 2023.5, Alchemist will use a new licensing model that fully leverages the power of Alchemist's deep understanding of code and data. This new model replaces the previous model, which was based on lines of code (for code sources) and job complexity (for non-code sources). We are working to make the transition as smooth as possible.

The Problem with Counting Lines of Code¶

We believe that Alchemist cost must reflect the value to the end user. Most tools on the market rely on metrics such as lines of code or "complexity". That's because it is easy for a tool to implement, but they neither represent the true size of the project nor the added value of automation.

Even worse, these popular metrics sometimes doesn't make sense at all. Consider the following two code snippets:

-- short comment
SELECT A, B, C FROM T1

/* very verbose comment with helpful description
of the query that spans multiple lines */
SELECT
    A,
    B,
    C
FROM T1

Despite both of them being exactly the same in terms of the result and conversion complexity, the second would cost you 3.5X more than the first one!

Same applies to complexity metrics. For example, manually rewriting a job with one very complicated step may take less time than a job with ten simple steps, but a typical complexity metric would price it the other way around!

In practice, users of tools that rely on these metrics are often forced to pre-process the source (remove whitespace, comments, etc.), which not only complicates the migration process and testing but also reduces the quality of the output. Things like comments and formatting are important and should be preserved.

Alchemist Licensing Model¶

Thanks to the way Alchemist works, we can calculate and rely on two "super" metrics:

Source Size - a special metric based on the parsed AST tree
Target Dialect Support Level - a metric that reflects how much of the source is supported in full-auto mode versus semi-auto mode using pattern matching & templates.

Thus you pay proportionally to the size of the source and the amount of manual work required to convert it.

Source Size¶

The size is essentially a minimum number of significant "tokens" necesssary to represent the given source. A token can be as small as one symbol (e.g. + operator), or as big as hundreds of symbols (e.g. single comment is one token).

Going back to our example above:

/* very verbose comment with helpful description
of the query that spans multiple lines */
SELECT
    A,
    B,
    C
FROM T1

the size will be 7 tokens (1 for the comment, 1 for SELECT, 3 for A, B, C, 1 for FROM, 1 for T1).

This means:

The size is entirely independent of the formatting
The size is the same whether your users use short or long names for tables, columns, variables etc.
All delimeters, whitespace are not counted
If a token is optional, it is not counted as well (e.g. in SAS run; is optional after a step, so we don't count it)

For non-code sources, we convert the source into an equivalent "virtual" code representation and then calculate the size.

Target Dialect Support Level¶

Metric is in Beta

Accurately calculating the support level is a complex task. We are constantly improving the accuracy in various scenarios. During beta period, for the license cost calculation purposes we will deliberately reduce the support level by 10-20%, to make sure no overestimation is happening.

The metric is calculated by running full conversion in a dry-run mode. In this mode we ignore any user defined templates (so we only count out of the box support) and don't generate any output. We then calculate the percentage of the source (in AST Nodes) that was marked as supported by the engine.

The additional benefit for you as a user is that we lock this number in at the moment of license purchase. So as we improve the support level with every release, you will get the benefit of it without any additional cost.