Skip to content

Template Configurations for Alchemist

Template configurations in Alchemist allow you to extend or modify out of the box conversion results using custom conversion templates. Templates are test files utilizing the Jinja templating engine to generate the desired output based on the provided configuration settings.

Understanding Templates

To use a template you need two things:

  • A template configuration in Alchemist configuration file.
  • A template text. It may either be stored in a file or provided inline in the configuration.

The template text is processed by the Jinja templating engine. Alchemist provides a number of built-in variables and functions that can be used in the template:

  • The node matched and being replaced by a template is always available in the template as node.
  • If match patterns are used, the match dictionary will have each value captured by a pattern (key matching the capture key).
  • render - a function that will defer back to Alchemist to convert a node. Usually used to convert children of the matched node.
  • get_node_by_id - a function that will return an AST node by its ID
  • now - current datetime in UTC.
  • none_rendered_str - a special object, that should be used to indicate that a node should not be rendered

For a full reference on Jinja syntax, see the Jinja documentation.

A target dialect may define additional template features. For more information, refer dialects docs.

Template Config Options

Option Description
template Name of the Jinja template file to use. If no extension is given, all of the following will be tried: sql, py, yml, yaml, jinja. Only one of template or inline must be set, except for the skip output_type - for which neither is necessary.
inline Inline template string to use. If inline is not empty, template must not be set.
output_type Type of output to generate. For supported output types see output types.
output_relative_path Relative path for the output file. Can be a Path, str, or None. If None, the template name + ID is used. Can also include {var_name} placeholders, which are substituted with a number of format variables, including all node properties and source_relative_path. This option is ignored for output types other than file.
match_nodes List of configs used to match the current node by type and/or attribute values. It uses the same syntax as Node Filter Configs. If any of the configs include this node (or none of them exclude it) - it is considered a match.
match_ids Deprecated! Use match_nodes instead. List of node Content or regular IDs to match (content_id and id attributes).
match_paths List of XPath match patterns. These patterns define which nodes should be processed by the template. For more information on how to write XPath patterns and examples, see the XPath Match Patterns section.
match_patterns List of tree match patterns. For more information on how to write match patterns and examples, see the Match Patterns section.
tags List of template tags. These tags are used to apply template filters.
model_output_type One of the supported JSON model names when output_type is set to MODEL. List of models depend on the dialect, refer to the dialects docs.

Example Configuration

converter:
  template_configs:
    - template: full_program
      output_type: "file"
      match_nodes:
        - include: "e237a0654a8e79ad04cc8050ea0cf235acb2ebaeeaca1a2681a6f2d199659fa6"
      output_relative_path: "{source_relative_path}/{source_file_name}.py"
    - template: intnx_runtime
      output_type: "snippet"
      match_paths:
        - SASSQLProcedure//SASCallExpression
      match_patterns:
        - '(SASCallExpression @func_name="INTNX" @arguments=[() -> interval, () -> start_from, () -> increment, () -> alignment])'
        - '(SASCallExpression @func_name="INTNX" @arguments=[() -> interval, () -> start_from, () -> increment])'
      tags:
        - "runtime"
    - inline: "SPLIT_PART(COL_NAME, '/', 2)"
      match_nodes:
        - include: "id_of_scan_expr"
    - inline: "some valid regex expression"
      match_nodes:
        - include: "id_of_prxchange_expr"

XPath Match Patterns

XPath match patterns are used to define which nodes in the AST should be processed by the template. XPath patterns in Alchemist are similar to XML XPath and consist of a sequence of paths. Each path is in the format @parent_field_name[index]type_name_or_pattern and is separated by a slash.

  • The slash is optional at the start of the XPath. If omitted, it is the same as using //the rest of the path.
  • // is a wildcard (anywhere) that matches any path.
  • @parent_field_name and index are optional.
  • type_name_or_pattern is optional, except for the last path in the XPath. It may be a type name or a a single node pattern (see below).

XPath Examples

Note: the following examples do not use real AST node types, which depend on the dialect. To get the real node tpyes you can use AST Explorer

  • CallExpr/@arguments[0]Id matches the first argument of a call expression at any level in the tree, which is of type Id.
  • (CallExpr @func_name="math.*")/@arguments[*]Id similar to the above, but matches any call expression whose function name matches the pattern math.*.
  • @arguments[*]CallExpr matches any CallExpr as long as it is an argument (stored in a field with the said name) of a parent expression.
  • CallExpr matches any call expression at any level in the tree.

Match Patterns

Match patterns are used for two main purposes:

  1. To match a particular node in the AST based on its attributes and children.
  2. To extract values from within the matched node subtree for use in the conversion template.

Patterns are written using a syntax similar to a cross between S-expressions and regex. To understand the syntax and usage, let's explore some examples of valid patterns.

Pattern Syntax

In match patterns, the following syntax is used:

  • Parentheses () are used to match an AST Node. What's inside these brackets define a node pattern. The minimal possible pattern is (), which will match any node.
  • The left bracket may be followed by one or more node types (class names) separated by |. Each class name will match the type itself and any subclasses. If 2+ classes specified via (Class1 | Class2) this works as OR
  • The rest of the content within the parentheses defines attributes and children. To match an attribute by value and/or capture it's value use the following general syntax @attr_name [= VALUE] [-> capture_key] where = VALUE and capture_key are both optional.
    • VALUE may be one of the following
      • a nested node pattern
      • A plain string or regex that will match against string represnetaion of the attribute. Regex uses Python syntax
      • A None keyword, that will match when the value is, well, you guessed it - None.
      • A variable $capture_key. This will match value against a previously captured value. E.g. (SASCallExpression @arguments=[(*) -> left, $left]) will match any function call that has two identical arguments
      • A sequence of values in [] brackets, including empty sequence []. Within the sequence you can define multiple values, capture_keys and even use regex-like quantifiers, like so: [VALUE -> capture_key, ANOTHER_0_OR_MORE_TIMES* -> another_capture_key, ()* -> tail]. capture_key is optional.
    • -> capture_key is used to capture the value of the attribute. The value is available in the pattern itself via variables (see above) as well as as in template match dictionary match["capture_key"]. If you want to capture multiple values into a list, use +, e.g. (@args=[() -> +args, () -> mid_arg, () -> +args]) will first and last arguments into args list.

Pattern Examples

As an example of a typical usage, let's see how pattern can be used to find and convert all SAS macro calls named GET_VIPS with 2 or more arguments. The following pattern will match such calls:

'(SASMacroCallExpression @arguments=[() -> call_type, () -> name_col, ()* -> tail_args] @func_name=(SASCodeName @identifier="GET_VIPS"))'

And may then be used in a template like so:

udf_vips_{{match["call_type"].value.value | lower}}({{match["name_col"].value.value | lower}},
{%- for arg in match["tail_args"] -%}
    {{- render(arg.value)}}{{ ", " if not loop.last else "" }}
{%- endfor -%}
)

Output Types

Output types define how the output of the template is handled. The following output types are supported by all dialects:

  • skip - the matching node will be skipped from conversion. This has the same effect as an excluding node filter, but allows using xpath and pattern matching.
  • snippet - The resulting string is inserted into the source file at the location of the matched node.
  • line - like snippet, but the resulting string is inserted on a new line.
  • chunk - same as line, except for the way it is represented in the side by side view in UI. Each chunk is a separate block in the view.
  • file - The resulting string becomes one of the reulsting converted outputs and is written to a file according to the path specification. The file path is defined by the output_relative_path field in the template config. The path is relative to the output path set in the configuration.
  • model - the result is treated as a json representation of a model, which may then be used by the specific dialect. This type should only be used for dialects that operate on models, such as the prophecy dialect. For more information, refer dialects docs.

In addition to these, dialects may define their own output types. Consult the dialect documentation for more information.

Usage examples

Altering the table in addition to the out-of-the-box conversion results

Alchemist doesn't handle permissions automatically. Here is an example of how using the template allows for customer-specific customization.

converter:
  template_configs:
    - template: template.jinja
      match_patterns:
        - |
          (SASSQLCreateAsSelectStatement
            @dataset=(SASDatasetExpression @dataset_name=(SASCodeQualifiedDatasetRef
              @source=(SASCodeName @identifier -> out_schema)
              @dataset_name=(SASCodeName @identifier -> out_table)
            ))
          )

      output_type: "line"
      tags:
        - "py_spark"
        - "py_var"

The content of the template.jinja file:

{{ render(node, no_templates=True) }}

spark.sql(f"ALTER TABLE {% raw %}{{% endraw %}{{match["out_schema"]}}{% raw %}}{% endraw %}.{{match["out_table"]}} SET OWNER TO `{OWNER_GROUP}`")

This will add the following statement at the end of rendering the node:

spark.sql(f"ALTER TABLE {out_schema}.out_table SET OWNER TO `{OWNER_GROUP}`")