Skip to content

Extra Template Features in Spark Dialect

Spark dialect provides additional template types.

Output Types

  • extra_imports - Raw python code to be added to the imports cell of the generated notebook.

    Use this setting to import additional libraries used by your templates. There is no need to import PySpark functions, as they are automatically added by Alchemist.

    Match against SASProgram, SASMetaJob or SASEGProcessFlowContainer nodes.

    Alchemist will validate that the code is a valid Python, but will not perform any semantic analysis on it.

    converter:
      user_template_paths:
        - "templates"
    
      template_configs:
        # Custom imports from a jinja file for SAS DI Jobs
        - template: imports.jinja
          match_patterns:
            - (SASMetaJob)
          output_type: extra_imports
    
        # Custom imports from an inline string for SAS EG Flows
        - inline: |
            from migration.helpers import *
            from pyspark.sql.functions import *
          match_patterns:
            - (SASEGProcessFlowContainer)
          output_type: extra_imports
    
        # Custom imports from an inline string for SAS Programs
        - inline: from migration.helpers import *
          match_patterns:
            - (SASProgram)
          output_type: extra_imports
    
  • extra_setup_code - Raw python code to be added to the setup cell of the generated notebook.

    This code is in addition to Alchemist automatically generated setup code, that creates a Spark session and initializes dictionaries for dynamic variables if necessary.

    Match against SASProgram, SASMetaJob or SASEGProcessFlowContainer nodes.

    If you want to override the entire setup code, use override_setup_code instead.

    Use this setting to add any additional setup code that is required for your templates to work. This code will be executed before any other code in the notebook, but after the imports and Spark session creation.

    Alchemist will validate that the code is a valid Python, but will not perform any semantic analysis on it.

    converter:
      template_configs:
        - inline: |
            # Additional setup code
            spark.conf.set("spark.sql.adaptive.enabled", "true")
            spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
          match_patterns:
            - (SASProgram)
          output_type: extra_setup_code
    
  • override_setup_code - Raw python code to be used as the setup cell of the generated notebook.

    This code will completely replace the automatically generated setup code, including the Spark session creation and dynamic variables initialization.

    Match against SASProgram, SASMetaJob or SASEGProcessFlowContainer nodes.

    Remember that Alchemist converter generates code that assumes that a Spark session is already created and available under the identifier specified in spark_session_identifier. If you override the setup code, you must ensure that a Spark session is created and available under the specified identifier.

    Alchemist also creates a vars dictionary used to store dynamic variables converted from macro expressions like &&prefix&counter.

    converter:
      template_configs:
        - template: custom_setup.py
          match_patterns:
            - (SASProgram)
          output_type: override_setup_code
    
  • sas_di_tr_func - Specially formatted code with a metadata block to be used to replace entire SAS DI job function converted from a single SAS DI transform.

    Here is an example of the config.yaml:

    converter:
      template_configs:
      - match_nodes:
        - attr: sas_meta_id
          include:
          - A5JCJVA0.BW000753
          type: SASMetaTransformBase
        output_type: sas_di_tr_func
        template: 1_A5JCJVA0.BW000753.py
      user_template_paths:
      - templates
    

    Here is an example of the content for the sas_di_tr_func template file:

    _alc_template_metadata_yaml = """
    func_name: step5_Join
    in_ds_sas_meta_ids:
    - A5JCJVA0.BU0004QD
    - A5JCJVA0.BU0004QM
    out_ds_sas_meta_ids:
    - A5JCJVA0.BU0004QL
    pass_vars_arg: false
    extra_args: []
    extra_kwargs:
    - name: key1
      value: '"lit_value1"'
    step: 5
    """
    
    
    def step5_Join(df_input1, df_input2, key1="lit_value1"):
        # Code of the transform
        return df_output
    

    The _alc_template_metadata_yaml is a special variable that contains metadata for the template. It is used by Alchemist to generate the appropriate function call within the notebook code and is not included in the generated notebook. Currently, it contains the following fields:

    • func_name: The name of the function that will be called in the notebook's main job function. This should match the function name in the template code.
    • in_ds_sas_meta_ids: A list of SAS dataset meta ids that will be passed as dataframes to the function. The order of the ids should match the order of the function's arguments.
    • out_ds_sas_meta_ids: A list of SAS dataset meta ids that will be returned as dataframes by the function. The order of the ids should match the order of the function's return values.
    • pass_vars_arg: Specifies whether the vars argument should be passed to the function. It will be placed after the dataframes arguments.
    • extra_args: A list of additional positional argument names to be passed to the function, in the specified order.
    • extra_kwargs: A list of additional keyword arguments to be passed to the function.
    • step: The step number of the transform function. Currently, used for informational purposes only.