Extra Template Features in Spark Dialect¶
Spark dialect provides additional template types.
Output Types¶
-
extra_imports- Raw python code to be added to the imports cell of the generated notebook.Use this setting to import additional libraries used by your templates. There is no need to import PySpark functions, as they are automatically added by Alchemist.
Match against
SASProgram,SASMetaJoborSASEGProcessFlowContainernodes.Alchemist will validate that the code is a valid Python, but will not perform any semantic analysis on it.
converter: user_template_paths: - "templates" template_configs: # Custom imports from a jinja file for SAS DI Jobs - template: imports.jinja match_patterns: - (SASMetaJob) output_type: extra_imports # Custom imports from an inline string for SAS EG Flows - inline: | from migration.helpers import * from pyspark.sql.functions import * match_patterns: - (SASEGProcessFlowContainer) output_type: extra_imports # Custom imports from an inline string for SAS Programs - inline: from migration.helpers import * match_patterns: - (SASProgram) output_type: extra_imports -
extra_setup_code- Raw python code to be added to the setup cell of the generated notebook.This code is in addition to Alchemist automatically generated setup code, that creates a Spark session and initializes dictionaries for dynamic variables if necessary.
Match against
SASProgram,SASMetaJoborSASEGProcessFlowContainernodes.If you want to override the entire setup code, use
override_setup_codeinstead.Use this setting to add any additional setup code that is required for your templates to work. This code will be executed before any other code in the notebook, but after the imports and Spark session creation.
Alchemist will validate that the code is a valid Python, but will not perform any semantic analysis on it.
converter: template_configs: - inline: | # Additional setup code spark.conf.set("spark.sql.adaptive.enabled", "true") spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true") match_patterns: - (SASProgram) output_type: extra_setup_code -
override_setup_code- Raw python code to be used as the setup cell of the generated notebook.This code will completely replace the automatically generated setup code, including the Spark session creation and dynamic variables initialization.
Match against
SASProgram,SASMetaJoborSASEGProcessFlowContainernodes.Remember that Alchemist converter generates code that assumes that a Spark session is already created and available under the identifier specified in
spark_session_identifier. If you override the setup code, you must ensure that a Spark session is created and available under the specified identifier.Alchemist also creates a
varsdictionary used to store dynamic variables converted from macro expressions like&&prefix&counter.converter: template_configs: - template: custom_setup.py match_patterns: - (SASProgram) output_type: override_setup_code -
sas_di_tr_func- Specially formatted code with a metadata block to be used to replace entire SAS DI job function converted from a single SAS DI transform.Here is an example of the
config.yaml:converter: template_configs: - match_nodes: - attr: sas_meta_id include: - A5JCJVA0.BW000753 type: SASMetaTransformBase output_type: sas_di_tr_func template: 1_A5JCJVA0.BW000753.py user_template_paths: - templatesHere is an example of the content for the
sas_di_tr_functemplate file:_alc_template_metadata_yaml = """ func_name: step5_Join in_ds_sas_meta_ids: - A5JCJVA0.BU0004QD - A5JCJVA0.BU0004QM out_ds_sas_meta_ids: - A5JCJVA0.BU0004QL pass_vars_arg: false extra_args: [] extra_kwargs: - name: key1 value: '"lit_value1"' step: 5 """ def step5_Join(df_input1, df_input2, key1="lit_value1"): # Code of the transform return df_outputThe
_alc_template_metadata_yamlis a special variable that contains metadata for the template. It is used by Alchemist to generate the appropriate function call within the notebook code and is not included in the generated notebook. Currently, it contains the following fields:func_name: The name of the function that will be called in the notebook's main job function. This should match the function name in the template code.in_ds_sas_meta_ids: A list of SAS dataset meta ids that will be passed as dataframes to the function. The order of the ids should match the order of the function's arguments.out_ds_sas_meta_ids: A list of SAS dataset meta ids that will be returned as dataframes by the function. The order of the ids should match the order of the function's return values.pass_vars_arg: Specifies whether the vars argument should be passed to the function. It will be placed after the dataframes arguments.extra_args: A list of additional positional argument names to be passed to the function, in the specified order.extra_kwargs: A list of additional keyword arguments to be passed to the function.step: The step number of the transform function. Currently, used for informational purposes only.