Prophecy Dialect Configuration¶

The Prophecy Dialect Configuration extends the base converter config allowing you to configure various aspects of the Prophecy conversion process. These configurations include settings for datasets, schemas, and other essential components of the Prophecy platform. The following sections describe the available configuration classes and their fields.

Also, refer to Prophecy specific template configuration for more information on additional template types available in the Prophecy converter.

Prophecy Converter Config¶

Field	Description	Default Value	Localizable	Environment Variable
default_output_extension	Default output file extension.	.py	True
render_unsupported_output_as_comment	If True converter will render whatever it was able to convert for unsupported source chunk as a comment. The output by definition will not be correct and may not even resemble valid code, but may be useful for creating user templates or reporting feature requests. This is especially true for large statements/expressions where only a small part is not supported. In this case the output may be very close to a fully converted one. Alchemist will determine a reasonable level at which to render the output. This is the same level as for `render_unsupported_source_as_comment`.	True	True
user_template_paths	List of folders with user templates to make available for conversion.	Empty list	False
template_configs	List of template configs.	ProphecySASTemplateConfig	False
template_tag_filter	Filter config for template tags to inclusion/exclusion from matching.	StringFilterConfig	True
node_filter	Filter config for node inclusion/exclusion from rendering.	NodeFilterConfig	True
use_runtime	If True converter will use runtime user-defined functions where appropriate. Be aware, that setting this to False may reduce the amount of automatically converted code, since for some of the constructs there may not be a non-runtime static inline version.	True	True	ALC_USE_RUNTIME
custom_udf_name_mapping	Custom name mapping for runtime functions. This allows to use custom names for runtime functions. For the name of specific functions, consult target dialect documentation.	Empty dict	True
conversion_comment_verbosity_level	Verbosity level for conversion comments (`code`, `todo`, `warning`, `debug`). Verbosity levels: - `code`: outputs only regular code comments retained from the source or considered part of the output. - `todo`: outputs `code` and `todo` comments; `todo` comments are used for output that has to be adjusted manually. - `warning`: outputs `code`, `todo` and `warning` comments; `warning` comments are used for potentially invalid code. - `debug`: outputs all comments, including developer warnings; `debug` comments are used for code that is unlikely to be invalid.	todo	True
conversion_mode	Conversion mode (`normal`, `strict`, `lax`). Changes code generation and how the comment verbosity level is handled: - NORMAL: The default mode. Balances correctness & readability, by allowing some heuristics about the common cases when achieving 100% match would generate overly verbose and complex code, while still allowing short constructs that ensure correctness. - STRICT: Prioritizes correctness over readability, striving to mark anything that is potentially not 100% correct in all scenarios as a TODO item and reducing heuristics to a minimum. - LAX: Prioritizes readability over correctness, assuming the best case scenario and avoiding generating additiona expressions that would be needed to handle edge cases. In addition to that, the verbosity level of conversion comments is adjusted based on the mode: - In `strict` mode, the `warning` comment is treated as `todo`, and `debug` is treated as `warning`. So more todo comments are generated. - In `lax` mode, the `todo` comment is treated as `warning`, and `warning` is treated as `debug`. Meaning no todo comments are generated at all.	normal	True
llm	Configuration for GenAI base conversion.	LLMConfig	False
spark_conf_ansi_enabled	Whether to generate code that assumes ANSI SQL mode is enabled in Spark.	True	True
sas	SAS to Spark conversion options.	SparkSASConfig	True
pbt_project_path	Path to the pbt project and thus whole output. If not set, the project output directory + `/prophecy` will be used. This folder must containt the `pbt_project.yml` file.	None	False	ALC_PROPHECY_PBT_PATH
pbt_project_file_name	Name of the pbt project file.	pbt_project.yml	True
output_pipeline_file_mask	Output file mask for pipeline files.	pipelines/{flow_name}/code/.prophecy/workflow.latest.json	False
output_dataset_file_mask	Output file mask for dataset files.	datasets/{dataset_name}/default/configs/dataset.json	True
dataset_id_mask	Dataset ID mask.	datasets/{dataset_name}	True
dataset_pbt_key_mask	Dataset PBT key mask.	datasets/{dataset_name}	True
project_id	Project ID.	unknown project id	False	ALC_PROPHECY_PROJECT_ID
prophecy_fabric_id	Prophecy fabric ID.	-1	False	ALC_PROPHECY_FABRIC_ID
prophecy_fabric_cd	Prophecy fabric code.	unknown fabric code	False	ALC_PROPHECY_FABRIC_CD
prophecy_author_email	Prophecy author email.	unknown@domain.com	False	ALC_PROPHECY_AUTHOR_EMAIL
datasets_config	Prophecy Datasets Config.	ProphecyDatasetsConfig	True
schemas_config	Prophecy Schemas Config.	ProphecySchemasConfig	True
extra_pipeline_vars	Extra variables to be added to the pipeline context.	Empty list	True
ext_dependencies	External dependencies to be added to the pipeline context.	Empty list	False
disable_updating_existing_pbt_pipelines	Project_pbt.yml meta will not be updated with existing pipelines if True.	False	True
custom_gems	Mapping from a custom gem type to a definition.	ProphecyCustomGemDef	True
enforce_schema_config	List of enforce schema configs.	EnforceSchemaConfig	True

Prophecy Schemas Config¶

Field	Description	Default Value
default_path_mask	Default path mask for schema storage.	"dbfs:/user/hive/warehouse/{schema_name}.db"
default_unity_catalog	Default unity catalog for schemas.	None
name_to_config	Mapping of schema names to individual schema configs.	{ }
name_to_path	This key has been deprecated. Use `name_to_config` instead.	{ }

Prophecy Schema Config¶

Field	Description	Default Value
path	Schema storage path.	None (a value must be provided)
unity_catalog	Name of unity catalog.	Not set

Prophecy Datasets Config¶

Defines dataset defaults as well as per-dataset configurations.

When Alchemist needs to find a configuration for a specific dataset, it follows a certain order of priority. This process is case-insensitive, meaning it doesn't matter if the names are in uppercase or lowercase.

First Priority - Schema and Dataset Name Combination: If a schema name is known/provided, Alchemist first tries to find a configuration that matches the combination of the schema name and the dataset name. The schema name and dataset name are combined using a period ('.'). For example, if the schema name is 'Schema1' and the dataset name is 'Dataset1', Alchemist looks for a configuration with the key 'Schema1.Dataset1' ignoring the case.
Second Priority - Dataset Name Only: If Alchemist can't find a configuration that matches the schema and dataset name combination, or if no schema name is provided, it then tries to find a configuration that matches the dataset name only.
Third Priority - Default Configuration: If Alchemist can't find a configuration that matches the dataset name, it then uses the default configuration

Here's an example of how these configurations might look in a YAML file:

datasets_config:
  default_format: delta
  default_type: File
  default_db_provider: HIVE
  default_db_file_format: DELTA
  name_to_config:
    Schema1.Dataset1:
      name: NewDatasetName
      schema_or_path: other_schema
    Dataset2:
      name: NewDatasetName2
    schema1.Dataset&mv:
      name: {Config.dynamic_name}
      schema_or_path: some_schema
      unity_catalog: non_default_catalog
      format_: catalogTable
      type_: database
      db_provider: delta
      db_file_format: delta

In this example, there are three dataset configurations. The first configuration is for 'Schema1.Dataset1', for which both schema and name will be changed. The second configuration is for any 'Dataset2' in any schema - the new name will be used, but the schema will depend on the schemas_config. And the last one showcase how a dynamic name in SAS DI (one using macrovars) is ocnverted into a dynamic named in Prophecy, as well the fact that each config can override defauls. For all other datasets, the default configuration is used.

Field	Description	Default Value
default_format	Default format for datasets.	delta
default_type	Default type for datasets.	File
default_db_provider	Default database provider for datasets.	HIVE
default_db_file_format	Default database file format for datasets.	DELTA
name_to_config	Mapping of dataset names to individual dataset configs	{ }

Prophecy Dataset Config¶

Field	Description	Default Value
name	A new name to use in the target. If not set, the original lowercased name will be used.	Not set
schema_or_path	Schema name or path to the dataset. If not set, the schemas config will be used to resolve the schema.	Not set
unity_catalog	Unity catalog name. If not set, the schemas config will be used to resolve the catalog.	Not set
format_	Dataset format. If not set, `default_format` from the datasets config will be used.	Not set
type_	Dataset type. If not set, `default_type` from the datasets config will be used.	Not set
db_provider	Specifies the provider of the database. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, `default_db_provider` from the datasets config will be used.	Not set
db_file_format	Specifies table file format. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, `default_db_file_format` from the datasets config will be used.	Not set

Prophecy Custom Gem Definition¶

This config is used to make Alchemist aware about custom gems available in the target Prophecy environment and enable usage of Custom gems in conversion and templates. A custom gem is defined by its gem_id and project_name.

Fields¶

Field	Description
`gem_id`	The unique identifier for the custom gem.
`project_name`	The name of the project that the custom gem belongs to.

Example¶

Here's an example of how you can define a custom gem in your configuration:

converter:
  custom_gems:
    newGemType:
      gem_id: gitUri=https://<your_prophecy_git_path>/gem-builder.git&subPath=/prophecy&tag=release-tag&projectSubscriptionProjectId=12345&path=gems/newGemType
      project_name: your-project-name

In this example, a new custom gem type called newGemType is defined. The gem_id is a unique identifier that includes the Git URI of the gem, the subpath within the repository, the release tag, the project subscription project ID, and the path to the gem. The project_name is the name of the project that the gem belongs to.

Enforce Schema Config¶

This configuration is used to ensure explicit dataframe schemas for the Prophecy Pipeline. If the option is enabled, Alchemist will add extra Reformat GEMs to the matched Transform nodes. The columns of the Reformat will be determined based on the original table metadata.

If the enforce_input_schema setting is enabled, Alchemist will add a Reformat GEM before the matched Transform node. If the enforce_output_schema setting is enabled, Alchemist will add a Reformat GEM after the matched Transform node.

To avoid a cluttered pipeline, filled with one-to-one mapping gems at every other step, Alchemist will attempt to analyze the pipeline and skip any redundant Reformat GEMs.

Optimization Logic

Alchemist preserves the "main" gems and only affects the "synthetic" one-to-one Reformat mappings
Alchemist does not add the one-to-one mapping gem:
- if it is followed by the Reformat GEM and does not have any other successor
- if it is preceded by the Reformat GEM and does not change the columns
Alchemist analyzes the ancestors of each one-to-one mapping gem step by step:
- if the Reformat gem is found:
  - add the gem if the columns are changed; otherwise, skip the gem
- if the Script or Custom gem is found:
  - add the gem
- if the SQL, Join or SetOperationGem gem is found:
  - add the gem if the columns are not the same; otherwise, skip the gem
- if the Aggregate gem is found:
  - if the Propagate All Input Columns option is enabled, add the gem
  - otherwise, check the Aggregate's columns: agg, group by, and pivot columns
    - add the gem if the columns are not the same; otherwise, skip the gem

Fields¶

Field	Description
`match_paths`	This is a list of match paths for Transform node selection.
`enforce_input_schema`	If set to True, Alchemist will add the Reformat Gem before the converted node to enforce the expected input columns. The default setting is False.
`enforce_output_schema`	If set to True, Alchemist will add the Reformat Gem after the converted node to enforce the expected output columns. The default setting is False.
`enforce_column_types`	If set to True, Alchemist will add `cast(col as schema_data_type)` expressions to enforce the expected column type. The default setting is False.
`keep_if_redundant`	If set to True, the Reformat Gems will be added even if they are redundant, e.g., if a Reformat Gem is followed by the same Reformat Gem. The default setting is False.

Example¶

Here's an example of how you can define an Enforce Schema Config:

converter:
  enforce_schema_config:
    - match_paths:
        - "()"
      enforce_input_schema: true
      enforce_output_schema: true
    - match_paths:
        - '(SASMetaCustomTransform @name="CustomTr")'
        - "SASMetaLoaderTransform"
      enforce_input_schema: false
      enforce_output_schema: true
      enforce_column_types: true
      keep_if_redundant: true

In this example, two settings are defined. The first setting enables both options for all Transform nodes. The second setting enables the output schema with type casting for its columns and disables the input schema for all Transform nodes of typeSASMetaCustomTransform with the name CustomTr, as well as for all Transform nodes of type SASMetaLoaderTransform. For more details on match paths, refer to the XPath Match Template configuration documentation.