Skip to content

Prophecy Dialect Configuration

The Prophecy Dialect Configuration extends the base converter config allowing you to configure various aspects of the Prophecy conversion process. These configurations include settings for datasets, schemas, and other essential components of the Prophecy platform. The following sections describe the available configuration classes and their fields.

Also, refer to Prophecy specific template configuration for more information on additional template types available in the Prophecy converter.

Prophecy Converter Config

Field Description Default Value Localizable Environment Variable
default_output_extension Default output file extension. .py True
render_unsupported_output_as_comment If True converter will render whatever it was able to convert for unsupported source chunk as a comment. The output by definition will not be correct and may not even resemble valid code, but may be useful for creating user templates or reporting feature requests.

This is especially true for large statements/expressions where only a small part is not supported. In this case the output may be very close to a fully converted one.

Alchemist will determine a reasonable level at which to render the output. This is the same level as for render_unsupported_source_as_comment.
True True
user_template_paths List of folders with user templates to make available for conversion. Empty list False
template_configs List of template configs. ProphecySASTemplateConfig False
template_tag_filter Filter config for template tags to inclusion/exclusion from matching. StringFilterConfig True
node_filter Filter config for node inclusion/exclusion from rendering. NodeFilterConfig True
use_runtime If True converter will use runtime user-defined functions where appropriate.

Be aware, that setting this to False may reduce the amount of automatically converted code, since for some of the constructs there may not be a non-runtime static inline version.
True True ALC_USE_RUNTIME
custom_udf_name_mapping Custom name mapping for runtime functions.

This allows to use custom names for runtime functions. For the name of specific functions, consult target dialect documentation.
Empty dict True
conversion_comment_verbosity_level Verbosity level for conversion comments (code, todo, warning, debug).

Verbosity levels:

- code: outputs only regular code comments retained from the source or considered part of the output.

- todo: outputs code and todo comments; todo comments are used for output that has to be adjusted manually.

- warning: outputs code, todo and warning comments; warning comments are used for potentially invalid code.

- debug: outputs all comments, including developer warnings; debug comments are used for code that is unlikely to be invalid.
todo True
conversion_mode Conversion mode (normal, strict, lax).

Changes code generation and how the comment verbosity level is handled:

- NORMAL: The default mode. Balances correctness & readability,
by allowing some heuristics about the common cases when
achieving 100% match would generate overly verbose and complex
code, while still allowing short constructs that ensure
correctness.
- STRICT: Prioritizes correctness over readability, striving to
mark anything that is potentially not 100% correct in all
scenarios as a TODO item and reducing heuristics to a minimum.
- LAX: Prioritizes readability over correctness, assuming the best
case scenario and avoiding generating additiona expressions
that would be needed to handle edge cases.

In addition to that, the verbosity level of conversion comments is adjusted based on the mode: - In strict mode, the warning comment is treated as todo, and debug is treated as warning. So more todo comments are generated. - In lax mode, the todo comment is treated as warning, and warning is treated as debug. Meaning no todo comments are generated at all.
normal True
llm Configuration for GenAI base conversion. LLMConfig False
spark_conf_ansi_enabled Whether to generate code that assumes ANSI SQL mode is enabled in Spark. True True
sas SAS to Spark conversion options. SparkSASConfig True
pbt_project_path Path to the pbt project and thus whole output.

If not set, the project output directory + /prophecy will be used. This folder must containt the pbt_project.yml file.
None False ALC_PROPHECY_PBT_PATH
pbt_project_file_name Name of the pbt project file. pbt_project.yml True
output_pipeline_file_mask Output file mask for pipeline files. pipelines/{flow_name}/code/.prophecy/workflow.latest.json False
output_dataset_file_mask Output file mask for dataset files. datasets/{dataset_name}/default/configs/dataset.json True
dataset_id_mask Dataset ID mask. datasets/{dataset_name} True
dataset_pbt_key_mask Dataset PBT key mask. datasets/{dataset_name} True
project_id Project ID. unknown project id False ALC_PROPHECY_PROJECT_ID
prophecy_fabric_id Prophecy fabric ID. -1 False ALC_PROPHECY_FABRIC_ID
prophecy_fabric_cd Prophecy fabric code. unknown fabric code False ALC_PROPHECY_FABRIC_CD
prophecy_author_email Prophecy author email. unknown@domain.com False ALC_PROPHECY_AUTHOR_EMAIL
datasets_config Prophecy Datasets Config. ProphecyDatasetsConfig True
schemas_config Prophecy Schemas Config. ProphecySchemasConfig True
extra_pipeline_vars Extra variables to be added to the pipeline context. Empty list True
ext_dependencies External dependencies to be added to the pipeline context. Empty list False
disable_updating_existing_pbt_pipelines Project_pbt.yml meta will not be updated with existing pipelines if True. False True
custom_gems Mapping from a custom gem type to a definition. ProphecyCustomGemDef True
enforce_schema_config List of enforce schema configs. EnforceSchemaConfig True

Prophecy Schemas Config

Field Description Default Value
default_path_mask Default path mask for schema storage. "dbfs:/user/hive/warehouse/{schema_name}.db"
default_unity_catalog Default unity catalog for schemas. None
name_to_config Mapping of schema names to individual schema configs. { }
name_to_path This key has been deprecated. Use name_to_config instead. { }

Prophecy Schema Config

Field Description Default Value
path Schema storage path. None (a value must be provided)
unity_catalog Name of unity catalog. Not set

Prophecy Datasets Config

Defines dataset defaults as well as per-dataset configurations.

When Alchemist needs to find a configuration for a specific dataset, it follows a certain order of priority. This process is case-insensitive, meaning it doesn't matter if the names are in uppercase or lowercase.

  1. First Priority - Schema and Dataset Name Combination: If a schema name is known/provided, Alchemist first tries to find a configuration that matches the combination of the schema name and the dataset name. The schema name and dataset name are combined using a period ('.'). For example, if the schema name is 'Schema1' and the dataset name is 'Dataset1', Alchemist looks for a configuration with the key 'Schema1.Dataset1' ignoring the case.

  2. Second Priority - Dataset Name Only: If Alchemist can't find a configuration that matches the schema and dataset name combination, or if no schema name is provided, it then tries to find a configuration that matches the dataset name only.

  3. Third Priority - Default Configuration: If Alchemist can't find a configuration that matches the dataset name, it then uses the default configuration

Here's an example of how these configurations might look in a YAML file:

datasets_config:
  default_format: delta
  default_type: File
  default_db_provider: HIVE
  default_db_file_format: DELTA
  name_to_config:
    Schema1.Dataset1:
      name: NewDatasetName
      schema_or_path: other_schema
    Dataset2:
      name: NewDatasetName2
    schema1.Dataset&mv:
      name: {Config.dynamic_name}
      schema_or_path: some_schema
      unity_catalog: non_default_catalog
      format_: catalogTable
      type_: database
      db_provider: delta
      db_file_format: delta

In this example, there are three dataset configurations. The first configuration is for 'Schema1.Dataset1', for which both schema and name will be changed. The second configuration is for any 'Dataset2' in any schema - the new name will be used, but the schema will depend on the schemas_config. And the last one showcase how a dynamic name in SAS DI (one using macrovars) is ocnverted into a dynamic named in Prophecy, as well the fact that each config can override defauls. For all other datasets, the default configuration is used.

Field Description Default Value
default_format Default format for datasets. delta
default_type Default type for datasets. File
default_db_provider Default database provider for datasets. HIVE
default_db_file_format Default database file format for datasets. DELTA
name_to_config Mapping of dataset names to individual dataset configs { }

Prophecy Dataset Config

Field Description Default Value
name A new name to use in the target. If not set, the original lowercased name will be used. Not set
schema_or_path Schema name or path to the dataset. If not set, the schemas config will be used to resolve the schema. Not set
unity_catalog Unity catalog name. If not set, the schemas config will be used to resolve the catalog. Not set
format_ Dataset format. If not set, default_format from the datasets config will be used. Not set
type_ Dataset type. If not set, default_type from the datasets config will be used. Not set
db_provider Specifies the provider of the database. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_provider from the datasets config will be used. Not set
db_file_format Specifies table file format. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_file_format from the datasets config will be used. Not set

Prophecy Custom Gem Definition

This config is used to make Alchemist aware about custom gems available in the target Prophecy environment and enable usage of Custom gems in conversion and templates. A custom gem is defined by its gem_id and project_name.

Fields

Field Description
gem_id The unique identifier for the custom gem.
project_name The name of the project that the custom gem belongs to.

Example

Here's an example of how you can define a custom gem in your configuration:

converter:
  custom_gems:
    newGemType:
      gem_id: gitUri=https://<your_prophecy_git_path>/gem-builder.git&subPath=/prophecy&tag=release-tag&projectSubscriptionProjectId=12345&path=gems/newGemType
      project_name: your-project-name

In this example, a new custom gem type called newGemType is defined. The gem_id is a unique identifier that includes the Git URI of the gem, the subpath within the repository, the release tag, the project subscription project ID, and the path to the gem. The project_name is the name of the project that the gem belongs to.

Enforce Schema Config

This configuration is used to ensure explicit dataframe schemas for the Prophecy Pipeline. If the option is enabled, Alchemist will add extra Reformat GEMs to the matched Transform nodes. The columns of the Reformat will be determined based on the original table metadata.

If the enforce_input_schema setting is enabled, Alchemist will add a Reformat GEM before the matched Transform node. If the enforce_output_schema setting is enabled, Alchemist will add a Reformat GEM after the matched Transform node.

To avoid a cluttered pipeline, filled with one-to-one mapping gems at every other step, Alchemist will attempt to analyze the pipeline and skip any redundant Reformat GEMs.

Optimization Logic

  • Alchemist preserves the "main" gems and only affects the "synthetic" one-to-one Reformat mappings
  • Alchemist does not add the one-to-one mapping gem:
    • if it is followed by the Reformat GEM and does not have any other successor
    • if it is preceded by the Reformat GEM and does not change the columns
  • Alchemist analyzes the ancestors of each one-to-one mapping gem step by step:
    • if the Reformat gem is found:
      • add the gem if the columns are changed; otherwise, skip the gem
    • if the Script or Custom gem is found:
      • add the gem
    • if the SQL, Join or SetOperationGem gem is found:
      • add the gem if the columns are not the same; otherwise, skip the gem
    • if the Aggregate gem is found:
      • if the Propagate All Input Columns option is enabled, add the gem
      • otherwise, check the Aggregate's columns: agg, group by, and pivot columns
        • add the gem if the columns are not the same; otherwise, skip the gem

Fields

Field Description
match_paths This is a list of match paths for Transform node selection.
enforce_input_schema If set to True, Alchemist will add the Reformat Gem before the converted node to enforce the expected input columns. The default setting is False.
enforce_output_schema If set to True, Alchemist will add the Reformat Gem after the converted node to enforce the expected output columns. The default setting is False.
enforce_column_types If set to True, Alchemist will add cast(col as schema_data_type) expressions to enforce the expected column type. The default setting is False.
keep_if_redundant If set to True, the Reformat Gems will be added even if they are redundant, e.g., if a Reformat Gem is followed by the same Reformat Gem. The default setting is False.

Example

Here's an example of how you can define an Enforce Schema Config:

converter:
  enforce_schema_config:
    - match_paths:
        - "()"
      enforce_input_schema: true
      enforce_output_schema: true
    - match_paths:
        - '(SASMetaCustomTransform @name="CustomTr")'
        - "SASMetaLoaderTransform"
      enforce_input_schema: false
      enforce_output_schema: true
      enforce_column_types: true
      keep_if_redundant: true

In this example, two settings are defined. The first setting enables both options for all Transform nodes. The second setting enables the output schema with type casting for its columns and disables the input schema for all Transform nodes of typeSASMetaCustomTransform with the name CustomTr, as well as for all Transform nodes of type SASMetaLoaderTransform. For more details on match paths, refer to the XPath Match Template configuration documentation.