Skip to content

Prophecy Dialect Configuration

The Prophecy Dialect Configuration extends the base converter config allowing you to configure various aspects of the Prophecy conversion process. These configurations include settings for datasets, schemas, and other essential components of the Prophecy platform. The following sections describe the available configuration classes and their fields.

Also, refer to Prophecy specific template configuration for more information on additional template types available in the Prophecy converter.

Prophecy Converter Config

Field Description Default Value Environment Variable
pbt_project_path Path to the PBT project and the entire output. None (a value must be provided) ALC_PROPHECY_PBT_PATH
pbt_project_file_name Name of the PBT project file. "pbt_project.yml"
output_pipeline_file_mask Output file mask for pipeline files. "pipelines/{flow_name}/code/.prophecy/workflow.latest.json"
output_dataset_file_mask Output file mask for dataset files. "datasets/{dataset_name}/default/configs/dataset.json"
dataset_id_mask Dataset ID mask. "datasets/{dataset_name}"
dataset_pbt_key_mask Dataset PBT key mask. "datasets/{dataset_name}"
project_id Project ID. "unknown project id" ALC_PROPHECY_PROJECT_ID
prophecy_fabric_id Prophecy fabric ID. -1 ALC_PROPHECY_FABRIC_ID
prophecy_fabric_cd Prophecy fabric code. "unknown fabric code" ALC_PROPHECY_FABRIC_CD
prophecy_author_email Prophecy author email. "unknown@domain.com" ALC_PROPHECY_AUTHOR_EMAIL
datasets_config Prophecy Datasets Config Default Datasets Config
schemas_config Prophecy Schemas Config Default Schemas Config
extra_pipeline_vars Extra variables to be added to the pipeline context. List of ConfigField models
ext_dependencies External dependencies to be added to the pipeline context. List of ExtDependency models
disable_updating_existing_pbt_pipelines If True, project_pbt.yml meta will not be updated with existing pipelines. False
custom_gems Mapping from a custom gem type (as used in templates) to a definition config. {}
enforce_schema_config List of enforce schema configs Enforce Schema Config. []

Prophecy Schemas Config

Field Description Default Value
default_path_mask Default path mask for schema storage. "dbfs:/user/hive/warehouse/{schema_name}.db"
default_unity_catalog Default unity catalog for schemas. None
name_to_config Mapping of schema names to individual schema configs. { }
name_to_path This key has been deprecated. Use name_to_config instead. { }

Prophecy Schema Config

Field Description Default Value
path Schema storage path. None (a value must be provided)
unity_catalog Name of unity catalog. Not set

Prophecy Datasets Config

Defines dataset defaults as well as per-dataset configurations.

When Alchemist needs to find a configuration for a specific dataset, it follows a certain order of priority. This process is case-insensitive, meaning it doesn't matter if the names are in uppercase or lowercase.

  1. First Priority - Schema and Dataset Name Combination: If a schema name is known/provided, Alchemist first tries to find a configuration that matches the combination of the schema name and the dataset name. The schema name and dataset name are combined using a period ('.'). For example, if the schema name is 'Schema1' and the dataset name is 'Dataset1', Alchemist looks for a configuration with the key 'Schema1.Dataset1' ignoring the case.

  2. Second Priority - Dataset Name Only: If Alchemist can't find a configuration that matches the schema and dataset name combination, or if no schema name is provided, it then tries to find a configuration that matches the dataset name only.

  3. Third Priority - Default Configuration: If Alchemist can't find a configuration that matches the dataset name, it then uses the default configuration

Here's an example of how these configurations might look in a YAML file:

datasets_config:
  default_format: delta
  default_type: File
  default_db_provider: HIVE
  default_db_file_format: DELTA
  name_to_config:
    Schema1.Dataset1:
      name: NewDatasetName
      schema_or_path: other_schema
    Dataset2:
      name: NewDatasetName2
    schema1.Dataset&mv:
      name: {Config.dynamic_name}
      schema_or_path: some_schema
      unity_catalog: non_default_catalog
      format_: catalogTable
      type_: database
      db_provider: delta
      db_file_format: delta

In this example, there are three dataset configurations. The first configuration is for 'Schema1.Dataset1', for which both schema and name will be changed. The second configuration is for any 'Dataset2' in any schema - the new name will be used, but the schema will depend on the schemas_config. And the last one showcase how a dynamic name in SAS DI (one using macrovars) is ocnverted into a dynamic named in Prophecy, as well the fact that each config can override defauls. For all other datasets, the default configuration is used.

Field Description Default Value
default_format Default format for datasets. delta
default_type Default type for datasets. File
default_db_provider Default database provider for datasets. HIVE
default_db_file_format Default database file format for datasets. DELTA
name_to_config Mapping of dataset names to individual dataset configs { }

Prophecy Dataset Config

Field Description Default Value
name A new name to use in the target. If not set, the original lowercased name will be used. Not set
schema_or_path Schema name or path to the dataset. If not set, the schemas config will be used to resolve the schema. Not set
unity_catalog Unity catalog name. If not set, the schemas config will be used to resolve the catalog. Not set
format_ Dataset format. If not set, default_format from the datasets config will be used. Not set
type_ Dataset type. If not set, default_type from the datasets config will be used. Not set
db_provider Specifies the provider of the database. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_provider from the datasets config will be used. Not set
db_file_format Specifies table file format. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_file_format from the datasets config will be used. Not set

Prophecy Custom Gem Definition

This config is used to make Alchemist aware about custom gems available in the target Prophecy environment and enable usage of Custom gems in conversion and templates. A custom gem is defined by its gem_id and project_name.

Fields

Field Description
gem_id The unique identifier for the custom gem.
project_name The name of the project that the custom gem belongs to.

Example

Here's an example of how you can define a custom gem in your configuration:

converter:
  custom_gems:
    newGemType:
      gem_id: gitUri=https://<your_prophecy_git_path>/gem-builder.git&subPath=/prophecy&tag=release-tag&projectSubscriptionProjectId=12345&path=gems/newGemType
      project_name: your-project-name

In this example, a new custom gem type called newGemType is defined. The gem_id is a unique identifier that includes the Git URI of the gem, the subpath within the repository, the release tag, the project subscription project ID, and the path to the gem. The project_name is the name of the project that the gem belongs to.

Enforce Schema Config

This configuration is used to ensure explicit dataframe schemas for the Prophecy Pipeline. If the option is enabled, Alchemist will add extra Reformat GEMs to the matched Transform nodes. The columns of the Reformat will be determined based on the original table metadata.

If the enforce_input_schema setting is enabled, Alchemist will add a Reformat GEM before the matched Transform node. If the enforce_output_schema setting is enabled, Alchemist will add a Reformat GEM after the matched Transform node.

To avoid a cluttered pipeline, filled with one-to-one mapping gems at every other step, Alchemist will attempt to analyze the pipeline and skip any redundant Reformat GEMs.

Optimization Logic

  • Alchemist preserves the "main" gems and only affects the "synthetic" one-to-one Reformat mappings
  • Alchemist does not add the one-to-one mapping gem:
    • if it is followed by the Reformat GEM and does not have any other successor
    • if it is preceded by the Reformat GEM and does not change the columns
  • Alchemist analyzes the ancestors of each one-to-one mapping gem step by step:
    • if the Reformat gem is found:
      • add the gem if the columns are changed; otherwise, skip the gem
    • if the Script or Custom gem is found:
      • add the gem
    • if the SQL, Join or SetOperationGem gem is found:
      • add the gem if the columns are not the same; otherwise, skip the gem
    • if the Aggregate gem is found:
      • if the Propagate All Input Columns option is enabled, add the gem
      • otherwise, check the Aggregate's columns: agg, group by, and pivot columns
        • add the gem if the columns are not the same; otherwise, skip the gem

Fields

Field Description
match_paths This is a list of match paths for Transform node selection.
enforce_input_schema If set to True, Alchemist will add the Reformat Gem before the converted node to enforce the expected input columns. The default setting is False.
enforce_output_schema If set to True, Alchemist will add the Reformat Gem after the converted node to enforce the expected output columns. The default setting is False.
enforce_column_types If set to True, Alchemist will add cast(col as schema_data_type) expressions to enforce the expected column type. The default setting is False.

Example

Here's an example of how you can define an Enforce Schema Config:

converter:
  enforce_schema_config:
    - match_paths:
        - "()"
      enforce_input_schema: true
      enforce_output_schema: true
    - match_paths:
        - '(SASMetaCustomTransform @name="CustomTr")'
        - "SASMetaLoaderTransform"
      enforce_input_schema: false
      enforce_output_schema: true
      enforce_column_types: true

In this example, two settings are defined. The first setting enables both options for all Transform nodes. The second setting enables the output schema with type casting for its columns and disables the input schema for all Transform nodes of typeSASMetaCustomTransform with the name CustomTr, as well as for all Transform nodes of type SASMetaLoaderTransform. For more details on match paths, refer to the XPath Match Template configuration documentation.