Prophecy Dialect Configuration¶
The Prophecy Dialect Configuration extends the base converter config allowing you to configure various aspects of the Prophecy conversion process. These configurations include settings for datasets, schemas, and other essential components of the Prophecy platform. The following sections describe the available configuration classes and their fields.
Also, refer to Prophecy specific template configuration for more information on additional template types available in the Prophecy converter.
Prophecy Converter Config¶
Field | Description | Default Value | Environment Variable |
---|---|---|---|
pbt_project_path | Path to the PBT project and the entire output. | None (a value must be provided) | ALC_PROPHECY_PBT_PATH |
pbt_project_file_name | Name of the PBT project file. | "pbt_project.yml" | |
output_pipeline_file_mask | Output file mask for pipeline files. | "pipelines/{flow_name}/code/.prophecy/workflow.latest.json" | |
output_dataset_file_mask | Output file mask for dataset files. | "datasets/{dataset_name}/default/configs/dataset.json" | |
dataset_id_mask | Dataset ID mask. | "datasets/{dataset_name}" | |
dataset_pbt_key_mask | Dataset PBT key mask. | "datasets/{dataset_name}" | |
project_id | Project ID. | "unknown project id" | ALC_PROPHECY_PROJECT_ID |
prophecy_fabric_id | Prophecy fabric ID. | -1 | ALC_PROPHECY_FABRIC_ID |
prophecy_fabric_cd | Prophecy fabric code. | "unknown fabric code" | ALC_PROPHECY_FABRIC_CD |
prophecy_author_email | Prophecy author email. | "unknown@domain.com" | ALC_PROPHECY_AUTHOR_EMAIL |
datasets_config | Prophecy Datasets Config | Default Datasets Config | |
schemas_config | Prophecy Schemas Config | Default Schemas Config | |
extra_pipeline_vars | Extra variables to be added to the pipeline context. | List of ConfigField models | |
ext_dependencies | External dependencies to be added to the pipeline context. | List of ExtDependency models | |
disable_updating_existing_pbt_pipelines | If True, project_pbt.yml meta will not be updated with existing pipelines. | False | |
custom_gems | Mapping from a custom gem type (as used in templates) to a definition config. | {} | |
enforce_schema_config | List of enforce schema configs Enforce Schema Config. | [] |
Prophecy Schemas Config¶
Field | Description | Default Value |
---|---|---|
default_path_mask | Default path mask for schema storage. | "dbfs:/user/hive/warehouse/{schema_name}.db" |
default_unity_catalog | Default unity catalog for schemas. | None |
name_to_config | Mapping of schema names to individual schema configs. | { } |
name_to_path | This key has been deprecated. Use name_to_config instead. |
{ } |
Prophecy Schema Config¶
Field | Description | Default Value |
---|---|---|
path | Schema storage path. | None (a value must be provided) |
unity_catalog | Name of unity catalog. | Not set |
Prophecy Datasets Config¶
Defines dataset defaults as well as per-dataset configurations.
When Alchemist needs to find a configuration for a specific dataset, it follows a certain order of priority. This process is case-insensitive, meaning it doesn't matter if the names are in uppercase or lowercase.
-
First Priority - Schema and Dataset Name Combination: If a schema name is known/provided, Alchemist first tries to find a configuration that matches the combination of the schema name and the dataset name. The schema name and dataset name are combined using a period ('.'). For example, if the schema name is 'Schema1' and the dataset name is 'Dataset1', Alchemist looks for a configuration with the key 'Schema1.Dataset1' ignoring the case.
-
Second Priority - Dataset Name Only: If Alchemist can't find a configuration that matches the schema and dataset name combination, or if no schema name is provided, it then tries to find a configuration that matches the dataset name only.
-
Third Priority - Default Configuration: If Alchemist can't find a configuration that matches the dataset name, it then uses the default configuration
Here's an example of how these configurations might look in a YAML file:
datasets_config:
default_format: delta
default_type: File
default_db_provider: HIVE
default_db_file_format: DELTA
name_to_config:
Schema1.Dataset1:
name: NewDatasetName
schema_or_path: other_schema
Dataset2:
name: NewDatasetName2
schema1.Dataset&mv:
name: {Config.dynamic_name}
schema_or_path: some_schema
unity_catalog: non_default_catalog
format_: catalogTable
type_: database
db_provider: delta
db_file_format: delta
In this example, there are three dataset configurations. The first configuration is for 'Schema1.Dataset1', for which both schema and name will be changed. The second configuration is for any 'Dataset2' in any schema - the new name will be used, but the schema will depend on the schemas_config
. And the last one showcase how a dynamic name in SAS DI (one using macrovars) is ocnverted into a dynamic named in Prophecy, as well the fact that each config can override defauls. For all other datasets, the default configuration is used.
Field | Description | Default Value |
---|---|---|
default_format | Default format for datasets. | delta |
default_type | Default type for datasets. | File |
default_db_provider | Default database provider for datasets. | HIVE |
default_db_file_format | Default database file format for datasets. | DELTA |
name_to_config | Mapping of dataset names to individual dataset configs | { } |
Prophecy Dataset Config¶
Field | Description | Default Value |
---|---|---|
name | A new name to use in the target. If not set, the original lowercased name will be used. | Not set |
schema_or_path | Schema name or path to the dataset. If not set, the schemas config will be used to resolve the schema. | Not set |
unity_catalog | Unity catalog name. If not set, the schemas config will be used to resolve the catalog. | Not set |
format_ | Dataset format. If not set, default_format from the datasets config will be used. |
Not set |
type_ | Dataset type. If not set, default_type from the datasets config will be used. |
Not set |
db_provider | Specifies the provider of the database. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_provider from the datasets config will be used. |
Not set |
db_file_format | Specifies table file format. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_file_format from the datasets config will be used. |
Not set |
Prophecy Custom Gem Definition¶
This config is used to make Alchemist aware about custom gems available in the target Prophecy environment and enable usage of Custom gems in conversion and templates. A custom gem is defined by its gem_id
and project_name
.
Fields¶
Field | Description |
---|---|
gem_id |
The unique identifier for the custom gem. |
project_name |
The name of the project that the custom gem belongs to. |
Example¶
Here's an example of how you can define a custom gem in your configuration:
converter:
custom_gems:
newGemType:
gem_id: gitUri=https://<your_prophecy_git_path>/gem-builder.git&subPath=/prophecy&tag=release-tag&projectSubscriptionProjectId=12345&path=gems/newGemType
project_name: your-project-name
In this example, a new custom gem type called newGemType
is defined. The gem_id
is a unique identifier that includes the Git URI of the gem, the subpath within the repository, the release tag, the project subscription project ID, and the path to the gem. The project_name
is the name of the project that the gem belongs to.
Enforce Schema Config¶
This configuration is used to ensure explicit dataframe schemas for the Prophecy Pipeline. If the option is enabled, Alchemist will add extra Reformat GEMs to the matched Transform nodes. The columns of the Reformat will be determined based on the original table metadata.
If the enforce_input_schema
setting is enabled, Alchemist will add a Reformat GEM before the matched Transform node. If the enforce_output_schema
setting is enabled, Alchemist will add a Reformat GEM after the matched Transform node.
To avoid a cluttered pipeline, filled with one-to-one mapping gems at every other step, Alchemist will attempt to analyze the pipeline and skip any redundant Reformat GEMs.
Optimization Logic
- Alchemist preserves the "main" gems and only affects the "synthetic" one-to-one Reformat mappings
- Alchemist does not add the one-to-one mapping gem:
- if it is followed by the Reformat GEM and does not have any other successor
- if it is preceded by the Reformat GEM and does not change the columns
- Alchemist analyzes the ancestors of each one-to-one mapping gem step by step:
- if the Reformat gem is found:
- add the gem if the columns are changed; otherwise, skip the gem
- if the Script or Custom gem is found:
- add the gem
- if the SQL, Join or SetOperationGem gem is found:
- add the gem if the columns are not the same; otherwise, skip the gem
- if the Aggregate gem is found:
- if the
Propagate All Input Columns
option is enabled, add the gem - otherwise, check the Aggregate's columns: agg, group by, and pivot columns
- add the gem if the columns are not the same; otherwise, skip the gem
- if the
- if the Reformat gem is found:
Fields¶
Field | Description |
---|---|
match_paths |
This is a list of match paths for Transform node selection. |
enforce_input_schema |
If set to True, Alchemist will add the Reformat Gem before the converted node to enforce the expected input columns. The default setting is False. |
enforce_output_schema |
If set to True, Alchemist will add the Reformat Gem after the converted node to enforce the expected output columns. The default setting is False. |
enforce_column_types |
If set to True, Alchemist will add cast(col as schema_data_type) expressions to enforce the expected column type. The default setting is False. |
Example¶
Here's an example of how you can define an Enforce Schema Config:
converter:
enforce_schema_config:
- match_paths:
- "()"
enforce_input_schema: true
enforce_output_schema: true
- match_paths:
- '(SASMetaCustomTransform @name="CustomTr")'
- "SASMetaLoaderTransform"
enforce_input_schema: false
enforce_output_schema: true
enforce_column_types: true
In this example, two settings are defined. The first setting enables both options for all Transform nodes. The second setting enables the output schema with type casting for its columns and disables the input schema for all Transform nodes of typeSASMetaCustomTransform
with the name CustomTr
, as well as for all Transform nodes of type SASMetaLoaderTransform
. For more details on match paths, refer to the XPath Match Template configuration documentation.