Prophecy Dialect Configuration¶
The Prophecy Dialect Configuration extends the base converter config allowing you to configure various aspects of the Prophecy conversion process. These configurations include settings for datasets, schemas, and other essential components of the Prophecy platform. The following sections describe the available configuration classes and their fields.
Also, refer to Prophecy specific template configuration for more information on additional template types available in the Prophecy converter.
Prophecy Converter Config¶
Field | Description | Default Value | Localizable | Environment Variable |
---|---|---|---|---|
default_output_extension | Default output file extension. | .py | True | |
render_unsupported_output_as_comment | If True converter will render whatever it was able to convert for unsupported source chunk as a comment. The output by definition will not be correct and may not even resemble valid code, but may be useful for creating user templates or reporting feature requests. This is especially true for large statements/expressions where only a small part is not supported. In this case the output may be very close to a fully converted one. Alchemist will determine a reasonable level at which to render the output. This is the same level as for render_unsupported_source_as_comment . |
True | True | |
user_template_paths | List of folders with user templates to make available for conversion. | Empty list | False | |
template_configs | List of template configs. | ProphecySASTemplateConfig | False | |
template_tag_filter | Filter config for template tags to inclusion/exclusion from matching. | StringFilterConfig | True | |
node_filter | Filter config for node inclusion/exclusion from rendering. | NodeFilterConfig | True | |
use_runtime | If True converter will use runtime user-defined functions where appropriate. Be aware, that setting this to False may reduce the amount of automatically converted code, since for some of the constructs there may not be a non-runtime static inline version. | True | True | ALC_USE_RUNTIME |
custom_udf_name_mapping | Custom name mapping for runtime functions. This allows to use custom names for runtime functions. For the name of specific functions, consult target dialect documentation. | Empty dict | True | |
conversion_comment_verbosity_level | Verbosity level for conversion comments (code , todo , warning , debug ). Verbosity levels: - code : outputs only regular code comments retained from the source or considered part of the output. - todo : outputs code and todo comments; todo comments are used for output that has to be adjusted manually. - warning : outputs code , todo and warning comments; warning comments are used for potentially invalid code. - debug : outputs all comments, including developer warnings; debug comments are used for code that is unlikely to be invalid. |
todo | True | |
conversion_mode | Conversion mode (normal , strict , lax ). Changes code generation and how the comment verbosity level is handled: - NORMAL: The default mode. Balances correctness & readability, by allowing some heuristics about the common cases when achieving 100% match would generate overly verbose and complex code, while still allowing short constructs that ensure correctness. - STRICT: Prioritizes correctness over readability, striving to mark anything that is potentially not 100% correct in all scenarios as a TODO item and reducing heuristics to a minimum. - LAX: Prioritizes readability over correctness, assuming the best case scenario and avoiding generating additiona expressions that would be needed to handle edge cases. In addition to that, the verbosity level of conversion comments is adjusted based on the mode: - In strict mode, the warning comment is treated as todo , and debug is treated as warning . So more todo comments are generated. - In lax mode, the todo comment is treated as warning , and warning is treated as debug . Meaning no todo comments are generated at all. |
normal | True | |
llm | Configuration for GenAI base conversion. | LLMConfig | False | |
spark_conf_ansi_enabled | Whether to generate code that assumes ANSI SQL mode is enabled in Spark. | True | True | |
sas | SAS to Spark conversion options. | SparkSASConfig | True | |
pbt_project_path | Path to the pbt project and thus whole output. If not set, the project output directory + /prophecy will be used. This folder must containt the pbt_project.yml file. |
None | False | ALC_PROPHECY_PBT_PATH |
pbt_project_file_name | Name of the pbt project file. | pbt_project.yml | True | |
output_pipeline_file_mask | Output file mask for pipeline files. | pipelines/{flow_name}/code/.prophecy/workflow.latest.json | False | |
output_dataset_file_mask | Output file mask for dataset files. | datasets/{dataset_name}/default/configs/dataset.json | True | |
dataset_id_mask | Dataset ID mask. | datasets/{dataset_name} | True | |
dataset_pbt_key_mask | Dataset PBT key mask. | datasets/{dataset_name} | True | |
project_id | Project ID. | unknown project id | False | ALC_PROPHECY_PROJECT_ID |
prophecy_fabric_id | Prophecy fabric ID. | -1 | False | ALC_PROPHECY_FABRIC_ID |
prophecy_fabric_cd | Prophecy fabric code. | unknown fabric code | False | ALC_PROPHECY_FABRIC_CD |
prophecy_author_email | Prophecy author email. | unknown@domain.com | False | ALC_PROPHECY_AUTHOR_EMAIL |
datasets_config | Prophecy Datasets Config. | ProphecyDatasetsConfig | True | |
schemas_config | Prophecy Schemas Config. | ProphecySchemasConfig | True | |
extra_pipeline_vars | Extra variables to be added to the pipeline context. | Empty list | True | |
ext_dependencies | External dependencies to be added to the pipeline context. | Empty list | False | |
disable_updating_existing_pbt_pipelines | Project_pbt.yml meta will not be updated with existing pipelines if True. | False | True | |
custom_gems | Mapping from a custom gem type to a definition. | ProphecyCustomGemDef | True | |
enforce_schema_config | List of enforce schema configs. | EnforceSchemaConfig | True |
Prophecy Schemas Config¶
Field | Description | Default Value |
---|---|---|
default_path_mask | Default path mask for schema storage. | "dbfs:/user/hive/warehouse/{schema_name}.db" |
default_unity_catalog | Default unity catalog for schemas. | None |
name_to_config | Mapping of schema names to individual schema configs. | { } |
name_to_path | This key has been deprecated. Use name_to_config instead. |
{ } |
Prophecy Schema Config¶
Field | Description | Default Value |
---|---|---|
path | Schema storage path. | None (a value must be provided) |
unity_catalog | Name of unity catalog. | Not set |
Prophecy Datasets Config¶
Defines dataset defaults as well as per-dataset configurations.
When Alchemist needs to find a configuration for a specific dataset, it follows a certain order of priority. This process is case-insensitive, meaning it doesn't matter if the names are in uppercase or lowercase.
-
First Priority - Schema and Dataset Name Combination: If a schema name is known/provided, Alchemist first tries to find a configuration that matches the combination of the schema name and the dataset name. The schema name and dataset name are combined using a period ('.'). For example, if the schema name is 'Schema1' and the dataset name is 'Dataset1', Alchemist looks for a configuration with the key 'Schema1.Dataset1' ignoring the case.
-
Second Priority - Dataset Name Only: If Alchemist can't find a configuration that matches the schema and dataset name combination, or if no schema name is provided, it then tries to find a configuration that matches the dataset name only.
-
Third Priority - Default Configuration: If Alchemist can't find a configuration that matches the dataset name, it then uses the default configuration
Here's an example of how these configurations might look in a YAML file:
datasets_config:
default_format: delta
default_type: File
default_db_provider: HIVE
default_db_file_format: DELTA
name_to_config:
Schema1.Dataset1:
name: NewDatasetName
schema_or_path: other_schema
Dataset2:
name: NewDatasetName2
schema1.Dataset&mv:
name: {Config.dynamic_name}
schema_or_path: some_schema
unity_catalog: non_default_catalog
format_: catalogTable
type_: database
db_provider: delta
db_file_format: delta
In this example, there are three dataset configurations. The first configuration is for 'Schema1.Dataset1', for which both schema and name will be changed. The second configuration is for any 'Dataset2' in any schema - the new name will be used, but the schema will depend on the schemas_config
. And the last one showcase how a dynamic name in SAS DI (one using macrovars) is ocnverted into a dynamic named in Prophecy, as well the fact that each config can override defauls. For all other datasets, the default configuration is used.
Field | Description | Default Value |
---|---|---|
default_format | Default format for datasets. | delta |
default_type | Default type for datasets. | File |
default_db_provider | Default database provider for datasets. | HIVE |
default_db_file_format | Default database file format for datasets. | DELTA |
name_to_config | Mapping of dataset names to individual dataset configs | { } |
Prophecy Dataset Config¶
Field | Description | Default Value |
---|---|---|
name | A new name to use in the target. If not set, the original lowercased name will be used. | Not set |
schema_or_path | Schema name or path to the dataset. If not set, the schemas config will be used to resolve the schema. | Not set |
unity_catalog | Unity catalog name. If not set, the schemas config will be used to resolve the catalog. | Not set |
format_ | Dataset format. If not set, default_format from the datasets config will be used. |
Not set |
type_ | Dataset type. If not set, default_type from the datasets config will be used. |
Not set |
db_provider | Specifies the provider of the database. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_provider from the datasets config will be used. |
Not set |
db_file_format | Specifies table file format. Only use when format_=CATALOGTABLE and type_=DATABASE. If not set, default_db_file_format from the datasets config will be used. |
Not set |
Prophecy Custom Gem Definition¶
This config is used to make Alchemist aware about custom gems available in the target Prophecy environment and enable usage of Custom gems in conversion and templates. A custom gem is defined by its gem_id
and project_name
.
Fields¶
Field | Description |
---|---|
gem_id |
The unique identifier for the custom gem. |
project_name |
The name of the project that the custom gem belongs to. |
Example¶
Here's an example of how you can define a custom gem in your configuration:
converter:
custom_gems:
newGemType:
gem_id: gitUri=https://<your_prophecy_git_path>/gem-builder.git&subPath=/prophecy&tag=release-tag&projectSubscriptionProjectId=12345&path=gems/newGemType
project_name: your-project-name
In this example, a new custom gem type called newGemType
is defined. The gem_id
is a unique identifier that includes the Git URI of the gem, the subpath within the repository, the release tag, the project subscription project ID, and the path to the gem. The project_name
is the name of the project that the gem belongs to.
Enforce Schema Config¶
This configuration is used to ensure explicit dataframe schemas for the Prophecy Pipeline. If the option is enabled, Alchemist will add extra Reformat GEMs to the matched Transform nodes. The columns of the Reformat will be determined based on the original table metadata.
If the enforce_input_schema
setting is enabled, Alchemist will add a Reformat GEM before the matched Transform node. If the enforce_output_schema
setting is enabled, Alchemist will add a Reformat GEM after the matched Transform node.
To avoid a cluttered pipeline, filled with one-to-one mapping gems at every other step, Alchemist will attempt to analyze the pipeline and skip any redundant Reformat GEMs.
Optimization Logic
- Alchemist preserves the "main" gems and only affects the "synthetic" one-to-one Reformat mappings
- Alchemist does not add the one-to-one mapping gem:
- if it is followed by the Reformat GEM and does not have any other successor
- if it is preceded by the Reformat GEM and does not change the columns
- Alchemist analyzes the ancestors of each one-to-one mapping gem step by step:
- if the Reformat gem is found:
- add the gem if the columns are changed; otherwise, skip the gem
- if the Script or Custom gem is found:
- add the gem
- if the SQL, Join or SetOperationGem gem is found:
- add the gem if the columns are not the same; otherwise, skip the gem
- if the Aggregate gem is found:
- if the
Propagate All Input Columns
option is enabled, add the gem - otherwise, check the Aggregate's columns: agg, group by, and pivot columns
- add the gem if the columns are not the same; otherwise, skip the gem
- if the
- if the Reformat gem is found:
Fields¶
Field | Description |
---|---|
match_paths |
This is a list of match paths for Transform node selection. |
enforce_input_schema |
If set to True, Alchemist will add the Reformat Gem before the converted node to enforce the expected input columns. The default setting is False. |
enforce_output_schema |
If set to True, Alchemist will add the Reformat Gem after the converted node to enforce the expected output columns. The default setting is False. |
enforce_column_types |
If set to True, Alchemist will add cast(col as schema_data_type) expressions to enforce the expected column type. The default setting is False. |
keep_if_redundant |
If set to True, the Reformat Gems will be added even if they are redundant, e.g., if a Reformat Gem is followed by the same Reformat Gem. The default setting is False. |
Example¶
Here's an example of how you can define an Enforce Schema Config:
converter:
enforce_schema_config:
- match_paths:
- "()"
enforce_input_schema: true
enforce_output_schema: true
- match_paths:
- '(SASMetaCustomTransform @name="CustomTr")'
- "SASMetaLoaderTransform"
enforce_input_schema: false
enforce_output_schema: true
enforce_column_types: true
keep_if_redundant: true
In this example, two settings are defined. The first setting enables both options for all Transform nodes. The second setting enables the output schema with type casting for its columns and disables the input schema for all Transform nodes of typeSASMetaCustomTransform
with the name CustomTr
, as well as for all Transform nodes of type SASMetaLoaderTransform
. For more details on match paths, refer to the XPath Match Template configuration documentation.