Databricks Dialect Configuration¶

The Databricks Dialect Configuration extends the base converter config and allows you to configure additional aspects of the conversion process. The following sections describe the available configuration classes and their fields.

Databricks Converter Config¶

Field	Description	Default Value	Localizable	Environment Variable
default_output_extension	Default output file extension.	.py	True
render_unsupported_output_as_comment	If True converter will render whatever it was able to convert for unsupported source chunk as a comment. The output by definition will not be correct and may not even resemble valid code, but may be useful for creating user templates or reporting feature requests. This is especially true for large statements/expressions where only a small part is not supported. In this case the output may be very close to a fully converted one. Alchemist will determine a reasonable level at which to render the output. This is the same level as for `render_unsupported_source_as_comment`.	True	True
user_template_paths	List of folders with user templates to make available for conversion.	Empty list	False
template_configs	List of template configs.	DBRSASTemplateConfig	False
template_tag_filter	Filter config for template tags to inclusion/exclusion from matching.	StringFilterConfig	True
node_filter	Filter config for node inclusion/exclusion from rendering.	NodeFilterConfig	True
use_runtime	If True converter will use runtime user-defined functions where appropriate. Be aware, that setting this to False may reduce the amount of automatically converted code, since for some of the constructs there may not be a non-runtime static inline version.	True	True	ALC_USE_RUNTIME
custom_udf_name_mapping	Custom name mapping for runtime functions. This allows to use custom names for runtime functions. For the name of specific functions, consult target dialect documentation.	Empty dict	True
conversion_comment_verbosity_level	Verbosity level for conversion comments (`code`, `todo`, `warning`, `debug`). Verbosity levels: - `code`: outputs only regular code comments retained from the source or considered part of the output. - `todo`: outputs `code` and `todo` comments; `todo` comments are used for output that has to be adjusted manually. - `warning`: outputs `code`, `todo` and `warning` comments; `warning` comments are used for potentially invalid code. - `debug`: outputs all comments, including developer warnings; `debug` comments are used for code that is unlikely to be invalid.	todo	True
conversion_mode	Conversion mode (`normal`, `strict`, `lax`). Changes code generation and how the comment verbosity level is handled: - NORMAL: The default mode. Balances correctness & readability, by allowing some heuristics about the common cases when achieving 100% match would generate overly verbose and complex code, while still allowing short constructs that ensure correctness. - STRICT: Prioritizes correctness over readability, striving to mark anything that is potentially not 100% correct in all scenarios as a TODO item and reducing heuristics to a minimum. - LAX: Prioritizes readability over correctness, assuming the best case scenario and avoiding generating additiona expressions that would be needed to handle edge cases. In addition to that, the verbosity level of conversion comments is adjusted based on the mode: - In `strict` mode, the `warning` comment is treated as `todo`, and `debug` is treated as `warning`. So more todo comments are generated. - In `lax` mode, the `todo` comment is treated as `warning`, and `warning` is treated as `debug`. Meaning no todo comments are generated at all.	normal	True
llm	Configuration for GenAI base conversion.	LLMConfig	False
spark_conf_ansi_enabled	Whether to generate code that assumes ANSI SQL mode is enabled in Spark.	True	True
sas	SAS to Spark conversion options.	SparkSASConfig	True
group_nodes_into_paragraphs	Whether consecutive nodes of similar type should be grouped into a single notebook paragraph.	True	True
render_all_source_code	Whether notebooks should include entire original SAS code before the converted code.	True	True
render_markdown_headers	Whether notebooks should include Markdown cells with headers reflecting original SAS program structure.	True	True
file_path_map	File path mapping.	Empty dict	True

Fila Path Map¶

File path mapping is used to convert source file location to a new cloud location. Result is always a posix path.

Mapping may specify a prefix part of the full path as it appears in the source, and how it should be converted to the target part of the path itself.

The longest matching prefix will be used. If no prefix matches, the original path will be used (which is probably not what you want).

The resulting path will be automatically converted to posix path.

Example:

for the path C:\User\any_local_dir\file.xlsx
the mapping can be {"C:\\User\\": "dbfs:/mnt/User/}
and the final path will be dbfs:/mnt/User/any_local_dir/file.xlsx

Databricks SAS Specific Converter Config¶

Field	Description	Default Value
year_cutoff	SAS YEARCUTOFF option value (see docs)	40
libref_to_schema	Mapping of SAS librefs to Spark schemas.	Empty dictionary

Example¶

Here's an example of how you can define libref_to_schema in the configuration file:

converter:
  sas:
    libref_to_schema:
      libref1: spark_schema1
      libref2: "{spark_schema_var}"

In this example, libref1 will be converted to spark_schema1, and libref2 will be converted to {spark_schema_var}, assuming that it will be used in f-strings and the variable spark_schema_var will be defined in the output code.