GenAI Conversion¶
Alchemist leverages generative AI (GenAI) to extend its SAS-to-PySpark conversion capabilities beyond natively supported constructs. This feature provides intelligent fallback conversion for complex SAS code that would otherwise require manual translation.
How It Works¶
Alchemist follows a multi-tier conversion approach:
-
Native Conversion First: When Alchemist recognizes a SAS construct as supported, it converts it natively without using GenAI for optimal performance and accuracy.
-
AI-Powered Fallback: For unsupported SAS Data Steps, SAS Procedures, and SAS Functions, Alchemist automatically engages GenAI when enabled to attempt conversion.
-
Context-Aware Prompting: Alchemist generates unique, context-specific prompts for each SAS code construct, providing relevant information to help the model produce accurate PySpark code.
-
Automated Validation: The generated code undergoes automatic validation using Python AST for syntax errors and Ruff linter for code quality. If issues are detected, Alchemist requests corrections from the model.
-
Quality Assurance: The pipeline ensures that AI-generated code meets basic syntax and linting standards before output.
Important: AI-generated code should always be reviewed and validated by developers, as accuracy and functional correctness cannot be guaranteed.
To control whether AI-based conversion is enabled, you can set the ALC_AI_ENABLED
environment variable to 1
(enable) or 0
(disable).
It is recommended to set API keys via environment variables. Alchemist supports OpenAI and Anthropic as LLM providers. Use OPENAI_API_KEY
for OpenAI and ANTHROPIC_API_KEY
for Anthropic.
Configuration Options¶
Field | Description | Default Value | Environment Variable |
---|---|---|---|
enabled | Whether AI based conversion is enabled. | False | ALC_AI_ENABLED |
credential | API key or credential to authenticate with the provider. Alchemist also supports reading credentials from the following canonical environment variables: - OPENAI_API_KEY for OpenAI - ANTHROPIC_API_KEY for Anthropic If the credential is not set, it will be read from the environment variable corresponding to the provider. |
ALC_AI_API_KEY | |
timeout | Timeout for LLM calls in seconds. | 10 | |
provider | Which LLM provider to use. | anthropic | |
model_id | The name of the model to use when calling the provider. | claude-sonnet-4-20250514 | ALC_AI_MODEL_ID |
api_url | The base URL for an OpenAI compatible LLM provider's API, useful for proxied or self- hosted services. Note that if this is set, the provider must be set to OpenAI or unset. If not set, the provider's default URL will be used. | None | ALC_AI_API_BASE_URL |
Examples¶
Enable AI-based conversion using Anthropic API¶
This assumes you have set the ANTHROPIC_API_KEY
environment variable.
converter:
llm:
enabled: true
provider: anthropic
model_id: claude-sonnet-4-20250514
timeout: 20
Enable AI-based conversion using OpenAI API via custom proxy¶
This assumes you have set the OPENAI_API_KEY
and ALC_AI_ENABLED
environment variables.
converter:
llm:
provider: openai
model_id: gpt-4.1
api_url: https://custom.proxy/api