Skip to content

GenAI Conversion

Alchemist leverages generative AI (GenAI) to extend its SAS-to-PySpark conversion capabilities beyond natively supported constructs. This feature provides intelligent fallback conversion for complex SAS code that would otherwise require manual translation.

How It Works

Alchemist follows a multi-tier conversion approach:

  1. Native Conversion First: When Alchemist recognizes a SAS construct as supported, it converts it natively without using GenAI for optimal performance and accuracy.

  2. AI-Powered Fallback: For unsupported SAS Data Steps, SAS Procedures, and SAS Functions, Alchemist automatically engages GenAI when enabled to attempt conversion.

  3. Context-Aware Prompting: Alchemist generates unique, context-specific prompts for each SAS code construct, providing relevant information to help the model produce accurate PySpark code.

  4. Automated Validation: The generated code undergoes automatic validation using Python AST for syntax errors and Ruff linter for code quality. If issues are detected, Alchemist requests corrections from the model.

  5. Quality Assurance: The pipeline ensures that AI-generated code meets basic syntax and linting standards before output.

Important: AI-generated code should always be reviewed and validated by developers, as accuracy and functional correctness cannot be guaranteed.

To control whether AI-based conversion is enabled, you can set the ALC_AI_ENABLED environment variable to 1 (enable) or 0 (disable).

It is recommended to set API keys via environment variables. Alchemist supports OpenAI and Anthropic as LLM providers. Use OPENAI_API_KEY for OpenAI and ANTHROPIC_API_KEY for Anthropic.

Configuration Options

Field Description Default Value Environment Variable
enabled Whether AI based conversion is enabled. False ALC_AI_ENABLED
credential API key or credential to authenticate with the provider.

Alchemist also supports reading credentials from the following canonical environment variables:

- OPENAI_API_KEY for OpenAI
- ANTHROPIC_API_KEY for Anthropic

If the credential is not set, it will be read from the environment variable corresponding to the provider.
ALC_AI_API_KEY
timeout Timeout for LLM calls in seconds. 10
provider Which LLM provider to use. anthropic
model_id The name of the model to use when calling the provider. claude-sonnet-4-20250514 ALC_AI_MODEL_ID
api_url The base URL for an OpenAI compatible LLM provider's API, useful for proxied or self- hosted services.

Note that if this is set, the provider must be set to OpenAI or unset.

If not set, the provider's default URL will be used.
None ALC_AI_API_BASE_URL

Examples

Enable AI-based conversion using Anthropic API

This assumes you have set the ANTHROPIC_API_KEY environment variable.

converter:
  llm:
    enabled: true
    provider: anthropic
    model_id: claude-sonnet-4-20250514
    timeout: 20

Enable AI-based conversion using OpenAI API via custom proxy

This assumes you have set the OPENAI_API_KEY and ALC_AI_ENABLED environment variables.

converter:
  llm:
    provider: openai
    model_id: gpt-4.1
    api_url: https://custom.proxy/api