Skip to content

chinyik94/cares-migration-with-babelfish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Babelfish for PostgreSQL: Complete Migration Guide (Local & Production)

A step-by-step guide for migrating an existing MS SQL Server database (currently hosted as AWS RDS) to PostgreSQL using Babelfish — covering local development with Docker and production deployment on Amazon Aurora PostgreSQL with Babelfish.


Table of Contents

  1. What is Babelfish?
  2. Prerequisites
  3. Phase 1 — Assess Compatibility
  4. Phase 2 — Local Development Setup (Docker)
  5. Phase 3 — Schema & Data Migration
  6. Phase 4 — Application Reconfiguration & Testing
  7. Phase 5 — Production Setup (AWS Aurora PostgreSQL with Babelfish)
  8. Phase 6 — Production Data Migration
  9. Phase 7 — Security Hardening
  10. Phase 8 — Go-Live & Post-Migration
  11. Migration Mode: single-db vs multi-db
  12. Escape Hatches
  13. Interoperability Notes
  14. Troubleshooting
  15. Resources

1. What is Babelfish?

Babelfish for PostgreSQL is an open-source extension (Apache 2.0 / PostgreSQL license) developed by Amazon that enables PostgreSQL to understand:

  • T-SQL (Transact-SQL) — SQL Server's proprietary SQL dialect
  • TDS (Tabular Data Stream) — SQL Server's network protocol

This means existing SQL Server applications can connect to a Babelfish-enabled PostgreSQL instance with minimal or no code changes, using familiar SQL Server drivers and tools (SSMS, sqlcmd, System.Data.SqlClient, etc.).

Key Architecture

Component Description
TDS Port (default 1433) Accepts SQL Server client connections (SSMS, sqlcmd, ADO.NET SqlClient)
PostgreSQL Port (default 5432) Accepts standard PostgreSQL connections (psql, pgAdmin, npgsql)
Babelfish Extensions babelfishpg_tds (TDS protocol), babelfishpg_tsql (T-SQL language), babelfishpg_common (data types), babelfishpg_money (money type)

Both ports access the same underlying PostgreSQL database, allowing dual-protocol access.


2. Prerequisites

For Local Development

  • Docker and Docker Compose installed
  • SQL Server Management Studio (SSMS) or sqlcmd for TDS connections
  • Babelfish Compass (download) for compatibility assessment
  • bcp utility (SQL Server command-line tools) for data export/import
  • Access to the source SQL Server / RDS SQL Server instance

For Production (AWS)

  • AWS account with permissions for RDS/Aurora, VPC, IAM, Secrets Manager, KMS
  • AWS CLI configured with appropriate credentials
  • Source RDS SQL Server instance accessible from your network
  • (Optional) AWS DMS for continuous data replication

3. Phase 1 — Assess Compatibility

Before any migration, assess how compatible your SQL Server codebase is with Babelfish.

Step 1: Export DDL from SQL Server

  1. Open SSMS and connect to your source SQL Server / RDS SQL Server.
  2. Right-click the database → TasksGenerate Scripts.
  3. Select all database objects (tables, views, stored procedures, functions, triggers).
  4. In Set Scripting OptionsAdvanced:
    • Script for Server Version: SQL Server 2019 or your current version
    • Types of data to script: Schema only
  5. Save the generated .sql file.

Step 2: Run Babelfish Compass

# Download and extract Babelfish Compass
# https://github.com/babelfish-for-postgresql/babelfish_compass/releases/latest

# Run the assessment
BabelfishCompass.bat MyReport MySchema.sql

Babelfish Compass generates an HTML compatibility report showing:

  • Supported features (will work as-is)
  • Review items (may need minor adjustments)
  • Not supported items (require rewrites or workarounds)

Step 3: Remediate Incompatibilities

  • Fix any unsupported T-SQL syntax identified in the Compass report.
  • Consider using escape hatches for features that can be safely ignored.
  • For unsupported features, plan PostgreSQL-native workarounds if needed.

Tip: Real-world migrations report 90-94% compatibility out of the box (per AWS case studies from CDL, FundApps, etc.).


4. Phase 2 — Local Development Setup (Docker)

Step 1: Create docker-compose.yaml

services:
  babelfishpg:
    image: jonathanpotts/babelfishpg
    ports:
      - "1432:1433"   # TDS port (SQL Server clients)
      - "5431:5432"   # PostgreSQL port
    volumes:
      - babelfishpg_data:/var/lib/babelfish/data
    restart: unless-stopped

volumes:
  babelfishpg_data:

Port mapping: We use 1432 / 5431 externally to avoid conflicts with any local SQL Server or PostgreSQL installations.

Step 2: Start the Container

docker compose up -d

Step 3: Verify Connectivity

Via TDS (SQL Server tools):

Server:   localhost,1432
Username: babelfish_user
Password: 12345678
Auth:     SQL Server Authentication

Via PostgreSQL:

Host:     localhost
Port:     5431
Username: babelfish_user
Password: 12345678

Advanced Docker Options

Option Command Flag Description
Custom username -u my_user Set custom Babelfish login
Custom password -p my_pass Set custom password
Custom DB name -d my_db Set internal PostgreSQL database name
Migration mode -m multi-db Use multi-db instead of default single-db

Enable SSL on Local Docker (Optional)

For testing SSL locally, exec into the container:

docker exec -it <container_id> bash
cd /var/lib/babelfish/data
openssl req -new -x509 -days 365 -nodes -text -out server.crt -keyout server.key -subj "/CN=localhost"
chmod og-rwx server.key
echo "ssl = on" >> postgresql.conf

Then restart the container.


5. Phase 3 — Schema & Data Migration

Step 1: Deploy Schema to Babelfish

  1. Connect to your local Babelfish via SSMS at localhost,1432 using SQL Server Authentication.
  2. Run your cleaned/validated DDL script (from Phase 1) in a new query window.
  3. Verify tables, views, stored procedures, and functions were created successfully.

Sample schema script reference: local-setup/babelfish-migration/dev-cares-c4wx1.sql

Step 2: Migrate Data Using migrate-data-bcp.ps1

Use the automation script: local-setup/babelfish-migration/migrate-data-bcp.ps1

This script automates the full BCP migration flow by:

  • Discovering base tables from source and destination.
  • Skipping destination-missing tables and empty exports.
  • Generating per-table format files (.fmt).
  • Exporting from source and importing into Babelfish using those format files.
  • Preserving identity values with -E.
  • Producing migration logs and per-table error files for troubleshooting.
# Update config values at the top of the script, then run:
powershell -ExecutionPolicy Bypass -File .\local-setup\babelfish-migration\migrate-data-bcp.ps1

Recommended: Use migrate-data-bcp.ps1 instead of running manual per-table BCP commands.

Key BCP Flags

Flag Description
-c Character data type
-t "^^" Custom column delimiter (avoid commas in data)
-f Format file for column mapping
-e Error output file
-E Preserve identity values
-T Trusted connection (Windows auth, source only)
-U / -P SQL Server authentication (for Babelfish target)

6. Phase 4 — Application Reconfiguration & Testing

Update Connection Strings

Point your application to Babelfish instead of SQL Server. The connection string format stays the same because Babelfish speaks TDS:

ADO.NET / Entity Framework (EDMX):

<add name="DataEntities"
  connectionString="metadata=res://*/AppCode.DataObject.DataObject.csdl|res://*/AppCode.DataObject.DataObject.ssdl|res://*/AppCode.DataObject.DataObject.msl;provider=System.Data.SqlClient;provider connection string=&quot;data source=localhost,1432;initial catalog=dev_cares_c4wx1;User ID=babelfish_user;Password=12345678;TrustServerCertificate=true;application name=EntityFramework&quot;"
     providerName="System.Data.EntityClient" />

Plain ADO.NET:

Data Source=localhost,1432;Initial Catalog=dev_cares_c4wx1;User ID=babelfish_user;Password=12345678;TrustServerCertificate=true

Note: provider=System.Data.SqlClient remains unchanged — Babelfish supports the TDS protocol natively.

Testing Checklist

  • Application starts without connection errors
  • CRUD operations work correctly
  • Stored procedures execute as expected
  • Entity Framework queries return correct results
  • No type mismatch or collation errors in logs
  • Triggers fire correctly
  • Transaction rollback/commit behavior is correct

7. Phase 5 — Production Setup (AWS Aurora PostgreSQL with Babelfish)

Babelfish is a built-in capability of Amazon Aurora PostgreSQL-Compatible Edition at no additional cost.

Step 1: Create a Custom Cluster Parameter Group

Babelfish must be enabled via a parameter group before creating the cluster.

AWS Console:

  1. Go to RDSParameter groupsCreate parameter group.
  2. Family: aurora-postgresql16 (or latest available)
  3. Type: DB Cluster Parameter Group
  4. Name: e.g., babelfish-cluster-params
  5. Edit the parameter group and set:
Parameter Value
rds.babelfish_status on
babelfishpg_tsql.migration_mode single-db or multi-db (see section 11)
babelfishpg_tsql.server_collation_name sql_latin1_general_cp1_ci_as (or match your SQL Server collation)

AWS CLI:

# Create parameter group
aws rds create-db-cluster-parameter-group \
  --db-cluster-parameter-group-name babelfish-cluster-params \
  --db-parameter-group-family aurora-postgresql16 \
  --description "Aurora PostgreSQL with Babelfish enabled"

# Enable Babelfish
aws rds modify-db-cluster-parameter-group \
  --db-cluster-parameter-group-name babelfish-cluster-params \
  --parameters \
    "ParameterName=rds.babelfish_status,ParameterValue=on,ApplyMethod=pending-reboot" \
    "ParameterName=babelfishpg_tsql.migration_mode,ParameterValue=single-db,ApplyMethod=pending-reboot"

Step 2: Create the Aurora PostgreSQL Cluster

AWS Console:

  1. Go to RDSCreate database.
  2. Choose Amazon AuroraPostgreSQL-Compatible Edition.
  3. Select a supported version (Aurora PostgreSQL 16.x recommended).
  4. Under Babelfish settings, check Turn on Babelfish.
  5. Set your DB master username and password (this becomes the Babelfish login).
  6. Configure the TDS port (default 1433).
  7. Choose the parameter group created in Step 1.
  8. Configure VPC, subnets, security groups, and encryption as required.
  9. Click Create database.

AWS CLI:

# Create the Aurora cluster
aws rds create-db-cluster \
  --db-cluster-identifier my-babelfish-cluster \
  --engine aurora-postgresql \
  --engine-version 16.6 \
  --master-username babelfish_admin \
  --master-user-password '<STRONG_PASSWORD>' \
  --db-cluster-parameter-group-name babelfish-cluster-params \
  --vpc-security-group-ids sg-xxxxxxxxx \
  --db-subnet-group-name my-db-subnet-group \
  --storage-encrypted \
  --kms-key-id alias/my-rds-key \
  --port 5432

# Create writer instance
aws rds create-db-instance \
  --db-instance-identifier my-babelfish-instance-1 \
  --db-cluster-identifier my-babelfish-cluster \
  --db-instance-class db.r6g.large \
  --engine aurora-postgresql

Step 3: Verify Babelfish Is Active

After the cluster is available, connect via sqlcmd using the cluster endpoint on the TDS port (default 1433):

sqlcmd -S my-babelfish-cluster.cluster-xxxx.region.rds.amazonaws.com,1433 \
       -U babelfish_admin \
       -P '<PASSWORD>' \
       -d master

Run a quick test:

SELECT @@VERSION;
SELECT DB_NAME();

8. Phase 6 — Production Data Migration

Option A: BCP Export/Import (Offline)

Same approach as local migration, but pointing to the Aurora endpoint:

# Export from source RDS SQL Server
bcp MyDB.dbo.MyTable out MyTable.txt -f MyTable.fmt -S source-rds-endpoint.rds.amazonaws.com -U sa -P '<PASSWORD>'

# Import to Aurora Babelfish
bcp MyDB.dbo.MyTable in MyTable.txt -f MyTable.fmt -e MyTable.err -E \
  -S my-babelfish-cluster.cluster-xxxx.region.rds.amazonaws.com,1433 \
  -U babelfish_admin -P '<PASSWORD>'

Option B: AWS Database Migration Service (DMS)

For larger databases or to minimise downtime, use AWS DMS for continuous replication:

  1. Create a replication instance in the same VPC as your Aurora cluster.
  2. Create source endpoint pointing to your RDS SQL Server.
  3. Create target endpoint pointing to your Aurora PostgreSQL cluster (use the PostgreSQL port 5432, not TDS — DMS speaks PostgreSQL natively).
  4. Create a migration task with:
    • Migration type: Migrate existing data and replicate ongoing changes (for minimal downtime)
    • Table mappings: Select the schemas/tables to migrate
  5. Start the task and monitor progress in the DMS console.

Important: When using DMS with Babelfish, the target endpoint should connect via the PostgreSQL port (5432), and table names need to account for the schema flattening Babelfish performs (e.g., mydb_dbo.mytable in multi-db mode).

Option C: Linked Server + INSERT...SELECT (Small Datasets)

For smaller tables, you can use a linked server from your source SQL Server to push data directly via TDS.


9. Phase 7 — Security Hardening

9.1 Enforce SSL/TLS Encryption

On Aurora PostgreSQL (Production):

Aurora PostgreSQL supports SSL by default. To enforce encrypted TDS connections:

  1. In your cluster parameter group, set:

    babelfishpg_tds.tds_ssl_encrypt = true
    

    This rejects any unencrypted TDS connection.

  2. For PostgreSQL-port connections, set:

    rds.force_ssl = 1
    
  3. Download the AWS RDS CA certificate bundle and configure your clients to use it.

On Local Docker (Development):

See Enable SSL on Local Docker above. For development, TrustServerCertificate=true in connection strings is acceptable.

9.2 Network Security

  • Place Aurora in a private subnet (no public accessibility).
  • Use VPC Security Groups to restrict inbound access:
    • TDS port (1433) — only from application servers / VPN
    • PostgreSQL port (5432) — only from trusted admin IPs or bastion hosts
  • Use AWS PrivateLink or VPC Peering if the application runs in a different VPC.

9.3 Credential Management

  • Never hardcode passwords in application code or config files.
  • Use AWS Secrets Manager to store and rotate database credentials automatically.
  • Use IAM Database Authentication for PostgreSQL-port connections where possible.
  • In local development, use environment variables or a .env file (excluded from version control via .gitignore).

9.4 Babelfish User & Role Management

Babelfish creates predefined PostgreSQL roles (sysadmin, master_db_owner, master_dbo, etc.) to mirror SQL Server's ownership model. Do not modify these roles directly from PostgreSQL.

Create application-specific logins via T-SQL:

-- Create a login
CREATE LOGIN app_user WITH PASSWORD = 'StrongP@ssw0rd!';

-- Grant access to a database
USE dev_cares_c4wx1;
CREATE USER app_user FOR LOGIN app_user;
ALTER ROLE db_datareader ADD MEMBER app_user;
ALTER ROLE db_datawriter ADD MEMBER app_user;

9.5 Encryption at Rest

  • Enable storage encryption on the Aurora cluster (uses AWS KMS).
  • Use a customer-managed KMS key for full control over key rotation and access policies.

9.6 Auditing & Monitoring

  • Enable Aurora PostgreSQL audit logging via the pgaudit extension.
  • Use Amazon CloudWatch for metrics and alarms (connections, CPU, IOPS).
  • Enable RDS Performance Insights for query-level monitoring.
  • Enable Enhanced Monitoring for OS-level metrics.

10. Phase 8 — Go-Live & Post-Migration

Pre-Cutover Checklist

  • All schema objects validated on Aurora Babelfish
  • Data migrated and row counts verified
  • Application tested end-to-end against Aurora
  • SSL/TLS enforced on all connections
  • Credentials stored in Secrets Manager
  • Security groups locked down to minimum required access
  • Backup retention and snapshot policies configured
  • CloudWatch alarms set for key metrics
  • Rollback plan documented (e.g., re-point to original RDS SQL Server)

Cutover Steps

  1. Stop writes to the source RDS SQL Server (or put application into read-only mode).
  2. Final data sync — run a final BCP export/import or let DMS catch up.
  3. Verify data integrity — compare row counts and spot-check critical tables.
  4. Update application connection strings to point to the Aurora Babelfish endpoint.
  5. Deploy the updated application.
  6. Monitor CloudWatch, application logs, and Performance Insights closely.
  7. Validate core functionality with the team.

Post-Migration

  • Monitor query performance; use SET BABELFISH_SHOWPLAN_ALL ON to review query plans.
  • Tune PostgreSQL parameters (shared_buffers, work_mem, effective_cache_size) as needed.
  • Consider using Aurora Serverless v2 for auto-scaling workloads.
  • Plan the eventual full PostgreSQL migration if desired (remove TDS dependency over time).

11. Migration Mode: single-db vs multi-db

Choose your migration mode before initializing Babelfish — it cannot be changed after initialization.

Aspect single-db multi-db
Use case Single SQL Server database Multiple SQL Server databases
Schema mapping (from PostgreSQL) Schema names are identical (e.g., dbo.MyTable) Prefixed with DB name (e.g., mydb_dbo.MyTable)
Path to native PostgreSQL Easier (fewer schema renames) Harder (compound schema names)
Recommendation Best if migrating 1 database and plan to eventually go fully native PostgreSQL Best if migrating multiple databases or uncertain about future needs

Setting via psql (self-hosted):

ALTER DATABASE demo SET babelfishpg_tsql.migration_mode = 'multi-db';

Setting via AWS Parameter Group (Aurora): Set babelfishpg_tsql.migration_mode in the cluster parameter group before creating the cluster.


12. Escape Hatches

Escape hatches let Babelfish silently ignore unsupported T-SQL features instead of raising errors. This is critical for running existing DDL scripts that use SQL Server-specific syntax not yet supported by Babelfish.

View Current Settings

EXEC sp_babelfish_configure '%';

Set All to Ignore (Useful for Migration)

EXEC sp_babelfish_configure '%', 'ignore', 'server';

Commonly Used Escape Hatches

Escape Hatch Default Description
escape_hatch_storage_options ignore Ignores ON PRIMARY, FILESTREAM_ON, etc.
escape_hatch_index_clustering ignore Ignores CLUSTERED/NONCLUSTERED (creates B-tree)
escape_hatch_table_hints ignore Ignores WITH (NOLOCK), etc.
escape_hatch_query_hints ignore Ignores OPTION(...) query hints
escape_hatch_session_settings ignore Ignores unsupported SET statements
escape_hatch_schemabinding_* ignore Ignores WITH SCHEMABINDING on views/procs/functions
escape_hatch_fulltext strict Raises error for FULLTEXT features
escape_hatch_storage_on_partition strict Raises error for table partitioning syntax

Best practice: Start with a broad ignore during initial migration, then tighten to strict as you remediate individual issues.


13. Interoperability Notes

Babelfish provides dual-port access to the same underlying data. Be aware of these considerations:

Schema Name Mapping

  • T-SQL (TDS port): mydb.dbo.MyTable
  • PostgreSQL (PG port, multi-db mode): mydb_dbo.MyTable
  • PostgreSQL (PG port, single-db mode): dbo.MyTable

Collation Differences

  • T-SQL default: case-insensitive ('a' = 'A' → true)
  • PostgreSQL default: case-sensitive ('a' = 'A' → false)
  • When accessing Babelfish tables from the PostgreSQL port, collation behavior depends on the connection type.

Transaction Semantics

  • T-SQL: On constraint violation, only the statement is rolled back; the transaction remains open.
  • PostgreSQL: On error, the entire transaction is rolled back.
  • When mixing dialects, PostgreSQL semantics take precedence.

Permissions

  • Objects created via T-SQL are accessible from both ports with the same user.
  • Objects created via PostgreSQL may not be visible or accessible from TDS.
  • Always create schemas and objects via T-SQL if they need to be accessed from both ports.

Best Practice

  • Keep the application on the TDS port for maximum SQL Server compatibility.
  • Use the PostgreSQL port only for admin tasks, reporting, or PostgreSQL-native features not available in T-SQL.
  • Do not modify Babelfish-managed roles or permissions from the PostgreSQL port.

14. Troubleshooting

Issue Resolution
Connection refused on TDS port Ensure security group allows inbound on port 1433; verify Babelfish is enabled (rds.babelfish_status = on)
SSMS Object Explorer errors Expected — SSMS Object Explorer is not fully supported; use the Query Editor instead
CREATE DATABASE not working Ensure you are in multi-db mode to create additional T-SQL databases
Schema not visible from PostgreSQL In multi-db mode, schema names are prefixed (e.g., mydb_dbo); run \dn in psql to list schemas
Unsupported T-SQL feature error Check if an escape hatch exists for the feature; set to ignore if acceptable
Performance regression SQL performance may differ between SQL Server and PostgreSQL; use SET BABELFISH_SHOWPLAN_ALL ON to review plans; tune PostgreSQL parameters
BCP import errors Check error files (-e flag); common issues include data type mismatches, NULL handling, and identity column conflicts
SSL/TLS handshake failure For Aurora: ensure client uses the RDS CA bundle; For Docker: ensure ssl = on in postgresql.conf

15. Resources

Official Documentation

AWS

Tools

Further Reading

About

Documentations on Migrating MSSQL Server to PostgreSQL Server with Babelfish Support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors