Developer Guide - CData Python Connectors & SQLAlchemy

SQLAlchemy is a powerful SQL toolkit and Object-Relational Mapper (ORM) for Python. It provides an abstraction layer over SQL, making database interactions more efficient and structured.

Key features of SQLAlchemy include:

ORM and core SQL support: Supports both ORM-based and raw SQL-based queries.
Database-agnostic: Works with various relational databases through dialect systems.
Lazy loading & eager loading: Optimizes query execution for better performance.
Connection pooling: Efficiently manages database connections, minimizing overheads and improving performance.
Declarative mapping: Uses Python classes to define database models.

Why Use SQLAlchemy with CData Python Connectors?

CData Python Connectors expand and enhance the power of the SQLAlchemy framework by providing real-time connectivity to 270+ data sources, including SaaS applications, ERP suites, big data platforms, and NoSQL systems—all without relying on complex API integrations, custom scripts, or third-party libraries.

The standard SQL-92 interface eliminates the need for protocol-specific requests by providing a consistent query language, pushdown query processing, metadata discovery, and intelligent client-side execution, while delivering unmatched performance for live data interaction through built-in, optimized data processing.

Key features of CData Python Connectors

Structured SQL Queries: Execute SQL queries seamlessly across relational and non-relational data sources.
Real-Time Data Access: Retrieve live data without replication, ensuring up-to-date insights.
Advanced SQL Capabilities: Effortlessly perform joins, aggregations, grouping functions, and other complex queries.
Optimized Performance: Pushdown query processing minimizes latency by executing supported operations at the source.
Seamless ORM Integration: Leverage SQLAlchemy's ORM to interact with diverse data sources using mapped classes.
Cross-Platform Compatibility: Works across Windows, macOS, and Linux for flexible deployment.
Unified Database Interaction: Standardize queries and data operations across multiple platforms.
Scalable and Efficient: Enhance performance with intelligent query execution and metadata discovery.

Using SQLAlchemy with CData Python Connectors

Let's see how CData Python Connectors work with an example.

We'll create a Python script that prompts the user to supply the path containing CSV files, displays the file contents, and then performs an INNER JOIN.

For this, we'll use the CData Python Connector for CSV, a full license or free Community License, along with a couple of CSV files (downloadable ZIP) stored in a local directory, such as: C:\Users\Public\Downloads\CSV Folder.

Quick Note: In the following code, we'll include a property called RowScanDepth in the connection string used in the create_engine function. This optional property defines the number of rows scanned when dynamically determining column structures for data sources that don't automatically include metadata (like CSV files). A higher value improves accuracy but may increase processing time. Setting it to 0 scans the entire CSV file, while the default is 100.

Paste the following code into your favorite text editor or IDE and run it.


import os
import warnings
import pandas as pd
from sqlalchemy import create_engine

# Suppress warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Get and validate CSV directory path
csv_path = input("Enter the CSV directory path: ").strip()

if not os.path.isdir(csv_path):
    exit("Error: Directory does not exist.")

# Create a SQLAlchemy engine to connect to the CSV directory
engine = create_engine(f"csv:///?URI={csv_path};RowScanDepth=0")

# Retrieve available CSV files
tables_df = pd.read_sql("SELECT * FROM sys_tables", engine)

if tables_df.empty or "TableName" not in tables_df.columns:
    print("Error: No CSV files found.")
    exit()

# Display available CSV files
print("\nAvailable CSV files:")
for i, row in tables_df.iterrows():
    print(f"{i+1}. {row['TableName']}")

# Select first table
try:
    table_choice1 = int(input("\nSelect the first table number: ")) - 1
    table1 = tables_df.at[table_choice1, "TableName"]
except (ValueError, IndexError):
    print("Error: Invalid selection.")
    exit()

# Display selected table's data
df1 = pd.read_sql(f'SELECT * FROM "{table1}"', engine)
print(f"\nSample Data from '{table1}':\n", df1.head())

# Select second table
try:
    table_choice2 = int(input("\nSelect the second table number for JOIN: ")) - 1
    table2 = tables_df.at[table_choice2, "TableName"]
except (ValueError, IndexError):
    print("Error: Invalid selection.")
    exit()

# Display available columns in both tables
columns1 = pd.read_sql(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table1}'", engine)["ColumnName"].tolist()
columns2 = pd.read_sql(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table2}'", engine)["ColumnName"].tolist()

print(f"\nColumns in '{table1}': {columns1}")
print(f"Columns in '{table2}': {columns2}")

# Ask user for the columns to JOIN on
join_column1 = input(f"\nEnter the column from '{table1}' to JOIN on: ").strip()

# Validate column selection
if join_column1 not in columns1:
    print(f"Error: Column '{join_column1}' not found in '{table1}'.")
    exit()

join_column2 = input(f"\nEnter the column from '{table2}' to JOIN on: ").strip()

# Validate column selection
if join_column2 not in columns2:
    print(f"Error: Column '{join_column2}' not found in '{table2}'.")
    exit()

# Perform JOIN query
query = f"""
    SELECT * 
    FROM "{table1}" AS t1 
    INNER JOIN "{table2}" AS t2 
    ON t1."{join_column1}" = t2."{join_column2}"
"""

try:
    joined_df = pd.read_sql(query, engine)
    print("\nJoined Data:\n", joined_df.head())
except Exception as e:
    print("Error executing JOIN:", str(e))

Once you run it, you'll be prompted to:

Enter the directory path where your CSV files are stored.
Choose two CSV files from the list of available tables.
Select a corresponding columns from both tables for performing a join.
The script then performs an INNER JOIN on the selected column and displays the merged dataset.

How does this code work?

This script demonstrates how to use SQLAlchemy with CData Python Connectors to query CSV files as if they were database tables.

SQLAlchemy Engine: The create_engine function creates a connection to the directory containing CSV files.
Reading CSV Files: The script retrieves available CSV files using sys_tables.
Data Preview: It allows users to preview a sample of the selected CSV files.
Performing a SQL Join: The script validates a common column for the join and performs an INNER JOIN, retrieving only the matching rows based on the selected column.
Error Handling: It validates user input and ensures selected columns exist in both tables before executing the join.

This approach makes working with CSV files more powerful by treating them like relational database tables and performing operations using SQL!

Free Community Licenses for data developers

CData Python Connectors further enhance the capabilities of the Python DB-API and SQLAlchemy by offering consistent, SQL-based connectivity to more than 270 data sources beyond traditional databases, including SaaS, NoSQL, and big data systems.

With the CData Python Community License, you get free-forever libraries to access your data in personal Python projects, all through familiar SQL. Request a license and start creating better-connected projects today!