Python Developer Guide - Getting Started

Python is a powerful programming language used for everything from basic data analysis to full-fledged app development.

In this guide, we'll create a simple Python script that prompts for the file path of your CSV files*, connects to them, and retrieves the data as tables. We'll also explore the unique features of CData Connectors, which go beyond basic connectivity to enable seamless data migration, replication, and live access.

CData Python Connectors let you read, write, and update data using familiar SQL queries, eliminating protocol-specific requests. With a standard SQL-92 interface, an embedded SQL engine, pushdown optimization, and enterprise-grade security, they simplify integration across 270+ cloud and on-prem sources.

We'll use the Community Edition of the CData Python Connector for CSV, VSCode, and pandas to demonstrate easy data access and manipulation. CData's optimized processing ensures high performance, pushing complex SQL operations to the source while handling unsupported queries client-side.

(* While this guide focuses on connecting CSV files, the same principles apply to all 270+ data sources supported by CData Connectors.)

Prerequisites

Visual Studio Code (VSCode): Download and install here.
Python Distribution: Download from here.
CData Python Connector for CSV (Community Edition): Download free from here.
Sample CSV Files: We've curated some sample CSV files for this guide. Download them here.

Getting started

Overview

Here's a quick overview of the steps:

Download, install, and configure the CData Python Connector for CSV and VSCode, and import the required modules.
Successfully establish a connection to the CSV file(s) and query data from them.
Perform advanced CData Connector functions such as data replication and SQL stored procedures.

STEP 1: Install, configure, and import the required tools and modules

1.1 Install Visual Studio Code (VSCode)

You may skip these steps if you have already installed these tools.

For VSCode, download the installer from here, run it, and follow the on-screen instructions to complete the installation.
For Python, download the latest version from here, run the installer, and ensure you check the "Add Python to PATH" option during installation. This allows you to use Python from the command line.

1.2 Install CData Python Connector for CSV

Follow these steps to install and configure the CData Python Connector for CSV:

Dependencies Note: The Python connector supports Python versions 3.8, 3.9, 3.10, 3.11, and 3.12. If you're using a version of Python outside these, you may need to create a virtual environment.

Extract the downloaded connector ZIP to your desired location.
Open up the terminal or command prompt and navigate to the corresponding installation directory or open up a command prompt in the directory where the .whl file is located. For example: C:\Users\Public\Downloads\CSVPythonConnector\CData.Python.CSV\win\Python312\64
For Windows: Install the .whl file using the pip installer. Use the appropriate version for your Python version and architecture. Example command: pip install cdata_csv_connector-24.0.9111-cp312-cp312-win_amd64.whl
For Linux or macOS: Install the .tar.gz file using the pip installer. Use the appropriate version for your Python version and architecture. Example command: pip install cdata_csv_connector-24.0.####-python3.tar.gz
Confirm that the installation was successful by running: pip list

If cdata-csv-connector is listed, the installation was successful.

1.3 Install the License for the CData Connector

Once you've installed the connector, install the Community Edition license key for your machine.

If you're using a trial version of the connector, the license file should already be placed under the lib\site-packages directory of your Python installation. For example: ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\cdata\installlic_csv

If you're using the full Community Edition connector and don't see the license file, request and download a Community License for the CSV Connector from here

1.4 Installing the License on Windows

Download and extract the ZIP file containing the license.
Open a terminal or command prompt.
Navigate to the license installer location, typically: C:\Users\Username\AppData\Local\Programs\Python\Python312\Lib\site-packages\cdata\installlic_csv
Or, if you downloaded it separately: C:\Downloads\cdata\installlic_csv
Run the installer with: .\license-installer.exe
Enter the registration prompted on the screen to complete the installation.

1.5 Installing the License on macOS/Linux

Download and extract the ZIP file containing the license.
Open a terminal inside the extracted directory. Example: cd ~/Downloads/CData-Python-CSV
Navigate to the license installer location, typically: /usr/local/lib/python3.12/site-packages/cdata/installlic_csv
Run the installer: ./license-installer
Enter the registration prompted on the screen to complete the installation.

1.6 Create a new Python file and import required modules

Launch VSCode, go to File > New File, and select Python File.
Save the file to your desired location by going to File > Save.
Add the following imports at the beginning of your script:


import os
import warnings
import cdata.csv as mod  # Import CData CSV module
import pandas as pd
from sqlalchemy import create_engine
import petl as etl
import sqlite3

STEP 2: Establish the connection to CSV files and query the data

2.1 Establish the Connection

Paste the code below the import statements.
It will prompt you to input the file path in the terminal. Make sure to provide the correct location of the root folder containing your CSV files.
For example: C:\User\Public\Documents\CSV-files.
Note: The connection string in the create_engine function includes an optional property called RowScanDepth. This property takes an integer value that defines the number of rows to scan when dynamically determining column structures for data sources that don't automatically include metadata (like CSV files). A higher RowScanDepth value improves accuracy but may increase processing time. Setting this value to 0 scans the entire CSV document. The default value is 100.
After entering the path, you will be prompted to select the number corresponding to your desired CSV file. Select a number, and it will display the data.


# Suppress SQLAlchemy deprecation warnings  
warnings.filterwarnings("ignore", category=DeprecationWarning)  

# Get and validate CSV directory path  
csv_path = input("Enter the CSV directory path: ").strip()  

if not os.path.isdir(csv_path):  
    exit("Error: Directory does not exist.")  

# Create a SQLAlchemy engine to connect to the CSV directory  
engine = create_engine(f"csv:///?URI={csv_path};RowScanDepth=0")  

# Retrieve a list of available CSV files  
tables_df = pd.read_sql("SELECT * FROM sys_tables", engine)  

# Ensure the directory contains valid CSV files  
if tables_df.empty or "TableName" not in tables_df.columns:  
    print("Error: No CSV files found.")  
    exit()  

# Display available CSV files with numbered choices  
print("\nAvailable CSV files:\n" + "\n".join(f"{i+1}. {row['TableName']}" for i, row in tables_df.iterrows()))  

try:  
    # User selection and validate input  
    table_choice = int(input("\nSelect a file number: ")) - 1  
    chosen_file = tables_df.at[table_choice, "TableName"]  

    # Read the selected CSV file into a DataFrame  
    df = pd.read_sql(f'SELECT * FROM "{chosen_file}"', engine)  

    # Display a preview of the data  
    print("\nSample Data:\n", df.head())  

except (ValueError, IndexError):  
    print("Error: Invalid selection.")

Click on the Run Python File button in the top-right corner to execute the code. Then, submit the file name and enter the selection in the terminal.

CData Python Connector for CSV Python Script Running in VSCode

This code prompts for a CSV directory path, validates it, and connects using SQLAlchemy. It retrieves available CSV files, displays them in a numbered list, and asks the user to select one. The chosen file is then loaded into a Pandas DataFrame, a preview is displayed, and invalid inputs trigger error messages.

CData Python Connector for CSV Python Script Output in VSCode Terminal

STEP 3: Perform advanced operations

Now that we've successfully connected to the CSV files and queried data, we can leverage CData Python Connector's advanced features to perform operations related to our data source. For the CSV Connector, this means we can create, move, copy, and delete files or even perform OAuth authentication with data storage services (like Google Cloud, AWS, and more) to securely access remote CSV files.

CData connectors are able to perform these operations through stored procedures. Stored procedures function like predefined commands that accept parameters, execute a task, and return relevant response data, including success or failure status.

The following part of the code executes the ListFiles stored procedure, which lists all CSV files stored in a local or cloud-based directory.

Paste the above code at the end of your current script, then click the Run Python File button in the top-right corner to execute it. Follow the prompts in the terminal to select and run a stored procedure. 🚀


# Reconnect to the CSV engine to fetch available stored procedures  

conn = mod.connect(f"URI={csv_path}")  

cur = conn.cursor()  

try:  
    cur.callproc("ListFiles")  
    results = cur.fetchall()  

    if results:  
        print("\nFiles in directory:")  
        for row in results:  
            print(row)  
    else:  
        print("\nNo files found in the directory.")  

    print("\nStored procedure 'ListFiles' executed successfully.")  

except Exception as e:  
    print(f"\nError executing ListFiles: {e}")

Once executed successfully, this will display a list of all available CSV files in the directory, helping you verify stored data before performing further operations.

ListFiles Stored Procedure Execution with CData Python Connector for CSV on VSCode

What's next?

Now that you've seen the power of Python with CData's Python Connectors, it's time to take your development to the next level. Check out the complete list of connectors and their documentation here. Here are some options you can explore further:

Integrate Python with App Development: Incorporate CData Python Connectors into your development projects with frameworks like Django, Flask, and FastAPI to build dynamic web applications that interact with your CSV data seamlessly. Enable real-time data access, streamline API integrations, and enhance backend functionality with efficient data querying, filtering, and transformation capabilities.
Move and Replicate Data in Real Time: Effortlessly migrate large datasets with millions of rows using CData Python Connectors. Leverage built-in optimized data processing and Change Data Capture (CDC) to transfer data efficiently between databases, cloud storage, and data warehouses with minimal downtime and seamless integration.
Perform Advanced Data Analysis: Dive deeper into pandas for more advanced data manipulations, such as time series analysis, pivot tables, or machine learning with Python.
Create Interactive Dashboards: Use libraries like Dash or Streamlit to create interactive, real-time dashboards that visualize your CSV data in an intuitive way.
Explore Other CData Python Connectors: Try out other CData Python Connectors for integrating with different data sources, such as SQL databases, APIs, or cloud platforms.

Conclusion:

Python, combined with CData's Python connectors, streamlines data integration and automation, enabling easy manipulation and visualization. This guide demonstrated how to connect, query, and work with CSV data, empowering you to build efficient, data-driven applications.

Free Community License for data developers

CData Python Connectors further enhance the capabilities of the Python DB-API by offering consistent, SQL-based connectivity to more than 270 data sources beyond traditional databases, including SaaS, NoSQL, and big data systems.

With the CData Python Community License, you get free-forever libraries to access your data in personal Python projects, all through familiar SQL. Request a license and start creating better-connected projects today!