Developer Guide - CData Python Connector & Pandas

This article provides a guide for using pandas and other modules to visualize CSV data in Python with CData’s Python Connector for CSV. The guide creates a Python script that prompts a user to enter a file path for a folder containing CSV files, lists the tables available, and then the user selects a table to display the data from.

NOTE: While this article features the CData Python Connector for CSV, the principles can be applied to any of the 270+ data sources CData supports.

View the completed script below: Full script

Prerequisites 

This guide requires the following parameters: 

CData Python Connector for CSV
Python 3.8, 3.9, 3.10, 3.11, or 3.12 distribution installed
CSV files are accessible and have read permissions: Sample CSV files (downloadable ZIP)

Step-by-step guide

Install the required modules
Use the “pip” utility to install the Pandas module and the SQLAlchemy toolkit in the terminal:
pip install pandas pip install sqlalchemy
Create a Python file and import modules
Create a new Python file and import the modules with the following code in your Python file:
import pandas as pd import matplotlib.pyplot as plt from sqlalchemy import create_engine
Prompt for CSV File path
Create a new variable named “file_path” that prompts the user for a CSV file path and gathers the input:
file_path = input("Enter the full path to the folder that is holding your CSV file, for example ‘C:/Users/Public/CSVFiles’: ")
Create SQLAlchemy engine
With the input gathered, we can create the connection string for the CSV connection and create the SQLAlchemy engine used to execute queries. To do this, create a connection string variable, “conn_str”, and a variable for the SQLAlchemy engine that uses the connection string, “engine”:
conn_str = f"csv:///?URI={file_path};RowScanDepth=0" engine = create_engine(conn_str)
Note: We set the “RowScanDepth” property to “0” to ensure the entire CSV file is parsed when scanning rows to dynamically determine column names and data types for the tabular representation of the CSV file(s). Scanning the entire document may result in a longer request, but the column data types will be more accurate when reflecting the data model.
List the available tables
To list the available tables for the connection we will query the “sys_tables” System Table and display the results from that query. To do this, you will create a variable for the query and then provide the SELECT query in a call to the read_sql() method alongside the “engine” connection object:
query1 = f"SELECT TableName FROM sys_tables" df1 = pd.read_sql(query1, engine)
Pandas will now execute the query and return the results in the form of a data frame. For ease of use, we will convert the results into a list and then display the results alongside a table number that the user will use to select a table:
tables = df1["TableName"].tolist() # Display available tables print(" Available tables:") for idx, table in enumerate(tables, start=1): print(f"{idx}.{table}")
Prompt the user to select a table
With the tables listed, we can create a new variable, “table_choice”, that prompts the user to select a table number and then use another variable, “selected_table” to gather the table name based on the table number:
table_choice = int(input(" Select a table number from above: ")) - 1 selected_table = tables[table_choice]
Execute SQL and visualize CSV data
With the input gathered for the table selection, we can query the selected table and display the data. To do this, we will again create a new variable for the query and then provide the SELECT query in a call to the read_sql() method alongside the “engine” connection object:
query2 = f"SELECT * FROM [{selected_table}]" df2 = pd.read_sql(query2, engine)
Now we can display the data frame that contain the results from the query:
print(" Data from the selected table:") print(df2) exit()

Free Community License for data developers

CData Python Connectors further enhance the capabilities of the Python DB-API & Pandas by offering consistent, SQL-based connectivity to more than 270 data sources beyond traditional databases, including SaaS, NoSQL, and big data systems.

With the CData Python Community License, you get free-forever libraries to access your data in personal Python projects, all through familiar SQL. Request a license and start creating better-connected projects today!

Full script


import pandas as pd
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

#Prompt User for CSV file path
file_path = input("Enter the full path to the folder that is holding your CSV file, for example 'C:/Users/Public/CSVFiles':  ")

# Create SQLAlchemy engine
conn_str = f"csv:///?URI={file_path};RowScanDepth=0"
engine = create_engine(conn_str)

#Query sys_tables to display available tables
query1 = f"SELECT TableName FROM sys_tables"
df1 = pd.read_sql(query1, engine)
tables = df1["TableName"].tolist()

# Display available tables
print("\nAvailable tables:")
for idx, table in enumerate(tables, start=1):
    print(f"{idx}.{table}")

# Ask the user to select a table
table_choice = int(input("\nSelect a table number from above: ")) - 1
selected_table = (tables[table_choice])

#Query the selected table
query2 = f"SELECT * FROM [{selected_table}]"
df2 = pd.read_sql(query2, engine)

#Display selected table data
print("\nData from the selected table:")
print(df2)
exit()