Developer Guide - CData Python Connectors & petl

This article provides a guide for using petl to replicate CSV data to a SQLite database in Python with CData’s Python Connector for CSV. The guide creates a Python script that prompts a user to enter a file path for a folder containing CSV files, lists the tables available, and then the user selects a table to display the data from.

While this article features the CData Python Connector for CSV, the principles can be applied to any of the 270+ data sources CData supports.

View the completed script here: Full script

Prerequisites

This guide requires the following prerequisites to follow along:

CData Python Connector for CSV
Python 3.8, 3.9, 3.10, 3.11 or 3.12
CSV file(s) accessible with read permissions: sample CSV files (downloadable ZIP)

Step-by-step guide

Install the required modules

Use the "pip" utility in a terminal to install the petl module
pip install petl
Create a python file and import modules

Create a new Python file and add the following import statements to your file:
import cdata.csv as csv_mod # Import CData CSV module import petl as etl import sqlite3 as db_mod
Prompt the user for the path to CSV file(s)
csv_path = input("Enter the full path to the directory containing your CSV file(s), for example 'C:/Users/Public/CSVFiles':")
Create a connection to the CSV directory using the CData Python Connector
csv_cnxn_str = f"URI={csv_path};RowScanDepth=0" csv_cnxn = csv_mod.connect(csv_cnxn_str);
Note: We set the "RowScanDepth" property to ensure the entire CSV file is parsed when scanning rows to dynamically determine column names and data types for the tabular representation of the CSV file(s). Scanning the entire document may result in a longer request, but column data types will be more accurate when reflecting the data model.
List the available tables

To list the available tables for the connection we will query the "sys_tables" System Table and display the results from that query. To do this, execute a SELECT query against the CSV connection object:
csv_cur = csv_cnxn.execute(f"SELECT TableName FROM sys_tables") table_names = csv_cur.fetchall() csv_cur.close() printf("\nAvailable tables:") for idx, table in enumerate(table_names, start=1): printf(f"{idx}. {table['TableName']}")
Prompt the user to select a table

With the tables listed, create a new variable "table_index" and prompt the user to select a table number. Then use another variable, "selected_table" to store the table name.
table_index = int(input("\nSelect a table number from above: "))–1 selected_table = table_names[table_index]['TableName']
Collect the metadata

Query the metadata (specifically the column names) from the CSV file using the "sys_tablecolumns" System Table. We'll use the column names to create the destination table in the SQLite database.
csv_cur = csv_cnxn.execute(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table_name}'") csv_columns = csv_cur.fetchall() csv_cur.close()
Prompt the user for the path to a database

Prompt the user for the path to the destination database. If the database does not exist, the SQLite3 module will create it.
db_path = input("\nPlease input the path to the directory for your SQLite database: ") db_cnxn = db_mod.connect(f"{db_path}/csv_data.db") db_cur = db_cnxn.cursor()
Configure the target database

At this point, we are ready to create a new table in the SQLite database based on the metadata of the replicated data. We'll use the column names from the source CSV file to generate the column definition part of the CREATE statement, setting each column type to TEXT for simplicity.
create_columns_part = ", ".join([f'"{col["ColumnName"]}" TEXT' for col in csv_columns]) db_cur.execute(f'CREATE TABLE IF NOT EXISTS [{table_name}] ({create_columns_part})') db_cur.close()
Replicate the CSV data to the database

Finally, we query the data from the CSV file using the petl module with the CData Python Connector and append the data to the SQLite database table
csv_data = etl.fromdb(csv_cnxn,f"SELECT * FROM [{table_name}]") etl.appenddb(csv_data, db_cnxn, table_name)

How to View the Database

To view the SQLite database, follow these steps:

Install a SQLite Viewer Extension

You can install this SQLite Viewer extension in VS Code or use an alternative database viewer.
Open the Database

Right-click on the csv_data.db file and select "Open With" >> "VS Code" (or any installed SQLite viewer).

Free Community License for data developers

CData Python Connectors further enhance the capabilities of the Python DB-API & petl by offering consistent, SQL-based connectivity to more than 270 data sources beyond traditional databases, including SaaS, NoSQL, and big data systems. With the CData Python Community License, you get free-forever libraries to access your data in personal Python projects, all through familiar SQL. Request a license and start creating better-connected projects today!

Full script


import cdata.csv as csv_mod
import petl as etl
import sqlite3 as db_mod

csv_path = input("\nPlease input the path to the directory for your CSV file(s): ")
csv_cnxn_string = f"URI={csv_path};RowScanDepth=0"
csv_cnxn = csv_mod.connect(csv_cnxn_string)
csv_cur = csv_cnxn.execute("SELECT TableName FROM sys_tables")
rs = csv_cur.fetchall()
csv_cur.close()

print("\nAvailable Tales")
for idx, row in enumerate(rs, start=1):
    print(f"{idx}. {row['TableName']}")

table_index = int(input("\nSelect a table number from above: "))-1

table_name = rs[table_index]['TableName']

csv_cur = csv_cnxn.execute(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table_name}'")
csv_columns = csv_cur.fetchall()
csv_cur.close()

db_path = input("\nPlease input the path to the directory for your SQLite database: ")
db_cnxn = db_mod.connect(f"{db_path}/csv_data.db")
db_cur = db_cnxn.cursor()
    
# Create table based on CSV structure (all columns are assumed text for simplicity) 
create_columns_part = ", ".join([f'"{col["ColumnName"]}" TEXT' for col in csv_columns]) 
db_cur.execute(f'CREATE TABLE IF NOT EXISTS [{table_name}] ({create_columns_part})') 
db_cur.close()

csv_data = etl.fromdb(csv_cnxn,f"SELECT * FROM [{table_name}]")

# Append data to SQLite database 
etl.appenddb(csv_data, db_cnxn, table_name)

Developer Guide - CData Python Connectors & petl

Prerequisites

Step-by-step guide

Install the required modules

Create a python file and import modules

Prompt the user for the path to CSV file(s)

Create a connection to the CSV directory using the CData Python Connector

List the available tables

Prompt the user to select a table

Collect the metadata

Prompt the user for the path to a database

Configure the target database

Replicate the CSV data to the database

How to View the Database

Install a SQLite Viewer Extension

Open the Database

Free Community License for data developers

Full script