Developer Guide - CData Python Connectors & petl

This article provides a guide for using petl to replicate CSV data to a SQLite database in Python with CData’s Python Connector for CSV. The guide creates a Python script that prompts a user to enter a file path for a folder containing CSV files, lists the tables available, and then the user selects a table to display the data from.

While this article features the CData Python Connector for CSV, the principles can be applied to any of the 270+ data sources CData supports.

View the completed script here: Full script

Prerequisites

This guide requires the following prerequisites to follow along:

  • CData Python Connector for CSV
  • Python 3.8, 3.9, 3.10, 3.11 or 3.12
  • CSV file(s) accessible with read permissions: sample CSV files (downloadable ZIP)

Step-by-step guide

  1. Install the required modules

    Use the "pip" utility in a terminal to install the petl module

    pip install petl
  2. Create a python file and import modules

    Create a new Python file and add the following import statements to your file:

    import cdata.csv as csv_mod # Import CData CSV module import petl as etl import sqlite3 as db_mod
  3. Prompt the user for the path to CSV file(s)

    csv_path = input("Enter the full path to the directory containing your CSV file(s), for example 'C:/Users/Public/CSVFiles':")
  4. Create a connection to the CSV directory using the CData Python Connector

    csv_cnxn_str = f"URI={csv_path};RowScanDepth=0" csv_cnxn = csv_mod.connect(csv_cnxn_str);

    Note: We set the "RowScanDepth" property to ensure the entire CSV file is parsed when scanning rows to dynamically determine column names and data types for the tabular representation of the CSV file(s). Scanning the entire document may result in a longer request, but column data types will be more accurate when reflecting the data model.

  5. List the available tables

    To list the available tables for the connection we will query the "sys_tables" System Table and display the results from that query. To do this, execute a SELECT query against the CSV connection object:

    csv_cur = csv_cnxn.execute(f"SELECT TableName FROM sys_tables") table_names = csv_cur.fetchall() csv_cur.close() printf("\nAvailable tables:") for idx, table in enumerate(table_names, start=1): printf(f"{idx}. {table['TableName']}")
  6. Prompt the user to select a table

    With the tables listed, create a new variable "table_index" and prompt the user to select a table number. Then use another variable, "selected_table" to store the table name.

    table_index = int(input("\nSelect a table number from above: "))–1 selected_table = table_names[table_index]['TableName']
  7. Collect the metadata

    Query the metadata (specifically the column names) from the CSV file using the "sys_tablecolumns" System Table. We'll use the column names to create the destination table in the SQLite database.

    csv_cur = csv_cnxn.execute(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table_name}'") csv_columns = csv_cur.fetchall() csv_cur.close()
  8. Prompt the user for the path to a database

    Prompt the user for the path to the destination database. If the database does not exist, the SQLite3 module will create it.

    db_path = input("\nPlease input the path to the directory for your SQLite database: ") db_cnxn = db_mod.connect(f"{db_path}/csv_data.db") db_cur = db_cnxn.cursor()
  9. Configure the target database

    At this point, we are ready to create a new table in the SQLite database based on the metadata of the replicated data. We'll use the column names from the source CSV file to generate the column definition part of the CREATE statement, setting each column type to TEXT for simplicity.

    create_columns_part = ", ".join([f'"{col["ColumnName"]}" TEXT' for col in csv_columns]) db_cur.execute(f'CREATE TABLE IF NOT EXISTS [{table_name}] ({create_columns_part})') db_cur.close()
  10. Replicate the CSV data to the database

    Finally, we query the data from the CSV file using the petl module with the CData Python Connector and append the data to the SQLite database table

    csv_data = etl.fromdb(csv_cnxn,f"SELECT * FROM [{table_name}]") etl.appenddb(csv_data, db_cnxn, table_name)

How to View the Database

To view the SQLite database, follow these steps:

  1. Install a SQLite Viewer Extension

    You can install this SQLite Viewer extension in VS Code or use an alternative database viewer.

  2. Open the Database

    Right-click on the csv_data.db file and select "Open With" >> "VS Code" (or any installed SQLite viewer).

Free Community License for data developers

CData Python Connectors further enhance the capabilities of the Python DB-API & petl by offering consistent, SQL-based connectivity to more than 270 data sources beyond traditional databases, including SaaS, NoSQL, and big data systems. With the CData Python Community License, you get free-forever libraries to access your data in personal Python projects, all through familiar SQL. Request a license and start creating better-connected projects today!

Full script

import cdata.csv as csv_mod import petl as etl import sqlite3 as db_mod csv_path = input("\nPlease input the path to the directory for your CSV file(s): ") csv_cnxn_string = f"URI={csv_path};RowScanDepth=0" csv_cnxn = csv_mod.connect(csv_cnxn_string) csv_cur = csv_cnxn.execute("SELECT TableName FROM sys_tables") rs = csv_cur.fetchall() csv_cur.close() print("\nAvailable Tales") for idx, row in enumerate(rs, start=1): print(f"{idx}. {row['TableName']}") table_index = int(input("\nSelect a table number from above: "))-1 table_name = rs[table_index]['TableName'] csv_cur = csv_cnxn.execute(f"SELECT ColumnName FROM sys_tablecolumns WHERE TableName = '{table_name}'") csv_columns = csv_cur.fetchall() csv_cur.close() db_path = input("\nPlease input the path to the directory for your SQLite database: ") db_cnxn = db_mod.connect(f"{db_path}/csv_data.db") db_cur = db_cnxn.cursor() # Create table based on CSV structure (all columns are assumed text for simplicity) create_columns_part = ", ".join([f'"{col["ColumnName"]}" TEXT' for col in csv_columns]) db_cur.execute(f'CREATE TABLE IF NOT EXISTS [{table_name}] ({create_columns_part})') db_cur.close() csv_data = etl.fromdb(csv_cnxn,f"SELECT * FROM [{table_name}]") # Append data to SQLite database etl.appenddb(csv_data, db_cnxn, table_name)