Back Up Databricks data to SQL Server through SSIS



Effortlessly backup data to SQL Server by utilizing the CData ADO.NET Provider for Databricks. In this article, we will employ an SSIS workflow to populate a database with Databricks data data.

This article illustrates using the Databricks ADO.NET Data Provider within a SQL Server SSIS workflow for the direct transfer of Databricks data to a Microsoft SQL Server database. It's worth noting that the identical process detailed below is applicable to any CData ADO.NET Data Providers, enabling the direct connection of SQL Server with remote data through SSIS.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


  1. Open Visual Studio and create a new Integration Services project.
  2. Add a new Data Flow task from the toolbox onto the Control Flow screen.
  3. In the Data Flow screen, add an ADO.NET Source and an OLE DB Destination from the toolbox.

  4. Add a new connection and select .NET Providers\CData ADO.NET Provider for Databricks.
  5. In the connection manager, enter the connection details for Databricks data.

    To connect to a Databricks cluster, set the properties as described below.

    Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

    • Server: Set to the Server Hostname of your Databricks cluster.
    • HTTPPath: Set to the HTTP Path of your Databricks cluster.
    • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

  6. Open the DataReader editor and set the following information:

    • ADO.NET connection manager: In the Connection Managers menu, select the Data Connection you just created.
    • Data access mode: Select 'SQL command'.
    • SQL command text: In the DataReader Source editor, open the Component Properties tab and enter a SELECT command, such as the one below:

      SELECT City, CompanyName FROM Customers WHERE Country = 'US'

  7. Close the DataReader editor and drag the arrow below the DataReader Source to connect it to the OLE DB Destination.
  8. Open the OLE DB Destination and enter the following information in the Destination Component Editor.

    • Connection manager: Add a new connection. Enter your server and database information here. In this example, SQLExpress is running on a separate machine.
    • Data access mode: Set your data access mode to "table or view" and select the table or view to populate in your database.
  9. Configure any properties you wish on the Mappings screen.

  10. Close the OLE DB Destination Editor and run the project. After the SSIS task has finished executing, your database will be populated with data obtained from Databricks data.

Ready to get started?

Download a free trial of the Databricks Data Provider to get started:

 Download Now

Learn more:

Databricks Icon Databricks ADO.NET Provider

Rapidly create and deploy powerful .NET applications that integrate with Databricks.