Use the API Server and Databricks ADO.NET Provider in Microsoft Power BI



You can use the API Server to feed Databricks data to Power BI dashboards. Simply drag and drop Databricks data into data visuals on the Power BI canvas.

The CData API Server enables your organization to create Power BI reports based on the current Databricks data (plus data from 200+ other ADO.NET Providers). The API Server is a lightweight Web application that runs on your server and, when paired with the ADO.NET Provider for Databricks, provides secure OData services of Databricks data to authorized users. The OData standard enables real-time access to the live data, and support for OData is integrated into Power BI. This article details how to create data visualizations based on Databricks OData services in Power BI.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Set Up the API Server

Follow the steps below to begin producing secure Databricks OData services:

Deploy

The API Server runs on your own server. On Windows, you can deploy using the stand-alone server or IIS. On a Java servlet container, drop in the API Server WAR file. See the help documentation for more information and how-tos.

The API Server is also easy to deploy on Microsoft Azure, Amazon EC2, and Heroku.

Connect to Databricks

After you deploy the API Server and the ADO.NET Provider for Databricks, provide authentication values and other connection properties needed to connect to Databricks by clicking Settings -> Connection and adding a new connection in the API Server administration console.

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

  • Server: Set to the Server Hostname of your Databricks cluster.
  • HTTPPath: Set to the HTTP Path of your Databricks cluster.
  • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

When you configure the connection, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

You can then choose the Databricks entities you want to allow the API Server access to by clicking Settings -> Resources.

Authorize API Server Users

After determining the OData services you want to produce, authorize users by clicking Settings -> Users. The API Server uses authtoken-based authentication and supports the major authentication schemes. Access can also be restricted based on IP address; by default, only connections to the local machine are allowed. You can authenticate as well as encrypt connections with SSL.

Connect to Databricks

Follow the steps below to connect to Databricks data from Power BI.

  1. Open Power BI Desktop and click Get Data -> OData Feed. To start Power BI Desktop from PowerBI.com, click the download button and then click Power BI Desktop.
  2. Enter the URL to the OData endpoint of the API Server. For example:

    http://MyServer:8032/api.rsc
  3. Enter authentication for the API Server. To configure Basic authentication, select Basic and enter the username and authtoken for a user of the OData API of the API Server.

    The API Server also supports Windows authentication using ASP.NET. See the help documentation for more information.

  4. In the Navigator, select tables to load. For example, Customers.

Create Data Visualizations

After pulling the data into Power BI, you can create data visualizations in the Report view. Follow the steps below to create a pie chart:

  1. Select the pie chart icon in the Visualizations pane.
  2. Select a dimension in the Fields pane: for example, City.
  3. Select a measure in the CompanyName in the Fields pane: for example, CompanyName.

You can change sort options by clicking the ellipsis (...) button for the chart. Options to select the sort column and change the sort order are displayed.

You can use both highlighting and filtering to focus on data. Filtering removes unfocused data from visualizations; highlighting dims unfocused data. You can highlight fields by clicking them:

You can apply filters at the page level, at the report level, or to a single visualization by dragging fields onto the Filters pane. To filter on the field's value, select one of the values that are displayed in the Filters pane.

Click Refresh to synchronize your report with any changes to the data.

Upload Databricks Data Reports to Power BI

You can now upload and share reports with other Power BI users in your organization. To upload a dashboard or report, log into PowerBI.com, click Get Data in the main menu and then click Files. Navigate to a Power BI Desktop file or Excel workbook. You can then select the report in the Reports section.

Refresh on Schedule and on Demand

You can configure Power BI to automatically refresh your uploaded report. You can also refresh the dataset on demand in Power BI. Follow the steps below to schedule refreshes through the API Server:

  1. Log into Power BI.
  2. In the Dataset section, right-click the Databricks Dataset and click Schedule Refresh.
  3. If you are hosting the API Server on a public-facing server like Azure, you can connect directly. Otherwise, if you are connecting to a feed on your machine, you will need to expand the Gateway Connection node and select a gateway, for example, the Microsoft Power BI Personal Gateway.
  4. In the settings for your dataset, expand the Data Source Credentials node and click Edit Credentials.
  5. Expand the Schedule Refresh section, select Yes in the Keep Your Data Up to Date menu, and specify the refresh interval.

You can now share real-time Databricks reports through Power BI.

Ready to get started?

Learn more or sign up for a free trial:

CData API Server