Use the API Server and HDFS ADO.NET Provider to Access HDFS Data in Microsoft PowerPivot



Use the API Server to connect to live HDFS data in the PowerPivot business intelligence tool.

This article will explain how to use the API Server and the ADO.NET Provider for HDFS (or any of 200+ other ADO.NET Providers) to provide HDFS data as OData services and then consume the data in Microsoft Excel's PowerPivot business intelligence tool. Follow the steps below to retrieve HDFS data in Power Pivot.

Set Up the API Server

Follow the steps below to begin producing secure HDFS OData services:

Deploy

The API Server runs on your own server. On Windows, you can deploy using the stand-alone server or IIS. On a Java servlet container, drop in the API Server WAR file. See the help documentation for more information and how-tos.

The API Server is also easy to deploy on Microsoft Azure, Amazon EC2, and Heroku.

Connect to HDFS

After you deploy the API Server and the ADO.NET Provider for HDFS, provide authentication values and other connection properties needed to connect to HDFS by clicking Settings -> Connections and adding a new connection in the API Server administration console.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

You can then choose the HDFS entities you want to allow the API Server access to by clicking Settings -> Resources.

Additionally, click Settings -> Server and set the Default Format to XML (Atom) for compatibility with Excel.

Authorize API Server Users

After determining the OData services you want to produce, authorize users by clicking Settings -> Users. The API Server uses authtoken-based authentication and supports the major authentication schemes. Access can also be restricted based on IP address; by default, only connections to the local machine are allowed. You can authenticate as well as encrypt connections with SSL.

Import HDFS Tables in Power Pivot

Follow the steps below to import tables that can be refreshed on demand:

  1. In Excel, click the PowerPivot Window icon in the PowerPivot tab to open PowerPivot.
  2. Click Home -> Get External Data -> From Data Service -> From OData Data Feed.
  3. Add authentication parameters. Click Advanced and set the Integrated Security option to Basic. You will need to enter the User Id and Password of a user who has access to the CData API Server. Set the password to the user's authtoken.

  4. In the Base URL box, enter the OData URL of the CData API Server. For example, http://localhost:8032/api.rsc.

  5. Select which tables you want to import and click Finish.

  6. You can now work with HDFS data in Power Pivot.

Ready to get started?

Learn more or sign up for a free trial:

CData API Server