CAPI SNAP: The Simplest Way for Developers to Adopt CAPI

By Bruce Wile, CAPI Chief Engineer and Distinguished Engineer, IBM Power Systems

Last week at OpenPOWER Summit Europe, we announced a brand-new Framework designed to make it easy for developers to begin using CAPI to accelerate their applications. The CAPI Storage, Network, and Analytics Programming Framework, or CAPI SNAP, was developed through a multi-company effort from OpenPOWER members and is now in alpha testing with multiple early adopter partners.

But what exactly puts the “snap” in CAPI SNAP? To answer that, I wanted to give you all a deeper look into the magic behind CAPI SNAP.  The framework extends the CAPI technology through the simplification of both the API (call to the accelerated function) and the coding of the accelerated function.  By using CAPI SNAP, your application gains performance through FPGA acceleration and because the compute resources are closer to the vast amounts of data.

A Simple API

ISVs will be particularly interested in the programming enablement in the framework. The framework API makes it a snap for an application to call for an accelerated function. The innovative FPGA framework logic implements all the computer engineering interface logic, data movement, caching, and pre-fetching work—leaving the programmer to focus only on the accelerator functionality.

Without the framework, an application writer must create a runtime acceleration library to perform the tasks shown in Figure 1.

Figure 1: Calling an accelerator using the base CAPI hardware primitives

Figure 1: Calling an accelerator using the base CAPI hardware primitives

But now with CAPI SNAP, an application merely needs to make a function call as shown in Figure 2. This simple API has the source data (address/location), the specific accelerated action to be performed, and the destination (address/location) to send the resulting data.

Figure 2: Accelerated function call with CAPI SNAP

Figure 2: Accelerated function call with CAPI SNAP

The framework takes care of moving the data to the accelerator and putting away the results.

Moving the Compute Closer to the Data

The simplicity of the API parameters is elegant and powerful. Not only can source and destination addresses be coherent system memory locations, but they can also be attached storage, network, or memory addresses. For example, if a framework card has attached storage, the application could source a large block (or many blocks) of data from storage, perform an action such as a search, intersection, or merge function on the data in the FPGA, and send the search results to a specified destination address in main system memory. This method has large performance advantages compared to the standard software method as shown in Figure 3.

Figure 3: Application search function in software (no acceleration framework)

Figure 3: Application search function in software (no acceleration framework)

Figure 4 shows how the source data flows into the accelerator card via the QSFP+ port, where the FPGA performs the search function. The framework then forwards the search results to system memory.

Figure 4: Application with accelerated framework search engine

Figure 4: Application with accelerated framework search engine

The performance advantages of the framework are twofold:

  1. By moving the compute (in this case, search) closer to the data, the FPGA has a higher bandwidth access to storage.
  2. The accelerated search on the FPGA is faster than the software search.

Table 1 shows a 3x performance improvement between the two methods. By moving the compute closer to the data, the FPGA has a much higher ingress (or egress) rate versus moving the entire data set into system memory.

 POWER+CAPI SNAP FrameworkSoftware-only
Ingest 100GB of DataTwo 100Gb/s ports: 4 seconds1 PCI-E Gen3 x8 NIC: 12.5 seconds
Perform Search<1us<100us
Results to System Memory<400ns0
Total4.0000014 seconds12.50001 seconds

Simplified Programming of Acceleration Actions

The programming API isn’t the only simplification in CAPI SNAP. The framework also makes it easy to program the “action code” on the FPGA. The framework takes care of retrieving the source data (whether it’s in system memory, storage, networking, etc) as well as sending the results to the specified destination. The programmer, writing in a high-level language such as C/C++ or Go, needs only to focus on their data transform, or “action.” Framework compatible compilers translate the high-level language to Verilog, which in turn gets synthesized using Xilinx’s Vivado toolset.

With CAPI SNAP, the accelerated search code (searching for one occurrence) is this simple:

for(i=0; i < Search.text_size; i++){

                                  if ((buffer[i] == Search.text_string)) {

                                                Search.text_found_position = i;

                                 }

                 }

The open source release will include multiple, fully functional example accelerators to provide users with the starting points and the full port declarations needed to receive source data and return destination data.

Make CAPI a SNAP

Are you looking to explore CAPI SNAP for your organization’s own data analysis? Then apply to be an early adopter of CAPI SNAP by emailing us directly at capi@us.ibm.com. Be sure to include your name, organization, and the type of accelerated workloads you’d like to explore with CAPI SNAP.

You can also read more about CAPI and its capabilities in the accelerated enterprise in our CAPI series on the OpenPOWER Foundation blog.

You will continue to see a drumbeat of activity around the framework, as we release the source code and add more and more capabilities in 2017.

 

 

OpenPOWER Makes FPGA Acceleration a “SNAP”

By Bruce Wile, CAPI Chief Engineer and Distinguished Engineer, IBM

Improving on the CAPI Base Technology

In the datacenter, metrics matter.  Competition between application providers is fierce, with pressure to provide benchmarks that show continued competitive advantages in performance, price, and power.  Application level improvements rode the Moore’s law performance improvement curve for decades, and now require accelerator innovations to deliver the performance gains needed to maintain current clients and win new business.  FPGA acceleration has long been an option, but the difficult programming model and specialized computer engineering skills hindered FPGAs in mainstream datacenters.

The biggest companies see this trend and have put significant resources into FPGA integration into the datacenter. But enabling FPGA acceleration for the masses has been a challenge. OpenPOWER’s Acceleration Workgroup is changing that.

capi-snap-neo4j-tile

The CAPI infrastructure, introduced on POWER8 in 2014, provides the technology and ecosystem foundation to enable datacenter applications to integrate with FPGA acceleration.  The technology base has everything needed to support the datacenter—virtualization (for multiple simultaneous context calls), a threaded model (for programming ease), removal of the device driver overhead (performance enablement), and an open ecosystem (for the masses to build upon).

As a result, FPGA experts around the world created CAPI accelerators, many of which are listed at ibm.biz/powercapi_examples. These are creative, compelling acceleration algorithms that open doors to capabilities previously beyond reach.

faces

Check out “Facial analysis for emotion detection” (ibm.biz/powercapi_SS_emotionDetect) from SiliconScapes for a slick example.

But there’s still a skills gap between the FPGA experts (computer engineers) and the programming experts working for most Independent Software Vendors (ISVs).  For FPGAs to deliver on their promise of higher performance at lower cost and lower power, we need further enablement for ISVs to embrace FPGA acceleration.

“Extending the capability of the CAPI device will offer our engineers and ultimately our users with more options for working efficiently with complex connected data,” explains Philip Rathle, VP of Products at OpenPOWER member Neo4j.

Accelerating Acceleration

Enter OpenPOWER and the Accelerator Workgroup.  At April 2016’s OpenPOWER Summit, multiple companies agreed to create a framework around CAPI. Two significant directives drove the work effort that followed:

  1. The framework would make it easy for programmers to call accelerators and write their own acceleration IP.
  2. The framework would be open source to enable continued enhancements and cross company collaboration.

Collaboration grew for building the framework, with significant contributions from IBM, Xilinx, Rackspace, Eiditicom, Reconfigure.io, Alpha-Data, and Nallatech.  Each company brought unique skills and perspectives to the effort, with a common goal of releasing the first version of the open source framework by the end of 2016.

capi-snap-levyx-tile

Bringing Developers CAPI in a SNAP!

Today, at OpenPOWER Summit Europe, we are announcing the CAPI Storage, Networking, and Acceleration Programming Framework, or CAPI SNAP Framework.  The framework fulfills the initial visions of the team, and will grow beyond the first release.  Upon release, the framework, including source code, will be available for anyone to try via github.

The framework is key for developers or anyone else looking to bring the power of FPGA acceleration to their data center. CAPI SNAP will:

  • Make it easy for developers to create new specialized algorithms for FPGA acceleration in high-level programming languages, like C++ and Go, instead of less user-friendly languages like VHDL and Verilog.
  • Make FPGA acceleration more accessible to ISVs and other organizations to bring faster data analysis to their users.
  • Leverage the OpenPOWER Foundation ecosystem to continually drive collaborative innovation.

Levyx Chief Business Development Officer Bernie Wu already sees how CAPI SNAP can make an impact for the ISV. “Levyx is focused on accelerating Big Data Analytical and Transactional Operations to real-time velocities. The CAPI SNAP Framework will allow us to bring processing even closer to the data and simplify the programming model for this acceleration,” adding “we see the CAPI SNAP capability being used to initially boost or enable rich real-time analytics and stream processing in variety of increasingly Machine to Machine driven use cases.”

Learn More and Try CAPI SNAP for Yourself!

For those interested in the CAPI SNAP Framework, we encourage you to watch for announcements at the OpenPOWER Summit Europe.  You can also read more about CAPI and its capabilities in the accelerated enterprise in our CAPI series on the OpenPOWER Foundation blog.

Are you looking to explore CAPI SNAP for your organization’s own data analysis? Then apply to be an early adopter of CAPI SNAP by emailing us directly at capi@us.ibm.com. Be sure to include your name, organization, and the type of accelerated workloads you’d like to explore with CAPI SNAP.

You will continue to see a drumbeat of activity around the framework, as we release the source code and add more and more capabilities in 2017.

Additional CAPI SNAP Reading from OpenPOWER Members

Alpha-Data: http://www.alpha-data.com/news.php