HP Leverages Texas Instruments ARM based DSPs over a RapidIO 2D Torus Unified Fabric to meet Performance Critical Analytics needs at PayPal
This blog is post is based on an article written by Tiffany Trader over at hpcwire.com with some edits / emphasis of RapidIO as the underlying unified fabric in the system described below. The original article on the HPCwire site can be found here.
The system described here uses HP’s ProLiant m800 cartridges powered by four of TI’s 66AK2Hx processors that integrate eight c66x DSP cores and four ARM Cortex-A15 cores using TI’s KeyStone II architecture running over a unified RapidIO fabric in HP’s Moonshot server platform. The 66AK2Hx SoC is ideally suited for this type of application as it possesses some unique advantages to aid in real time processing.
RapidIO 2D Torus Unified Fabric
The HP Moonshot System uses existing Moonshot 1500 Chassis connections to implement a 2D Torus Mesh Fabric, providing a high speed general purpose interface among the cartridges for those applications that benefit from high bandwidth node-to-node communication.
The RapidIO 2D Torus unified fabric is routed as a torus ring configuration connecting up to 45 m800 cartridges capable of providing 5Gbs per lane connections in each direction to its north, south, east and west neighbors. This allows the HP Moonshot System to meet many unique HPC applications where efficient localized traffic is needed.
The HP ProLiant m800 is the highest density DSP solution in an industry standard infrastructure in the market today, with 1,440 DSP cores, 760 ARM cores and up to 11.5TB of storage in a single Moonshot chassis connected via a unified 5Gbs per lane RapidIO fabric.
At the 2014 HPC User Forum in Seattle, Ryan Quick and Arno Kolster from PayPal described how the company is using HPC to transform its chaotic real-time server data into intelligent, actionable insight. The unique “Systems Intelligence” approach uses HP’s Moonshot server powered by TI processors to aggregate, analyze and act on transaction data in real time. A video of the PayPal presentation at the HPC User Forum is shown below.
The goal for PayPal is to detect patterns and anomalies and take action upon those before the user experience is negatively impacted. The main challenge is doing this in real-time as PayPal needs to process some 3 million events per second from thousands of sources in its datacenter. Source events include application logs, machine data, environmental data from the datacenters, and social media events. There is about 25 Tera Bytes of data coming in per hour, 20 Mega Bytes per second of machine data from thousands of PayPal servers, some 50,000 metadata relationships, and an ever-increasing tide of social media trends and customer interactions to consider.
The Systems Intelligence flow architecture shares many similarities with a PayPal fraud detection system that was also built on HPC principles. It’s fairly simplistic, says Kolster, but when it gets down to the actual deployment, it becomes much more complex. All the source event data gets thrown on a huge bus in real time and ingested by app servers, which are doing inline processing. There are complex event processors on each of those application servers, and a huge shared memory event window with the SGI UV2000. The event stream is augmented with offline databases, both relational and graph databases. The machine learning element pushes new models back into the application servers. An alerting and notification system is used for problem remediation.
PayPal’s exploration of HPC started as far back as 2006. As Ryan Quick, Principal Architect in the Advanced Technology Group at PayPal, explains, “Our job is looking at the next best thing.” Quick and Kolster started shopping in HPC because they had a set of problems, especially around real-time, that weren’t being met by the tools they could acquire from their regular channels.
“There’s a weird gray area where your needs aren’t being met in the enterprise, but HPC is still a little too bleeding-edge,” adds Quick.
In discussing how they decided upon the HP-TI platform, Quick recalls looking at Kolster and saying, “what they’ve done here is build an HPC cluster and they put it on a system on a chip.” The KeyStone multicore processors provided a powerful combination of four ARM Cortex-A15 cores, eight C66x DSPs, plus internal fabric and networking capabilities.
As RapidIO Steering Member Texas Instruments explains here, the 66AK2Hx SoC running in HP’s Moonshot platform possesses some unique advantages to aid in real time processing including:
1) C66x DSP cores that have great signal processing performance as well as very low latency response times and can receive, process, and return packet data very quickly.
2) An integrated I/O fabric that moves data quickly and with low latency. The C66AK2H IO fabric utilizes RapidIO that has 10x lower hop to hop latency than Ethernet I/O.
3) Additional KeyStone II architecture elements such as the Multicore Navigator and TeraNet which further enable low latency data movement within and across devices.
The new PayPal platform essentially treats Complex Event Processing as Digital Signals. “You turn the data into a signal that can be analyzed in hardware,” says Quick. And with eight DSPs per SOC, they can ingest many signals at once and then pattern recognize against all of them simultaneously. The system is also quite efficient: it runs at 55 watts per cartridge (4 SoCs/cartridge) and delivers an impressive 11.2 gigaflops-per-watt. As a point of comparison, the most energy-efficient supercomputer in the world as per the most recent edition of the Green500 list – TSUBAME-KFC in Japan – offers a more modest 4.4 gigaflops-per-watt.
The application is currently available to the public through Texas Instruments. The product includes OpenCL, the full development kit and the software.