Written by Joel Engardio for MapD CEO Todd Mostak
From Video Games to Game-Changer for Data Insights
How MapD’s very fast GPU-accelerated SQL query engine and visual analytics platform transforms industries
By Todd Mostak
I asked the audience a pressing question at the Quandl Alternative Data Conference: “How many of you know you can do more with GPUs than play Doom or Quake?”
The crowd laughed as they raised their hands. Many indeed have seen the emergence of GPUs as an alternative compute platform. While graphics processing units and graphics cards were originally designed to play video games, it didn’t take long for gamers to see the greater potential of GPUs in their work lives.
People realized that rendering pixels through a screen was a fundamentally parallelizable problem. They also realized that you could do all sorts of very fast computes. GPUs have thousands of cores and they can compute lots of things in parallel and that makes them good for all sorts of tough computational problems.
So what is the real-world application of this GPU power? While I’m not a quant expert and MapD doesn’t provide training strategies or data, the MapD platform helps people find alpha in the markets, explore large data sets and test hypotheses in real time.
MapD is an open source SQL engine that runs on multiple GPUs per server and multiple servers per cluster. It can literally run queries over billions or tens of billions of records in milliseconds.
This helps companies get insights quickly and deploy those insights in the market faster. The financial services industry — particularly investment banks and hedge funds — have benefited as they look for an edge or advantage in the large data sets they use.
MapD happens to do a lot of geospatial analytics now. But when I started MapD with my research at MIT, it was originally meant as a very fast database with an extension of fast visual analytics on top.
MapD solves the problem of visualizing massive data sets. At the Quandl conference, I showed the audience how to find important consumer behavior insights buried in a data set with millions of rows and 260 different features. I also demonstrated how to identify the strongest variables in this data set with an eye toward training and segmentation models.
This is all possible thanks to a decline in CPU processing power, a rise in data and the emergence of the GPU. It’s worth noting some history to map how MapD came to be.
Until the early 2000s, CPUs were getting 30 to 40 percent faster every year. Then engineers hit a speed wall. It’s been difficult to sustain those performance gains and today there might be only 15 to 20 percent growth in CPU processing power annually.
At the same time, the growth of data has exploded. And not just from traditional sources like enterprise CRM data. Now data is piling up in point of sale, credit card spending and social media usage. Not to mention all the sensors and telemetry from cars and phones. Basically, anything that’s an asset and moves is likely sending data to the point that people are drowning in it.
But people know there is a lot of value in this data if they could actually just pull the insights out of it.
That’s where GPUs come in. With a different processing paradigm than CPUs, performance by GPUs has been able to grow more than 50 percent each year. This unprecedented pace gives people a new weapon to keep up with data.
For example, a CPU serve might have 10 or 20 very fast cores. Compare that to a GPU server with more than 40,000 cores per server. Even though the GPU cores aren’t as fast or smart as the CPU cores, you have a massive amount of parallelism at your disposal.
That’s where MapD comes in. Would you rather have 10 or 20 knights or 40,000 angry peasants? People have been able to transform their industries by having so much power at their disposal.
Scientists were the first to leverage parallelism power for things like protein folding and simulating nuclear explosions. Then the machine learning industry benefited with a leap in deep learning and neural networks being accelerated by GPUs.
I’m convinced the final frontier is being able to use GPUs for general purpose data analytics. These GPUs not only have a ton of computational bandwidth, but they also have a vast memory bandwidth that allows them to scan terabytes of data per second.
I think that makes GPUs an excellent tool for interactive data analysis and query. And that’s what I built MapD for.