One of the main buzzwords emerging from the tech world the past two years has been “big data.”
But from a non-techie perspective, what exactly is big data? And how does it work?
I sat down with Barry Eggers, the Managing Director of Lightspeed Venture Partners the other day to discuss his unique approach to the subject of big data. He likes to explain its impact by focusing on how it is beginning to transform major league baseball.
Besides, now that the holidays are past, if you squint hard enough, you can see spring training out there near the edge of the horizon already. So it’s time to talk baseball!
“I've been told that every MLB team has a data scientist,” Eggers told me. “When I heard that one of them, a small market team that gets a lot of wins for its investment, the Tampa Bay Rays, uses the Hadoop Cluster–the same tech that Google uses–I knew we had truly entered the era of big data for baseball.”
What he was referring to is the approach Google and others use to parse large amounts of data quickly.
“It can be understood by the way Google returns search results,” Eggers continued. “If you can index the entire web you can get better results. When you type in keywords at Google, it sends the words to thousands of machines in parallel and gets back results almost instantly; by breaking a problem into small pieces and doing parallel analysis you get more accurate results more quickly. That's what's behind the innovation in big data.”
He is not just talking about current information but the ever-growing databases since the world started going digital two decades ago.
“What's exciting is we cannot only look at today's data but look back at some 20 years of data and process it right away.”
So let’s turn back to baseball. As any casual fan or reader of "Moneyball" knows, baseball has long been a game heavy on statistics. We have batting averages, earned run averages, on-base percentages, slugging percentages, strikeout-walk ratio, fielding percentages, and in more recent times, more exotic stats like OPS and WHIP.
But these are all based on structured data like wins, at-bats, hits, errors, runs, and so on.
What excites Eggers is all the new data, much of it unstructured, being generated in modern ballparks wired with cameras and technologies monitoring every play from every angle, all game long.
“Now ballparks are outfitted to track more unstructured data like the trajectory of the baseball. Cameras are in place all over the ballpark to capture data that never was captured before like the trajectory of every pitch or the path to the ball taken by an outfielder. There is a 3D trajectory of every ball hit in that stadium. Along with wind and weather conditions, humidity, the dimensions of the park. That's a lot of data.”
It's what you might call “big data” that's exactly the kind of data scientists are trained to develop algorithms to sort into meaningful information.
One example that has been brought into game broadcasts already is the K-Zone – a virtual yellow box superimposed on your TV screen that indicates whether any given pitch passed through the strike zone or not.
Over time, as data scientists employed by teams gather more and more information about pitch selection, pitch trajectory, a batter’s decision to hit, the ball’s trajectory, a fielder’s path to the ball, and so on, one can imagine a manager like Bruce Bochy flashing signs to Buster Posey, who relays them to Matt Cain, based on algorithmic predictions of what is most likely to get the current hitter out on the next pitch or sequence of pitches.
Bench Coach Ron Wotus, meanwhile, will be able to better position his defensive lineup in anticipation of which parts of the field the ball is most likely to go to, based on a data scientist’s current analysis of all those factors.
Eggers believes as this kind of big data crunching takes hold in major league baseball, the initial advantage will indeed be to the defense.
Of course, professional coaches and players already make all of these decisions in real-time, based on extensive statistical analyses plus a combination of what they feel in their gut.
But what Eggers knows is that data scientists and big data analysts can provide a more scientific basis for those in-game decisions–not to mention those all-important, front-office player acquisition decisions–that help make a team like the Giants so successful.
“The most important acquisition for teams this off-season might be a great data scientist,” says Eggers. “And in that, the home field advantage for A's and Giants is we are at the center of the universe for data scientists.”
Lightspeed is a VC firm based in Menlo Park, and has backed over 200 IT and consumer tech companies including Shoe Dazzle, The Honest Co, Kixeye, TaskRabbit, Riverbed Technology, Data Stax, and MapR Technology.
As for Lightspeed’s investment philosophy, Eggers says the firm tries to identify big themes that are so disruptive that in five years or so, there will be lots of opportunities for lots of players.
As for predicting the future and where a trend is headed (always a tricky proposition), he turns to another sport, hockey, and arguably the greatest hockey player of all time.
“Wayne Gretzky said, ‘A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be.’ What I love about working with great entrepreneurs is their ability to see where the market is going two, three or even five years from now.”