Big Data and Horse Race Results: Inside the World of Predictive Analytics

Oakland A’s general manager Billy Beane became famous for pioneering predictive data usage in baseball. Back in 2002, faced with a limited budget and the departure of three heavy-hitting free agents, Beane and his statisticians built a cost-effective roster by betting that a player’s on-base percentage mattered more than batting average, home runs, RBIs, and other traditional stats.

After a disappointing start to the season, the A’s went on a 20-game winning streak later in the season. Since 2002, Beane’s A’s have won the AL West five times, ushering predictive analytics into every baseball team’s front office.

Big data has slowly taken hold in many sports, including horse racing. The trick, as “Big Data” author Viktor Mayer-Schöenberger told the Wall Street Journal, is to measure the right things, not just the things that are easy to quantify. It’s not easy to figure out which data points actually predict a winner. Even as analysts get better at it, they can’t predict any result with 100 percent certainty.

How Big Data Became Possible

The volume of data generated in the world has increased drastically thanks to the Web, mobile technology, and cloud computing. The proliferation of computers, starting in the 1960s, allowed all kinds of organizations, from sports teams to businesses, to start storing large amounts of data.

When people ventured onto the Web in the 1990s, companies like Google began collecting passive information on their browsing habits. Mobile technology, ranging from smartphones to programmable refrigerators, has made it easier than ever to collect data. Cloud computing means that almost anyone can store large amounts of data and harness the computing power to analyze it. Experts estimate that about 90 percent of the world’s stored data was collected in just the past few years.

Predicting a Winner

Analysts can review data points such as how far a horse travels during a race — or how much ground a horse covers relative to a race result — to determine whether covering less ground leads to more victories. To the human mind, horses that cover the least ground during the race have a bigger chance of winning the race, but that might not be true. A horse that covers more ground per race but demonstrates bursts of free and clear speed can erase the disadvantage of covering more ground.

Analyzing huge sets of horse race data could answer multiple questions, including:

  • How relevant is distance traveled per race to a horse’s winning percentage?
  • Which factors make longer distances less of a problem for certain horses?
  • Does slow and steady truly win the race, or do winning horses follow another racing pattern?

These examples are only the beginning of what big data analytics could do for horse racing. For instance, a computer could compare a directory of thoroughbred race tracks and race courses with race data from a list of horses to determine which horses will be more likely to win at each track. Predictive modeling can even figure in other factors, including weather, diet, jockey, and even reports of the horse’s mood, to predict each horse’s chance of winning or placing on race day. The challenge is to figure out which factors truly predict which horses will win.

Betting the Analytical Way

Race tracks are littered with the dollars of people who choose horses because they just have a feeling about which horse will get lucky on any given day. By the time a mobile app puts horse racing predictions at people’s fingertips — and it’s only a matter of time until that happens — some worry that big data will suck all of the joy out of predicting which horses will win.

One of the architects of Oakland’s success, statistician Paul DePodesta, thinks it’s unwise to ignore human judgment completely. Although data analysis can lessen uncertainty, it’s impossible to flawlessly predict a human’s future performance. Today, DePodesta crunches numbers for the New York Mets now, relying on a blend of data, traditional scouting, and potential player development when picking for its roster. Even after years of statistical analysis, DePodesta still thinks there’s a human element to playing moneyball.

Most likely, big data will sweep over horse racing, but it won’t eliminate the thrill of the bet. There’s always something intangible that goes into making a horse a winner — on one particular track, on one particular day.

Leave a Reply