The possibilities for Statcast are more constrained by the human imagination than any technological barrier.
Photo by Aaron Gordon
On a perfect early September evening, Greg Cain, the senior director of sports data at MLB Advanced Media, gazed out at the pristine Fenway Park field from the bar above the right field stands.
I wondered if Cain saw the field the way I saw it, with nine players playing defense and one at bat and a whole lot of jokers in the dugouts, or if he could see what Statcast, MLB's state-of-the-art tracking system, saw: the ball being captured, at 40,000 frames per second, in motion slower than human thought can conceive; the players being tracked with unprecedented accuracy by digital eyes; the complexities of each play being distilled into thousands of lines of code, Matrix-style, as the border between the digital and "real" world imperceptibly merges like train tracks on the horizon. He probably sees both.
Statcast measures every activity on the field down to the microsecond, including the very spin of the ball as it hurtles towards home plate. This is the first season the technology is active in all 30 stadiums, marking what several MLB employees referred to as the beginning of the Statcast Era. It could revolutionize the way baseball is watched, analyzed, taught, and even thought about, because it will give spectators and general managers alike a view of the game unlike ever before.
This season, an extra broadcasting truck traveled with the MLB Network team solely for Statcast. VICE Sports was given exclusive access to see how this process works first-hand from inside the truck—which is kept a cool 63 degrees to prevent the electronics from overheating—during a recent Yankees-Red Sox game. During the game, MLB Network segment producer Mike Treanor directed his operator, Matt, to prepare potential Statcast segments for the live broadcast. The two sat side-by-side, faces lit by the glow of LCD screens showing every angle and graphic. If they were able to put a Statcast graphic together quick enough, and felt it told a good story, they'd send it over to the main broadcasting truck.
In the bottom of the fifth inning, with the Yankees down 1-0, Stephen Drew came to bat with runners on second and third. On a 2-0 count, Rick Porcello threw an 83.34 mile-per-hour changeup. On that pitch, Porcello extended 7.02 feet, meaning his release point was a little over seven feet from the front edge of the rubber. Pitches released closer to the plate will appear faster to batters, so Porcello's changeup had a perceived velocity of 84.51 mph, not much faster than its actual velocity. Just after leaving Porcello's hand, the ball spun at 1,534.2 rotations per minute.
All of this data tells us that, in a game he otherwise dominated, Porcello threw a bad pitch: a below-average version of his changeup, his worst pitch. Porcello throws his changeup only 7 percent of the time, and this particular changeup's spin rate was 6.6 percent slower than his season average. This meant the ball hung over the plate long enough for Drew to get the barrel on it.
The ball sailed 295.42 feet towards the left-center field wall at 104.11 miles per hour. Both runners scored as Drew circled the bases, reaching a top speed of 18.3 miles per hour. The two-run double proved to be the game-winning hit.
On Statcast, the play looked like this:
On his laptop, Cain can pull up any play from any team this year and see an animation just like the one above, along with the broadcast footage, and every line of code for the play. The possibilities for Statcast, a true marvel, are constrained more by the human imagination than by any technological barrier.
"I've been very tight with this data and the system as the whole for a year, and I still don't even see the end goal," Cain shouted over the crowd. "It's just, in every direction, it's exponential."
These days, almost every statistical innovation is erroneously compared to Moneyball. Statcast is no exception. Newsweek called it "Moneyball on steroids," Wired dubbed it "Moneyball II: The Reckoning," and Fortune referred to is as the stuff of "Moneyball geek dreams." Despite these comparisons, Statcast is not Moneyball. Baseball is well past distilling existing stats through a new prism to uncover previously undervalued strategies.
If anything, Statcast innovates in the opposite direction of the Sabrmetrics and Moneyball movement. Moneyball covered the macro. Statcast focuses on the micro. While Moneyball prized knowledge about how players performed over large quantities of time to detect hidden value, Statcast measures players down to fractions of a second to precisely quantify what is right in front of our eyes. This is both the magic of Statcast and also the source of its biggest criticism. If it tells us what we can already see, then why do we need it? Do we really need this military-grade technology to tell us Porcello shouldn't have thrown his worst pitch to Drew?
Statcast is made up of two different components: Trackman tracks the ball, and Hego eyes the players. Trackman is based on missile defense technology, and uses radar to measure the ball's movement at an astonishing 40,000 frames per second, which gets pared down to 100 frames per second to filter out noise. Those 100 frames per second are used to calculate the ball's three-dimensional coordinates. In addition, Trackman uses Doppler radar techniques to detect the spin of the ball by measuring the speed of the seams. "The end result," Cain says, "is a series of polynomials representing the trajectory of the ball, as pure as possible, all things considered."
Then there's the Hego system, which uses cameras to keep track of the players. There are six cameras per park, three for each side of the field. Each set creates a live panorama of the field and, using triangulation, can determine the position of any given player on the field at 30 frames per second, the same frame-rate as television broadcasts. The cameras calibrate their position by using ballpark landmarks, such as the 379-foot sign on the left-center field wall at Fenway.
The Trackman data then gets merged with the Hego system to create one complete picture of everything happening on the field.
Trackman determines when a play starts based on when the ball leaves the pitcher's mound. From that point on, every frame produced by Hego gets an ID number, which gets synchronized with the Trackman data. Then, the play gets tied to MLB's scorer, so that all the data it just collected can be connected to everyday baseball language: a double to left-center by Stephen Drew, two runs scored.
Players can now be compared based on objective measures that require little to no context, such as the speed at which the ball leaves the bat, how many rotations per second a changeup has, how fast a player runs, and the route efficiency of an outfielder tracking down a fly ball.
This revolutionary data set is being applied in two different ways. First, fans are being introduced to the data through MLB Network broadcasts and, soon, through regional sports broadcasts as well. For now, the broadcast ambitions for Statcast are relatively modest, as fans and the crew adjust to its capabilities. "Part of it is a gut reaction," Chris Pfeiffer, who runs the MLB Network broadcasts, says about the decision to include Statcast data.
"When there's a home run, as a fan, you're wondering, how far did that go? So, we kind of address that for the fan by doing the distance and some of the other metrics that come along with that." Pfeiffer thinks it will take viewers time to adjust. "As everybody gets more comfortable with it," he said, "they're going to expect the numbers on a great diving play in the outfield."
Treanor, the segment producer at the Boston game, said the Statcast info about Drew's double didn't go into the broadcast because it was a routine double to left, and utilizing the data for a deeper explanation would require more than the 20 to 30 seconds the broadcast structure allowed.
The second application of Statcast data is how teams will utilize it to analyze player performance. Cain says teams will need data scientists, statisticians, and even data visualization experts to create their own metrics with the massive amounts of data. One American League team executive said the biggest challenge is figuring out what to do with all the information—not just how to analyze it but also simply how to store it.
"The amount of data is staggering," he said.
For each game, Cain estimates, Statcast collects roughly 80 gigabytes of video and data (mostly video, since the data is simply lines of text which doesn't take up very much storage space). The challenge, Cain says, lies in "knowing how to look at the data, calculate all those elements, building a system to ingest our document, and then process this—and at scale, too, because we're talking about 2,500 games a year, around 400 plays a game, we're talking 200,000 plays a year."
This mountain of data is why Pfeiffer and Cain both consider 2015 to be Year One of a new baseball era, the Statcast Era. The longer Statcast is in action, the more teams will be able to do historical comparisons. In 15 years, a general manager will be able to analyze how, for example, Jacob DeGrom's slider has changed over time, down to micrometers on its drop or the number of individual rotations it makes. Is he losing enough that he needs to lean heavier on his changeup? What about his fastball? Should he get a new contract? When, 40 years from now, a player comes along whom everyone calls the next Mike Trout, broadcasters will be able to look at his bat speed, exit velocity, route efficiency, speed on the base path, and other pure measurements of baseball acumen to answer that question.
We are not there yet. Because this is only Year One, the numbers are of somewhat limited use due to the limited context. Pfeiffer says the things fans want most are comparisons for more arcane measurements such as exit velocity, for which units of measurements don't tell us a whole lot. What is an average exit velocity? Above average? Great? Until this year, we had no idea. Now, we know Giancarlo Stanton hits the ball harder than anyone else in the Majors, with 14 of the 25 hardest-hit balls this season, including five of the top six. The longer Statcast collects data, the more Pfeiffer can provide this context, and the more interesting these comparisons will be.
This, more than anything else, is why Statcast bears little resemblance to Sabrmetrics. WAR, for instance, is a mathematical composite that indirectly represents what you see on the field, which isn't to dismiss its usefulness, but it's at least in part why traditionalists have such a hard time accepting it. On the other hand, Statcast is providing the details for what we've been seeing all along, for what we've always known is important. Rather than condensing a thousand stories into one number, it's turning one story into a thousand numbers.
"My point about the system is it's really measuring skills that are just there," Cain told me just before the first pitch was thrown, gesturing towards the field. "Like, you're seeing it. Whether you believe that measurement or not, that's on you. I mean, we just know that guy is running 21 miles per hour. Whether you care about that or not, it doesn't matter."