Monday, January 25, 2016

Pondering Point Patterns

Tennis players are actors in a social network of fellow competitors, and every exchange of shots in a rally can be seen as a conversation that is part of a narrative encompassing each new achievement in the sport. Do some players try to have the same conversation with every opponent they face? Do some have a more expansive vocabulary and select specific patterns of play every time they face specific opponents? Are some players able to change the conversation when it’s not going their way? Are there ‘truthy’ stories that sports journalists and experts spin to keep the banter lively and the audience engaged? Can statistical analysis reveal whether it makes any sense to be asking these questions?

Read the full article... published on The Tennis Notebook.

Saturday, January 23, 2016

The Refactor Factor - Pursuing Patterns


I've spent the past few months re-factoring the codebase to support a range of new features and data sources that will surface in 2016.  It may not appear that a great deal is happening, and I certainly haven't produced what I thought I would produce in the timelines I've envisioned, but from a cold start less than a year ago I'm finally producing code that I'm willing to share with the public on Github.
The "Match Radar" had a significant overhaul and was my first use of a "reusable" and "updating" approach to D3 chart design.  The release of the source for the "reusable, updating radar chart" was picked up by "Building Widgets" and resulted in an R derivative of the chart, "d3radarR".

More significant for the future of, however, was the redevelopment of the Points-to-Set Chart, which resulted in the creation of a "Universal Match Object" (UMO).  The UMO "understands" the structure of tennis matches and serves as a "validating" container for point data from which numerous "views" of a match may be generated. Points-to-Set is one such "view".
I recently used the UMO to validate 96,000+ matches with point-by-point data which were provided by Jeff Sackmann. (You can read about the results of that analysis here).  As I refactor my codebase the UMO has become central to the integration of disparate data sources.

A great deal of my focus of late has been re-visiting the data which Jeff Sackmann and his team of volunteers have been generating over at the Match Charting Project.  The number of charted matches has been growing by leaps and bounds.  When I started I believed I would be far more focused on data that can be captured using applications such as Pro Tracker Tennis, but once I expanded the project to support multiple data sources the draw of the larger (and growing) MCP dataset began to exert more influence, especially as there is a broader audience for visualizations of professional tennis matches, and a greater opportunity for conversations and collaborations with others who are interested in analyzing the data.

Patterns of Play

Last week Nikita Taparia and Jeff Sackmann launched the Tennis Data Storytelling Challenge.  Since one of the goals of is to showcase visualizations of tennis data, I elected to be one of the sponsors of the challenge.  I will showcase the top three visualizations that result from the challenge and endeavor to present them in such a way that they will be able to utilize the growing database of matches in real time.

To help promote the challenge I wrote a piece for The Tennis Notebook: "Pondering Point Patterns", and to further facilitate the exploration of MCP data I've been busy creating some MCP-specific tools.  The UMO has been central to this effort.  At this point .CSV files generated by the MCP spreadsheet are being validated by the UMO, which is then providing a standard API for querying and manipulating match data.

One of the most significant features of the Match Charting Project is that it is possible to capture not only point progressions and rally lengths, but also some details about specific strokes and shot placements during rallies.  Working with MCP data is driving the development of the UMO to a point where it will not only "understand" the structure of matches, but also the layout of the court.

Over at Jeff has written a number articles based on analysis of shot-specific data. One of the first articles that caught my eye (there are some color charts) examines the effectiveness of "Return-of-Serve" placement; another article looks at the tactical importance of "finding" an opponents backhand.  On the whole, however, there hasn't been many explorations of this rich portion of the MCP data.  At I've not even touched it, until now...