Workshop Example: Data Sprints

data sprint examples

A crucial element of the Data Inquiry approach is the collaboration between the expert/apprentice data scientists and the actors engaged in actual societal situations. This collaboration helps the data scientists to consider not only the technical dimension of their intervention, but also the context in which it takes place and the conditions that can make it more than a simple exercise of technical skill. Civil society groups, for their part, will find not only help in the collection and treatment of data, but more importantly a fresh perspective on how their action can exploit digital records.

In the first phases of a project, the collaboration between data scientists and civil society organisations can happen remotely and asynchronously, by an exchange of email that allows to initiate a dialogue and converge to a shared definition of a possible joint project. Yet, the experience has taught us that the best format to actually carry out the bulk of the collaboration is an intensive time-boxed workshop, where the two groups are present and work together (if possible, in the physical presence of one another) for at least 2 or 3 days and, if possible, for an entire week.

Over the years, the researchers of the Public Data Lab have developed a specific template for this kind of workshopping, learning from the open-source format of barcamps and datathons, and adapting it to the specificities of academic research and teaching.

Data sprints import two things from its open-source predecessors:

  1. The ‘quick and dirty’ (or ‘design to cost’) approach. The short and intensive nature of data sprint shields these events from the dream of exhaustivity sometimes associated with ‘big data’. Participants know that they will only be able to treat a limited quantity of records and that they will only achieve imperfect results, but they accept such constraints more as a challenge than as a flaw. Making the most out of light infrastructures, simple logistics and agile organizations, participants are aware that their work should reuse code and data gathered in earlier projects and that their outcomes should become the basis for further ventures.
  2. The heterogeneity of the actors involved. The need to achieve deliverable results by the end of the event requires the gathering of all competences required both as in terms of technical skills and in terms of the knowledge of the social situation at stake (hence the importance of convening all the participants during the sprint).

Unlike hackathons and barcamps, however, data sprints are always preceded by extensive preparation. Because the time available during a data sprint is limited, it is crucial to carry out some activities before the data sprint:

A careful preparation allows to dedicate as much as possible of the time allotted to the workshop for the activities that can only be carried out during the data sprint:

Finally, a greater follow-up than hackathons and barcamps is necessary after the data sprint. The ‘quick and dirty’ approach that characterizes the sprinting days should be complemented by an extensive work of refinement and documentation, in order to make sure that the work of the sprint actually bears fruit and generates the desired societal outcomes. Besides following-up on the specific objectives of the spring, efforts should be invested in making datasets, scripts and visualizations reusable beyond their original projects. Sprints should remain faithful to their open-source roots and ensure that all the data, code and content produced are freely available through open licenses.

A more detailed description of the data sprint format can be found here