Flipping the Data Collection Script
A DISCUSSION OF
The Measured BodyData never really die. The process of data generation requires abstraction and decontextualization away from the whole of what it aims to represent, classify, trace, and analyze. Like the body measurements of those white, male cadavers that Mona Sloane, Abigail Z. Jacobs, and Emanuel Moss describe in “The Measured Body” (Issues, Fall 2025), data drift downstream, along with the residue of their collectors’ choices, detached from original contexts or purposes.
The privacy aftereffects of health and biometric data gathering are profound. Still, the “But, privacy!” argument can ring hollow. I’ve heard researchers complain about the “nontrivial task” of gathering data reflecting human body poses, movements, and gestures, and about privacy regulations limiting use of that data. Some even resort to lists of online video links and timestamps that mark, for instance, when an infant’s leg twitches or head bobs. This way, the video data can be scraped to train artificial intelligence models detecting early signs of illness.
The people whose torsos and limbs don’t fit the mold of the homogenized, small-sample datasets that built fault-prone software monitoring their movements might also question the cries of data minimization and restriction. They may want better datasets to improve the systems that watch and sometimes penalize them.
The authors recognize this ethical tug-of-war. To bring “the tools of knowledge production closer to people’s lives,” they envision a roaming, social media-friendly, data gathering unit. But whose products is this knowledge production mission designed to build and whose goals is it designed to achieve?
An ongoing national health effort intended to build an open-source genetic dataset reflective of minority populations has met criticism from some in the very groups it aims to include. Overcoming entrenched, generational distrust of sensitive data harvesting will be difficult, particularly as more people push back against health data collection and advocate for collective privacy rights.
Whose products is this knowledge production mission designed to build and whose goals is it designed to achieve?
Indeed, some of those whose lives researchers aim to improve through better AI training data might spot a data-gathering bus coming a mile away, perceiving it as another means of communal data exploitation and collective privacy invasion. Commercial DNA data breaches exposing sensitive information of small communities don’t help the crusade for enormous, generalized biometric datasets.
The authors offer an example involving sign language that reflects a more purposeful reform model for data collection. Sure, it’s better than the old way. But as well-intentioned the motives, even the authors state that it is the companies that “stand to benefit the most” from deployments of applications that might be built from the “vast mocap dataset” they propose.
Consider the worldwide Indigenous Data Sovereignty movement, which encourages flipping the data collection script entirely, not just tweaking the template. All sorts of communities can decide for themselves if and when they need, want, or are ready for data collection, what tools they want built from their data, or if such tools are desirable at all.
There are no easy answers here. The impulse to improve flawed systems is a good one. But these data collection challenges could impede goals for building mocap datasets that truly reflect the panoply of body shapes and sizes that move and commune in the real world.
Kate Kaye
Journalist and researcher of data use in complex systems
RedTail