Skeleton vs Texture: Human behavioral experiments on object recognition in VR environment

Understanding the mechanisms underlying human object recognition has been a fundamental pursuit in cognitive neuroscience and vision science. By proposing object recognition strategies based on distinct object properties: structural cues dominating for rigid objects and textural cues for deformable objects, we can address fundamental questions about how the visual system optimally processes different object categories.

This essay presents an examination of my hypothesis that object recognition follows different pathways depending on the perceptual rigidity of the target: for rigid bodies, skeletal (physical) structure dominates identification processes, while for non-rigid (deformable) bodies, texture becomes the primary determinant.

Skeletal Structure in Rigid Object Recognition

Recent neurophysiological and behavioral evidence strongly supports the importance of skeletal representations in object recognition. Ayzenberg and his colleagues demonstrated that skeletal similarity was the most predictive factor in human object similarity judgments when compared to other computational models including image statistics, neural networks, and spatial relations. Participants consistently categorized objects based on their underlying skeletal structure even when surface contours and non-accidental properties were manipulated, suggesting that skeletal descriptions provide a privileged source of information for object identification.

A potential threat for Neanderthals

The idea that rigid bodies are more rapidly recognized based on their physical structure could be supported by evolutionary perspectives on perception. Throughout human evolution, the ability to quickly assess the potential threat posed by objects in the environment — such as rocks, tools, or animal bodies — has been essential for survival. Rigid objects inherently threaten harm because they maintain their shape and mass upon contact, making their overall physical appearance a critical cue for immediate danger assessment. In response to these selective pressures, the visual system has evolved to prioritize the rapid extraction of a “gist” or summary representation based on skeletal structure, allowing fast discrimination between harmful and harmless objects.

This capacity to swiftly recognize and categorize objects by global shape likely reflects a primitive system of object representation shared by other primates as well, wherein abstract properties tied to physical structure guide behavioral responses independent of detailed experience or cultural knowledge. Contemporary neurobiological and behavioral studies demonstrate that humans and animals (broadly speaking) rely on this shape-based processing system for quick threat evaluation, illustrating its fundamental role as an evolutionary adaptation for safety and efficient interaction with the environment.

The skeletal approach also offers the following computational advantages for rigid object recognition.

Skeletal representations provide a compact description of global shape structure that remains relatively invariant to local contour variations.
Skeletal representations offer a quantitative metric for computing shape similarity that closely matches human perceptual judgments.
Skeletal structures capture the hierarchical organization of object parts, enabling recognition across different viewing angles and partial occlusions, making a more robust model for a biological system.

Texture Dominance in Deformable Object Recognition

Still a pile of clothes

For deformable objects, on the other hand, texture and material emerges as the primary recognition cue, due to the inherent instability of structural features. Research by Schmid and Doerschner claims that when objects undergo non-rigid deformations, the contribution of surface optics and mechanical properties gradually increases for the object perception, with surface texture playing a crucial role in maintaining object identity despite shape changes. The human visual system demonstrates remarkable robustness in perceiving non-rigid object motion across various material conditions, suggesting specialized processing mechanisms for deformable entities.

Material texture again provides several advantages for deformable object recognition, when the object’s physical structure cannot work as a strong cue.

Surface properties such as reflections, color patterns, and microscopic texture elements mostly remain stable during deformation, providing reliable identification cues.
Texture information is processed through specialized neural pathways that can operate independently of shape processing, allowing for robust recognition even when structural cues are compromised.

Jumping into Object Recognition in Virtual Reality

Virtual reality environments offer opportunities to study object recognition mechanisms under controlled experimental conditions while maintaining ecological validity. VR allows researchers to systematically manipulate object properties, including skeletal structure and surface texture, even in ways that would be impossible or impractical in real-world settings.

VR enables precise control over visual parameters including lighting conditions, viewing angles, and object properties while maintaining naturalistic interaction patterns similar to the real-world. This includes physical (mechanical) properties such as friction, elasticity, stress, etc.
Experiment designers can create novel object categories that isolate specific visual features without confounding variables present in real-world stimuli.
Real-time manipulation of object properties during recognition tasks enables thorough investigation of the whole recognition processes.

Creating rigid bodies: Articulated bodies

Articulated bodies: Rigid body movement with constraints

WIP

Creating non-rigid bodies: Deformable surface

Modeling deformable surface with elastic springs

// Springs: Hooke's Law applied to each edge
vector<float> force = material.stiffness * delta + material.damping * velocity;

// Bending: Each vertex (point mass) creates a torque, which causes bending
float size = material.bending * math.acos(dot) / math.PI;
acceleration[edge.bendingNode] = acceleration[edge.bendingNode] - edge.bendingNode.normal * size;

// Gravity: Always there
vector<float> gravity;

WIP