An amateur paleontologist's discovery of a seashell-like rock in the Saudi Arabian desert leads to a fascinating mathematical exploration of shell morphology and evolution.
In the vast expanse of the Alghat desert in Saudi Arabia, far from any coastline, a remarkable discovery was made: a rock that eerily resembles a seashell. This peculiar find, located at the base of a cliff with the nearest coastline in Dammam approximately 500 kilometers away, presented a geological puzzle that sparked an intriguing scientific investigation.
The discovery, documented in a GitHub repository by an anonymous researcher, represents a fascinating intersection of geology, mathematics, and evolutionary biology. What begins as a simple observation of a rock formation transforms into a sophisticated exploration of how we can understand ancient life through mathematical analysis of form.

Geological Context: When Deserts Were Seas
At first glance, the presence of a seashell-like rock in the desert seems impossible. Yet, as the author notes, carbonate rocks, marine fossils, coral fossils, and sedimentary structures are commonly found throughout the Arabian Peninsula. These geological remnants tell a story of a time when this arid region was submerged beneath ancient seas.
Specifically, during the late Jurassic period, approximately 150 million years ago, what we now call the Arabian Peninsula existed as part of a shallow marine environment. This explains why marine fossils can be found far from any present-day coastline. The stratigraphic evidence reveals a complex history of sea level changes, sedimentation, and tectonic uplift that eventually transformed these marine environments into the desert landscapes we see today.

The Citizen Scientist Approach
Faced with this intriguing fossil, the author—lacking formal paleontological expertise—embarked on a unique do-it-yourself investigation. Rather than seeking professional analysis, they decided to approach the problem through mathematical morphology: the study of shape and form.
"The proper way of answering these questions is to conduct a detailed analysis of the fossil," the author acknowledges, "this should be done by an expert paleontologist. However, I know no paleontology, or any paleontologist, so I figured I could DIY it myself (how hard could it be..?), though I'll do it strictly via its shape — or what's called its morphology."
This citizen science approach represents an interesting democratization of scientific inquiry, where computational tools and publicly available datasets enable individuals without specialized training to engage with complex scientific questions.
Mathematical Representation of Shell Morphology
The core of this investigation lies in representing shell shapes mathematically. The author utilized a comprehensive dataset containing 7,894 different species and 59,244 images of shells from the Zhang, et al. shell dataset. This rich dataset provided the necessary foundation for comparative analysis.

Capturing 'shape' mathematically presents several challenges. Any object can be rotated, scaled, and translated, making direct comparison difficult. To address this, the author developed a normalization pipeline:
- Centering: Each shell was centered to the midpoint of the image
- Scaling: The scale of all shells was made equivalent, with the maximum distance from the origin set to 1
- Orientation: Pitch and yaw were addressed by selecting only samples where the shell's opening faced the camera. For roll rotation, the longest radius was used as a reference point, with shells rotated so this radius always appeared on the right
After normalization, the contour of each shell was extracted to 256 points relative to the center, resulting in a 256×2 matrix representing each shell's shape. This mathematical representation enabled quantitative comparison between different shells.
Dimensionality Reduction and Principal Component Analysis
With each shell represented as 256 points in a 256-dimensional space, the author faced the challenge of visualizing and understanding this high-dimensional data. The solution was to apply dimensionality reduction techniques, specifically Principal Component Analysis (PCA).
PCA transforms the original high-dimensional data into a lower-dimensional space while preserving as much of the original variance as possible. After applying PCA, the author found that just two principal components could capture 67.25% of the variance in shell shapes—meaning most of the important information about shell shape could be represented in just two dimensions.
The most fascinating aspect of this analysis is interpreting what these principal components represent. By examining shells at opposite ends of each principal component axis, the author determined that:
- PC1 primarily captures the 'pointiness' of shells, accounting for over 50% of the variance in shell shapes
- PC2 appears to represent symmetry or mass distribution along the vertical axis

Mapping the Space of Shell Shapes
The resulting visualization reveals intriguing patterns in shell morphology across species. The plot shows PC1 on the x-axis and PC2 on the y-axis, with color representing shell roughness (calculated as the difference in slope between consecutive points).
Several interesting observations emerge from this mapping:
- Round shells (negative PC1 values) are far more common than pointy shells (positive PC1 values)
- Despite their abundance, round shells show less diversity in shape compared to pointy shells
- Pointy shells tend to be rougher than round shells
- No round shells in the dataset show significant asymmetry (all have PC2 values close to zero)
This mapping provides a quantitative visualization of shell diversity across species, highlighting patterns that might not be immediately apparent through qualitative observation alone.
Identifying the Alghat Fossil
With this mathematical framework in place, the author could finally address the original question: what does the Alghat fossil most closely resemble?
The analysis revealed that the fossil bears a striking resemblance to Sphincterochila candidissima, a species of land snail. However, this presents an interesting temporal discrepancy. While the Alghat fossil dates back to the Jurassic period (approximately 150 million years ago), the earliest known fossil of Sphincterochila candidissima is much younger, dating back only about 38 million years.

This temporal mismatch suggests several possibilities:
- The identification based solely on morphology may be incorrect
- The lineage represented by Sphincterochila candidissima may have ancient origins that are not yet represented in the fossil record
- The similar shape may result from convergent evolution, where unrelated species evolve similar forms in response to similar environmental pressures
The author favors the convergent evolution hypothesis, noting that "its eerie similarity to the Alghat fossil is still fascinating, and perhaps points to some sort of convergent evolution, where two different species evolve to have similar shapes due to similar environmental pressures."
Implications and Limitations
This investigation highlights both the power and limitations of morphological analysis. While shape provides an intuitive and accessible approach to comparing organisms, it has significant limitations:
- Convergent evolution: Similar shapes can evolve independently in unrelated species
- Developmental constraints: Organisms with different evolutionary histories may be constrained to develop similar forms
- Environmental adaptation: Similar environmental pressures can lead to similar solutions across lineages
The author acknowledges these limitations, stating that "shape is not the best way of determining shell lineage, but its eerie similarity to the Alghat fossil is still fascinating."
The Democratization of Scientific Inquiry
Beyond the specific findings about the Alghat fossil, this project represents an interesting example of how computational tools and publicly available data are democratizing scientific inquiry. By providing access to their methodology and analysis through a GitHub repository, the author enables others to build upon their work and potentially refine their approach.
The interactive tool at https://shell.hawzen.me allows anyone to explore the shell latent space and see where different shell species fit within this mathematical framework. This open approach to science contrasts with traditional academic publishing, where methodologies and data are often locked behind paywalls.
Conclusion: The Intersection of Curiosity and Computation
The story of the desert seashell exemplifies how personal curiosity, when combined with computational tools, can lead to meaningful scientific exploration. What began as a simple observation of a rock formation in the desert transformed into a sophisticated mathematical investigation of shell morphology and evolution.
This citizen science project demonstrates that valuable insights can emerge from unexpected places, and that the boundaries between professional and amateur science are becoming increasingly permeable in the digital age. As computational tools become more accessible and datasets more comprehensive, we can expect to see more such projects that bridge the gap between observation and analysis.
The Alghat fossil may remain a geological enigma in terms of its exact lineage, but the mathematical journey it inspired provides a fascinating glimpse into the possibilities of citizen science and the beauty of mathematical approaches to understanding biological form.

Comments
Please log in or register to join the discussion