There’s rarely time to write about every cool science-y story that comes our way. So this year, we’re once again running a special Twelve Days of Christmas series of posts, highlighting one science story that fell through the cracks in 2020, each day from December 25 through January 5. Today: archaeologists are using drones and satellite imagery, among other tools, to build large online datasets with an eye toward harnessing the power of big data for their research.
Archaeology is finally catching up with the so-called “digital humanities,” as evidenced by a February special edition of the Journal of Field Archaeology, devoted entirely to discussing the myriad ways in which large-scale datasets and associated analytics are transforming the field. The papers included in the edition were originally presented during a special session at a 2019 meeting of the Society for American Archaeology. The data sets might be a bit smaller than those normally associated with Big Data, but this new “digital data gaze” is nonetheless having a profound impact on archaeological research.
As we’ve reported previously, more and more archives are being digitized within the humanities, and scholars have been applying various analytical tools to those rich datasets, such as Google N-gram, Bookworm, and WordNet. Close reading of selected sources—the traditional method of the scholars in the humanities—gives a deep but narrow view. Quantitative computational analysis can combine that close reading with a broader, more generalized bird’s-eye approach that can reveal hidden patterns or trends that otherwise might have escaped notice. The nature of the data archives and digital tools are a bit different in archaeology, but the concept is the same: combine the traditional “pick and trowel” detailed field work on the ground with more of a sweeping, big-picture, birds-eye view, in hopes of gleaning hidden insights.
One paper in particular demonstrates the power of this approach, authored by anthropologists Steven Wernke and Parker VanValkenburgh, of Vanderbilt University and Brown University, respectively. They collaborated with fellow co-author Akira Saito, an ethnohistorian with the National Museum of Ethnology in Japan, to develop two online databases, and used them to bring a fresh perspective to the forced resettlement of the Inca Empire in the 1570s by Spanish conquerors.
The Linked Open Gazetteer of the Andean Region (LOGAR) is designed to collect primary source information on relevant locations of interest to those who study the Andes region. It includes information collected from a comprehensive record of the resettlement known as the “Tasa de la Visita General,” maintained by the Spanish-appointed viceroy of Peru. The Geospatial Platform for Andean Culture, History and Archaeology (GeoPACHA) complements LOGAR. It’s an open-source, browser-based platform that lets users discover and document archaeological sites in the Andes by systematically surveying satellite and historic aerial imagery, via networks of trained teams.
The trio were able to create a comprehensive basemap of the planned colonial towns (reducciones) built during that mass resettlement. That helped them spot an intriguing pattern in the distribution of those reducciones: it seemed to follow a remarkably similar distribution of the Inca imperial infrastructure, namely its road system. Specifically, they noted similar clustering of populations in the greater Cuzco and Lima areas. “The Spanish, after about 40 years of being in Peru, were trying to figure out how to govern this vast territory,” VanValkenburgh told Ars. “They directly imitated what the Inca were doing. The resettlement was one of the initiatives at the core of that attempt to reimagine Spanish governance in an Inca model.”
This new emphasis on using the digital tools of Big Data doesn’t mean archaeologists are “throwing in the trowel” when it comes to traditional field work, however. Wernke and VanValkenburgh discussed with Ars the necessity of maintaining a crucial balance between the two approaches, as well as expounding upon the potential advantages and drawbacks of tapping into the power of scale.
Ars Technica: Archaeology has been lagging behind the humanities in terms of incorporating these techniques. Why is that?
Steven Wernke: Archaeologists generally think of field-collected data as the gold standard and we tend to be very bound to that standard. We tend to think of people as the main instrument of observation as archeologists. But what we’re trying to do is not in any way to replace that, or claim that what we’re doing is somehow better. We’re trying to complement that approach with these new tools for turning old imagery into things that we can put online and search systematically.
The other dimension to this has to do with documenting archaeological heritage, at a time when it’s disappearing at an accelerating rate. Some of that is driven by climate change. In Peru we see this very concretely through intensifying El Nino events, which triggers all kinds of flooding on the North coast of Peru that is has been destroying sites. Looting is another major problem, which is really responding to market forces that originated in the Northern hemisphere from the antiquities market. There are sites in Peru with bodies laying all over the place because a cemetery was identified and the looters have gone in to try to get metal and textiles.
Parker VanValkenburgh: A big part of this is about the nature of archaeological data itself. Genetic data has this sort of primordial modularity to it, so it can be reduced down to a series of specific variables. If you’re an archaeologist and you want to study the growth of cities across the ancient world, there are variations in archaeological data caused by human culture, which is not modular. That means it’s really difficult for people to agree on what the break points are between different sorts of classifications. It’s difficult to scale up data collection and datasets to the point that you can do these types of big data analyses. So it’s natural that big data analytics are first being applied in datasets that weren’t specifically collected for archaeological analysis in the first place, like satellite data.
Ars Technica: There is inevitable tension between finely tuned details and the so-called “eye in the sky” perspective offered by a systemic digitized approach. How do you find a good balance?
Wernke: In a nutshell, you find it by doing both. We’re both field archaeologists. I would say 90 percent of what I’ve done has been survey and excavation in the field. I’ve worked in the same valley in the Andes for 25 years. That’s not unusual. Archaeologists tend to specialize geographically and they tend to get to know a place intimately. I’m a big advocate for that. We’re hyper-conscious of the fact that this kind of godlike view you get from viewing the surface of the earth from everywhere and nowhere at once brings with it a lot of risk of overlooking all this local variability, and all the complexity on the ground that we ourselves have documented. Yet we now also know if we’re only on the ground, we’re missing several levels of forces that were acting on humans in the past, forces that are acting on us presently. We’re trying to link these things together. We’re not trying to displace one with the other.
VanValkenburgh: Archaeology is a field, at least in its modern iteration, that is really good at telling these intense small stories. It’s micro-historical in a way. That’s valuable when you have these big sweeping generalizations of what happened in the past and you effectively hold generalizing theory accountable by applying it in really specific locations. But where does that theory come from? Currently what we have is a random smattering of studies. Take one example, the Inca Empire. What we know about the Inca Empire is the sum of a bunch of somewhat random studies that, in their aggregate, tell us a lot about how the Inca work and a lot about local variation. But it hasn’t been systematically sampled with an idea for understanding the entire system.
Big data is often discussed as being this radical alternative to the hypothetical and deductive approach. But in archaeology, it’s providing us with a better generalized model to test against, or to fit alongside, the really localized things we’re doing. That way we can better contextualize the type of micro-historical research that we both think is going to continue to be archeology’s bread and butter for a long time.
Ars Technica: Your work on the Inca Empire, and the resettlement in particular, provides a useful case study. Let’s talk about the challenges of understanding this point in history and how big data can help.
Wernke: At its peak in the 16th century, the Inca Empire was the largest empire in the world before the Spanish invaded the Americas. Terence D’Altroy, a prominent scholar of the Incas at Columbia University, has written, “If the Inca Empire were in the Old World, it would stretch from St. Petersburg in the North to Cairo in the South.” It’s enormous. That’s a big data challenge that encompasses over five modern republics in today’s terms, millions of people, dozens of ethnolinguistic groups, and one of the most [geographically] diverse areas—from the driest desert in the world, to the wettest rainforest in the world, and everything in between as you go up the Andes on one side and down the other.
The traditional narrative is Spanish conquest, with a capital C. We’re trying to complicate that narrative on several levels. This mass resettlement program occurred a generation after the invasion in the 1570s—about 40 years after the conquest of the Incas by Francisco Pizarro. You could easily fit [the resettlement] to that narrative of conquest, because over a million people virtually overnight were displaced to over a thousand towns built throughout the vice royalty. Yes, it was hugely disruptive and a form of domination, but it was also profoundly dependent on what came before, in terms of Inca administrative infrastructure and local arrangements with Spanish administrators. Through this big global picture, we can start to see one dimension of that: how they were dependent on the Inca road system that came before. So what the Spanish would say was a form of conquest was in some sense kind of an afterlife of Inca imperialism, a recycling of Andean forms of imperialism.
VanValkenburgh: The Spanish invasion was literally an infection. The first invaders from the Old World to arrive in the Andes appeared with new pathogens. Then you think about the Inca Empire as being a body and the road system as being a circulatory system. The Spanish invaded and colonized this existing thing and took it over. People always say, how is it this group of 200 people took over a vast empire? They took over the way that a virus [takes over] a host, and they played native factions against each other. The Inca had conquered a bunch of people who were not particularly fond of them and the Spanish took advantage of those rivalries. But the way that they moved around their goods, their troops, etc., after the Spanish invaded was the same road system.
Ars Technica: How do you ensure the quality of your datasets, particularly in a field like archaeology, where you’re often dealing with imperfect or incomplete information?
Wernke: We’ve been careful to structure GeoPACHA as a peer review kind of system internally. It is crowdsourced in that many people are working on it, but it’s not just wide-open crowdsourcing. We have teams of researchers who are experts in their fields. They have come to the project with specific research questions that they want to address. They’re working with their students, and the students are working directly with them as regional editors for their projects. They review their team’s site identifications, and then we as the general editors do the final review on those before they are committed to the database. We also have a system for tracking coverage, a cell-based grid that’s overlaid on these survey areas. The user will mark a grid cell as surveyed, and that also gets reviewed by the regional editors and general editors to make sure that nothing was missed, that we don’t have false negatives.
VanValkenburgh: One of the things we both think about is, if we scale this up beyond where we’re currently working, can this model of peer review work? Wikipedia has found its own way to produce a lot of quality with a relatively small set of really dedicated people. In some ways that’s what we envision going forward. But there have been a number of different initiatives that have tried to harness the power of crowdsourcing for archaeological site edification out there, and I think initial results have shown that being able to identify archeological sites and satellite imagery is actually pretty hard to do in some cases. The kind of vision that you need to interpret satellite imagery requires a fair amount of training. This is an issue that [also extends] to textual data and all other kinds of data that have become a big part of archaeological data science.
Ars Technica: Where would you like to see this field go over the next 10 years? What’s your vision for how this might all come together?
VanValkenburgh: Cooperation and ethical introspection are the two things I think are most important. There’s two parallel tracks in terms of data aggregation that are going on right now. We’ve got initiatives like Open Context, which is a pioneer in open data for archaeology, plus a number of other initiatives including the Digital Archaeological Record, to create massive databases that allow archaeologists to ask systematic questions about comparison. I’d love to see a cultural shift in the field, where people are attempting to get their data into repositories like that, thereby enabling us to do more systematic comparison.
There’s also the parallel development where people are going out and collecting large datasets with an end goal of modularity at large scales. LiDAR is one great example. That is a different type of big data. But I’d like for there to be some standards for data sharing in LiDAR, as well as across both of those parallel tracks. In the community, we need to have a serious conversation about the ethics and best practices of doing big archaeology. Working in scale poses questions about privacy and data sovereignty that are not unlike ones we face as archaeologists working on the ground. But the archaeologist working in a local community is beholden to personal relationships in a way that you don’t necessarily get when you’re working as an eye in the sky. So I think we’re entering a more contemplative phase of archaeological science.