[This blog comes in two parts. Part 2 is posted under the title "The Uses of Digital Philology in Tang-Song History - Part 2".]

Hilde De Weerdt (1/14/2017)

Prepublication version for Tang Song lishi pinglun special issue “Data and Digital Tools” for which I was asked to reflect back on my engagement with digital research in the field of Tang-Song history; updated with select features (DOCUSKY and COMPARATIVUS) 2018.

In the last five years Chinese historians have, like their counterparts working on other parts of the world, woken up to the fact that digital history need not be a reduction of texts to quantitative data and the aggregation thereof. They have come to see that with digital philology historians can supplement and enrich the repertoire of analytical perspectives and hermeneutic strategies that can be brought to the texts and material objects they study. In the brief communication that follows I will give an overview of some of the ways in which I have engaged with digital approaches to historical texts from the Tang and Song dynasties and discuss how I integrated these into my longer-term research projects on Tang and Song political and intellectual history.


Most historians have been working digitally for some ten to twenty years. Beyond the more obvious word-processing tasks and simple searches that we perform on a regular basis, many have also been creating tables, spreadsheets, and databases in the standard office software packages that have gradually come to supplement, if not fully replace, the notepads and card files of the past. The beginning of my exploration into the new terrain of digital humanities started off in this way. In 2007 I examined the data and the structure of the China Biographical Database [hereafter CBDB] to see what they might have to offer to someone interested in researching correspondence networks in twelfth-century Southern Song China. In the rudimentary report I presented at the “Prosopography of Middle Period China: Using The Chinese Biographical Database” Workshop that year I concluded that CBDB was then unsuitable for research on correspondence networks due to a lack of relevant data. I also concluded that the model could and should be extended both by the ingestion of existing reference works and by more fine-grained smaller-scale research projects. In the process of investigating the CBDB (and reading through a database development manual) I had, moreover, become aware of the benefits that linking texts and notes I was gathering in the process of my own research to the CBDB might have.

I made a first step in this direction while investigating the social and cultural history of note taking and the printing of notebooks in Song China. As part of my larger investigation into the reception history of texts relating to current affairs and dynastic history such as court gazettes, archival compilations, maps, and policy documents, I set out to compile a table of all printed editions of notebooks (biji) published during the Song Dynasty. I transformed the table into a spreadsheet listing author name, original title, standard title, original date of compilation, date of printed edition, printer name, patron name (if any), place of printing, current holding location (if the edition was still extant), and the source(s) attesting the existence of the printed edition. I was able to go much further in my analysis of the social and geographical backgrounds and careers of many of the authors by connecting my data to the biographical details contained about them in CBDB, which was being rapidly populated with prominent Song biographical reference sources at the time. I could also map the locations in which biji were printed by linking the place names mentioned in prefaces and catalogs to the coordinates listed in the historical geographical datasets provided by Lex Berman and others in CHGIS. I argued on the basis of the resulting data that we see in the social history of Song biji publishing not a shift in authorship from high court to local office-holders and to those without official rank (as Zhang Hui and others have maintained) but rather a broadening of authorship with high court office-holders continuing to play an important, but no longer dominant, role in printed texts and with court politics continuing as a central concern in the genre (see also, De Weerdt, Information, Territory and Networks, ch. 6).

I later extended this methodology of linking the data I gathered myself to existing databases that could enrich them to the digital text of the notebooks I selected to read in full. It struck me in reading Wang Mingqing’s series of notebooks titled Huizhu lu that the best way to analyze the information network embedded in the text would be to create a database in the text itself. Wang Mingqing frequently mentions in his notebooks the person(s) with whom he had conversations, his source texts or the authors whose books he read, the collectors he visited etc. I annotated a digital edition of the text, tagging entry per entry who served as an informant on what kinds of topics and which texts he commented upon. By adding the CBDB id for each person thus referred to in the tags inserted in the digital text and then connecting the data from my text files to CBDB, I was able to import from the latter database the dates, native place, and career information for hundreds of individuals mentioned in the text. This would have been tedious and practically inconceivable without digital methods. Even though I still added or corrected information on dozens of individuals (curating data always remains a necessity), the ready availability of such information from authoritative (if not infallible) reference materials such as Chang Bide’s 昌彼得 Songren zhuanji ziliao suoyin 宋人傳記資料索引 or Li Zhiliang’s 李之亮 Songdai junshou tongkao 宋代郡守通考 meant that reconstructing the temporal, social, and geographical dimensions of Wang Mingqing’s information network as reflected in his notebooks became a feasible experiment.

I included this work in Part II of my latest book Information, Territory, and Networks: The Crisis and Maintenance of Empire in Song China. In this work I argue that a structural transformation took place in the production and dissemination of information about the Chinese polity. Genres such as maps showing the Chinese territories, archival compilations, court gazetttes, and military geographies, which had been authored mainly by court officials for consumption by the same group, were by the twelfth century authored, read, and commented upon by cultural elites living across the Song Empire. I trace these changes in political communication between court and literati at the level of institutional change, legal history, and cultural production. On the basis of systematic analyses of notebooks and other genres recording literati reading practices, I further argue that the surge in literati textual production exhibited two important features that led to a tendency towards the formation and maintenance of large imperial states in Chinese history from this point onwards. First, literati tended to articulate an imperial mission, a commitment to an idealized Chinese commonwealth, in secondary discourse about the polity. Second, the scope of the networks through which texts were exchanged tended to be broad, diverse, and geographically crossregional. It was my exploration of digital methodologies that made the latter findings and the novel examination of the reception history of state documents and the communication networks of literati possible.

I expanded the initial experiment with a group of graduate students (Lik Hang Tsui, Chen Yunju, Li Yun-Chung, and You Zixi) into a further examination of a broader selection of notebooks. Our original datasets are available for download from the site accompanying the book. With the help of Brent Hou Ieong Ho I also transformed the texts and data into an online platform that allows for the interactive visualization of the information networks embedded in the encoded notebook texts. Readers can test arguments made in Information, Territory and Networks and perform other analyses of the texts and data. Readers can:

• create heat and cluster maps of the places where informants hailed from or served in

• compare the temporal distribution of authors in different notebooks

• analyze the social backgrounds of informants by examining their office-holding record or absence thereof

• check on the frequency of citation of different kinds of informants or individuals across notebooks

• link back to the passages in which informants occur


Figure 1: Interactive platform for selected Song notebooks, showing native place information of informants and, at the bottom, the buttons for other view options including full text, table view, and chart views.

By examining the social relationships thus commemorated in notebooks I gained new insight into the genre and its development across the Song period. This kind of reading in context, alongside other forms of digital reading such as the corpus linguistic analysis of different editions of the text in chapter 8, transformed my earlier close readings of the texts and informed my interpretation. It allowed me, as a modern reader for whom contextualization always remains a challenge, to develop new insights into the ways in which individuals such as Wang Mingqing articulated distinctive positions on shared literati concerns such as Song-Jin relations in his notebooks.


My first exploration of digital methods took place at the micro- and meso-scale of source analysis. I was either working with individual texts (one set of notebooks compiled by one author) or corpora of selected texts from a small number of authors from the same period. With digital methods we should be able to do more. Digital reading holds in my view a unique promise to allow the researcher to zoom in and out. We are used to doing this now with images of art objects and material artefacts, zooming not only into the regions and details of the individual object but also zooming out to the larger collections and subcollections in which it has come to classified (for a nice example, see Florian Kräutli’s Timeline suite for digital curation). The same holds for texts. Digital curation allows us to scale up and down, but we have yet to examine how this can be best done. For historians the importance of scaling is readily evident when we think of developments that involve an entire class of people, a vast area, or centuries of time. I will provide two examples of ongoing projects to explain how digital approaches can allow us to explore the macroscale in our work, one in the area of political history, the other in the area of urban history and the history of technology.

Political and intellectual historians of the Song period have long debated the factional struggles that involved large numbers of scholar-officials between the eleventh and thirteenth centuries. The majority of the large number of studies on key moments in this history have focused either on the representation of events in court chronicles or on individual cases of well-known, and in some cases such as Huang Kuanchong’s work on Sun Yingshi, more peripheral, actors. Within this framework it is difficult to develop a sense of how alliances were formed among larger collectivities. Networking was intertwined with the career of literati at various stages. It was essential when preparing and sitting examinations, seeking appointment and re-appointment, or when obtaining patronage for other types of employment. Networking involved literati in political coalitions. If it is the case that networking of this kind was necessary for careers and therefore pervasive, it follows that historians need to understand how factional politics worked not only at the top but also in the provinces. I believe that the larger question of how far factional politics filtered through to the provinces can be better answered by devising methods to explore the entirety of the existing record. With a group of postdoctoral fellows and doctoral students I have begun to analyze how the co-occurrence of the names of the men who appeared on factional lists can be used to explore such questions. Through a comparison of three key moments in factional struggle in the twelfth century we also aim to address the question of how factional alliances were transformed in the context of the broader social and cultural changes of the Song period.

In a preliminary experiment we ran the names (including alternate names) of all those mentioned on the Yuanyou list (1100s), those persecuted by Councilor Qin Gui 秦檜 (1090-1155) (1140s), and those on the Qingyuan roster (1190s) through three collections of texts. Each collection consisted of all prose texts gathered in the collected writings of authors who were active during the time the lists were compiled. (In Chinese Studies we are fortunate to have access to large corpora of digital texts that make analysis at this scale feasible.) We included the collected writings of all Song authors whose CBDB index year (normally 60 years of year of death) fell within a sixty-year range (-30 to +30) of the date established as the publication date of the list of names: 1104, 1142, and 1196. The 1104 corpus consisted of 56,969 documents in 23,701,759 characters by 2,231 authors; the 1142 corpus 47,040 documents in 18,780,575 characters by 1,139 authors; the 1196 corpus 52,593 documents in 23,446,605 characters by 2,598 authors. Given the limitations of automated detection at this point the data needed to be curated carefully to eliminate occurrences that did not refer to the people in question.

What can we do with the curated datasets about the co-occurrence relationships amongst alleged faction members extracted from the collected prose writings of their contemporaries? From the datasets listing what documents discuss which faction members with what frequency we can see what authors, genres, and individual texts we may have ignored in earlier research and need to turn to in the future. Through network analyses of the co-occurrence relationships amongst faction members we examine whether the faction list network cohered in its entirety, what subgroups it was composed of, and what historical actors were central to the network and connected subgroups to each other. From here we return to the primary and secondary sources to further investigate the role of central actors and check on subgroups that appear to have been ignored to our knowledge. We also continue further analysis of the co-occurrence network by examining to what extent factors such as native place, family connections, and career experience may impact membership in the list in its entirety and in subgroups. Further down the line, we also aim to track how ties among faction members were represented over time by running the names against texts from later periods. This work is ongoing, but preliminary test runs of the clustering of names suggests that the structure of the presumed twelfth-century factions varied considerably, with the Yuanyou list forming a dense overall network and the Qingyuan list a more hierarchical structure, whereas no clear connections appear in the record for those persecuted by Qin Gui—demonstrating that historical network analysis can also be used to demonstrate the absence of relationships and network effects. We are also advancing current practice in historical network analysis and taking advantage of the largescale prosopographical and textual databases for Chinese history by developing sampling methods, comparing the networks resulting from co-occurrence and other types of relationship data to those of random samples of contemporaries. This work suggests, for example, that, on the basis of the extant record, Daoxue affiliates may have been an unusually close-knit group as running a sample of a hundred men whose backgrounds closely match those listed on the Qingyuan list through the same corpus of 52,593 documents does not yield comparable co-occurrence ties.

Data extraction of this kind can be helpful for a broad range of historical inquiry. With two other Ph.D. students, Xiong Huei-lan and Liu Jialong, I am currently also working on a longue-durée history of wall construction. On the basis of a set of regular expressions describing how information about the construction of walls is typically represented in wall and wall gate inscriptions (chengji, menji) and related records in local gazetteers and collected writings we are compiling a dataset covering the history of wall construction from the Song through the Qing periods. This will allow us to map, over time, the construction of walls, their maintenance, durability, materials, cost, labour force, design, size, location, and perceived functions. This kind of work should also benefit other historians, allowing, for example, urban and military historians interested in the comparative analysis of urban planning and military technology to work with nuanced datasets to draw larger conclusions about the relationship between city wall construction and the development of firepower.


Figure 2. Pilot project on wall construction in Shaanxi, Guangdong, and Henan. Texts were marked up in batch for a wide variety of features in MARKUS, directly exported from MARKUS to DOCUSKY and DOCUGIS. Screenshot shows how the source text as well as the data extracted from it can be linked from the map features.

Such large-scale digital projects lend themselves to analysis at the macroscale and can also be designed to allow readers to zoom into the particulars of source texts, individual actors, particular places, or construction events. It may already be obvious that, in order to realize these goals, historians have to deal with new challenges. Much time will need to be invested in data curation; this is true for all digital projects—automated methods are never foolproof. Moreover, at the level of macroscale analyses, historians may have to adjust their expectations somewhat. Working digitally requires an adjustment in scholarly habits, a tolerance for experimentation and failure, for instance, and the willingness to deal with a certain measure of inaccuracy and messiness when working at elevated scales. Finally, this work cannot be undertaken by individuals. The collaboration of scholars with different types of expertise is required to develop humanities-specific digital methods and platforms that can lead to new insights.

[Please continue reading this blog at "The Uses of Digital Philology in Tang-Song History - Part 2".]