The Pamphlet collection of the DHLab on the PEERS platform proposes work-in-progress research notes as well as methodological papers by the ENP-China team. The work-in-progress notes present preliminary results of experiments and research produced in the course of research before submission to academic journals. The methodological notes develop aspects of methods that usually cannot find their place in a published paper or on a specific aspect of a research question.
This inaugural issue of the DH Lab Pamphlet is devoted to our experiments in exploring and studying the Biographical Dictionary of Republican China edited by Howard L. Boorman.
This second issue of the Digital History Pamphlet Collection aims at introducing the Consolidated National Advertising Co, one of the largest Chinese advertising agencies established prior to 1949. Drawing on Who’s Who, historical newspapers and Shanghai Municipal Archives, and harnessing social network analysis and visualizing tools, we propose to build a collective biography of the Consolidated group and his main actors.
This issue of the DH Lab Pamphlet presents unpublished papers, though not exclusively, that examine aspects of the Shanghai elites through fiction — a study of three Shanghai novels — and propose a macro-historical reading of Shanghai history
Knowledge, Power, and Networks – Elites in Transition in Modern China
The ENP-China project is pleased to announce the publication of the first book resulting from the ENP-China team’s work and the meetings and discussions that we organized with a group of international “like-minded” historians.
At the level of the chapters written by the members of the team, we are indebted to the collaborative work carried out with our colleagues in computer science/NLP/data science.
The contributors include a broad array of young and seasoned historians: Cécile Armand (Aix-Marseille University), Peter E. Hamilton (Lingnan University), Christian Henriot (Aix-Marseille University), Marilyn Levine (Central Washington University), Ling-ling Lien (Academia Sinica, Institute of Modern History), Yi-tang Lin (University of Geneva), Henrike Rudolf (University of Göttingen), Brett Sheehan (University of Southern California), Huei-min Sun (Academia Sinica, Institute of Modern History).
This volume was co-edited with two historians who belong to the next generation of China scholars, Sun Huei-min and Cécile Armand. I am particularly proud, on the book cover, to be framed between these two talented women.
The book is advertised on the website of Brill. We hope it will be a milestone in rethinking historical research in the direction of an integrated, data-rich history. It should be the first of a series of collective works that explore original, even unpublished sources, with innovative methods.
In the past decades, the world has watched the rise of China as an economic and military power and the emergence of Chinese transnational elites. What may seem like an entirely new phenomenon marks the revival of a trend initiated at the end of the Qing. There distribution of power, wealth, and knowledge among the newly formed elites matured during the Republican period.
This volume demonstrates both the difficulty and the value of re-thinking the elites in modern China. It establishes that the study of the dynamic tensions within the elite and among elite groups in this epochal era is within reach if we are prepared to embrace forms of historical inquiry that integrate the abundant and even limitless historical resources and to engage with the rich repertoire of digital techniques/instruments available and question our previous research paradigms. This renewed approach brings historical research closer to an integrative data-rich history of modern China.
We wrote the design manual of the Modern China Biographical Database to present the nature, the purpose, and the structure of the database. This document is not strictly a “script” for analytical use. We chose this format to elaborate this manual for two main reasons: first, this was a collective exercise in which each of us could write his/her part(s) independently, which were then compiled automatically into a single document; second, we wanted to work with a format that allowed us to update any part at any time and to produce seamlessly a flexible document for the web.
In processing data extracted from sources, historical or otherwise, we often face the same issues of having to clean, homogenize, group, etc. the data to make it useful for analysis. Data transformation is a tedious task that requires rigor and accuracy. While small data sets can be processed by hand in a spreadsheet, larger data sets can quickly become time consuming, with a greater risk of making mistakes. An R script enables the systematic processing of data without erasing the original data set on which the transformation is being applied.
With the Data Transformation script we mean to provide a series of operations for the transformation of data, including messy data, to fit
the requirements of clean tabular data. Our purpose is to facilitate the production of data in Chinese studies according to the norms of
academia in Western countries, but the script can be extended to other fields of study. We propose here a wide range of examples, from simple text editing to the extraction of data from complex sentences. The examples we use apply to both English (or any Latin script) and Chinese.
This is the first part of a multidirectional exploratory study of Shanghai industrialists in the Shenbao from the mid 19th-mid 20th century. In this document, we examine the terms through which “industrialists” were named in the press. This essay takes two of the most common terms thatdesignated “industrialists” in the Chinese press in the Republican period: 工業家 and 實業家. While 工業家 represents an unambiguous term for “industrialist”, 實業家 can refer to “entrepreneur” (in other sectors, including banking) and “industrialist”. Our purpose in this script is to extract all the texts that refer to any of these terms and to extract all the Named entities (person, organization, location) mentioned in these texts. The second stage of this survey is to link these entities to events to which these terms may be related. The next instalment will explore a wider range of terms associated to “industrialists”.
Case 1 – The Rotary Club of China in the press
A Practical Guide to the ‘enpchina’ package: The Rotary Club in the Chinese Press: This guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers, focusing on a major Chinese newspaper – Shenbao 申報 – and a concrete case study – the Rotary Club of Shanghai 上海扶輪社 (Shanghai fulunshe) (Rmd version: https://bookdown.enpchina.eu/Rotary_sb_eng.Rmd).
A Practical Guide to the ‘enpchina’ package: The Rotary Club in the English-language Press: This guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers – i.e. ProQuest “Chinese Newspapers Collection” – taking the Rotary Club of Shanghai as a case study (Rmd version: https://bookdown.enpchina.eu/Rotary_pq_eng.Rmd).
Case 2 – American University Men of China
This tutorial series applies a place-based methodology to study Sino-American alumni networks in modern China, based on a directory of the American University Club of Shanghai published in 1936. It is divided into four parts:
1. Find and analyze places using the R package “Places” (html version, Markdown version)
2. From places to networks (a dual approach): Build, visualize and analyze place-based networks using igraph (html version, Markdown version).
3. Community detection in place-based networks (igraph): Identify and analyze subgroups of places (igraph)
4. Place formation over time: Create period-based subnetworks to analyze the formation of academic places between 1883 and 1935
Case 3 – The Golden Age of the returned students
This collection of tutorials explore the presence of the returned students in the Chinese modern press. The press corpora include a dozen of Chinese newspapers spanning from the mid 19th-mid 20th century. They are part of the large collections of historical sources that the ENP China project has acquired and made available in full text for the first time. The potential for exploration is infinite. It may be disturbing too. As humanists trained in the close reading and critical hermeneutics of a limited, human-scale amount of documents, we are poorly equipped for facing this data deluge. Where to start? How to proceed? These tutorials provide some useful tips for turning historians into data-driven humanists. We will experiment with various techniques and methods to handle massive historical corpora and approach modern Chinese history from new perspectives.
The purpose of this tutorial series is twofold :
- Substantially, to introduce a step change in the history of the returned students and contribute to a new understanding of their role in building a new China after the empire – a much disputed issue in the existing scholarship (Wang, 1966). The corpus-based, data-driven approach we propose will enrich and contextualize the biographical, proposopographical and cultural studies that have prevailed to date.
- Methodologically, we aim to:
- introduce the enpchina R package – a set of tools relying on R programming language tailored specifically for exploring massive, multilingual corpora of Chinese sources – and other R packages we consider useful for historical research ;
- devise on-the-fly yet sustainable solutions for harnessing large collections of historical newspapers ;
- empower historians with various programming skills so that they gain full control over the “datafication” process and escape the black boxes that we inherit from web platforms and off-the-shelf softwares.
We chose R studio because it provides an integrated framework for combining a variety of approaches and commanding the complete chain of operations. Under R, data-driven historians can conduct the entire research process – from data extraction to the exploration, analysis, interpretation and publishing of their findings and methodology – within a single, unified environment, while ensuring the traceability of the workflow and the replicability of their experiments, through sharing the code and emphasizing collaboration. Moreover, it is supported by a large community of users (historians/scholars, data scientists/computing specialists ) and it is constantly evolving toward greater integration and accessibility. Altogether, the following tutorials develop a standard workflow that any historian can emulate or transpose to her own research needs. She will be guided step by step from building the corpus to analyzing its textual content, mapping the underlying network of social actors and many other applications.
Corpus building with the EnpChina package : in this tutorial, you will learn how to use the enpchina package to build a corpus (i.e. a collection of newspaper articles) from a keyword-based query and to conduct a preliminary exploration of this corpus (Rmd version: https://bookdown.enpchina.eu/Liumei/01_Corpus.rmd).
- Text analysis with tidytext: apply basic text analysis techniques to approach the content of articles (tokenisation, word frequency, correlation, co-occurrences) with the package tidytext (Rmd version: https://bookdown.enpchina.eu/Liumei/02_TextAnalysis.Rmd).
- Text statistics with quanteda: learn how to create a corpus object to perform more advanced text analyses (frequency, time series) and visualisations (heat maps) with the package quanteda (Rmd version: https://bookdown.enpchina.eu/Liumei/021_TextStats.Rmd).
- Keyword extraction with quanteda: learn how to handle multi-word units (e.g. « United States”), extract key terms and compare corpora of varying size using more sophisticated metrics (TF-ID, log-likelihood ratio test) (Rmd version: https://bookdown.enpchina.eu/Liumei/022_KeyTerm.Rmd).
- Text co-occurrences (1) explore relations between words, learn how to find and visualize collocates and to measure their significance (Rmd version: https://bookdown.enpchina.eu/Liumei/023_TextCooc.Rmd).
- Text co-occurrences (2) : discover alternative ways of visualizing text collocations (Rmd version: https://bookdown.enpchina.eu/Liumei/024_Collocation.Rmd).
- Concordancing: to bridge the gap between distant and close reading, learn how to analyze words in their original context and apply regular expressions to refine your research (coming soon).
Sentiment analysis (coming soon)
Topic modeling (coming soon)
Named Entity Recognition (coming soon)
- Extraction with the enpchina package
- Processing : clean, homogenize and classify named entity with the tidyverse meta package and open refine R extensions.
- Network analysis of persons and organizations
- Mapping locations
Corpus forensics (coming soon)
- Text classification
- Text metrics
- Text features
- Text reuse
X-Boorman presents an enhanced digital version of the Biographical Dictionary of Republican China edited by Howard L. Boorman in 1967-1971. The dictionary is no longer in print and hardly used in modern Chinese historical studies. It lost mush of its luster and relevance with the emergence of internet resources. Yet, this is a work based on meticulous research that provides information that remains relevant today. X-Boorman proposes an exploration of this work beyond the individual biographies, through network analysis, mapping, graphs, with an interface that incorporates pinyin and Chinese.
BDOC: Biographical Dictionary of Occupied China
The ENP-China project is developing collections of publications in the form of pamphlets and podcasts on the PEERS platform. PEERS is a non-profit organization created by researchers for researchers who can publish works directly with integrated data and operational code. This makes it possible for everyone to contribute to data, to explore data and to re-analyze data. The ENP-China team publishes intermediate research results, methodological papers, and also full papers in its series Digital History Lab.
The ENP-China collection on BN-Asie (Asia Digital Library) makes available all the source materials that researchers have used in the course of their research. Generally, it includes digital versions in high resolution, except when we were not able to get one. We processed all these files to turn them into searchable pdf files (OCR). While we establish the collection, all the deposited materials can be found by searching « enpc » in the keyword field of the search engine.
The ENP-China project maintains a bibliography of all the works — source materials as well as academic literature — that are relevant to the study of elites as a historical object. It is organised in six sections: Biographical databases [any database with historical biographical data], Elites_China (Eng) [academic works on elites in China in English and other Western languages], Elites_China (Zh) [academic works on elites in China in Chinese], Elites Foreigners [academic works on foreign elites and foreigners in China in English and other Western languages], Elites History [academic works on elites outside of China in English and other Western languages], Source Books [all the source materials used in the ENP-China project.
All the data sets produced in the course of the ENP-China project are made available on the ENP-China Data Repository on Zenodo, the long-term general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit data sets, research software, reports, and any other research related digital artifacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.
Virtual Shanghai is a research and resource platform on the history of Shanghai from the mid-nineteenth century to nowadays. It incorporates various sets of documents: essays, original documents, photographs, maps, quantitative data, etc. The objective of the project is to write a history of the city through the combined mobilization of these various types of documents. The implementation of this approach relies on the use of digital and GIS technologies. On the research side, the platform offers various ways to step into the history of the city and follow its course at different levels over time. On the resource side, apart from providing original textual and visual documents, it develops a powerful cartographic tool for spatial analysis and real-time mapping (to be upgraded soon). The authors of the present project subscribe to the idea of sharing scholarship and research tools for the benefit of scholars, students, and citizens at large.
Virtual Beijing has attracted the interest of historians of modern China only in the last decade. If we set aside the spate of books generated by the Olympic games, few studies have addressed the social history of Beijing and, among them, few have actually devoted much attention to the spatial dimension as part of the historical analysis. Through this digital platform, we plan to explore further the social history of Beijing and to address its historical trajectory through case studies based on textual, visual, and spatial data.
Virtual Wenzhou is a platform for research and resources on the history of Wenzhou from the late imperial period (circa 1300-1911) to the present. It collects and assemblies a wide array of sources, including photographs, old maps, archival documents, and analytical data. The aim of this project is to present a thorough collection of sources pertaining to the history and culture of Wenzhou, and port city in Southeast China that had developed distinct cultures and languages. Aiming to enhance the understanding of Wenzhou
Numerica Sinica is a CNRS initiative initially supported by INSHS and operated by TGER Huma-Num in order to create a mutualized instrument of access to digital resources on Chinese worlds for the benefit of CNRS affiliated units. The platform now includes a broad range of resources, including those purchased specifically for ANR projects and the ENP-China project.