ENP China Projects

Data Repository
All the data sets produced in the course of the ENP-China project are made available on the ENP-China Data Repository on Zenodo, the long-term general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit data sets, research software, reports, and any other research related digital artifacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.
Bibliography
The ENP-China project maintains a bibliography of all the works — source materials as well as academic literature — that are relevant to the study of elites as a historical object. It is organised in six sections: Biographical databases [any database with historical biographical data], Elites_China (Eng) [academic works on elites in China in English and other Western languages], Elites_China (Zh) [academic works on elites in China in Chinese], Elites Foreigners [academic works on foreign elites and foreigners in China in English and other Western languages], Elites History [academic works on elites outside of China in English and other Western languages], Source Books [all the source materials used in the ENP-China project.
BN-Asie
The ENP-China collection on BN-Asie (Asia Digital Library) makes available all the source materials that researchers have used in the course of their research. Generally, it includes digital versions in high resolution, except when we were not able to get one. We processed all these files to turn them into searchable pdf files (OCR). While we establish the collection, all the deposited materials can be found by searching « enpc » in the keyword field of the search engine.
Publications
The ENP-China project is developing collections of publications in the form of pamphlets and podcasts on the PEERS platform. PEERS is a non-profit organization created by researchers for researchers who can publish works directly with integrated data and operational code. This makes it possible for everyone to contribute to data, to explore data and to re-analyze data. The ENP-China team publishes intermediate research results, methodological papers, and also full papers in its series Digital History Lab.
X-Boorman
X-Boorman presents an enhanced digital version of the Biographical Dictionary of Republican China edited by Howard L. Boorman in 1967-1971. The dictionary is no longer in print and hardly used in modern Chinese historical studies. It lost mush of its luster and relevance with the emergence of internet resources. Yet, this is a work based on meticulous research that provides information that remains relevant today. X-Boorman proposes an exploration of this work beyond the individual biographies, through network analysis, mapping, graphs, with an interface that incorporates pinyin and Chinese. BDOC: Biographical Dictionary of Occupied China

Publications

ENP-China Publications
Knowledge, Power, and Networks - Elites in Transition in Modern China The ENP-China project is pleased to announce the upcoming publication of the first book resulting from the ENP-China team’s work and the meetings and discussions that we organized with a group of international “like-minded” historians. At the level of the chapters written by the members of the team, we are indebted to the collaborative work carried out with our colleagues in computer science/NLP/data science. The contributors include a broad array of young and seasoned historians: Cécile Armand (Aix-Marseille University), Peter E. Hamilton (Lingnan University), Christian Henriot (Aix-Marseille University), Marilyn Levine (Central Washington University), Ling-ling Lien (Academia Sinica, Institute of Modern History), Yi-tang Lin (University of Geneva), Henrike Rudolf (University of Göttingen), Brett Sheehan (University of Southern California), Huei-min Sun (Academia Sinica, Institute of Modern History). This volume was co-edited with two historians who belong to the next generation of China scholars, Sun Huei-min and Cécile Armand. I am particularly proud, on the book cover, to be framed between these two talented women. The book is advertised on the website of Brill. We hope it will be a milestone in rethinking historical research in the direction of an integrated, data-rich history. It should be the first of a series of collective works that explore original, even unpublished sources, with innovative methods. In the past decades, the world has watched the rise of China as an economic and military power and the emergence of Chinese transnational elites. What may seem like an entirely new phenomenon marks the revival of a trend initiated at the end of the Qing. There distribution of power, wealth, and knowledge among the newly formed elites matured during the Republican period. This volume demonstrates both the difficulty and the value of re-thinking the elites in modern China. It establishes that the study of the dynamic tensions within the elite and among elite groups in this epochal era is within reach if we are prepared to embrace forms of historical inquiry that integrate the abundant and even limitless historical resources and to engage with the rich repertoire of digital techniques/instruments available and question our previous research paradigms. This renewed approach brings historical research closer to an integrative data-rich history of modern China.
DHLab Pamphlets
The Pamphlet collection of the DHLab on the PEERS platform proposes work-in-progress research notes as well as methodological papers by the ENP-China team. The work-in-progress notes present preliminary results of experiments and research produced in the course of research before submission to academic journals. The methodological notes develop aspects of methods that usually cannot find their place in a published paper or on a specific aspect of a research question. DH Lab Pamphlet 1 This inaugural issue of the DH Lab Pamphlet is devoted to our experiments in exploring and studying the Biographical Dictionary of Republican China edited by Howard L. Boorman. DH Lab Pamphlet 2 This second issue of the Digital History Pamphlet Collection aims at introducing the Consolidated National Advertising Co, one of the largest Chinese advertising agencies established prior to 1949. Drawing on Who's Who, historical newspapers and Shanghai Municipal Archives, and harnessing social network analysis and visualizing tools, we propose to build a collective biography of the Consolidated group and his main actors. DH Lab Pamphlet 3 This issue of the DH Lab Pamphlet presents unpublished papers, though not exclusively, that examine aspects of the Shanghai elites through fiction — a study of three Shanghai novels — and propose a macro-historical reading of Shanghai histor

Connected Projects

Virtual Shanghai
Virtual Shanghai is a research and resource platform on the history of Shanghai from the mid-nineteenth century to nowadays. It incorporates various sets of documents: essays, original documents, photographs, maps, quantitative data, etc. The objective of the project is to write a history of the city through the combined mobilization of these various types of documents. The implementation of this approach relies on the use of digital and GIS technologies. On the research side, the platform offers various ways to step into the history of the city and follow its course at different levels over time. On the resource side, apart from providing original textual and visual documents, it develops a powerful cartographic tool for spatial analysis and real-time mapping (to be upgraded soon). The authors of the present project subscribe to the idea of sharing scholarship and research tools for the benefit of scholars, students, and citizens at large.
Virtual Beijing
Virtual Beijing has attracted the interest of historians of modern China only in the last decade. If we set aside the spate of books generated by the Olympic games, few studies have addressed the social history of Beijing and, among them, few have actually devoted much attention to the spatial dimension as part of the historical analysis. Through this digital platform, we plan to explore further the social history of Beijing and to address its historical trajectory through case studies based on textual, visual, and spatial data.
Virtual Wenzhou
Virtual Wenzhou is a platform for research and resources on the history of Wenzhou from the late imperial period (circa 1300-1911) to the present. It collects and assemblies a wide array of sources, including photographs, old maps, archival documents, and analytical data. The aim of this project is to present a thorough collection of sources pertaining to the history and culture of Wenzhou, and port city in Southeast China that had developed distinct cultures and languages. Aiming to enhance the understanding of Wenzhou
MadSpace
MADSpace (Mapping Advertising Space) was born in 2016 as a digital companion to a PhD dissertation devoted to a spatial history of advertising in modern China. It was designed to store, organize and connect primary sources or raw data (archives, printed materials, maps, photos), analytical materials or cooked data (graphs, maps, trees, tables, timelines), multimedia narratives (dissertation, published papers, unpublished essays), bibliographical references and other resources. It also includes a relational database of some 2,000 historical actors involved in the advertising industry (companies, branded products). Since then, it has expanded beyond its initial purpose to include over 950 archival documents, 1,500 printed materials, 1,000 images, and more than 700 “cooked data” to date, all related to various interconnected research interests that branch out of the history of advertising, such as market expertise, consumer culture, Americanization, the modern press and public opinion in China. It is regularly updated and enriched.
Numerica Sinica
Numerica Sinica is a CNRS initiative initially supported by INSHS and operated by TGER Huma-Num in order to create a mutualized instrument of access to digital resources on Chinese worlds for the benefit of CNRS affiliated units. The platform now includes a broad range of resources, including those purchased specifically for ANR projects and the ENP-China project.
Bibliothèque Numérique Asiatique
By developing a collection of digital books and  a database of web resources in textual and visual mode, the Asian Digital Library (BNA) aims to contribute to research and the dissemination of knowledge on East Asia. The BNA project is based largely on the resources available both in the personal libraries of scholars, on public institutional sites, on individual sites, document, and archival funds, or even on commercial sites. BNA seeks to gather East Asia-related resources that are often scattered across multiple sites with no link to Asia. The objective of BNA is not to duplicate or replace printed documents. It is primarily to provide documentation that is often scarce and difficult to access, mostly in very distant libraries. On the other hand, through OCR processing, BNA strives to provide part of the available documents as searchable full text. This work takes time and applies priority to the resources that are searchable for research. BNA has no opening hours as it is available 24 hours a day on the web. There is no need to get in line at the circulation desk and there is no limit to the duration of the loan. The books you download become your books. For research as for teaching, BNA offers the possibility for everyone to access the same documents at the same time and without restriction. To comply with legal standards, the works subject to copyright on the BNA platform are accessible only through authenticated access (members of IrAsia and IAO). All copyright-free resources are naturally available to all without formality. We invite you to make unlimited use of the Asian Digital Library.
近現代人物資訊整合系統 (Institute of Modern History)
《近現代人物資訊整合系統》收錄近現代中國人名錄、人物傳記、職官表、檔案、口述史等多種人物資料,加以編輯組合,並以主題分類,整理為整合性的人物資訊。資料庫收錄的人物時間涵蓋清中葉至民國時期,現有人物約135,000多人。以主題分類有:近代春秋TIS人物索引、上海地區人物錄、中國人物傳記資料、口述史叢書、近史所檔案館人名權威檢索系統等。
中華民國政府官職資料庫
 中華民國政府官職資料庫收集國內與企管相關之重要文獻,網羅全國所有期刊之管理類文章,是一具具有完整性書目索引以及高品質摘要的中文權威性企管類資料庫。 ... 家總長年關注家庭照顧

Documentation

The Rotary Club [Cécile Armand]
Case 1 – The Rotary Club of China in the press
A Practical Guide to the ‘enpchina’ package: The Rotary Club in the Chinese PressThis guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers, focusing on a major Chinese newspaper – Shenbao 申報 – and a concrete case study – the Rotary Club of Shanghai 上海扶輪社 (Shanghai fulunshe) (Rmd version: https://bookdown.enpchina.eu/Rotary_sb_eng.Rmd). A Practical Guide to the ‘enpchina’ package: The Rotary Club in the English-language PressThis guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers – i.e. ProQuest “Chinese Newspapers Collection” – taking the Rotary Club of Shanghai as a case study  (Rmd version: https://bookdown.enpchina.eu/Rotary_pq_eng.Rmd).
Case 2 – American University Men of China
This tutorial series applies a place-based methodology to study Sino-American alumni networks in modern China, based on a directory of the American University Club of Shanghai published in 1936. It is divided into four parts: 1. Find and analyze places using the R package “Places” (html versionMarkdown version) 2. From places to networks (a dual approach): Build, visualize and analyze place-based networks using igraph (html versionMarkdown version). 3. Community detection in place-based networks (igraph): Identify and analyze subgroups of places (igraph) 4. Place formation over time: Create period-based subnetworks to analyze the formation of academic places between 1883 and 1935
Case 3 – The Golden Age of the returned students 
This collection of tutorials explore the presence of the returned students in the Chinese modern press. The press corpora include a dozen of Chinese newspapers spanning from the mid 19th-mid 20th century. They are part of the large collections of historical sources that the ENP China project has acquired and made available in full text for the first time. The potential for exploration is infinite. It may be disturbing too. As humanists trained in the close reading and critical hermeneutics of a limited, human-scale amount of documents, we are poorly equipped for facing this data deluge. Where to start? How to proceed? These tutorials provide some useful tips for turning historians into data-driven humanists. We will experiment with various techniques and methods to handle massive historical corpora and approach modern Chinese history from new perspectives. The purpose of this tutorial series is twofold :
  1. Substantially, to introduce a step change in the history of the returned students and contribute to a new understanding of their role in building a new China after the empire – a much disputed issue in the existing scholarship (Wang, 1966). The corpus-based, data-driven approach we propose will enrich and contextualize the biographical, proposopographical and cultural studies that have prevailed to date.
  2. Methodologically, we aim to:
    • introduce the enpchina R package – a set of tools relying on R programming language tailored specifically for exploring massive, multilingual corpora of Chinese sources – and other R packages we consider useful for historical research ;
    • devise on-the-fly yet sustainable solutions for harnessing large collections of historical newspapers ;
    • empower historians with various programming skills so that they gain full control over the “datafication” process and escape the black boxes that we inherit from web platforms and off-the-shelf softwares.
We chose R studio because it provides an integrated framework for combining a variety of approaches and commanding the complete chain of operations. Under R, data-driven historians can conduct the entire research process – from data extraction to the exploration, analysis, interpretation and publishing of their findings and methodology – within a single, unified environment,  while ensuring the traceability of the workflow and the replicability of their experiments, through sharing the code and emphasizing collaboration. Moreover, it is supported by a large community of users (historians/scholars, data scientists/computing specialists ) and it is constantly evolving toward greater integration and accessibility. Altogether, the following tutorials develop a standard workflow that any historian can emulate or transpose to her own research needs. She will be guided step by step from building the corpus to analyzing its textual content, mapping the underlying network of social actors and many other applications. Corpus building with the enpchina package : in this tutorial, you will learn how to use the enpchina package to build a corpus (i.e. a collection of newspaper articles) from a keyword-based query and to conduct a preliminary exploration of this corpus (Rmd version: https://bookdown.enpchina.eu/Liumei/01_Corpus.rmd). Text analysis 
  1. Text analysis with tidytext: apply basic text analysis techniques to approach the content of articles (tokenisation, word frequency, correlation, co-occurrences) with the package tidytext (Rmd version: https://bookdown.enpchina.eu/Liumei/02_TextAnalysis.Rmd).
  2. Text statistics with quanteda: learn how to create a corpus object to perform more advanced text analyses   (frequency, time series) and visualisations (heat maps) with the package quanteda (Rmd version: https://bookdown.enpchina.eu/Liumei/021_TextStats.Rmd).
  3. Keyword extraction with quanteda: learn how to handle multi-word units (e.g. « United States”), extract key terms and compare corpora of varying size using more sophisticated metrics (TF-ID, log-likelihood ratio test) (Rmd version: https://bookdown.enpchina.eu/Liumei/022_KeyTerm.Rmd).
  4. Text co-occurrences (1) explore relations between words, learn how to find and visualize collocates and to measure their significance (Rmd version: https://bookdown.enpchina.eu/Liumei/023_TextCooc.Rmd).
  5. Text co-occurrences (2) : discover alternative ways of visualizing text collocations (Rmd version: https://bookdown.enpchina.eu/Liumei/024_Collocation.Rmd).
  6. Concordancing: to bridge the gap between distant and close reading, learn how to analyze words in their original context and apply regular expressions to refine your research (coming soon).
Sentiment analysis (coming soon) Topic modeling (coming soon) Named Entity Recognition (coming soon)
  1. Extraction with the enpchina package
  2. Processing : clean, homogenize and classify named entity with the tidyverse meta package and open refine R extensions.
  3. Network analysis of persons and organizations
  4. Mapping locations
Corpus forensics (coming soon)
    • Text classification
    • Text metrics
    • Text features
    • Stylometry
    • Text reuse
Industrialist [Christian Henriot]
This is the first part of a multidirectional exploratory study of Shanghai industrialists in the Shenbao from the mid 19th-mid 20th century. In this document, we examine the terms through which “industrialists”  were named in the press. This essay takes two of the most common terms thatdesignated “industrialists” in the Chinese press in the Republican period: 工業家 and 實業家. While 工業家 represents an unambiguous term for “industrialist”, 實業家 can refer to “entrepreneur” (in other sectors, including banking) and “industrialist”. Our purpose in this script is to extract all the texts that refer to any of these terms and to extract all the Named entities (person, organization, location) mentioned in these texts. The second stage of this survey is to link these entities to events to which these terms may be related. The next instalment will explore a wider range of terms associated to “industrialists”.
Industrialist [Christian Henriot]
This is the first part of a multidirectional exploratory study of Shanghai industrialists in the Shenbao from the mid 19th-mid 20th century. In this document, we examine the terms through which “industrialists”  were named in the press. This essay takes two of the most common terms thatdesignated “industrialists” in the Chinese press in the Republican period: 工業家 and 實業家. While 工業家 represents an unambiguous term for “industrialist”, 實業家 can refer to “entrepreneur” (in other sectors, including banking) and “industrialist”. Our purpose in this script is to extract all the texts that refer to any of these terms and to extract all the Named entities (person, organization, location) mentioned in these texts. The second stage of this survey is to link these entities to events to which these terms may be related. The next instalment will explore a wider range of terms associated to “industrialists”.
Data Transformation [ENP-China Team]
In processing data extracted from sources, historical or otherwise, we often face the same issues of having to clean, homogenize, group, etc. the data to make it useful for analysis. Data transformation is a tedious task that requires rigor and accuracy. While small data sets can be processed by hand in a spreadsheet, larger data sets can quickly become time consuming, with a greater risk of making mistakes. An R script enables the systematic processing of data without erasing the original data set on which the transformation is being applied. With the Data Transformation script we mean to provide a series of operations for the transformation of data, including messy data, to fit the requirements of clean tabular data. Our purpose is to facilitate the production of data in Chinese studies according to the norms of academia in Western countries, but the script can be extended to other fields of study. We propose here a wide range of examples, from simple text editing to the extraction of data from complex sentences. The examples we use apply to both English (or any Latin script) and Chinese.
MCBD Design Manual [ENP-China Team]
We wrote the design manual of the Modern China Biographical Database to present the nature, the purpose, and the structure of the database. This document is not strictly a “script” for analytical use. We chose this format to elaborate this manual for two main reasons: first, this was a collective exercise in which each of us could write his/her part(s) independently, which were then compiled automatically into a single document; second, we wanted to work with a format that allowed us to update any part at any time and to produce seamlessly a flexible document for the web.
MCBD Design Manual [ENP-China Team]
We wrote the design manual of the Modern China Biographical Database to present the nature, the purpose, and the structure of the database. This document is not strictly a “script” for analytical use. We chose this format to elaborate this manual for two main reasons: first, this was a collective exercise in which each of us could write his/her part(s) independently, which were then compiled automatically into a single document; second, we wanted to work with a format that allowed us to update any part at any time and to produce seamlessly a flexible document for the web.