Resources

Publications

HU YI-FAN _ Ecole Doctorale Espaces Cul_

Publications

HistText 1.0

HistText is the application developed in R by the ENP-China team to explore, extract, and process the data from the historical digital corpora that the project acquired or produced. Initially, this application was designed as an internal tool (series of search functions) under the name “enpchina R Library”. It provided ready-made functions to mine data in digital corpora. We plan to update all the scripts that we have made available so far as markdown files, but for those who use our initial package we still maintain the functions in the scripts with the ‘enpchina’ denomination. Eventually, ‘HistText’ will replace all these mentions.

The HistText application is used by the ENP-China research team on its own corpora, but the field of application of HistText is much broader. It can be used for any digital corpora, readily in English, French, and Chinese, and with only light adaptations for other languages. We have released the public version of the code on Gitlab. The current version of HistText 1.0 comes with a complete HistText Manual that we have prepared to describe all the functions and to provide ready-made examples of scripts.

We have also developed a public search interface that allows anyone to access and search the publicly accessible corpora (vs. corpora under copyright) without having to go through coding in R. Yet if your institutions happens to have a license for these corpora (ProQuest, Shenbao), please contact us and we shall open an access to you. The HistText online interface has two access pages: one for the Search and Query functions, one for Named Entity Extraction.The functions of the interface are fully described in the HistText User Guide.

There are two online presentations of the functions of the enpchina package and of HistText.

MCBD Design Manual [ENP-China Team]

We wrote the design manual of the Modern China Biographical Database to present the nature, the purpose, and the structure of the database. This document is not strictly a “script” for analytical use. We chose this format to elaborate this manual for two main reasons: first, this was a collective exercise in which each of us could write his/her part(s) independently, which were then compiled automatically into a single document; second, we wanted to work with a format that allowed us to update any part at any time and to produce seamlessly a flexible document for the web.

Data Transformation [ENP-China Team]

In processing data extracted from sources, historical or otherwise, we often face the same issues of having to clean, homogenize, group, etc. the data to make it useful for analysis. Data transformation is a tedious task that requires rigor and accuracy. While small data sets can be processed by hand in a spreadsheet, larger data sets can quickly become time consuming, with a greater risk of making mistakes. An R script enables the systematic processing of data without erasing the original data set on which the transformation is being applied.

With the Data Transformation script we mean to provide a series of operations for the transformation of data, including messy data, to fit
the requirements of clean tabular data. Our purpose is to facilitate the production of data in Chinese studies according to the norms of
academia in Western countries, but the script can be extended to other fields of study. We propose here a wide range of examples, from simple text editing to the extraction of data from complex sentences. The examples we use apply to both English (or any Latin script) and Chinese.

Industrialist [Christian Henriot]

This is the first part of a multidirectional exploratory study of Shanghai industrialists in the Shenbao from the mid 19th-mid 20th century. In this document, we examine the terms through which “industrialists” were named in the press. This essay takes two of the most common terms thatdesignated “industrialists” in the Chinese press in the Republican period: 工業家 and 實業家. While 工業家 represents an unambiguous term for “industrialist”, 實業家 can refer to “entrepreneur” (in other sectors, including banking) and “industrialist”. Our purpose in this script is to extract all the texts that refer to any of these terms and to extract all the Named entities (person, organization, location) mentioned in these texts. The second stage of this survey is to link these entities to events to which these terms may be related. The next instalment will explore a wider range of terms associated to “industrialists”.

The Rotary Club [Cécile Armand]

Case 1 – The Rotary Club of China in the press

A Practical Guide to the ‘enpchina’ package: The Rotary Club in the Chinese Press: This guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers, focusing on a major Chinese newspaper – Shenbao 申報 – and a concrete case study – the Rotary Club of Shanghai 上海扶輪社 (Shanghai fulunshe) (Rmd version: https://bookdown.enpchina.eu/Rotary_sb_eng.Rmd).

A Practical Guide to the ‘enpchina’ package: The Rotary Club in the English-language Press: This guide aims to demonstrate how China historians can take advantage of the “enpchina” package to explore massive corpora of historical newspapers – i.e. ProQuest “Chinese Newspapers Collection” – taking the Rotary Club of Shanghai as a case study (Rmd version: https://bookdown.enpchina.eu/Rotary_pq_eng.Rmd).

Case 2 – American University Men of China

This tutorial series applies a place-based methodology to study Sino-American alumni networks in modern China, based on a directory of the American University Club of Shanghai published in 1936. It is divided into four parts:

1. Find and analyze places using the R package “Places” (html version, Markdown version)

2. From places to networks (a dual approach): Build, visualize and analyze place-based networks using igraph (html version, Markdown version).

3. Community detection in place-based networks (igraph): Identify and analyze subgroups of places (igraph)

4. Place formation over time: Create period-based subnetworks to analyze the formation of academic places between 1883 and 1935

Case 3 – The Golden Age of the returned students

This collection of tutorials explore the presence of the returned students in the Chinese modern press. The press corpora include a dozen of Chinese newspapers spanning from the mid 19th-mid 20th century. They are part of the large collections of historical sources that the ENP China project has acquired and made available in full text for the first time. The potential for exploration is infinite. It may be disturbing too. As humanists trained in the close reading and critical hermeneutics of a limited, human-scale amount of documents, we are poorly equipped for facing this data deluge. Where to start? How to proceed? These tutorials provide some useful tips for turning historians into data-driven humanists. We will experiment with various techniques and methods to handle massive historical corpora and approach modern Chinese history from new perspectives.

The purpose of this tutorial series is twofold :

Substantially, to introduce a step change in the history of the returned students and contribute to a new understanding of their role in building a new China after the empire – a much disputed issue in the existing scholarship (Wang, 1966). The corpus-based, data-driven approach we propose will enrich and contextualize the biographical, proposopographical and cultural studies that have prevailed to date.
Methodologically, we aim to:

- introduce the enpchina R package – a set of tools relying on R programming language tailored specifically for exploring massive, multilingual corpora of Chinese sources – and other R packages we consider useful for historical research ;
- devise on-the-fly yet sustainable solutions for harnessing large collections of historical newspapers ;
- empower historians with various programming skills so that they gain full control over the “datafication” process and escape the black boxes that we inherit from web platforms and off-the-shelf softwares.

We chose R studio because it provides an integrated framework for combining a variety of approaches and commanding the complete chain of operations. Under R, data-driven historians can conduct the entire research process – from data extraction to the exploration, analysis, interpretation and publishing of their findings and methodology – within a single, unified environment, while ensuring the traceability of the workflow and the replicability of their experiments, through sharing the code and emphasizing collaboration. Moreover, it is supported by a large community of users (historians/scholars, data scientists/computing specialists ) and it is constantly evolving toward greater integration and accessibility. Altogether, the following tutorials develop a standard workflow that any historian can emulate or transpose to her own research needs. She will be guided step by step from building the corpus to analyzing its textual content, mapping the underlying network of social actors and many other applications.

Corpus building with the EnpChina package : in this tutorial, you will learn how to use the enpchina package to build a corpus (i.e. a collection of newspaper articles) from a keyword-based query and to conduct a preliminary exploration of this corpus (Rmd version: https://bookdown.enpchina.eu/Liumei/01_Corpus.rmd).

Text analysis

Text analysis with tidytext: apply basic text analysis techniques to approach the content of articles (tokenisation, word frequency, correlation, co-occurrences) with the package tidytext (Rmd version: https://bookdown.enpchina.eu/Liumei/02_TextAnalysis.Rmd).
Text statistics with quanteda: learn how to create a corpus object to perform more advanced text analyses (frequency, time series) and visualisations (heat maps) with the package quanteda (Rmd version: https://bookdown.enpchina.eu/Liumei/021_TextStats.Rmd).
Keyword extraction with quanteda: learn how to handle multi-word units (e.g. « United States”), extract key terms and compare corpora of varying size using more sophisticated metrics (TF-ID, log-likelihood ratio test) (Rmd version: https://bookdown.enpchina.eu/Liumei/022_KeyTerm.Rmd).
Text co-occurrences (1) explore relations between words, learn how to find and visualize collocates and to measure their significance (Rmd version: https://bookdown.enpchina.eu/Liumei/023_TextCooc.Rmd).
Text co-occurrences (2) : discover alternative ways of visualizing text collocations (Rmd version: https://bookdown.enpchina.eu/Liumei/024_Collocation.Rmd).
Concordancing: to bridge the gap between distant and close reading, learn how to analyze words in their original context and apply regular expressions to refine your research (coming soon).

Sentiment analysis (coming soon)

Topic modeling (coming soon)

Named Entity Recognition (coming soon)

Extraction with the enpchina package
Processing : clean, homogenize and classify named entity with the tidyverse meta package and open refine R extensions.
Network analysis of persons and organizations
Mapping locations

Corpus forensics (coming soon)

- Text classification
- Text metrics
- Text features
- Stylometry
- Text reuse

X-Boorman

X-Boorman presents an enhanced digital version of the Biographical Dictionary of Republican China edited by Howard L. Boorman in 1967-1971. The dictionary is no longer in print and hardly used in modern Chinese historical studies. It lost mush of its luster and relevance with the emergence of internet resources. Yet, this is a work based on meticulous research that provides information that remains relevant today. X-Boorman proposes an exploration of this work beyond the individual biographies, through network analysis, mapping, graphs, with an interface that incorporates pinyin and Chinese.

Publications

The ENP-China project is developing collections of publications in the form of pamphlets and podcasts on the PEERS platform. PEERS is a non-profit organization created by researchers for researchers who can publish works directly with integrated data and operational code. This makes it possible for everyone to contribute to data, to explore data and to re-analyze data. The ENP-China team publishes intermediate research results, methodological papers, and also full papers in its series Digital History Lab.

BN-Asie

The ENP-China collection on BN-Asie (Asia Digital Library) makes available all the source materials that researchers have used in the course of their research. Generally, it includes digital versions in high resolution, except when we were not able to get one. We processed all these files to turn them into searchable pdf files (OCR). While we establish the collection, all the deposited materials can be found by searching « enpc » in the keyword field of the search engine.

Bibliography

The ENP-China project maintains a bibliography of all the works — source materials as well as academic literature — that are relevant to the study of elites as a historical object. It is organised in six sections: Biographical databases [any database with historical biographical data], Elites_China (Eng) [academic works on elites in China in English and other Western languages], Elites_China (Zh) [academic works on elites in China in Chinese], Elites Foreigners [academic works on foreign elites and foreigners in China in English and other Western languages], Elites History [academic works on elites outside of China in English and other Western languages], Source Books [all the source materials used in the ENP-China project.

Data Repository

All the data sets produced in the course of the ENP-China project are made available on the ENP-China Data Repository on Zenodo, the long-term general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit data sets, research software, reports, and any other research related digital artifacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.

Virtual Shanghai

Virtual Shanghai is a research and resource platform on the history of Shanghai from the mid-nineteenth century to nowadays. It incorporates various sets of documents: essays, original documents, photographs, maps, quantitative data, etc. The objective of the project is to write a history of the city through the combined mobilization of these various types of documents. The implementation of this approach relies on the use of digital and GIS technologies. On the research side, the platform offers various ways to step into the history of the city and follow its course at different levels over time. On the resource side, apart from providing original textual and visual documents, it develops a powerful cartographic tool for spatial analysis and real-time mapping (to be upgraded soon). The authors of the present project subscribe to the idea of sharing scholarship and research tools for the benefit of scholars, students, and citizens at large.

Virtual Beijing

Virtual Beijing has attracted the interest of historians of modern China only in the last decade. If we set aside the spate of books generated by the Olympic games, few studies have addressed the social history of Beijing and, among them, few have actually devoted much attention to the spatial dimension as part of the historical analysis. Through this digital platform, we plan to explore further the social history of Beijing and to address its historical trajectory through case studies based on textual, visual, and spatial data.

Virtual Wenzhou

Virtual Wenzhou is a platform for research and resources on the history of Wenzhou from the late imperial period (circa 1300-1911) to the present. It collects and assemblies a wide array of sources, including photographs, old maps, archival documents, and analytical data. The aim of this project is to present a thorough collection of sources pertaining to the history and culture of Wenzhou, and port city in Southeast China that had developed distinct cultures and languages. Aiming to enhance the understanding of Wenzhou

MadSpace

MADSpace (Mapping Advertising Space) was born in 2016 as a digital companion to a PhD dissertation devoted to a spatial history of advertising in modern China. It was designed to store, organize and connect primary sources or raw data (archives, printed materials, maps, photos), analytical materials or cooked data (graphs, maps, trees, tables, timelines), multimedia narratives (dissertation, published papers, unpublished essays), bibliographical references and other resources. It also includes a relational database of some 2,000 historical actors involved in the advertising industry (companies, branded products). Since then, it has expanded beyond its initial purpose to include over 950 archival documents, 1,500 printed materials, 1,000 images, and more than 700 “cooked data” to date, all related to various interconnected research interests that branch out of the history of advertising, such as market expertise, consumer culture, Americanization, the modern press and public opinion in China. It is regularly updated and enriched.

https://madspace.org/

Numerica Sinica

Numerica Sinica is a CNRS initiative initially supported by INSHS and operated by TGER Huma-Num in order to create a mutualized instrument of access to digital resources on Chinese worlds for the benefit of CNRS affiliated units. The platform now includes a broad range of resources, including those purchased specifically for ANR projects and the ENP-China project.

近現代人物資訊整合系統 (Institute of Modern History)

《近現代人物資訊整合系統》收錄近現代中國人名錄、人物傳記、職官表、檔案、口述史等多種人物資料，加以編輯組合，並以主題分類，整理為整合性的人物資訊。資料庫收錄的人物時間涵蓋清中葉至民國時期，現有人物約135,000多人。以主題分類有：近代春秋TIS人物索引、上海地區人物錄、中國人物傳記資料、口述史叢書、近史所檔案館人名權威檢索系統等。

Elites, Networks and Power in Modern China

Publications

Publications

Case 1 – The Rotary Club of China in the press

Case 2 – American University Men of China

Case 3 – The Golden Age of the returned students

Connected projects

LOCATION

Quik Menu

Related Links

USER TOOLS

Hotel Information

Internal Team Meeting

Internal Form

Connected to Us

© 2026 All rights reserved ENP China

Contact

Top Restaurants in town :