This page is dedicated to providing documentation on our research processes and methods. One of our major tools is R-studio through which we implement a large part of data mining on corpora and we process and analyze the collected data. As part of the workflow, we produce R scripts that we share in the form of markdown files. We want to make the research process transparent and reproducible, as well as to make available these scripts to other scholars for their own use and adaptation.
The documentation is organized in topics that reflect on-going research case studies.
Data transformation : We have prepared this script to help with data transformation in R. It is basically a series of mini-scripts on routine operations of data transformation in historical research, including specific operations for Chinese. This covers a wide range of issues, from basic editing (to capitalize words) to splitting up data based on regex (regular expressions).
« Industrialist » in the Shenbao: This is a study based on the common terms that designated « industrialists » in the Chinese press: 工業家 and 實業家. My approach is to extract all the texts (newspaper articles) that refer to any of these terms and to extract all the Named Entities (NEs) mentioned in these texts. My purpose is to identify the actors and organizations mentioned in relation to the term « industrialist », the role /position of the actors, and their relation with events when relevant.