To implement its research operations, the ENP-China team has established a work environment with a number of core instruments and technologies. Many of them are reliant upon the Huma-Num large-scale infrastructure (TGI Huma-Num) operated by CNRS (French National Center for Scientific Research) for hosting these instruments and storing resources.
Inception is an open-source platform for text annotation. It is an instrument developed by and based at the UKP Lab at TU Darmstadt. It aims at facilitating the task of annotation of specific semantic phenomena by building an annotation platform that incorporates all the related tasks into a joint web-based platform.
Padagraph is an open source platform for collaborative editing, analysis and three-dimensional exploration of large networks. It was developed in Scala language by Pierre Magistry and Yannick Chudy. It is able to handle very large networks by offering a navigation by construction of subgraphs, while leaving the user free to determine and modify what is a node, property or link in its data, according to the evolution of its needs and questions. These peculiarities make it suitable for the needs of the ENP-China team to process large quantities of data and to the exploratory and experimental nature of the historical study we are conducting.
R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. It is continually enriched by the community of users with increasingly sophisticated libraries. While Python is also used by the members of the team in computing and data science, the non-computing members of the team have learned R as a cross-over language for data processing. The team uses an implementation of R-studio hosted by Huma-Num. The ENP-China project has developed a specific package to access, search, and process its corpora.
For the task of collecting, storing and sharing the large-scale corpora, data sets, and other documentary resources used in the project we use the Sharedocs platform on Huma-Num. This guarantees their access at all times at any time, as well as their long-term preservation for future research. Sharedocs includes services such as bulk OCR of image and pdf documents.
HEURIST’s research-driven data management system puts the user in charge, allowing them to design, create, manage, analyse and publish their own richly-structured database(s) within hours, through a simple web interface, without the need for programmers or consultants.
PostgreSQL, also known as Postgres, is a free and open-source relational database management system. The ENP-China team uses an instance of Postgres on Huma-Num to support and manage the geospatial data of the Modern China Geospatial Database.