Linguistic complexity in scientific writing: A large‑scale diachronic study from 1821 to 1920

Published in Scientometrics (SSCI), 2022

Abstract: This study intends to describe the diachronic changes of linguistic complexity (i.e. overall, morphological, and syntactic complexity) in scientific writing based on Kolmogorov complexity, an information-theoretic approach. We have chosen the entire data (i.e. all the 24 text types including articles, letters, news, etc.) and two individual registers (i.e. the full texts and abstracts of articles) of Philosophical Transactions of the Royal Society of London, the world’s oldest scientific writing journal. The Mann-Kendall trend tests were used to capture diachronic changes in linguistic complexity at three complexity levels, and the Pearson correlation coefficients were calculated to investigate the relationships between the three complexity metrics. Results showed that the overall and morphological complexity of both the entire data and full texts increased from 1821 to 1920, indicating a massive lexical expansion during this 100-year period, as evidenced by more and more word form variants in scientific writing. In contrast, the syntactic complexity of the entire data and full texts declined, suggesting a gradual shift towards grammatical simplification in the evolution of scientific writing, particularly in word order rules and syntactic patterns. A trade-off effect has also been found between syntactic and morphological complexity in the entire data. In addition, concerning abstracts, the overall and morphological complexity decreased while the syntactic complexity increased. Drawing from these results, researchers can better understand the changing linguistic complexity styles in scientific writing, thus making adjustments in their writing accordingly to garner greater attention in academia.

摘要: 本研究旨在基于Kolmogorov复杂度(一种信息论方法)描述科学写作中语言复杂度(即整体、形态和句法复杂度)的历时变化。我们选择了《伦敦皇家学会哲学汇刊》——世界上最古老的科学写作期刊——的全部数据(包括文章、信件、新闻等24种文本类型)和两个单独的语域(即文章的全文和摘要)。采用Mann-Kendall趋势检验捕捉语言复杂度在三个复杂度层面上的历时变化,并计算了皮尔逊相关系数以研究三种复杂度指标之间的关系。结果表明,从1821年到1920年,整体和形态复杂度在全部数据和全文中均有所增加,表明在这100年期间科学写作中词汇的极大扩展,越来越多的词形变体作为证据。然而,全部数据和全文的句法复杂度下降,表明科学写作演化过程中语法逐渐简化,特别是在词序规则和句法模式上。在全部数据中,还发现了句法复杂度和形态复杂度之间的权衡效应。此外,关于摘要,整体和形态复杂度下降,而句法复杂度增加。基于这些结果,研究人员可以更好地理解科学写作中语言复杂度风格的变化,从而在写作中做出相应调整,以在学术界获得更大的关注。(翻译自GPT-4o)

Contribution: Li Wang conceptualized the study. Gui Wang wrote the Introduction, Results, and Discussion sections. Hui Wang wrote the Methodology section. Nan Wang and Xinyi Sun wrote the Literature Review section. Both Gui Wang and Hui Wang were actively involved in the discussion of the Discussion section. Gui Wang outlined the workflow of the methodology and was responsible for the primary statistical analysis. Xinyi Sun was involved in the early data collection and organization. Hui Wang played a significant role in refining the study’s design. Li Wang, Hui Wang, and Gui Wang all actively reviewed, revised, and contributed to the finalization of the manuscript. Li Wang provided constructive feedback on the finalization of the manuscript. Thanks to the open resource about the calculation of Kolmogorov complexity generously provided by Ehret (2017).

Download paper here

Recommended citation: Wang, G., Wang, H., Sun, X., Wang, N., & Wang, L. (2022). Linguistic complexity in scientific writing: A large-scale diachronic study from 1821 to 1920. Scientometrics, 128(1), 441–460. https://doi.org/10.1007/s11192-022-04550-z