Electronics, Vol. 13, Pages 2949: SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction

1 month ago 27

Electronics, Vol. 13, Pages 2949: SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction

Electronics doi: 10.3390/electronics13152949

Authors: Yunfei Liu Shengyang Li Yunziwei Deng Shiyi Hao Linjie Wang

With the continuous exploration of space science, a large number of domain-related materials and scientific literature are constantly generated, mostly in the form of text, which contains rich and unexplored domain knowledge. Natural language processing technology has made rapid development and pre-trained language models provide promising information extraction tools. However, due to the strong professionalism of space science, there are many domain concepts and technical terms. Moreover, Chinese texts have complex language structures and word combinations, which may yield suboptimal performance by general pre-trained models such as BERT. In this work, we investigate how to adapt BERT to Chinese space science and propose the space science-aware pre-trained language model, namely, SSuieBERT. We validate it through downstream tasks such as named entity recognition, relation extraction, and event extraction, which can perform better than general models. To the best of our knowledge, our proposed SSuieBERT is the first pre-trained language model in space science, which can promote information extraction and knowledge discovery from space science texts.

Read Entire Article