Abstract: This paper presents a comprehensive methodology for extracting and processing data from the scientific literature to improve the performance of generative language models in the case of the ...