MIDSURE 2022

blog-post-image

Abstract: Interdisciplinary scientists are faced with a unique challenge: papers relevant to their research are spread across many journals in different disciplines, thus multiplying the number of works that need to be surveyed. Nuclear Astrophysics is a major interdisciplinary research field at MSU where scientists are confronted with 40 journals from the astrophysics and nuclear physics disciplines. Only a tenth of the publications in these journals are pertinent to nuclear astrophysicists. MSU's Joint Institute for Nuclear Astrophysics - Center for the Evolution of the Elements (JINA-CEE) has addressed this issue by creating its own virtual journal for the community. However, sorting through ~500 papers per week extracts valuable time from JINA-CEE researchers. In efforts to increase the efficiency of the process, we have developed a machine learning tool that uses abstracts to automate the identification of papers relevant to the virtual journal. The tool converts the text of a publication's abstract into a numerical representation (TFIDF) and then uses a suitable machine learning algorithm (SVM) to determine if the paper belongs in the virtual journal. We trained our tool on 50,000 papers that had previously been manually sorted by JINA-CEE scientists. The tool was then tested on an independent set of 20,000 papers and achieved an accuracy of 83%. The sorting of a single paper by the tool takes significantly less time (18.5 ms) than a JINA-CEE scientist. We are working with JINA-CEE scientists to integrate our tool by the end of the summer.