Article Text
Abstract
Objectives Current clinical knowledge synthesis strategies are not robust, and lack long term digital curation planning. Therefore, new curation strategies that leverage upon data mining and text analysis are essential to cope with the deluge of clinical research evidence by ensuring a well organized and ontologically mapped research evidence generation schema. We have following objectives for this project.
Develop, train and implement advanced machine learning methods to automate knowledge extraction, summarization and appraisal biomedical literature.
Re train BIOBERT using pre defined appraisal metrics for accurate Q&A based knowledge extraction and summarization of biomedical literature.
Evaluate the performance of retrained BIOBERT model using expert adhoc methods.
Deploy, and train ML model to asses performance using pre defined appraisal metrics.
Method We adopted the state of the art pre trained deep learning based NLP model, BIOBERT along with BIOASQ datasets. Based on domain expert knowledge, we developed critical appraisal metrics for the Biobert model. We evaluated and fine tuned the model based on the performance of the trained model. Model training and evaluation was performed using python based deep neural network library , tensroflow, hosted at AWS.
Results Using ROUGE ( Recall-Oriented Understudy for Gisting Evaluation),a set of metrics for evaluating automatic summarization of texts as well as machine translation, Precision, recall and F score was calculated between system and reference summary to assess the performance of ML model. Preliminary results showed a validated appraisal and summary of publication compared to annotated expert based summaries. Our pre trained BIOBERT with BIOASQ corpus has provided a validated framework to automated the literature mining and appraisal with human expert performance.
Conclusions We have adopted, trained and evaluated NLP based machine learning model using pre trained Biobert and Bioasq data sets. Our findings support the potential use of this machine learning model to automate literature summarization and appraisal to address ever growing biomedical publications overload. We plan to further evaluate and improve the performance of our model to create a real-time knowledge synthesis