Ulster University Logo

Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study

Huang, JianDong, Zheng, Huiru, Wang, Haiying / HY and Jiang, Xingpeng (2017) Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study. In: 2017 IEEE International Conference in Bioinformatics and Biomedicine, Kansas City, MO, USA. IEEE. 8 pp. [Conference contribution]

[img] Text
[img] Text - Accepted Version

URL: http://dx.doi.org/10.1109/BIBM.2017.8217977

DOI: 10.1109/BIBM.2017.8217977


Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.

Item Type:Conference contribution (Paper)
Keywords:Machine Learning; Cyanobacteria blooms; OTU (Operational Taxonomic Unit)
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Mathematics
Faculty of Life and Health Sciences > School of Geography and Environmental Sciences
Faculty of Life and Health Sciences
Research Institutes and Groups:Computer Science Research Institute > Smart Environments
Computer Science Research Institute
Computer Science Research Institute > Artificial Intelligence and Applications
Environmental Sciences Research Institute > Coastal Systems
ID Code:39630
Deposited By: Dr Huiru Zheng
Deposited On:21 Apr 2018 12:39
Last Modified:21 Apr 2018 12:39

Repository Staff Only: item control page