Machine learning for the automatic detection of anomalous events
dc.contributor.advisor | Camp, Tracy | |
dc.contributor.author | Fisher, Wendy D. | |
dc.date.accessioned | 2017-05-19T21:21:41Z | |
dc.date.accessioned | 2022-02-03T13:00:52Z | |
dc.date.available | 2017-05-19T21:21:41Z | |
dc.date.available | 2022-02-03T13:00:52Z | |
dc.date.issued | 2017 | |
dc.identifier | T 8243 | |
dc.identifier.uri | https://hdl.handle.net/11124/170968 | |
dc.description | Includes bibliographical references. | |
dc.description | 2017 Spring. | |
dc.description.abstract | In this dissertation, we describe our research contributions for a novel approach to the application of machine learning for the automatic detection of anomalous events. We work in two different domains to ensure a robust data-driven workflow that could be generalized for monitoring other systems. Specifically, in our first domain, we begin with the identification of internal erosion events in earth dams and levees (EDLs) using geophysical data collected from sensors located on the surface of the levee. As EDLs across the globe reach the end of their design lives, effectively monitoring their structural integrity is of critical importance. The second domain of interest is related to mobile telecommunications, where we investigate a system for automatically detecting non-commercial base station routers (BSRs) operating in protected frequency space. The presence of non-commercial BSRs can disrupt the connectivity of end users, cause service issues for the commercial providers, and introduce significant security concerns. We provide our motivation, experimentation, and results from investigating a generalized novel data-driven workflow using several machine learning techniques. In Chapter 2, we present results from our performance study that uses popular unsupervised clustering algorithms to gain insights to our real-world problems, and evaluate our results using internal and external validation techniques. Using EDL passive seismic data from an experimental laboratory earth embankment, results consistently show a clear separation of events from non-events in four of the five clustering algorithms applied. The results from experimenting with our BSR data, using various system information (SI) and system information blocks (SIBs), show we can make a clear distinction between commercial and non-commercial scans in both Universal Mobile Telephone System (UMTS) and Long Term Evolution (LTE); more work is needed to understand whether non-commercial BSRs can be discovered in the Global System for Mobile Communications (GSM) analysis. We also investigate and provide results on using ASN.1 encoded LTE data as input to our machine learning algorithms; we use encoded data to eliminate the need for extensive feature selection and manual analysis that could potentially introduce bias. Chapter 3 uses a multivariate Gaussian machine learning model to identify anomalies in our experimental data sets. For the EDL work, we used experimental data from two different laboratory earth embankments. Additionally, we explore five wavelet transform methods for signal denoising. The best performance is achieved with the Haar wavelets. We achieve up to 97.3% overall accuracy and less than 1.4% false negatives in anomaly detection. Using the BSR scans, we continue to see that the GSM broadcast messages are not suitable for our anomaly detection system. However, the multivariate Gaussian approach with the UMTS, LTE, and ANS.1 encoded LTE scans were successful in separating commercial from non-commercial BSRs with 100% overall accuracy. In Chapter 4, we research using two-class and one-class support vector machines (SVMs) for an effective anomaly detection system. We again use the two different EDL data sets from experimental laboratory earth embankments (each having approximately 80% normal and 20% anomalies) to ensure our workflow is robust enough to work with multiple data sets and different types of anomalous events (e.g., cracks and piping). We apply Haar wavelet-denoising techniques and extract nine spectral features from decomposed segments of the time series data. The two-class SVM with 10-fold cross validation achieved over 94% overall accuracy and 96% F1-score. The F1-score is a measure of the algorithms predictive performance and the harmonic mean of precision and recall. Experiments with the one-class SVM (no labeled data for anomalies) using the top features selected by our automatic feature selection algorithm increase our overall results from 83% accuracy and 89% F1-score to over 91% accuracy and 95% F1-score. The two-class SVM experiments with our BSR detection workflow, using the top two features for each data set, highlight the ability to make a distinction between commercial and non-commercial BSRs with 83.3% overall accuracy and 89.8% F1-score for GSM and an impressive 100% overall accuracy for UMTS, LTE, and the ASN.1 encoded LTE data. As expected, using labels for only normal data, the one-class SVM resulted in a lower overall performance. The overall accuracy for GSM, UMTS, and LTE dropped to 73.3%, 74.5%, and 91.1%, with F1-scores of 81.0%, 82.2%, and 95.0%, respectively. Our approach provides a means for automatically identifying anomalous events using various machine learning techniques. Detecting internal erosion events in aging EDLs, earlier than is currently possible, can allow more time to prevent or mitigate catastrophic failures. Results show that we can successfully separate normal from anomalous data observations in passive seismic data, and provide a step towards techniques for continuous real-time monitoring of EDL health. Our lightweight non-commercial BSR detection system also has promise in separating commercial from non-commercial BSR scans without the need for prior geographic location information, extensive time-lapse surveys, or a database of known commercial carriers. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado School of Mines. Arthur Lakes Library | |
dc.relation.ispartof | 2017 - Mines Theses & Dissertations | |
dc.rights | Copyright of the original work is retained by the author. | |
dc.subject | BSR detection | |
dc.subject | machine learning | |
dc.subject | earth dam and levee | |
dc.subject | anomaly detection | |
dc.title | Machine learning for the automatic detection of anomalous events | |
dc.type | Text | |
dc.contributor.committeemember | Wang, Hua | |
dc.contributor.committeemember | Krzhizhanovskaya, Valeria | |
dc.contributor.committeemember | Zhang, Hao | |
dc.contributor.committeemember | Navidi, William Cyrus | |
dc.contributor.committeemember | Stone, Kerri | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) | |
thesis.degree.level | Doctoral | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado School of Mines |