Loading...
Machine learning for encrypted Amazon Echo traffic classification
Jackson, Ryan Blake
Jackson, Ryan Blake
Citations
Altmetric:
Advisor
Editor
Date
Date Issued
2018
Date Submitted
Collections
Research Projects
Organizational Units
Journal Issue
Embargo Expires
Abstract
As smart speakers like the Amazon Echo become more popular, they have given rise to rampant concerns regarding user privacy. This work investigates machine learning techniques to extract ostensibly private information from the TCP traffic moving between an Echo device and Amazon's servers, despite the fact that all such traffic is encrypted. Specifically, we investigate two supervised classification problems using six machine learning algorithms and three feature vectors. The "request type classification" problem seeks to determine what type of user request is being answered by the Echo. With six classes, we achieve 97% accuracy in this task using random forests. The "speaker identification" problem seeks to determine who, of a finite set of possible speakers, is speaking to the Echo. In this task, with two classes, we outperform random guessing by a small but statistically significant margin with an accuracy of 58%. We discuss the reasons for, and implications of, these results, and suggest several avenues for future research in this domain.
Associated Publications
Rights
Copyright of the original work is retained by the author.