Telstra is Australia’s largest telecom provider. The company provides a dataset of service disruption events on Kaggle in which events are categorized in three levels of severity. I built a model to predict the severity of uncategorized events.
The first step of this project was to consolidate the data which was provided in separate files. I then reformatted the data to be completely numerical and checked for missing values and outliers. As several features were non-ordinal categorical features, I encoded them using one-hot encoding. The different files were linked by a common index but a given index was sometimes assigned multiple entries which I grouped together.
For model building, I started by trying out a different classification algorithms, with random forests being the best-performing one. Due to the imbalance of the different classes, I chose the F1 score as an appropriate metric and the best model achieved a score of 0.65.
In order to improve the model, I used different feature selection techniques, namely choosing features by correlation with the target, recursive feature elimination, principal component analysis, adding interaction features, and combinations thereof. However, the best model remained the original random forest model.
The python code for this project is available in a jupyter notebook on GitHub.