The goal of this project was to implement multiple machine learning algorithms from scratch to identify if an Android app was malicious or benign based on the system calls the app made. Each algorithm’s predictions were submitted to Kaggle and scored using the F1 metric, with the primary goal being to maximize F1. A minimum of six unique algorithms were required. Five algorithms built entirely from scratch, and one optionally using an external ML library.
All simple algorithms were coded by scratch in python using only the numpy and pandas libraries.
For the neural network models, I used PyTorch to build, train, and evaluate architectures efficiently.
As shown in the table, the Random Forest model outperformed all others with an F1 score of 0.89. Its ability to reduce variance through ensembling and handle feature interactions made it particularly well-suited to this dataset. The neural network also performed well, achieving an F1 score of 0.82. However, surprisingly, the 5-fold ensemble did not offer any improvement over a single model, possibly due to the models already being quite similar or converging to similar solutions. The biggest surprise was how well the basic decision tree performed, achieving the same score as the neural network.
Overall, models that were able to handle non-linearity and aggregate decisions like Random Forests and Neural Networks were more effective in this task.