This project explores the musical and cultural dynamics that shape the success of Afrobeats and Amapiano tracks in the digital era. Using a curated dataset enriched with Spotify audio features, TikTok virality scores, streaming velocity, and Billboard chart presence, I developed a custom hit-classification framework tailored to each genre. Through exploratory data analysis and machine learning models—including logistic regression and random forest—I identified which features best predict hit potential. The findings challenge common industry assumptions, suggesting that visibility and shareability often outweigh traditional audio attributes like tempo and duration.
Final Model Results
The class-balanced Random Forest achieved strong hit-class performance, with an Afrobeats hit F1-score of 0.94 and an Amapiano hit F1-score of 0.91, outperforming Logistic Regression in minority-class hit detection.
My Primary Motivation?
With Afrobeats and Amapiano gaining mainstream traction on global platforms, I set out to investigate what drives virality in these genres—beyond traditional Western hit-making formulas. This project bridges data science with cultural analytics to explore how engagement signals, community dynamics, and platform-specific trends shape music success in decentralized ecosystems. By designing a genre-aware prediction model rooted in real-world virality metrics, my aim was not only to uncover predictive patterns, but to challenge industry assumptions about what a "hit" sounds like in the digital age.
🔍 Key Highlights

Methodology & Insights
I began by aggregating data from Spotify's API, capturing key audio features like tempo, duration, and beat strength. To enrich the dataset, I manually tracked TikTok virality, streams-per-day since release, and Billboard Africa chart appearances. I then defined genre-specific hit criteria tailored to Afrobeats and Amapiano, accounting for cultural and platform-driven dynamics. Once labeled, I used logistic regression and random forest models to evaluate predictive performance and extract feature importances. Conclusively, I selected the class-balanced Random Forest model as the stronger final model because it improved hit detection under class imbalance.
Lyric Analysis (Genius Integration):
II extended the model by collecting lyrics through the Genius API and applying VADER sentiment analysis to create a lyric_sentiment feature. This added an emotional-text layer to the project, allowing the model to test whether lyrical tone contributed additional predictive signal beyond streams, TikTok virality, Billboard presence, and Spotify popularity.
Findings
The models revealed that commonly assumed musical drivers of success—such as tempo, beat strength, and duration—had limited standalone predictive power. Instead, platform-driven signals like TikTok virality, Billboard Africa visibility, Spotify popularity, and streaming velocity carried more predictive value. These insights suggest that in today's digital music ecosystem, shareability and cultural momentum often outweigh traditional musical structure. Notably, Afrobeats hits tended to show stronger beat presence and higher average popularity scores, while Amapiano hits favored mid-tempo consistency and subtle rhythmic patterns.
Reflection & Future Directions
This project offers a data-driven framework for understanding the cultural and structural dynamics behind song success in Afrobeats and Amapiano. By integrating engagement signals and audio features, the model provides actionable insight into the evolving nature of musical virality. Looking ahead, future work could expand the dataset across more African genres, incorporate richer lyric embeddings beyond sentiment scores, and add regional platform signals from YouTube Shorts, Apple Music charts, Shazam, and radio play. I would also improve model generalizability by testing the pipeline on songs released after the original snapshot date and comparing whether the same predictors hold over time. Ultimately, this project shows that hit prediction is not just a sound problem — it is a cultural signal problem.






