0:00 / 0:00
Volume: 50%

Hit Song Predictor: Amapiano & Afrobeats

01
TAGS
Machine-Learning
Machine-Learning
Random-Forest-Model
Random-Forest-Model
Weighted-Ensemble-Model
Weighted-Ensemble-Model
0:00 / 0:00
Volume: 50%

Hit Song Predictor: Amapiano & Afrobeats

Hit Song Predictor: Amapiano & Afrobeats

01
TAGS
Machine-Learning
Machine-Learning
Random-Forest-Model
Random-Forest-Model
Weighted-Ensemble-Model
Weighted-Ensemble-Model
Weighted-Ensemble-Model
Hit Song Predictor: Amapiano & Afrobeats— Personal Project
Hit Song Predictor: Amapiano & Afrobeats— Personal Project

This project explores the musical and cultural dynamics that shape the success of Afrobeats and Amapiano tracks in the digital era. Using a curated dataset enriched with Spotify audio features, TikTok virality scores, streaming velocity, and Billboard chart presence, I developed a custom hit-classification framework tailored to each genre. Through exploratory data analysis and machine learning models—including logistic regression and random forest—I identified which features best predict hit potential. The findings challenge common industry assumptions, suggesting that visibility and shareability often outweigh traditional audio attributes like tempo and duration.

My Primary Motivation?

With Afrobeats and Amapiano gaining mainstream traction on global platforms, I set out to investigate what drives virality in these genres—beyond traditional Western hit-making formulas. This project bridges data science with cultural analytics to explore how engagement signals, community dynamics, and platform-specific trends shape music success in decentralized ecosystems. By designing a genre-aware prediction model rooted in real-world virality metrics, my aim was not only to uncover predictive patterns, but to challenge industry assumptions about what a "hit" sounds like in the digital age.


🔍 Key Highlights
Designed custom hit prediction rules for Afrobeats and Amapiano

Designed custom hit prediction rules for Afrobeats and Amapiano

Engineered a dataset with Spotify audio features, TikTok virality, Billboard chart presence, and streaming velocity to capture real-world performance metrics.

Engineered a dataset with Spotify audio features, TikTok virality, Billboard chart presence, and streaming velocity to capture real-world performance metrics.

Applied logistic regression and random forest models to uncover that visibility and shareability outperform traditional audio features in predicting hit potential.

Applied logistic regression and random forest models to uncover that visibility and shareability outperform traditional audio features in predicting hit potential.

Methodology & Insights

I began by aggregating data from Spotify's API, capturing key audio features like danceability, tempo, valence, and beat strength. To enrich the dataset, I manually tracked TikTok virality, streams-per-day since release, and Billboard Africa chart appearances. I then defined genre-specific hit criteria tailored to Afrobeats and Amapiano, accounting for cultural and platform-driven dynamics. Once labeled, I used logistic regression and random forest models to evaluate predictive performance and extract feature importances.

Lyric Analysis (Genius Integration):

I also scraped and analyzed lyrics from Genius to uncover patterns in language and cultural references, which helped categorize the emotional mood of each song in the dataset. Mood labels—such as"Confident," "Romantic," "Sad," or other emotions—were used as features during exploratory data analysis and model training. Afrobeats hits frequently invoked themes of love, success, and identity, while Amapiano lyrics leaned toward party culture and communal connection. This lyrical layer revealed how narrative tone and emotional resonance contribute to a song’s virality—beyond just its sound structure.

Findings

The models revealed that commonly assumed musical drivers of success—such as tempo, valence, and duration—had minimal standalone predictive power. Instead, external engagement signals like TikTok virality and chart visibility played a dominant role in determining hit potential. Notably, Afrobeats hits tended to show stronger beat presence and higher average popularity scores, while Amapiano hits favored mid-tempo consistency and subtle rhythmic patterns. These insights suggest that in today's digital music ecosystem, shareability and cultural momentum often outweigh traditional musical structure.

Reflection & Future Directions

This project offers a data-driven framework for understanding the cultural and structural dynamics behind song success in Afrobeats and Amapiano. By integrating engagement signals and audio features, the model provides actionable insight into the evolving nature of musical virality. Looking ahead, future work will involve deeper integration of lyrical sentiment analysis, semantic categorization, and temporal trend modeling. Expanding the scope to include other regional genres and incorporating listener behavioral data will further refine the predictive power and generalizability of the model. Ultimately, this research contributes to broader conversations at the intersection of data science, musicology, and digital culture.