This project explores the musical and cultural dynamics that shape the success of Afrobeats and Amapiano tracks in the digital era. Using a curated dataset enriched with Spotify audio features, TikTok virality scores, streaming velocity, and Billboard chart presence, I developed a custom hit-classification framework tailored to each genre. Through exploratory data analysis and machine learning models—including logistic regression and random forest—I identified which features best predict hit potential. The findings challenge common industry assumptions, suggesting that visibility and shareability often outweigh traditional audio attributes like tempo and duration.
My Primary Motivation?
With Afrobeats and Amapiano gaining mainstream traction on global platforms, I set out to investigate what drives virality in these genres—beyond traditional Western hit-making formulas. This project bridges data science with cultural analytics to explore how engagement signals, community dynamics, and platform-specific trends shape music success in decentralized ecosystems. By designing a genre-aware prediction model rooted in real-world virality metrics, my aim was not only to uncover predictive patterns, but to challenge industry assumptions about what a "hit" sounds like in the digital age.
🔍 Key Highlights
Methodology & Insights
I began by aggregating data from Spotify's API, capturing key audio features like danceability, tempo, valence, and beat strength. To enrich the dataset, I manually tracked TikTok virality, streams-per-day since release, and Billboard Africa chart appearances. I then defined genre-specific hit criteria tailored to Afrobeats and Amapiano, accounting for cultural and platform-driven dynamics. Once labeled, I used logistic regression and random forest models to evaluate predictive performance and extract feature importances.
Lyric Analysis (Genius Integration):
I also scraped and analyzed lyrics from Genius to uncover patterns in language and cultural references, which helped categorize the emotional mood of each song in the dataset. Mood labels—such as"Confident," "Romantic," "Sad," or other emotions—were used as features during exploratory data analysis and model training. Afrobeats hits frequently invoked themes of love, success, and identity, while Amapiano lyrics leaned toward party culture and communal connection. This lyrical layer revealed how narrative tone and emotional resonance contribute to a song’s virality—beyond just its sound structure.
Findings
The models revealed that commonly assumed musical drivers of success—such as tempo, valence, and duration—had minimal standalone predictive power. Instead, external engagement signals like TikTok virality and chart visibility played a dominant role in determining hit potential. Notably, Afrobeats hits tended to show stronger beat presence and higher average popularity scores, while Amapiano hits favored mid-tempo consistency and subtle rhythmic patterns. These insights suggest that in today's digital music ecosystem, shareability and cultural momentum often outweigh traditional musical structure.
Reflection & Future Directions
This project offers a data-driven framework for understanding the cultural and structural dynamics behind song success in Afrobeats and Amapiano. By integrating engagement signals and audio features, the model provides actionable insight into the evolving nature of musical virality. Looking ahead, future work will involve deeper integration of lyrical sentiment analysis, semantic categorization, and temporal trend modeling. Expanding the scope to include other regional genres and incorporating listener behavioral data will further refine the predictive power and generalizability of the model. Ultimately, this research contributes to broader conversations at the intersection of data science, musicology, and digital culture.