Hiphopology is a text mining tool that handles data for the partial discographies of my top 50 hip-hop artists. Some of my favourite artists had to be discluded due to the strict criteria I was following to maintain a constant between the datasets for each artists (eg. every artists included in this list must have at least 5 full length projects released). The data is mined from all the verses (lyrics in bridges, refrains, hooks, choruses, feature verses, etc. were not taken into account) on each project from each artist. This tool is not meant to compare artists, only to better understand the work and process of each artist as an individual.
This project was inspired by the top 50 lists that became popular on Twitter in 2019, after one list went viral and became a trending topic of conversation (shout out Joe, Rory, Mal & Parks). I started to wonder what my personal top 50 list would look like. I thought about what my priorities would be; would I rank them by writing ability, wordplay, punchlines, beat selection, story telling, etcetera. This became an impossible decision, so I thought I'd design an algorithm and let the computer decide how to rank my top 50.
I had previously worked on a hip hop analytics project called "Artist vs. Artist" that ran basic text mining on rap lyrics by scraping them from lyric sites, though I wasn't very happy with this project as the data always seemed to skew inaccurate. As I finished building "Artist vs. Artist" I also came to the realization that artists should not be pitted against each other, especially with arbitrary statistics that don't take in to account the actual sound of the music. This lead me to create an analytics platform with the goal of quantifying lyrical prowess, and doing so in a way that didn't compare or rank artists with one another, for this reason Hiphopology only supplies you with the artists stats, and whether they're below or above average. Averages are only taken from the 50 top tier rappers on this list, so an artist's stats being below average does not mean they aren't talented by any means.
The statistics shown here should not be used for any serious scientific purpose, they're anecdotal, fun, and interesting at best. That being said, I spent over 80 hours cleaning up and combing through the datasets by hand to ensure accuracy, as this project would be completely useless if I couldn't stand by my numbers. I should mention I am not a statistician, and the majority of my statistics knowledge was learned during the build of this project. I did consult a stats & maths major on my numbers, however.
If you're a dork like me and you want more information on the science and tech behind this project, or if you have questions, comments, complaints: email [email protected]
Follow me on Instagram to keep up with my future projects. damienstewart.me