You are a DJ or you like music, especially remixes and mashups? That’s perfect, because this article is made for you!
Photo by Gabriel Barletta on Unsplash
In this article we will create the following song using AI:
Not too bad, right? But how do we get there?
First, we will download the particular songs used in the remix above. These are Never Forget You by Zara Larsson and Meant For You by WildVibes.
To download those I used a YouTube converter.
Since I am a big fan of Google Colab, I wrote the code using a Google Colab notebook. I will share a link to that notebook at the end of this article. To get access to the two songs within the notebook, I first uploaded them to Google Drive.
To access the songs saved to Google Drive from our notebook, we need to run the following code:
from google.colab import drivedrive.mount(‘/content/gdrive’)
The first time you do this, a prompt window will ask you to grant access to your Google Drive.
Now, it’s time to install ffmpeg by using the following command:
!apt -qq install ffmpeg
ffmpeg is required to be able to install spleeter. The spleeter module will be responsible for most of the magic in this article. With spleeter it is very easy to split a song into vocals (singing voice) and accompaniment (a cappella). Theoretically, you can also use spleeter to split a song into vocals, drums, bass, piano and other separations. But we will split our two songs into vocals (singing voice) and accompaniment only. To do this, we just need to run the following command:
!spleeter separate -o output/ /content/gdrive/MyDrive/ai_remix/never_forget_you.mp3
We will do the same for the Meant For You song:
!spleeter separate -o output/ /content/gdrive/MyDrive/ai_remix/meant_for_you.mp3
You may wonder how spleeter separates vocals and accompaniment. The field of study behind this is called Audio Source Separation. Essentially, the goal is to train AI models to separate a mixture into isolated sounds from individual sources (e.g., the human voice). DJs already use such AI models to separate vocals from tracks, for example, and create new remixes with them. However, AI models will continue to improve, making them even more valuable for DJs. In the future I will write more detailed about Audio Source Separation and how it works. Feel free to follow me so you don’t miss these articles. :-)
Now that we have both songs separated into vocals and accompaniment, we can start remixing.
To do this, we must first install the pydub module with the following command:
!pip install pydub
The pydub module helps us to overlay both songs with each other. Since Never Forget You has 146 beats per minute while Meant For You has 128 BPM, I’m using the following method to adjust the speed of the vocals of Never Forget You:
from pydub import AudioSegment# Credits to Abhi Krishnan:
#https://stackoverflow.com/questions/51434897/how-to-change-audio-playback-speed-using-pydubdef speed_change(sound, speed=1.0):# Manually override the frame_rate. This tells the computer how many
# samples to play per secondsound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
“frame_rate”: int(sound.frame_rate * speed)
})# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
Let’s decrease the speed of Never Forget You:
from pydub import AudioSegmentvocals = AudioSegment.from_file(“output/never_forget_you/vocals.wav”, format=”wav”)vocals_slower = speed_change(vocals, 0.95)
This doesn’t lower the speed of Never Forget You to the level of Meant For You, but I felt that both songs sounded best together with this ratio (feel free to try and adjust it yourself).
So, it’s finally time to overlay the vocals Never Forget You with the accompaniment of Meant For You:
acappella = AudioSegment.from_file(“output/meant_for_you/accompaniment.wav”, format=”wav”)ai_remix = acappella.overlay(vocals_slower[12000:], position=2000)ai_remix = ai_remix[1500:]ai_remix.export(“ai_remix.mp3”, format=”mp3")
As you can see, I used the overlay function of the pydub module to overlay both samples. I also arranged and trimmed the samples a bit to make both samples fit together better. pydub manages sounds in milliseconds, so 12,000 equals 12 seconds.
That’s it! This is how I created the song you already listened to at the beginning of the article.
Feel free to subscribe, as I will be covering Audio Source Separation in future articles and explaining in more detail how such models actually work.
Link to the Colab Notebook: ai_remix.ipynb
Final Thoughts
I hope you enjoyed this article. I will publish more articles about Deep Learning related topics in the future. I also write about topics in the field of Data Science and Data Engineering.
Isn’t collaboration great? I’m always happy to answer questions or discuss ideas proposed in my articles. So don’t hesitate to reach out to me! 🙌 Also, make sure to subscribe or follow to not miss out on new articles.