Week 2— Generating Music by using Deep Learning

b21626972
BBM406 Spring 2021 Projects
3 min readApr 18, 2021

--

Hello, this is our second blog post for our term project of BBM406 Fundamentals of Machine Learning. This week, we examined the models and evaluation metrics used in music generation using deep learning.

Methods

There are two main methods for symbolic music data which are RNN based and CNN based methods.

First approach for automatic music generation is based on the Long Short Term Memory (LSTM) model. Long short-term memory is a Recurrent Neural Network architecure used in various domains in the field of deep learning. At each time stamp, an amplitude value is fed into the Long Short Term Memory cell.

LSTM cell

There are several types of usage for LSTM. These are one to one, one to many, many to one and many to many networks. We will use the many to many network approach for our LSTM model music generation problem.

Second approach for automatic music generation is based on the Convolutional Neural Network (CNN) model. There are several CNN models proposed for music generation. One of them is WaveNet architecture. This method uses Causal Dilated 1D Convolutional Neural Network. The main reason is that using Dilated Dilated Convolutional Neural Network is increasing the influence of input for a given node. For a better understanding, we can examine the picture below.

The other CNN based method for music generation is Deep Convolutional Generative Adversarial Neural Network (DCGAN). GAN’s are very popular recently for generative methods. There are several studies for music generation using GAN’s. MidiNet is one of them. They use 13-dimensional chord representation.

Model is composed of Generator CNN and Discriminator CNN. The basic architecture of MidiNet is shown below.

Evaluation

To evaluate the quality of the model, we may ask volunteer participants to share their opinions on the generated music. Another evaluation metric is based on probabilistic measures such as likelihood and density estimation has been used in tasks such as image generation and music generation. The recurrent model used in the study is trained with the goal of maximizing the log-likelihood of each training sequence. The used probabilistic measures provide objective information. However, there exist bad samples with very high likelihoods.

See you in next blog post.

--

--