Cocktail Party Problem
Imagine being in a party. Like in every party, the music is loud and everyone is screaming. Nevertheless, you can hear (almost) clearly the person that you are talking with. This capacity to focus your attention on a conversation in a noisy environment is what the psychoacousticians call the cocktail party effect. This effect becomes a problem when the person is unable to focus its attention on a particular stimulus. The cocktail party problem has first been studied in the early 1950s in the context of air traffic controls. Since then, this problem grew beyond the realm of psychology and we can use computational methods to solve it.
Here, we are going to present the most commonly used approach, named Independent Component Analysis (ICA), through the implementation of the FastICA algorithm.
ICA
To solve the cocktail party problem, we have to isolate the sounds coming from the different sources. More formally, we have to separate multivariate signals into multiple independent signals. When using ICA, we are looking for independent components that will maximize the statistical independence of the estimated components. So, how can we force the estimated components to be as independent as possible ?
There are two approaches: we can try to minimize the mutual information of the components or, we can try to maximize the non-Gaussianity of the components. Here, we are going to focus on the second approach as it is the one used in FastICA.
Non-Gaussianity can be defined as a high/low level of spikiness of a distribution (here represented by an audio signal). The degree of spikiness of a distribution can be measured by using the normalized form of the fourth central moment of the distribution, also called Kurtosis. It is defined by this formula:
If we assume that is a zero mean and unit variance variable, kurtosis becomes:
Kurtosis measure is positive if the distribution is spikier than Gaussian and negative if it is flatter. The non-Gaussianity can therefore be measured by computing the absolute value of the Kurtosis measure.

Gaussian distribution (blue) compared to distribution with different Kurtosis measure (red)
However, this measure is extremely sensitive to outliers and is not a robust way to determine the non-Gaussianity of a distribution.
According to the Central Limit Theorem[1], when independent random variables are added, their normalized sum tends toward a Gaussian distribution. We can use this theorem to define the Gaussianity of a distribution in terms of entropy, the distribution with the highest entropy being the Gaussian distribution. The entropy of a random variable with the density function is defined as follow:
Based on the property of the Gaussian distribution in terms of entropy and on the above definition of the entropy, we can define the negentropy as follow:
where is a Gaussian variable with the same variance as .
Gaussian distribution being the distribution with the highest entropy, > with the assumption that is not Gaussian (which is the case in the cocktail party problem). Therefore, to find the most non-Gaussian we have to maximize . However, in practice we canβt directly maximize the negentropy as it is computationally expensive. Instead, we try to maximize an approximation of the negentropy. The negentropy approximation used by FastICA is the following:
where is a non-quadratic function, a zero mean and unit variance variable and a Gaussian zero mean and unit variance variable. In the case of FastICA, .
Moreover, since is contant, we just have to maximize in order to maximize . Thatβs where FastICA comes into play !
FastICA
Let be the input data matrix of size x , a weight vector of size and the number of components. We want to find the matrix of size x that will project onto independent component (i.e. matrix with independent components).
For FastICA to work we need to make sure that . We accomplish this by centering and whitening the input data matrix . To center , we subtract to each of its column the mean value of the column:
To whiten , we perform an eigenvalue decomposition on the covariance matrix of the centered :
where is the diagonal matrix of eigenvalues and the matrix of eigenvectors.
In this setting we have:
The problem is now to find the unit vector so that maximizes the non-Gaussianity. We can achieve that by using an approximative iteration of Newtonβs Method defined by:
To do so, we have to compute the gradient and the derivative of the gradient of according to . We have:
For the reasons explained on page 188 of [2], we cannot use directly this gradient. We have to add multiplied by a constant . We obtain:
The approximative Newtonβs iteration becomes:
With the assumption that has been centered and whitened and that , it can be simplified to:
We now have a one-unit algorithm to solve the cocktail party problem. But, we donβt want to estimate only one independent component but multiple ones. A method, is to add after the Newtonβs iteration a deflationary orthogonalization step defined by:
More details can be found on page 192-194 of [2].
Algorithm
The complete algorithm will then look like this:
- Center the matrix
- Whiten the matrix
- Create a matrix that will contain all the
- Repeat steps C times
4.1 Create a random vector of weight of size
4.2 Perform a newton iteration on
4.3 Perform a deflationary orthogonalization on using the current
4.4 Normalize
4.5 If is still changing go back to step 4.2
4.6 Put in - Compute and return
Implementation
The implementation of the above algorithm is straightforward using Python and the Numpy library.
import numpy as np
# Definition of G'
def Gprime(x):
return np.tanh(x)
# Definition of G''
def Gsecond(x):
return np.ones(x.shape) - np.power(np.tanh(x), 2)
# Center matrix X
def centerMatrix(X, N):
mean = X.mean(axis=1)
M = X - (mean.reshape((N, 1)) @ np.ones((1, X.shape[1])))
return M
# Whiten matrix X with eigenvalue decomposition
def whitenMatrix(X):
D, E = np.linalg.eigh(X @ X.T)
DE = np.diag(1/np.sqrt(D + 1e-5)) @ E.T
return DE @ X
# One-unit algorithm step
def oneUnit(X, wp):
term1 = np.mean((X @ Gprime(wp @ X).T), axis=1)
term2 = np.mean(Gsecond(wp @ X), axis=1) * wp
return term1 - term2
# Deflationary orthogonalization
def orthogonalize(W, wp, i):
return wp - ((wp @ W[:i,:].T) @ W[:i,:])
# wp normalization
def normalize(wp):
return wp / np.linalg.norm(wp)
def fastICA(X, C, max_iter):
N = X.shape[0]
X = centerMatrix(X, N)
X = whitenMatrix(X)
W = np.zeros((C, N))
for i in range(C):
wp = np.random.rand(1, N)
for _ in range(max_iter):
wp = oneUnit(X, wp)
wp = orthogonalize(W, wp, i)
wp = normalize(wp)
W[i,:] = wp
return W @ X
Results
Letβs now load some audio samples and see the results. Here we are going to use a dataset containing sounds coming from 9 different sources. You can find the data, the complete code and the results on my github.
from scipy.io import wavfile
import matplotlib.pyplot as plt
import numpy as np
import os
# Function to load the audio data
def loadData(dataPath, nsources=9, size=50000):
data = np.empty((nsources, size), np.float32)
for i in range(nsources):
fs, data[i,:] = wavfile.read(os.path.join(dataPath, 'mix') + str(i+1) + '.wav')
return fs, data
# Function to display the audio spectrum
def displayData(X):
C = X.shape[0]
for i in range(C):
plt.subplot(C, 1, i + 1)
plt.plot(X[i,:])
plt.show()
# Function to save the audio data separated by fastica
def saveData(saveDataPath, fs, X):
for i in range(X.shape[0]):
wavfile.write(os.path.join(saveDataPath, 'ica_') + str(i + 1) + ".wav", fs, X[i,:])
dataPath = "" # path to the folder containing the mix audio sounds
saveDataPath = "" # path to the folder that will contain the results of the fastICA
nComponents = 9 # number of components that we want
if not os.path.exists(saveDataPath):
os.makedirs(saveDataPath)
fs, X = loadData(dataPath)
displayData(X)
S = fastICA(X, nComponents)
displayData(S)
saveData(saveDataPath, fs, S)

Data matrix before (left) and after (right) FastICA
References
[1] MIT OpenCourseWare on Central Limit Theorem and the Law of Large Numbers by Jeremy Orloff and Jonathan Bloom
[2] Independent Component Analysis by Aapo Hyvarinen, Juha Karhunen and Erkki Oja