PCA on 2D point-clouds

In this script we link the results of PCA with the geometry of 2D point-clouds. Several point-clouds built with random samples are considered:

  1. independent and uncorrelated
  2. independent and correlated
  3. dependent (linear relation) and correlated
  4. dependent and uncorrelated
  5. dependent (nonlinear relation) and correlated

Contents

% Copyright (C) 2018 Juan Pablo Carbajal
%
% This program is free software; you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation; either version 3 of the License, or
% (at your option) any later version.
%
% This program is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
% GNU General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with this program. If not, see <http://www.gnu.org/licenses/>.

% Author: Juan Pablo Carbajal <ajuanpi+dev@gmail.com>
% Created: 2018-06-28

%close all
%clc

% Determine whether we are running GNU Octave
if exist ('OCTAVE_VERSION')
  isOctave = true;
else
  isOctave = false;
end

% Check for packages and functionality
if isOctave
  pkg load statistics % for mvnrnd
  if (! isempty (pkg ("list", "geometry")))
    hasgeometry = true;
    pkg load geometry   % for cov2ellipse
  else
    hasgeometry = false;
  endif
else
  % Check for mvnrnd
  mvnrnd(0, 1, 1);
  % Check for Matgeom functionality
  try
    cov2ellipse(eye(2));
    hasgeometry = true;
  catch
    hasgeometry = false;
  end
end
if ~hasgeometry
  warning ('Matgeom not detected ellipses will not be drawn\n');
end

% #########
% helper functions

% plot cloud points
function plot_pointcloud (xy, str)
  plot (xy(:,1), xy(:,2), 'o', ...
    'color',[0.5 0.5 0.5], 'markersize', 3, 'markerfacecolor', 'auto');
  axis equal
  box off
  set (gca, 'xaxislocation', 'origin', 'yaxislocation', 'origin');

  cf  = corr (xy(:,1), xy(:,2));
  dc  = dcov (xy(:,1), xy(:,2));
  str = sprintf ('%s\nCorr. coeff.: %.2f\nDist. corr.: %.2f\n', str, cf, dc);
  title (str);
end

% plot scaled PCA vectors
function plot_pca (xy, flag)
  [T, mean_xy, P, S, l] = calibratePCA (xy, 2);

  ab    = sqrt (l).'; % length ellipse axes
  theta = rad2deg (atan2 (P(2,1), P(1,1)));
  P     = P .* ab;     % pc scaled by length of ellipse axes

  h = addarrows (mean_xy, P, 'r');
  set (h, 'linewidth', 3, 'maxheadsize', 0.2);
  set (h(2), 'color', 'm');

  if flag
    hold on
    drawEllipse ([mean_xy ab theta], 'color', 'c', 'linewidth', 3)
    hold off
  end
end
% end helper functions
% #########

Point-clouds parameters

n = 1e3;                      % Number of samples in the clouds

Independent and Uncorrelated point-cloud

$$ x \sim \mathcal{N}(0, 1) $$ $$ y \sim \mathcal{N}(0, 1) $$

case_str = 'independent-uncorrelated';

xy_iu = randn (n, 2);
xy_iu = zscore (xy_iu);

figure (1)
  clf
  plot_pointcloud (xy_iu, case_str)
  plot_pca (xy_iu, hasgeometry)
Example_2DPointclouds_PCA-1.png

Independent and Correlated point-cloud

$$ \begin{bmatrix} x & y \end{bmatrix} \sim \mathcal{N}(0, K) $$

case_str = 'independent-correlated';

K_ic  = [1 -0.8; -0.8 1];          % Covariance matrix
xy_ic = mvnrnd ([0 0], K_ic, n);   % Samples
xy_ic = zscore (xy_ic);

figure (2)
  clf
  plot_pointcloud (xy_ic, case_str)
  plot_pca (xy_ic, hasgeometry)
  if hasgeometry
    ellipse = cov2ellipse (K_ic);
    hold on
    drawEllipse (ellipse, 'color', 'b', 'linewidth', 1);
    hold off
  end
Example_2DPointclouds_PCA-2.png

Dependent and Correlated point-cloud

$$ x \sim \mathcal{U}(-1, 1) $$ $$ y = a x + \epsilon \quad \epsilon \sim \mathcal{N}(0, \sigma^2) $$

case_str = 'dependent-correlated, linear';

xy_dc      = zeros (n, 2);
xy_dc(:,1) = 2 * rand (n, 1) - 1;
xy_dc(:,2) = -0.5 * xy_dc(:,1) + 0.05 * xy_iu(:,2);
xy_dc      = zscore (xy_dc);

figure (3)
  clf
  plot_pointcloud (xy_dc, case_str)
  plot_pca (xy_dc, hasgeometry)
Example_2DPointclouds_PCA-3.png

Dependent and Uncorrelated point-cloud

$$ x \sim \mathcal{U}(-1, 1) $$ $$ y = a \frac{\epsilon}{\vert\epsilon\vert} x + \epsilon \quad \epsilon \sim \mathcal{U}(-1, 1) $$

case_str = 'dependent-uncorrelated';

xy_du      = 2 * rand (n, 2) - 1;
xy_du(:,2) = 6 * sign (xy_du(:,2)) .* xy_du(:,1).^2  + xy_du(:,2);
xy_du      = zscore (xy_du);

figure (4)
  clf
  plot_pointcloud (xy_du, case_str)
  plot_pca (xy_du, hasgeometry);
Example_2DPointclouds_PCA-4.png

Dependent and correlated, nonlinear relation and bimodal

case_str = 'dependent-correlated, nonlinear, bimodal';

phi    = pi / 4;
xy_dc2 = xy_du * [cos(phi) -sin(phi); sin(phi) cos(phi)];
dxy    = 2*[1 1];
cloud1 = xy_du(:,2) < 0;
cloud2 = ~cloud1;
xy_dc2(cloud1, :) += dxy;
xy_dc2(cloud2, :) -= dxy;
xy_dc2 = zscore (xy_dc2);

figure (5)
  clf
  plot_pointcloud (xy_dc2, case_str)
  plot_pca (xy_dc2, hasgeometry);
Example_2DPointclouds_PCA-5.png