MATLAB实例:PCA降维

MATLAB实例:PCA降维

作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

1. iris数据

5.1,3.5,1.4,0.2,1
4.9,3.0,1.4,0.2,1
4.7,3.2,1.3,0.2,1
4.6,3.1,1.5,0.2,1
5.0,3.6,1.4,0.2,1
5.4,3.9,1.7,0.4,1
4.6,3.4,1.4,0.3,1
5.0,3.4,1.5,0.2,1
4.4,2.9,1.4,0.2,1
4.9,3.1,1.5,0.1,1
5.4,3.7,1.5,0.2,1
4.8,3.4,1.6,0.2,1
4.8,3.0,1.4,0.1,1
4.3,3.0,1.1,0.1,1
5.8,4.0,1.2,0.2,1
5.7,4.4,1.5,0.4,1
5.4,3.9,1.3,0.4,1
5.1,3.5,1.4,0.3,1
5.7,3.8,1.7,0.3,1
5.1,3.8,1.5,0.3,1
5.4,3.4,1.7,0.2,1
5.1,3.7,1.5,0.4,1
4.6,3.6,1.0,0.2,1
5.1,3.3,1.7,0.5,1
4.8,3.4,1.9,0.2,1
5.0,3.0,1.6,0.2,1
5.0,3.4,1.6,0.4,1
5.2,3.5,1.5,0.2,1
5.2,3.4,1.4,0.2,1
4.7,3.2,1.6,0.2,1
4.8,3.1,1.6,0.2,1
5.4,3.4,1.5,0.4,1
5.2,4.1,1.5,0.1,1
5.5,4.2,1.4,0.2,1
4.9,3.1,1.5,0.1,1
5.0,3.2,1.2,0.2,1
5.5,3.5,1.3,0.2,1
4.9,3.1,1.5,0.1,1
4.4,3.0,1.3,0.2,1
5.1,3.4,1.5,0.2,1
5.0,3.5,1.3,0.3,1
4.5,2.3,1.3,0.3,1
4.4,3.2,1.3,0.2,1
5.0,3.5,1.6,0.6,1
5.1,3.8,1.9,0.4,1
4.8,3.0,1.4,0.3,1
5.1,3.8,1.6,0.2,1
4.6,3.2,1.4,0.2,1
5.3,3.7,1.5,0.2,1
5.0,3.3,1.4,0.2,1
7.0,3.2,4.7,1.4,2
6.4,3.2,4.5,1.5,2
6.9,3.1,4.9,1.5,2
5.5,2.3,4.0,1.3,2
6.5,2.8,4.6,1.5,2
5.7,2.8,4.5,1.3,2
6.3,3.3,4.7,1.6,2
4.9,2.4,3.3,1.0,2
6.6,2.9,4.6,1.3,2
5.2,2.7,3.9,1.4,2
5.0,2.0,3.5,1.0,2
5.9,3.0,4.2,1.5,2
6.0,2.2,4.0,1.0,2
6.1,2.9,4.7,1.4,2
5.6,2.9,3.6,1.3,2
6.7,3.1,4.4,1.4,2
5.6,3.0,4.5,1.5,2
5.8,2.7,4.1,1.0,2
6.2,2.2,4.5,1.5,2
5.6,2.5,3.9,1.1,2
5.9,3.2,4.8,1.8,2
6.1,2.8,4.0,1.3,2
6.3,2.5,4.9,1.5,2
6.1,2.8,4.7,1.2,2
6.4,2.9,4.3,1.3,2
6.6,3.0,4.4,1.4,2
6.8,2.8,4.8,1.4,2
6.7,3.0,5.0,1.7,2
6.0,2.9,4.5,1.5,2
5.7,2.6,3.5,1.0,2
5.5,2.4,3.8,1.1,2
5.5,2.4,3.7,1.0,2
5.8,2.7,3.9,1.2,2
6.0,2.7,5.1,1.6,2
5.4,3.0,4.5,1.5,2
6.0,3.4,4.5,1.6,2
6.7,3.1,4.7,1.5,2
6.3,2.3,4.4,1.3,2
5.6,3.0,4.1,1.3,2
5.5,2.5,4.0,1.3,2
5.5,2.6,4.4,1.2,2
6.1,3.0,4.6,1.4,2
5.8,2.6,4.0,1.2,2
5.0,2.3,3.3,1.0,2
5.6,2.7,4.2,1.3,2
5.7,3.0,4.2,1.2,2
5.7,2.9,4.2,1.3,2
6.2,2.9,4.3,1.3,2
5.1,2.5,3.0,1.1,2
5.7,2.8,4.1,1.3,2
6.3,3.3,6.0,2.5,3
5.8,2.7,5.1,1.9,3
7.1,3.0,5.9,2.1,3
6.3,2.9,5.6,1.8,3
6.5,3.0,5.8,2.2,3
7.6,3.0,6.6,2.1,3
4.9,2.5,4.5,1.7,3
7.3,2.9,6.3,1.8,3
6.7,2.5,5.8,1.8,3
7.2,3.6,6.1,2.5,3
6.5,3.2,5.1,2.0,3
6.4,2.7,5.3,1.9,3
6.8,3.0,5.5,2.1,3
5.7,2.5,5.0,2.0,3
5.8,2.8,5.1,2.4,3
6.4,3.2,5.3,2.3,3
6.5,3.0,5.5,1.8,3
7.7,3.8,6.7,2.2,3
7.7,2.6,6.9,2.3,3
6.0,2.2,5.0,1.5,3
6.9,3.2,5.7,2.3,3
5.6,2.8,4.9,2.0,3
7.7,2.8,6.7,2.0,3
6.3,2.7,4.9,1.8,3
6.7,3.3,5.7,2.1,3
7.2,3.2,6.0,1.8,3
6.2,2.8,4.8,1.8,3
6.1,3.0,4.9,1.8,3
6.4,2.8,5.6,2.1,3
7.2,3.0,5.8,1.6,3
7.4,2.8,6.1,1.9,3
7.9,3.8,6.4,2.0,3
6.4,2.8,5.6,2.2,3
6.3,2.8,5.1,1.5,3
6.1,2.6,5.6,1.4,3
7.7,3.0,6.1,2.3,3
6.3,3.4,5.6,2.4,3
6.4,3.1,5.5,1.8,3
6.0,3.0,4.8,1.8,3
6.9,3.1,5.4,2.1,3
6.7,3.1,5.6,2.4,3
6.9,3.1,5.1,2.3,3
5.8,2.7,5.1,1.9,3
6.8,3.2,5.9,2.3,3
6.7,3.3,5.7,2.5,3
6.7,3.0,5.2,2.3,3
6.3,2.5,5.0,1.9,3
6.5,3.0,5.2,2.0,3
6.2,3.4,5.4,2.3,3
5.9,3.0,5.1,1.8,3

2. MATLAB程序

function [COEFF,SCORE,latent,tsquared,explained,mu,data_PCA]=pca_demo()
x=load(\'iris.data\');
[~,d]=size(x);
k=d-1; %前k个主成分
x=zscore(x(:,1:d-1));  %归一化数据
[COEFF,SCORE,latent,tsquared,explained,mu]=pca(x);
% 1)获取样本数据 X ,样本为行,特征为列。
% 2)对样本数据中心化,得S(S = X的各列减去各列的均值)。
% 3)求 S 的协方差矩阵 C = cov(S)
% 4) 对协方差矩阵 C 进行特征分解 [P,Lambda] = eig(C);
% 5)结束。
% 1、输入参数 X 是一个 n 行 p 列的矩阵。每行代表一个样本观察数据,每列则代表一个属性,或特征。
% 2、COEFF 就是所需要的特征向量组成的矩阵,是一个 p 行 p 列的矩阵,没列表示一个出成分向量,经常也称为(协方差矩阵的)特征向量。并且是按照对应特征值降序排列的。所以,如果只需要前 k 个主成分向量,可通过:COEFF(:,1:k) 来获得。
% 3、SCORE 表示原数据在各主成分向量上的投影。但注意:是原数据经过中心化后在主成分向量上的投影。即通过:SCORE = x0*COEFF 求得。其中 x0 是中心平移后的 X(注意:是对维度进行中心平移,而非样本。),因此在重建时,就需要加上这个平均值了。
% 4、latent 是一个列向量,表示特征值,并且按降序排列。
% 5、tsquared Hotelling的每个观测值X的T平方统计量
% 6、explained 由每个主成分解释的总方差的百分比
% 7、mu 每个变量X的估计平均值
% x= bsxfun(@minus,x,mean(x,1)); data_PCA=x*COEFF(:,1:k); latent1=100*latent/sum(latent);%将latent总和统一为100,便于观察贡献率 pareto(latent1);%调用matla画图 pareto仅绘制累积分布的前95%,因此y中的部分元素并未显示 xlabel(\'Principal Component\'); ylabel(\'Variance Explained (%)\'); % 图中的线表示的累积变量解释程度 print(gcf,\'-dpng\',\'Iris PCA.png\'); iris_pac=data_PCA(:,1:2) ; save iris_pca iris_pac

3. 结果

iris_pca:前两个主成分

-2.25698063306803 0.504015404227653
-2.07945911889541       -0.653216393612590
-2.36004408158421       -0.317413944570283
-2.29650366000389       -0.573446612971233
-2.38080158645275       0.672514410791076
-2.06362347633724       1.51347826673567
-2.43754533573242       0.0743137171331950
-2.22638326740708       0.246787171742162
-2.33413809644009       -1.09148977019584
-2.18136796941948       -0.447131117450110
-2.15626287481026       1.06702095645556
-2.31960685513084       0.158057945820095
-2.21665671559727       -0.706750478104682
-2.63090249246321       -0.935149145374822
-2.18497164997156       1.88366804891533
-2.24394778052703       2.71328133141014
-2.19539570001472       1.50869601039751
-2.18286635818774       0.512587093716441
-1.88775015418968       1.42633236069007
-2.33213619695782       1.15416686250116
-1.90816386828207       0.429027879924458
-2.19728429051438       0.949277150423224
-2.76490709741649       0.487882574439700
-1.81433337754274       0.106394361814184
-2.22077768737273       0.161644638073716
-1.95048968523510       -0.605862870440206
-2.04521166172712       0.265126114804279
-2.16095425532709       0.550173363315497
-2.13315967968331       0.335516397664229
-2.26121491382610       -0.313827252316662
-2.13739396044139       -0.482326258880086
-1.82582143036022       0.443780130732953
-2.59949431958629       1.82237008322707
-2.42981076672382       2.17809479520796
-2.18136796941948       -0.447131117450110
-2.20373717203888       -0.183722323644913
-2.03759040170113       0.682669420156327
-2.18136796941948       -0.447131117450110
-2.42781878392261       -0.879223932713649
-2.16329994558551       0.291749566745466
-2.27889273592867       0.466429134628597
-1.86545776627869       -2.31991965918865
-2.54929404704891       -0.452301129580194
-1.95772074352968       0.495730895348582
-2.12624969840005       1.16752080832811
-2.06842816583668       -0.689607099127106
-2.37330741591874       1.14679073709691
-2.39018434748641       -0.361180775489047
-2.21934619663183       1.02205856145225
-2.19858869176329       0.0321302060908945
1.10030752013391        0.860230593245533
0.730035752246062       0.596636784545418
1.23796221659453        0.612769614333371
0.395980710562889       -1.75229858398514
1.06901265623960        -0.211050862633647
0.383174475987114       -0.589088965722193
0.746215185580377       0.776098608766709
-0.496201068006129      -1.84269556949638
0.923129796737431       0.0302295549588077
0.00495143780650871     -1.02596403732389
-0.124281108093219      -2.64918765259090
0.437265238506424       -0.0586846858581760
0.549792126592992       -1.76666307900171
0.714770518429262       -0.184815166484382
-0.0371339806719297     -0.431350035919633
0.872966018474250       0.508295314415273
0.346844440799832       -0.189985178614466
0.152880381053472       -0.788085297090142
1.21124542423444        -1.62790202112846
0.156417163578196       -1.29875232891050
0.735791135537219       0.401126570248885
0.470792483676532       -0.415217206131680
1.22388807504403        -0.937773165086814
0.627279600231826       -0.415419947028686
0.698133985336190       -0.0632819273014206
0.870620328215835       0.249871517845242
1.25003445866275        -0.0823442389434431
1.35370481019450        0.327722365822153
0.659915359649250       -0.223597000167979
-0.0471236447211597     -1.05368247816741
0.121128417400412       -1.55837168956507
0.0140710866007487      -1.56813894313840
0.235222818975321       -0.773333046281646
1.05316323317206        -0.634774729305402
0.220677797156699       -0.279909968621073
0.430341476713787       0.852281697154445
1.04590946111265        0.520453696157683
1.03241950881290        -1.38781716762055
0.0668436673617666      -0.211910813930204
0.274505447436587       -1.32537578085168
0.271425764670620       -1.11570381243558
0.621089830946741       0.0274506709978046
0.328903506457842       -0.985598883763833
-0.372380114621411      -2.01119457605980
0.281999617970590       -0.851099454545845
0.0887557702224096      -0.174324544331148
0.223607676665854       -0.379214256409087
0.571967341693057       -0.153206717308028
-0.455486948803962      -1.53432438068788
0.251402252309636       -0.593871222060355
1.84150338645482        0.868786147264828
1.14933941416981        -0.698984450845645
2.19898270027627        0.552618780551384
1.43388176486790        -0.0498435417617587
1.86165398830779        0.290220535935809
2.74500070081969        0.785799704159685
0.357177895625210       -1.55488557249365
2.29531637451915        0.408149356863061
1.99505169024551        -0.721448439846371
2.25998344407884        1.91502747107928
1.36134878398531        0.691631011499905
1.59372545693795        -0.426818952656741
1.87796051113409        0.412949339203311
1.24890257443547        -1.16349352357816
1.45917315700813        -0.442664601834978
1.58649439864337        0.674774813132046
1.46636772102851        0.252347085727036
2.42924030093571        2.54822056527013
3.29809226641255        -0.00235343587272177
1.24979406018816        -1.71184899071237
2.03368323142868        0.904369044486726
0.970663302005081       -0.569267277965818
2.88838806680663        0.396463170625287
1.32475563655861        -0.485135293486995
1.69855040646181        1.01076227706927
1.95119099025002        0.999984474306318
1.16799162725452        -0.317831851008113
1.01637609822602        0.0653241212065782
1.78004554289349        -0.192627479858818
1.85855159177699        0.553527164026207
2.42736549094542        0.245830911619345
2.30834922706014        2.61741528404554
1.85415981777379        -0.184055790370030
1.10756129219332        -0.294997832217552
1.19347091639304        -0.814439294423699
2.79159729280499        0.841927657717863
1.57487925633390        1.06889360300461
1.34254676764379        0.420846092290459
0.920349720485088       0.0191661621187343
1.84736314547313        0.670177571688802
2.00942543830962        0.608358978317639
1.89676252747561        0.683734258412757
1.14933941416981        -0.698984450845645
2.03648602144585        0.861797777652503
1.99500750598298        1.04504903502442
1.86427657131500        0.381543630923962
1.55328823048458        -0.902290843047121
1.51576710303099        0.265903772450991
1.37179554779330        1.01296839034343
0.956095566421630       -0.0222095406309480

累计贡献率

MATLAB实例:PCA降维

可见:前两个主成分已经占了95%的贡献程度。这两个主成分可以近似表示整个数据。

4. pca_data.m

其中normlization.m见MATLAB实例:聚类初始化方法与数据归一化方法

function data=pca_data(data, choose)
% PCA降维,保留90%的特征信息 
data = normlization(data, choose); %归一化
score = 0.90; %保留90%的特征信息
[num,dim] = size(data);
xbar = mean(data,1);
means = bsxfun(@minus, data, xbar);
cov = means\'*means/num;
[V,D] = eig(cov);
eigval = diag(D);
[~,idx] = sort(eigval,\'descend\');
eigval = eigval(idx);
V = V(idx,:);
p = 0;
for i=1:dim
   perc = sum(eigval(1:i))/sum(eigval);
   if perc > score
       p = i;
       break;       
   end 
end
E = V(1:p,:);
data= means*E\';

参考:

Junhao Hua. Distributed Variational Bayesian Algorithms. Github, 2017.

MATLAB实例:PCA(主成成分分析)详解