欢迎访问 生活随笔!

生活随笔

当前位置: 首页 > 编程资源 > 编程问答 >内容正文

编程问答

homework5_ZhankunLuo

发布时间:2023/12/31 编程问答 40 豆豆
生活随笔 收集整理的这篇文章主要介绍了 homework5_ZhankunLuo 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

Zhankun Luo

PUID: 0031195279

Email: luo333@pnw.edu

Fall-2018-ECE-59500-009

Instructor: Toma Hentea

文章目录

  • Homework 5
    • Function
      • plot_point.m
      • Calc_SwSbSm.m
      • J3.m
      • FDR.m
    • Problem
      • Problem 5.1
        • Proof of $E[z] = \lVert \mu \rVert^2$
        • Proof of $\sigma_{z}^2 = \lVert \mu \rVert^2 l$
        • Proof of the probability of error $P_e$
          • Gram–Schmidt process
          • Preparing work
          • Calculate Probability
          • Calculate $P_e$
    • Computer Experiment
      • Experiment 5.2
        • meaning of S_w, S_b, S_m, J3
        • code: experiment5_2.m
        • result
        • conclusion
      • Experiment 5.4
        • Theory of Fisher’s Linear Discriminant
        • code: experiment5_4.m
        • result
        • conclusion

Homework 5

Function

plot_point.m

function plot_point(X,y) %this function can handle up to 6 different classes [l,N]=size(X); %N=no. of data vectors, l=dimensionality if(l~=2)fprintf('NO PLOT CAN BE GENERATED\n')return elsepale=['ro';'g+';'b.';'y.';'m.';'c.']; %Plot of the data vectorshold onfor i=1:Nplot(X(1,i),X(2,i),pale(y(i),:))end hold off end

Calc_SwSbSm.m

function [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y ) % [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y ) % Calculate S_w, S_b, S_m % OUTPUT: % S_w: the within-class % S_b: the between-class % S_m: the mixture Sm = Sw + Sb c = max(y); % number of classes [l, N] = size(X); % N: number of vectors, l: dimensions mu = zeros(l, c); S_w = zeros(l, l); S_b = zeros(l, l); mu_0 = zeros(l, 1); P = zeros(1, c); for i = 1:cindex_class_i = find(y == i);Mu = sum(X(:, index_class_i), 2) / length(index_class_i);mu(:, i) = Mu; mu_0 = mu_0 + sum(X(:, index_class_i), 2) / N;P(i) = length(index_class_i) / N;X_relative = X(:, index_class_i) - repmat(Mu, 1, length(index_class_i));S_wi = zeros(l, l);for j = 1:length(index_class_i)S_wi = S_wi + X_relative(:, j) * X_relative(:, j)';endS_w = S_w + S_wi / N; end for i = 1:cS_b = S_b + P(i) * (mu(:, i) - mu_0) * (mu(:, i) - mu_0)'; end S_m = S_w + S_b;

J3.m

function J3 = J3(S_w, S_m) J3 = trace(S_w \ S_m); end

FDR.m

function [ FDR, w ] = FDR( X, y , D_y) %function [ FDR, w ] = FDR( X, y , D_y) % Fisher's Discriminant Ratio % INPUT: % X: points % y: y==i ==> belong to Class i % D_y: dimension of w, how many features Z_i = w_i'* X need to classify % OUTPUT: % FDR: trace((w * S_w * w') \ (w * S_b * w')) % w: use w ==> make Z = w'* X => calculate tr(S_w \ S_b) of Z % <=> maximize FDR of X [ S_w, S_b, S_m ] = Calc_SwSbSm( X, y ); [ Vector, Diag ] = eig( S_w \ S_b ); vector = fliplr(Vector); % make highest eig show first w = vector(:, 1:D_y); % select D_y vectors corresponding to D_y highest eig values FDR = trace((w'* S_w * w) \ (w'* S_b * w)); end

Problem

Problem 5.1

Both classes, ω1\omega_1ω1 , ω2\omega_2ω2 , are described by Gaussian distributions of
the same covariance matrix Σ=I\Sigma = IΣ=I , where III is the identity matrix and mean values
μ\muμ and −μ- \muμ, respectively, where:
μ=[μ1,...,μl]T=[1,12,...,1l]Tbl=∥μ∥=∑i=1lμi2\mu = [\mu_1, ..., \mu_l]^T = [1, \frac{1}{\sqrt{2}}, ... , \frac{1}{\sqrt{l}}]^T\\ b_l = \lVert \mu \rVert = \sqrt{\sum_{i=1}^{l}{\mu_i^2}} μ=[μ1,...,μl]T=[1,21,...,l1]Tbl=μ=i=1lμi2
Define:
z≡xTμ,wherex=[x1,...,xl]Tz \equiv x^T\mu, where \ x = [x_1, ..., x_l]^T zxTμ,where x=[x1,...,xl]T

Proof of E[z]=∥μ∥2E[z] = \lVert \mu \rVert^2E[z]=μ2

Forx∈ω1:E[z]=E[xTμ]=E[xT]μ=μTμ=∥μ∥2Forx∈ω2:E[z]=E[xTμ]=E[xT]μ=−μTμ=−∥μ∥2For \ x \in \omega_1: E[z] = E[x^T\mu] = E[x^T]\mu = \mu^T\mu = \lVert \mu \rVert^2\\ For \ x \in \omega_2: E[z] = E[x^T\mu] = E[x^T]\mu = - \mu^T\mu = - \lVert \mu \rVert^2 For xω1:E[z]=E[xTμ]=E[xT]μ=μTμ=μ2For xω2:E[z]=E[xTμ]=E[xT]μ=μTμ=μ2

Proof of σz2=∥μ∥2l\sigma_{z}^2 = \lVert \mu \rVert^2 lσz2=μ2l

Forx∈ω1:For \ x \in \omega_1:For xω1:
σz2=E[(x−μ)2∥μ∥2]=∥μ∥2∭−∞∞(x−μ)21(2π)l2Σ12exp(−12(x−μ)TΣ−1(x−μ))dx=∥μ∥2∭−∞∞∑i=1l(xi−μi)21(2π)l2exp(−12∑j=1l(xj−μj)2)dx=∥μ∥2∑i=1l∭−∞∞(xi−μi)21(2π)l2exp(−12∑j=1l(xj−μj)2)dx1...dxl=∥μ∥2∑i=1l∫−∞∞(xi−μi)21(2π)12exp(−12(xi−μi)2)dxi∏j≠i,j=1l∫−∞+∞1(2π)12exp(−12(xj−μj)2)dxj=∥μ∥2∑i=1l1⋅1(l−1)=∥μ∥2l\sigma_{z}^2 = E[(x-\mu)^2\lVert \mu \rVert^2]\\ = \lVert \mu \rVert^2\iiint_{-\infty}^{\infty}(x-\mu)^2 \frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)) dx\\ =\lVert \mu \rVert^2 \iiint_{-\infty}^{\infty}\sum_{i=1}^{l}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \iiint_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}\sum_{j=1}^{l}(x_j-\mu_j)^2) dx_1...dx_l\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} \int_{-\infty}^{\infty}(x_i-\mu_i)^2 \frac{1}{(2\pi)^{\frac{1}{2}}} exp(-\frac{1}{2}(x_i-\mu_i)^2) dx_i \prod_{j\neq i, j =1}^{l}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(x_j-\mu_j)^2)dx_j\\ =\lVert \mu \rVert^2 \sum_{i=1}^{l} 1\cdot 1^{(l-1)}\\ =\lVert \mu \rVert^2 l σz2=E[(xμ)2μ2]=μ2(xμ)2(2π)2lΣ211exp(21(xμ)TΣ1(xμ))dx=μ2i=1l(xiμi)2(2π)2l1exp(21j=1l(xjμj)2)dx=μ2i=1l(xiμi)2(2π)2l1exp(21j=1l(xjμj)2)dx1...dxl=μ2i=1l(xiμi)2(2π)211exp(21(xiμi)2)dxij̸=i,j=1l+(2π)211exp(21(xjμj)2)dxj=μ2i=1l11(l1)=μ2l
Forx∈ω2:For \ x \in \omega_2:For xω2:
σz2=E[(x+μ)2∥μ∥2]=∥μ∥2l\sigma_{z}^2 = E[(x+\mu)^2\lVert \mu \rVert^2] = \lVert \mu \rVert^2 l σz2=E[(x+μ)2μ2]=μ2l

Proof of the probability of error PeP_ePe

Gram–Schmidt process

Set:
α1=μ,α2=[0,1,0,...,0]T,α3=[0,0,1,0,...,0]T,αl=[0,...,0,1]TThesevectorsareLinearlyIndependent\alpha_1 = \mu,\alpha_2 = [0, 1, 0, ..., 0]^T, \alpha_3 = [0, 0, 1, 0, ..., 0]^T, \alpha_l = [0,..., 0, 1]^T\\ These \ vectors \ are \ Linearly \ Independent α1=μ,α2=[0,1,0,...,0]T,α3=[0,0,1,0,...,0]T,αl=[0,...,0,1]TThese vectors are Linearly Independent
Then:
β1=α1=μβ2=α1−&lt;α2,β1&gt;&lt;β1,β1&gt;β1...βi=αi−&lt;αi,β1&gt;&lt;β1,β1&gt;β1−...−&lt;αi,βi−1&gt;&lt;βi−1,βi−1&gt;βi−1(i=2,...l)\beta_1 = \alpha_1 = \mu\\ \beta_2 = \alpha_1- \frac{&lt;\alpha_2, \beta_1&gt;}{&lt;\beta_1, \beta_1&gt;}\beta_1\\ ...\\ \beta_i = \alpha_i- \frac{&lt;\alpha_i, \beta_1&gt;}{&lt;\beta_1, \beta_1&gt;}\beta_1-...-\frac{&lt;\alpha_i, \beta_{i-1}&gt;}{&lt;\beta_{i-1}, \beta_{i-1}&gt;}\beta_{i-1} \ \ (i =2,...l) β1=α1=μβ2=α1<β1,β1><α2,β1>β1...βi=αi<β1,β1><αi,β1>β1...<βi1,βi1><αi,βi1>βi1  (i=2,...l)
Normalize:
e1=β1∥μ∥=μblei=βi∥βi∥(i=1,...,l)e_1 = \frac{\beta_1}{\lVert \mu \rVert}= \frac{\mu}{b_l}\\ e_i = \frac{\beta_i}{\lVert \beta_i \rVert}\ \ (i =1,...,l) e1=μβ1=blμei=βiβi  (i=1,...,l)

Preparing work

Set:
P=[e1,...,el]T&lt;ei,ej&gt;={0ifi≠j1ifi=jPPT=[e1,...,el]T[e1,...,el]=IlPisanorthogonalmatrix,P−1=PT,PTP=Il,∣det(P)∣=1P = [e_1, ..., e_l]^T\\ &lt;e_i, e_j&gt; = \left\{ \begin{array}{lcl} 0\quad &amp;if \quad i\neq j \\ 1\quad &amp;if \quad i=j \end{array} \right. \\ PP^T= [e_1, ..., e_l]^T [e_1, ..., e_l] = I_l\\ P \ is \ an \ orthogonal \ matrix, P^{-1}=P^T, P^TP=I_l, |det(P)| =1 \\ P=[e1,...,el]T<ei,ej>={01ifi̸=jifi=jPPT=[e1,...,el]T[e1,...,el]=IlP is an orthogonal matrix,P1=PT,PTP=Il,det(P)=1
Apply P to (x−μ)(x-\mu)(xμ)
P(x−μ)=[e1,e2,...,el]T(x−μ)=[e1Tx,...,elTx]T−[bl,0,...,0]Tbecause,eiTμ=eiT(ble1)={0ifi≠1blifi=1definey=[y1,...,yl]T≡[e1Tx,...,elTx]TthusP(x−μ)=[y1−bl,y2,...,yl]TP(x-\mu) = [e_1, e_2, ..., e_l]^T (x-\mu) = [e_1^Tx, ..., e_l^Tx]^T - [b_l, 0, ..., 0]^T\\ because, \quad e_i^T\mu = e_i^T(b_l e_1)= \left\{ \begin{array}{lcl} 0\quad &amp;if \quad i\neq 1 \\ b_l\quad &amp;if \quad i=1 \end{array} \right. \\ define \quad y = [y_1, ..., y_l]^T \equiv [e_1^Tx, ..., e_l^Tx]^T\\ thus \quad P(x-\mu) = [y_1 - b_l, y_2, ..., y_l]^T P(xμ)=[e1,e2,...,el]T(xμ)=[e1Tx,...,elTx]T[bl,0,...,0]Tbecause,eiTμ=eiT(ble1)={0blifi̸=1ifi=1definey=[y1,...,yl]T[e1Tx,...,elTx]TthusP(xμ)=[y1bl,y2,...,yl]T

Calculate Probability

The probability density function for (x1,...,xl)(x_1, ..., x_l)(x1,...,xl) is given by, (Σ=Il\Sigma=I_lΣ=Il):

class ω1\omega_1ω1:
p(x;μ,Σ)=1(2π)l2Σ12exp(−12(x−μ)TΣ−1(x−μ))=1(2π)l2exp(−12(x−μ)T(x−μ))=1(2π)l2exp(−12(x−μ)TPTP(x−μ))=1(2π)l2exp(−12[y1−bl,y2,...,yl][y1−bl,y2,...,yl]T)=1(2π)l2exp(−12[(y1−bl)2+∑i=2lyi2])p(x; \mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^T (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x-\mu)^TP^TP (x-\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[y_1 - b_l, y_2, ..., y_l][y_1 - b_l, y_2, ..., y_l]^T)\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])\\ p(x;μ,Σ)=(2π)2lΣ211exp(21(xμ)TΣ1(xμ))=(2π)2l1exp(21(xμ)T(xμ))=(2π)2l1exp(21(xμ)TPTP(xμ))=(2π)2l1exp(21[y1bl,y2,...,yl][y1bl,y2,...,yl]T)=(2π)2l1exp(21[(y1bl)2+i=2lyi2])
class ω2\omega_2ω2:
p(x;−μ,Σ)=1(2π)l2Σ12exp(−12(x+μ)TΣ−1(x+μ))=1(2π)l2exp(−12(x+μ)T(x+μ))=1(2π)l2exp(−12[(y1+bl)2+∑i=2lyi2])p(x; -\mu, \Sigma)=\frac{1}{(2\pi)^{\frac{l}{2}}\Sigma^{\frac{1}{2}}} exp(-\frac{1}{2}(x+\mu)^T \Sigma^{-1}(x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}(x+\mu)^T (x+\mu))\\ =\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])\\ p(x;μ,Σ)=(2π)2lΣ211exp(21(x+μ)TΣ1(x+μ))=(2π)2l1exp(21(x+μ)T(x+μ))=(2π)2l1exp(21[(y1+bl)2+i=2lyi2])
Because:
dy=PdxP−1dy=PTdy=dx∣det(PT)∣dy1...dyl=dy1...dyl=dx1...dxldy = Pdx\\ P^{-1}dy = P^Tdy =dx\\ |det(P^T)|dy_1...dy_l = dy_1...dy_l = dx_1...dx_l\\ dy=PdxP1dy=PTdy=dxdet(PT)dy1...dyl=dy1...dyl=dx1...dxl
Then:

For class ω1\omega_1ω1:
P(z=xTμ&lt;0∣ω1)=P(z=xTμ=xTble1=bly1&lt;0∣ω1)=P(y1&lt;0∣ω1)=∭y1&lt;0p(x;μ,Σ)dx1...dxl=∭y1&lt;01(2π)l2exp(−12[(y1−bl)2+∑i=2lyi2])dy1...dyl=∫−∞01(2π)12exp(−12(y1−bl)2)dy1∏i=2l∫−∞+∞1(2π)12exp(−12yi2)dyi=∫−∞01(2π)12exp(−12(y1−bl)2)dy1=∫−∞−bl1(2π)12exp(−12Z2)dZ=∫bl+∞1(2π)12exp(−12Z2)dZP(z=x^T\mu&lt;0|\omega_1) \\ = P(z=x^T\mu=x^Tb_l e_1=b_ly_1&lt;0|\omega_1) =P(y_1&lt;0|\omega_1)\\ = \iiint_{y_1&lt;0}p(x; \mu, \Sigma)dx_1...dx_l\\ = \iiint_{y_1&lt;0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 - b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1 \prod^{l}_{i=2}\int_{-\infty}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}y_i^2)dy_i\\ = \int_{-\infty}^{0}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 - b_l)^2)dy_1\\ = \int_{-\infty}^{-b_l}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ P(z=xTμ<0ω1)=P(z=xTμ=xTble1=bly1<0ω1)=P(y1<0ω1)=y1<0p(x;μ,Σ)dx1...dxl=y1<0(2π)2l1exp(21[(y1bl)2+i=2lyi2])dy1...dyl=0(2π)211exp(21(y1bl)2)dy1i=2l+(2π)211exp(21yi2)dyi=0(2π)211exp(21(y1bl)2)dy1=bl(2π)211exp(21Z2)dZ=bl+(2π)211exp(21Z2)dZ
For class ω2\omega_2ω2:
P(z=xTμ&gt;0∣ω2)=P(y1&gt;0∣ω2)=∭y1&gt;01(2π)l2exp(−12[(y1+bl)2+∑i=2lyi2])dy1...dyl=∫0+∞1(2π)12exp(−12(y1+bl)2)dy1=∫bl+∞1(2π)12exp(−12Z2)dZP(z=x^T\mu&gt;0|\omega_2)\\ = P(y_1&gt;0|\omega_2)\\ = \iiint_{y_1&gt;0}\frac{1}{(2\pi)^{\frac{l}{2}}} exp(-\frac{1}{2}[(y_1 + b_l)^2+\sum_{i=2}^{l}y_i^2])dy_1...dy_l\\ = \int_{0}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}(y_1 + b_l)^2)dy_1\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ P(z=xTμ>0ω2)=P(y1>0ω2)=y1>0(2π)2l1exp(21[(y1+bl)2+i=2lyi2])dy1...dyl=0+(2π)211exp(21(y1+bl)2)dy1=bl+(2π)211exp(21Z2)dZ

Calculate PeP_ePe

We have P(ω1)=12,P(ω2)=12P(\omega_1) = \frac{1}{2}, P(\omega_2)=\frac{1}{2}P(ω1)=21,P(ω2)=21:
Pe=P(ω1)P(z=xTμ&lt;0∣ω1)+P(ω2)P(z=xTμ&gt;0∣ω2)=12∫bl+∞1(2π)12exp(−12Z2)dZ+12∫bl+∞1(2π)12exp(−12Z2)dZ=∫bl+∞1(2π)12exp(−12Z2)dZP_e = P(\omega_1)P(z=x^T\mu&lt;0|\omega_1) + P(\omega_2)P(z=x^T\mu&gt;0|\omega_2)\\ = \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ + \frac{1}{2} \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ\\ = \int_{b_l}^{+\infty}\frac{1}{(2\pi)^{\frac{1}{2}}}exp(-\frac{1}{2}Z^2)dZ Pe=P(ω1)P(z=xTμ<0ω1)+P(ω2)P(z=xTμ>0ω2)=21bl+(2π)211exp(21Z2)dZ+21bl+(2π)211exp(21Z2)dZ=bl+(2π)211exp(21Z2)dZ
Where bl=∥μ∥b_l = \lVert \mu \rVertbl=μ

Computer Experiment

Experiment 5.2

meaning of S_w, S_b, S_m, J3

number of classes: MMM

number of training samples in class ωi\omega_iωi: nin_ini

number of features: NNN

dimension of features: lll

Within Class:
Sw=∑i=1MPiSwiwhereSwi=E[(x−μi)(x−μi)T],Pi=niNS_w = \sum_{i=1}^{M}P_i S_{wi}\\ where \quad S_{wi} = E[(x-\mu_i)(x-\mu_i)^T], \quad P_i = \frac{n_i}{N} Sw=i=1MPiSwiwhereSwi=E[(xμi)(xμi)T],Pi=Nni
Between Class:
Sb=∑i=1MPi(μi−μ0)(μi−μ0)Twhereμ0=∑i=1MPiμiS_b = \sum_{i =1}^{M} P_i (\mu_i - \mu_0)(\mu_i - \mu_0)^T\\ where \quad \mu_0 = \sum_{i=1}^{M} P_i \mu_i Sb=i=1MPi(μiμ0)(μiμ0)Twhereμ0=i=1MPiμi
Mixture Scatter:
Sm=E[(x−μ0)(x−μ0)T]=Sw+SbS_m = E[(x-\mu_0)(x-\mu_0)^T] = S_w + S_b Sm=E[(xμ0)(xμ0)T]=Sw+Sb
Criteria J3J_3J3:
J3=trace(Sw−1Sm)J_3 = trace(S_w^{-1}S_m) J3=trace(Sw1Sm)

code: experiment5_2.m

%% Computer Experiment 5.2 close('all'); clear; clc; m = [-10 -10 10 10; -10 10 -10 10]; m_test = [-1 -1 1 1;-1 1 -1 1]; s1 = 0.2 * [1 0;0 1]; s2 = 3 * [1 0;0 1]; P = [0.25 0.25 0.25 0.25]'; N = 400; % 100 points each class, 4 class %% the Generated 400 Vectors in X1, X1_test, X2, X2_test randn('seed', 0); % reproducible for i = 1:size(m, 2) if i == 1X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';X1_test = mvnrnd(m_test(:, i), s1, fix(P(i)*N))';X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';X2_test = mvnrnd(m_test(:, i), s2, fix(P(i)*N))';elseX1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];X1_test = [X1_test, mvnrnd(m_test(:, i), s1, fix(P(1)*N))'];X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];X2_test = [X2_test, mvnrnd(m_test(:, i), s2, fix(P(1)*N))'];end end y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N)), 3 * ones(1, fix((P(3)*N))), 4 * ones(1, fix((P(4)*N)))]; y1_test = y1; y2 = y1; y2_test = y2; %% Plot all Situations for 4 Classes figure; plot_point(X1, y1); title('Classes are Far away, $${\Sigma = 0.2 I}$$','Interpreter','latex'); figure; plot_point(X1_test, y2_test); title('Classes are Close, $${\Sigma = 0.2 I}$$','Interpreter','latex'); figure; plot_point(X2, y2); title('Classes are Far away, $${\Sigma = 3 I}$$','Interpreter','latex'); figure; plot_point(X2_test, y2_test); title('Classes are Close, $${\Sigma = 3 I}$$','Interpreter','latex'); %% Calculate S_w, S_b, S_m, J3 = trace(S_w \ S_m) [ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 ) J_3 = J3(S_w, S_m) [ S_w, S_b, S_m ] = Calc_SwSbSm( X1_test, y1_test ) J_3 = J3(S_w, S_m) [ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 ) J_3 = J3(S_w, S_m) [ S_w, S_b, S_m ] = Calc_SwSbSm( X2_test, y2_test ) J_3 = J3(S_w, S_m)

result

% m = [-10 -10 10 10; % -10 10 -10 10]; % Sigma = 0.2 I S_w =0.2070 0.00460.0046 0.2145 S_b =99.8278 -0.2653-0.2653 100.0591 S_m =100.0348 -0.2607-0.2607 100.2736 J_3 = 951.3471 % m_test = [-1 -1 1 1; % -1 1 -1 1]; % Sigma = 0.2 I S_w =0.2042 0.00350.0035 0.1999 S_b =1.0225 -0.0537-0.0537 0.9842 S_m =1.2266 -0.0502-0.0502 1.1841 J_3 = 11.9440 % m = [-10 -10 10 10; % -10 10 -10 10]; % Sigma = 3 I S_w =3.0501 -0.0759-0.0759 3.1290 S_b =99.7179 -0.4153-0.4153 100.9610 S_m =102.7680 -0.4912-0.4912 104.0900 J_3 = 66.9920 % m_test = [-1 -1 1 1; % -1 1 -1 1]; % Sigma = 3 I S_w =2.8682 -0.0492-0.0492 2.9545 S_b =1.1316 0.00250.0025 1.1283 S_m =3.9997 -0.0466-0.0466 4.0828 J_3 = 2.7766

conclusion

  • When Σ\SigmaΣ of classes are the same:
    Sw≈ΣS_w \approx \Sigma\\ SwΣ

  • When classes are Far away⇒\Rightarrow trace(Sb)trace(S_b)trace(Sb) is big.
    trace(Sb)=1N∑i=1Mni(μi−μ0)2,whereμ0=1N∑i=1Mniμitrace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2, \quad where \quad \mu_0 = \frac{1}{N} \sum_{i=1}^{M} n_i \mu_i trace(Sb)=N1i=1Mni(μiμ0)2,whereμ0=N1i=1Mniμi

  • For J3=trace(Sw−1Sm)J3 = trace(S_w^{-1} S_m)J3=trace(Sw1Sm):

    • Relationship of σ,trace(sb)\sigma, trace(s_b)σ,trace(sb)and J3J3J3:
      ⇒Σ=σIalsoSw≈Σ=σIJ3=trace(Sw−1Sm)≈trace(1σISm)=trace(Sm)σ=trace(Sw+Sb)σ=trace(Sw)+trace(Sb)σ≈σl+trace(Sb)σ=l+trace(Sb)σ\Rightarrow \Sigma = \sigma I\\ also \quad S_w \approx \Sigma = \sigma I\\ J3 = trace(S_w^{-1}S_m) \approx trace(\frac{1}{\sigma}I S_m) = \frac{trace(S_m)}{\sigma}\\ = \frac{trace(S_w + S_b)}{\sigma} = \frac{trace(S_w) + trace(S_b)}{\sigma} \approx \frac{\sigma l + trace(S_b)}{\sigma} = l + \frac{trace(S_b)}{\sigma} Σ=σIalsoSwΣ=σIJ3=trace(Sw1Sm)trace(σ1ISm)=σtrace(Sm)=σtrace(Sw+Sb)=σtrace(Sw)+trace(Sb)σσl+trace(Sb)=l+σtrace(Sb)

      m=10[−1−111;−11−11],σ=0.2m = 10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2m=10[1 1 1 1;1 1 1 1],σ=0.2m=[−1−111;−11−11],σ=0.2m = [-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 0.2m=[1 1 1 1;1 1 1 1],σ=0.2
      J3J3J3951.347111.9440
      l+trace(Sb)σl+\frac{trace(S_b)}{\sigma}l+σtrace(Sb)1001.412.0335
      m=10[−1−111;−11−11],σ=3m = 10[-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3m=10[1 1 1 1;1 1 1 1],σ=3m=[−1−111;−11−11],σ=3m = [-1\ -1\ 1\ 1;-1\ 1\ -1\ 1], \sigma = 3m=[1 1 1 1;1 1 1 1],σ=3
      J3J3J366.99202.7766
      l+trace(Sb)σl +\frac{trace(S_b)} {\sigma}l+σtrace(Sb)68.89302.7533
    • How J3J3J3 changes:

      Classes are Far away:
      ⇒(μi−μ0)2↑⇒trace(Sb)=1N∑i=1Mni(μi−μ0)2↑⇒J3=trace(Sw−1Sb)≈l+trace(Sb)σ↑\Rightarrow (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow trace(S_b) =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)^2 \uparrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow (μiμ0)2trace(Sb)=N1i=1Mni(μiμ0)2J3=trace(Sw1Sb)l+σtrace(Sb)
      Features within every class are Close:
      ⇒σ↓⇒J3=trace(Sw−1Sb)≈l+trace(Sb)σ↑\Rightarrow \sigma \downarrow\\ \Rightarrow J3 = trace(S_w^{-1}S_b) \approx l + \frac{trace(S_b)}{\sigma} \uparrow σJ3=trace(Sw1Sb)l+σtrace(Sb)

  • Experiment 5.4

    FDR: Fisher’s discriminant ratio

    Theory of Fisher’s Linear Discriminant





    Columns of www are corresponding to the Highest Eigen Values of SW−1SBS_W^{-1}S_BSW1SB

    code: experiment5_4.m

    %% Computer Experiment 5.4 close('all'); clear; clc; m = [2 2.5; 4 10]; s1 = [1 0;0 1]; s2 = 0.25 * [1 0;0 1]; P = [0.5 0.5]'; N = 200; % 100 points each class, 2 class %% the Generated 200 Vectors in X1, X1_test, X2, X2_test randn('seed', 0); % reproducible for i = 1:size(m, 2) if i == 1X1 = mvnrnd(m(:, i), s1, fix(P(i)*N))';X2 = mvnrnd(m(:, i), s2, fix(P(i)*N))';elseX1 = [X1, mvnrnd(m(:, i), s1, fix(P(1)*N))'];X2 = [X2, mvnrnd(m(:, i), s2, fix(P(1)*N))'];end end y1 = [ones(1, fix((P(1)*N))), 2 * ones(1, fix(P(2)*N))]; y2 = y1; %% Get FDR value, w (line projected on) of X1, X2 [ S_w, S_b, S_m ] = Calc_SwSbSm( X1, y1 ) [ FDR_1, w_1 ] = FDR( X1, y1 , 1) % set D_y: Dimension of w = 1 S_w \ [m(:, 1)-m(:, 2)] ans = ans / sqrt(sum(ans.^2)) [ S_w, S_b, S_m ] = Calc_SwSbSm( X2, y2 ); [ FDR_2, w_2 ] = FDR( X2, y2 , 1) S_w \ [m(:, 1)-m(:, 2)] ans = ans / sqrt(sum(ans.^2)) %% Plot all Situations for 4 Classes f1 = figure; plot_point(X1, y1); hold on; h1 = ezplot(@(x, y) w_1(1)*x + w_1(2)*y); hold on; H1 = ezplot(@(x, y) w_1(2)*x - w_1(1)*y, [-2 8 0 15]); f2 = figure; plot_point(X2, y2); hold on; h2 = ezplot(@(x, y) w_2(1)*x + w_2(2)*y); hold on; H2 = ezplot(@(x, y) w_2(2)*x - w_2(1)*y, [-4 6 0 12]); figure(f1); title('$${\Sigma = I}$$','Interpreter','latex'); set(h1,'Color','r'); set(H1,'Color','g'); legend([h1 H1], 'y = wTx', 'line to be projected on', 'Location', 'Best') figure(f2); title('$${\Sigma = 0.25 I}$$','Interpreter','latex'); set(h2,'Color','r'); set(H2,'Color','g'); legend([h2 H2], 'y = wTx', 'line to be projected on', 'Location', 'NorthWest')


    result

    % Sigma = I FDR_1 = 8.1336 w_1 =-0.1118-0.9937 S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes-0.6136-5.5634 normalized S_w \ [m(:, 1)-m(:, 2)] =-0.1096-0.9940 % Sigma = 0.25 I FDR_2 = 38.4050 w_2 =0.0075-1.0000 S_w \ [m(:, 1)-m(:, 2)] = % for 2 classes-0.0019-25.9308 normalized S_w \ [m(:, 1)-m(:, 2)] =-0.0001-1.0000

    conclusion

  • Theoretically, for 2 classes:

    When J(w)J(w)J(w) is maximized,
    DifferentiatingJ(w)=wTSbwwTSwwHaving(wTSbw)Sww=(wTSbw)Sbw⇒w=J(w)(Sw−1Sb)w,J(w)=k1AlsoSb=(m1−m2)(m1−m2)T,Sbw=((m1−m2)Tw)(m1−m2),((m1−m2)Tw)=k2⇒w=k1k2Sw−1(m1−m2)Differentiating \quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad w = J(w) (S_w^{-1}S_b) w, \quad J(w) = k_1\\ Also \quad S_b = (m_1 - m_2)(m_1 - m_2)^T, \quad S_b w = ((m_1- m_2)^T w) (m_1 - m_2), \quad ((m_1- m_2)^T w) = k_2\\ \Rightarrow \quad w = k_1 k_2 S_w^{-1}(m_1 - m_2) DifferentiatingJ(w)=wTSwwwTSbwHaving(wTSbw)Sww=(wTSbw)Sbww=J(w)(Sw1Sb)w,J(w)=k1AlsoSb=(m1m2)(m1m2)T,Sbw=((m1m2)Tw)(m1m2),((m1m2)Tw)=k2w=k1k2Sw1(m1m2)
    So www has the same direction with Sw−1(m1−m2)S_w^{-1}(m_1 - m_2)Sw1(m1m2)

    Actually:

    When Σ=I\Sigma = IΣ=I, covariance is big enough, www and Sw−1(m1−m2)S_w^{-1}(m_1 - m_2)Sw1(m1m2) are in the same direction.

    When Σ=0.25I\Sigma = 0.25 IΣ=0.25I, covariance is too small, there is big difference between directions of www and Sw−1(m1−m2)S_w^{-1}(m_1 - m_2)Sw1(m1m2).

    wwwSw−1(m1−m2)S_w^{-1}(m_1 - m_2)Sw1(m1m2)normalized Sw−1(m1−m2)S_w^{-1}(m_1 - m_2)Sw1(m1m2)J(w)J(w)J(w)
    Σ=I\Sigma = IΣ=I-0.1118; -0.9937-0.6136; -5.5634-0.1096; -0.99408.1336
    Σ=0.25I\Sigma = 0.25IΣ=0.25I0.0075; -1.0000-0.0019; -25.9308-0.0001; -1.000038.4050
  • Theoretically:
    HavingSw≈Σ=σI,Restraintcondition:wTw=1,MaxeigenvalueofSb:ΛmaxDifferentiatingJ(w)=wTSbwwTSwwHaving(wTSbw)Sww=(wTSbw)Sbw⇒eigenvalueproblem:J(w)w=Sw−1Sbw≈SbσwFor2classes:J(w)≈λmax−σmaxeigenvalueofSbσJ(w)≈λmax−σ=ΛmaxσHaving \quad S_w \approx \Sigma = \sigma I, \quad Restraint \ condition: w^T w = 1, \quad Max\ eigen\ value\ of\ S_b: \Lambda_{max}\\ Differentiating \quad J(w) = \frac{w^T S_b w}{w^T S_w w}\\ Having \quad (w^T S_b w) S_w w = (w^T S_b w) S_b w\\ \Rightarrow \quad eigen\ value\ problem: J(w)w=S_w^{-1}S_b w \approx \frac{S_b}{\sigma} w\\ For\ 2\ classes: J(w) \approx \lambda_{max-\sigma} \quad max\ eigen\ value\ of \quad \frac{S_b}{\sigma}\\ J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} HavingSwΣ=σI,Restraint condition:wTw=1,Max eigen value of Sb:ΛmaxDifferentiatingJ(w)=wTSwwwTSbwHaving(wTSbw)Sww=(wTSbw)Sbweigen value problem:J(w)w=Sw1SbwσSbwFor 2 classes:J(w)λmaxσmax eigen value ofσSbJ(w)λmaxσ=σΛmax
    When Distance between centers is not changed:
    ⇒(μi−μ0)(μi−μ0)T=constmatrix⇒Sb=1N∑i=1Mni(μi−μ0)(μi−μ0)T=constmatrix⇒Λmax=const⇒J(w)σ≈Λmax=const\Rightarrow (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow S_b =\frac{1}{N} \sum_{i=1}^{M} n_i (\mu_i-\mu_0)(\mu_i-\mu_0)^T = const\ matrix\\ \Rightarrow \Lambda_{max} = const\\ \Rightarrow J(w)\sigma \approx \Lambda_{max} = const (μiμ0)(μiμ0)T=const matrixSb=N1i=1Mni(μiμ0)(μiμ0)T=const matrixΛmax=constJ(w)σΛmax=const
    σ\sigmaσ of classes is big $\Rightarrow $ FDR: Fisher’s discriminant ratio is small.
    σ↑⇒J(w)≈λmax−σ=Λmaxσ↓\sigma \uparrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \downarrow σJ(w)λmaxσ=σΛmax
    σ\sigmaσ of classes is small $\Rightarrow $ FDR: Fisher’s discriminant ratio is big.
    σ↓⇒J(w)≈λmax−σ=Λmaxσ↑\sigma \downarrow\\ \Rightarrow J(w) \approx \lambda_{max-\sigma} = \frac{\Lambda_{max}}{\sigma} \uparrow σJ(w)λmaxσ=σΛmax
    Actually:

    σ\sigmaσJ(w)J(w)J(w)σJ(w)\sigma J(w)σJ(w)Λmax\Lambda_{max}ΛmaxσJ(w)Λmax×100%\frac{\sigma J(w)}{\Lambda_{max}} \times 100\%ΛmaxσJ(w)×100%
    Σ=I\Sigma = IΣ=I18.13368.13368.751592.94%
    Σ=0.25I\Sigma = 0.25IΣ=0.25I0.2538.40509.60138.9344107.46%

  • 总结

    以上是生活随笔为你收集整理的homework5_ZhankunLuo的全部内容,希望文章能够帮你解决所遇到的问题。

    如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。