Matting Laplacian

1. Color Line Assumption

ColorAssumption

  • Matting is a technique that extracts the foreground from an image.
  • Matting needs an assumption that for a small window ww of an image, its foreground pixels are colinear, so are its background pixels.
  • More specifically, for a pixel iwi \in w, there exists βi\beta_i, γi\gamma_i, F1F_1, F2F_2, B1B_1, and B2B_2 such that Fi=βiF1+(1βi)F2F_i = \beta_i F_1 + (1 - \beta_i) F_2 and Bi=γiB1+(1γi)B2B_i = \gamma_i B_1 + (1 - \gamma_i) B_2 where FiF_i is the foreground component of the pixel ii and BiB_i is the background component of it.
  • Moreover, the intensity IiI_i of the pixel ii consists of FiF_i and BiB_i such that Ii=αiFi+(1αi)BiI_i = \alpha_i F_i + (1 - \alpha_i) B_i for some αi\alpha_i.
  • Ii=αi(βiF1+(1βi)F2)+(1αi)(γiB1+(1γi)B2)I_i = \alpha_i (\beta_i F_1 + (1 - \beta_i) F_2) + (1 - \alpha_i)(\gamma_i B_1 + (1 - \gamma_i) B_2).
  • It can be also represented as
(F2B2F1F2B1B2)(αiαiβi(1αi)γi)=IiB2\begin{equation} \left(\begin{array}{ccc} F_2 - B_2 & F_1 - F_2 & B_1 - B_2 \end{array}\right) \left(\begin{array}{c} \alpha_i \\ \alpha_i \beta_i \\ (1 - \alpha_i) \gamma_i \end{array}\right) = I_i - B_2 \end{equation} (αiαiβi(1αi)γi)=(F2B2F1F2B1B2)1(IiB2)=(r1tr2tr3t)(IiB2)\begin{equation} \left(\begin{array}{c} \alpha_i \\ \alpha_i \beta_i \\ (1 - \alpha_i) \gamma_i \end{array}\right) = \left(\begin{array}{ccc} F_2 - B_2 & F_1 - F_2 & B_1 - B_2 \end{array}\right)^{-1} (I_i - B_2) = \left(\begin{array}{c} r^t_1 \\ r^t_2 \\ r^t_3 \end{array}\right) (I_i - B_2) \end{equation}
  • So αi=r1tIir1tB2\alpha_i = r^t_1 I_i - r^t_1 B_2. It implies that αi\alpha_i and IiI_i are in a linear relationship for ww. It means that for every window IiI_i and αi\alpha_i are in an affine relationship.

2. Matting Laplaciain

  • For the affinity of IiI_i and αi\alpha_i, αiaIi+b\alpha_i \approx a I_i + b for some aa and bb. Now consider the cost function J(α,a,b)J(\alpha, a, b).
j=1Niwj(αij(ajIij+bj))2+ϵaj22\begin{equation} \sum_{j=1}^N \sum_{i \in w_j} (\alpha^j_i - (a_j I^j_i + b_j))^2 + \epsilon \Vert a_j \Vert^2_2 \end{equation}

where NN is the number of windows in an image which is the same as the number of pixels in an image because the windows are created for each pixel. As the regularization term of this cost function, ϵaj22\epsilon \Vert a_j \Vert^2_2, is minimized, aja_j goes to zero, and it causes aibja_i \approx b_j. Here assume that IiI_i is a scalar for simplicity, but it can be also extended to a three-dimensional vector.

  • This cost function can be reformed as follows:
J(α,a,b)=j=1N(I1j1IWj1ϵ0)(ajbj)(α1jαWj0)2=j=1NGj(ajbj)αj2\begin{equation} J(\alpha, a, b) = \sum_{j=1}^N \left\| \left(\begin{array}{cc} I^j_1 & 1 \\ \vdots & \vdots \\ I^j_W & 1 \\ \sqrt{\epsilon} & 0 \end{array}\right) \left(\begin{array}{c} a_j \\ b_j \end{array}\right) - \left(\begin{array}{c} \alpha^j_1 \\ \vdots \\ \alpha^j_W \\ 0 \end{array}\right) \right\|^2 = \sum_{j=1}^N \left\| G_j \left(\begin{array}{c} a_j \\ b_j \end{array}\right) - \overline{\alpha_j} \right\|^2 \end{equation}

where WW is the number of pixels in each window.

  • Let the optimal aja_j and bjb_j for the window wjw_j be aja^{*}_j and bjb^{*}_j, then from the normal equation
(ajbj)=(GjtGj)1Gjtαj\begin{equation} \left(\begin{array}{c} a^{*}_j \\ b^{*}_j \end{array}\right) = (G^t_j G_j)^{-1} G^t_j \overline{\alpha_j} \end{equation}
  • Now, the cost function J(α)J(\alpha) for this optimal aja^{*}_j and bjb^{*}_j is
J(α)=j=1NGj(ajbj)αj2=j=1NGj(GjtGj)1Gjtαjαj2\begin{equation} J(\alpha) = \sum_{j=1}^N \left\| G_j \left(\begin{array}{c} a^{*}_j \\ b^{*}_j \end{array}\right) - \overline{\alpha_j} \right\|^2 = \sum_{j=1}^N \left\| G_j(G^t_j G_j)^{-1} G^t_j \overline{\alpha_j} - \overline{\alpha_j} \right\|^2 \end{equation}

Let Gj(GjtGj)1Gjt=XG_j(G^t_j G_j)^{-1} G^t_j = X, then Xt=Gj(GjtGj)1Gjt=XX^t = G_j(G^t_j G_j)^{-1} G^t_j = X, so XX is symmetric. Moreover, XXt=XX=Gj(GjtGj)1GjtGj(GjtGj)1Gjt=Gj(GjtGj)1Gjt=XXX^t = XX = G_j(G^t_j G_j)^{-1} G^t_j G_j(G^t_j G_j)^{-1} G^t_j = G_j(G^t_j G_j)^{-1} G^t_j = X. It implies that J(α)=j=1NXαjαj2=j=1N(XI)αj2=j=1Nαjt(XI)t(XI)αj=j=1Nαjt(XtI)(XI)αj=j=1Nαjt(XtXXXt+I)αj=j=1Nαjt(XXXt+I)αj=j=1Nαjt(IX)αj=j=1Nαjt(IGj(GjtGj)1Gjt)αj=j=1NαjtGjαj\begin{aligned} J(\alpha) &= \sum_{j=1}^N \left\| X \overline{\alpha_j} - \overline{\alpha_j} \right\|^2 = \sum_{j=1}^N \left\| (X - I) \overline{\alpha_j} \right\|^2 = \sum_{j=1}^N \overline{\alpha_j}^t (X - I)^t (X - I) \overline{\alpha_j} \\ &= \sum_{j=1}^N \overline{\alpha_j}^t (X^t - I) (X - I) \overline{\alpha_j} = \sum_{j=1}^N \overline{\alpha_j}^t (X^t X - X - X^t + I) \overline{\alpha_j} \\ &= \sum_{j=1}^N \overline{\alpha_j}^t (X - X - X^t + I) \overline{\alpha_j} = \sum_{j=1}^N \overline{\alpha_j}^t (I - X) \overline{\alpha_j} \\ &= \sum_{j=1}^N \overline{\alpha_j}^t (I - G_j(G^t_j G_j)^{-1} G^t_j) \overline{\alpha_j} = \sum_{j=1}^N \overline{\alpha_j}^t \overline{G_j} \overline{\alpha_j} \end{aligned}

  • Besides, GjtGjG^t_j G_j, (GjtGj)1(G^t_j G_j)^{-1}, (GjtGj)1Gjt(G^t_j G_j)^{-1} G^t_j, and Gj(GjtGj)1GjtG_j(G^t_j G_j)^{-1} G^t_j are as follows:
GjtGj=(I1jIWjϵ110)(I1j1IWj1ϵ0)=(ϵ+iWIi2iWIiiWIiW)=(W(σ2+μ2)+ϵWμWμW)(GjtGj)1=1W2(σ2+μ2)+WϵW2μ2(WWμWμW(σ2+μ2)+ϵ)=1W2σ2+Wϵ(WWμWμW(σ2+μ2)+ϵ)(GjtGj)1Gjt=1W2σ2+Wϵ(WWμWμW(σ2+μ2)+ϵ)(I1jIWjϵ110)=1Wσ2+ϵ(I1jμIWjμϵμI1j+σ2+μ2+ϵWμIWj+σ2+μ2+ϵWμϵ)\begin{aligned} G^t_j G_j &= \left(\begin{array}{cccc} I^j_1 & \cdots & I^j_W & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{array}\right) \left(\begin{array}{cc} I^j_1 & 1 \\ \vdots & \vdots \\ I^j_W & 1 \\ \sqrt{\epsilon} & 0 \end{array}\right) = \left(\begin{array}{cc} \epsilon + \sum_{i}^W I^2_i & \sum_{i}^W I_i \\ \sum_{i}^W I_i & W \end{array}\right) \\ &= \left(\begin{array}{cc} W(\sigma^2 + \mu^2) + \epsilon & W\mu \\ W\mu & W \end{array}\right) \\ (G^t_j G_j)^{-1} &= \frac{1}{W^2 (\sigma^2 + \mu^2) + W \epsilon - W^2 \mu^2} \left(\begin{array}{cc} W & -W\mu \\ -W\mu & W(\sigma^2 + \mu^2) + \epsilon \end{array}\right) \\ &= \frac{1}{W^2 \sigma^2 + W \epsilon} \left(\begin{array}{cc} W & -W\mu \\ -W\mu & W(\sigma^2 + \mu^2) + \epsilon \end{array}\right) \\ (G^t_j G_j)^{-1} G^t_j &= \frac{1}{W^2 \sigma^2 + W \epsilon} \left(\begin{array}{cc} W & -W\mu \\ -W\mu & W(\sigma^2 + \mu^2) + \epsilon \end{array}\right) \left(\begin{array}{cccc} I^j_1 & \cdots & I^j_W & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{array}\right) \\ &= \frac{1}{W \sigma^2 + \epsilon} \left(\begin{array}{cccc} I^j_1 - \mu & \cdots & I^j_W - \mu & \sqrt{\epsilon} \\ -\mu I^j_1 + \sigma^2 + \mu^2 + \frac{\epsilon}{W} & \cdots & -\mu I^j_W + \sigma^2 + \mu^2 + \frac{\epsilon}{W} & -\mu \sqrt{\epsilon} \end{array}\right) \\ \end{aligned} Gj(GjtGj)1Gjt=1Wσ2+ϵ(I1j1IWj1ϵ0)(I1jμIWjμϵμI1j+σ2+μ2+ϵWμIWj+σ2+μ2+ϵWμϵ)=1Wσ2+ϵ(ϵ(I1jμ)IuIvIuμ+σ2+μ2Ivμ+ϵWϵ(IWjμ)ϵ(I1jμ)ϵ(IWjμ)ϵ)\begin{aligned} G_j(G^t_j G_j)^{-1} G^t_j &= \frac{1}{W \sigma^2 + \epsilon} \left(\begin{array}{cc} I^j_1 & 1 \\ \vdots & \vdots \\ I^j_W & 1 \\ \sqrt{\epsilon} & 0 \end{array}\right) \left(\begin{array}{cccc} I^j_1 - \mu & \cdots & I^j_W - \mu & \sqrt{\epsilon} \\ -\mu I^j_1 + \sigma^2 + \mu^2 + \frac{\epsilon}{W} & \cdots & -\mu I^j_W + \sigma^2 + \mu^2 + \frac{\epsilon}{W} & -\mu \sqrt{\epsilon} \end{array}\right) \\ &= \frac{1}{W \sigma^2 + \epsilon} \left(\begin{array}{cccc} \ddots & & & \sqrt{\epsilon} (I^j_1 - \mu) \\ & I_u I_v - I_u \mu + \sigma^2 + \mu^2 - I_v \mu + \frac{\epsilon}{W} & & \vdots \\ & & \ddots & \sqrt{\epsilon} (I^j_W - \mu) \\ \sqrt{\epsilon} (I^j_1 - \mu) & \cdots & \sqrt{\epsilon} (I^j_W - \mu) & \epsilon \end{array}\right) \end{aligned}

where uu, v[1,W]v \in [1, W] are integers. From equations as above, (Gjuv)(\overline{G_j}_{uv}) which means the (u,v)(u, v) element of Gj\overline{G_j} is (Gj)uv=(IGj(GjtGj)1Gjt)uv=δuv1Wσ2+ϵ(IuIvIuμ+σ2+μ2Ivμ+ϵW)=δuv1Wσ2+ϵ(σ2+ϵW+(Iuμ)(Ivμ))=δuv1W(1+1σ2+ϵW(Iuμ)(Ivμ))\begin{aligned} (\overline{G_j})_{uv} &= (I- G_j(G_j^t G_j)^{-1} G_j^t )_{uv} \\ &= \delta_{uv} - \frac{1}{W \sigma^2 + \epsilon} \left( I_u I_v - I_u \mu + \sigma^2 + \mu^2 - I_v \mu + \frac{\epsilon}{W} \right) \\ &= \delta_{uv} - \frac{1}{W \sigma^2 + \epsilon} \left( \sigma^2 + \frac{\epsilon}{W} + (I_u - \mu)(I_v - \mu) \right) \\ &= \delta_{uv} - \frac{1}{W} \left( 1 + \frac{1}{\sigma^2 + \frac{\epsilon}{W}}(I_u - \mu)(I_v - \mu) \right) \end{aligned}

where \delta_{uv} is the Kronecker delta which means that 11 if u=vu = v and 00 otherwise.

  • Let the top-left W×WW \times W submatrix of Gj\overline{G_j} be [Gj][\overline{G_j}]. Now, J(α)J(\alpha) can be represented as the form mtLmm^t Lm where LL is the Matting Laplacian. Particularly, mm is an N×1N \times 1 matte vector where mkm_k is the matte of the kk-th pixel. Therefore, there exist multiple ii and jj pairs such that αij=mk\alpha^j_i = m_k.
J(α)=j=1NαjtGjαj=j=1N(α1jαWj0)Gj(α1jαWj0)=j=1N(α1jαWj)[Gj](α1jαWj)=mtLm=(m1mN)(L11L1NLN1LNN)(m1mN)=m1L11m1+m1L12m2++m1L1NmN+miLi1m1+miLi2m2++miLiNmN+mNLN1m1+mNLN2m2++mNLNNmN\begin{aligned} J(\alpha) &= \sum_{j=1}^N \overline{\alpha_j}^t \overline{G_j} \overline{\alpha_j} = \sum_{j=1}^N \left(\begin{array}{cccc} \alpha^j_1 & \cdots & \alpha^j_W & 0 \end{array}\right) \overline{G_j} \left(\begin{array}{c} \alpha^j_1 \\ \vdots \\ \alpha^j_W \\ 0 \end{array}\right) \\ &= \sum_{j=1}^N \left(\begin{array}{ccc} \alpha^j_1 & \cdots & \alpha^j_W \end{array}\right) [\overline{G_j}] \left(\begin{array}{c} \alpha^j_1 \\ \vdots \\ \alpha^j_W \end{array}\right) = m^t L m\\ &= \left(\begin{array}{ccc} m_1 & \cdots & m_N \end{array}\right) \left(\begin{array}{ccc} L_{11} & \cdots & L_{1N} \\ \vdots & \ddots & \vdots \\ L_{N1} & \cdots & L_{NN} \end{array}\right) \left(\begin{array}{c} m_1 \\ \vdots \\ m_N \end{array}\right) \\ \\ &= \begin{array}{c} m_1 L_{11} m_1 + m_1 L_{12} m_2 + \cdots + m_1 L_{1N} m_N + \\ \cdots \\ m_i L_{i1} m_1 + m_i L_{i2} m_2 + \cdots + m_i L_{iN} m_N + \\ \cdots \\ m_N L_{N1} m_1 + m_N L_{N2} m_2 + \cdots + m_N L_{NN} m_N \end{array} \end{aligned}
  • LklL_{kl} appears in only mkLklmlm_k L_{kl} m_l part. Therefore, LklL_{kl} is the sum of the coefficients of αi1j1αi2j2\alpha^{j_1}_{i_1} \alpha^{j_2}_{i_2} pairs which are equivalent to mkmlm_k m_l in j=1NαjtGjαj\sum_{j=1}^N \overline{\alpha_j}^t \overline{G_j} \overline{\alpha_j}.

aLa

In other words, for the window-set wklw_{kl} that includes the kk-th and ll-th pixels together in an image, LklL_{kl} can be figured out from this by calculating wwklαwtGwαw\sum_{w \in w_{kl}} \overline{\alpha_w}^t \overline{G_w} \overline{\alpha_w}. This is because there exists the set Θ={(pw,qw)pw=mk,qw=ml,pw,qwαw and wwkl}\Theta = \{ (p_w, q_w) \vert p_w = m_k, q_w = m_l, p_w, q_w \in \overline{\alpha_w} \text{ and } w \in w_{kl} \}. Therefore, LklL_{kl} is as follows: Lkl=wwkl(Gw)pwqw=wwklδpwqw1Wpwqw(1+1σw2+ϵW(Ipwμw)(Iqwμw))\begin{aligned} L_{kl} &= \sum_{w \in w_{kl}} (\overline{G_w})_{p_w q_w} = \sum_{w \in w_{kl}} \delta_{p_w q_w} - \frac{1}{W_{p_w q_w}} \left(1 + \frac{1}{\sigma^2_w + \frac{\epsilon}{W}}(I_{p_w} - \mu_w)(I_{q_w} - \mu_w)\right) \end{aligned}

  • LL is symmetric since Lkl=LlkL_{kl} = L_{lk}.
  • Finally, J=mtLmJ = m^t L m, so it can be solved by solving Lm=0Lm = 0 because Jm=2mtL=0\frac{\partial J}{\partial m} = 2m^t L = 0.

3. Adding Constraint

  • Any constant mm is in nullspace of LL. Consider cL(1,,1)tcL(1, \cdots, 1)^t for cRc \in R. Then its ii-th row element is j=1NLij\sum_{j=1}^N L_{ij}. For the window-set wiiw_{ii} that includes the ii-th pixel, let pw=mip_w = m_i, wwiiw \in w_{ii}. Then,
j=1NLij=wwiiuw(Gw)pwu=wwiiuwδpwu1W(1+1σw2+ϵW(Ipwμw)(Iuμw))=wwii(uwδpwuuw1W(1+1σw2+ϵW(Ipwμw)(Iuμw)))=wwii(111Wσw2+ϵ(Ipwuμw)uw(Iuμw))=wwii1Wσw2+ϵ(Ipwuμw)(WμwWμw)=wwii0=0\begin{aligned} \sum_{j=1}^N L_{ij} &= \sum_{w \in w_{ii}} \sum_{u \in w} (\overline{G_w})_{p_w u} = \sum_{w \in w_{ii}} \sum_{u \in w} \delta_{p_w u} - \frac{1}{W} \left(1 + \frac{1}{\sigma^2_w + \frac{\epsilon}{W}}(I_{p_w} - \mu_w)(I_{u} - \mu_w)\right) \\ &= \sum_{w \in w_{ii}} \left( \sum_{u \in w} \delta_{p_w u} - \sum_{u \in w} \frac{1}{W} \left(1 + \frac{1}{\sigma^2_w + \frac{\epsilon}{W}}(I_{p_w} - \mu_w)(I_{u} - \mu_w)\right) \right) \\ &= \sum_{w \in w_{ii}} \left( 1 - 1 - \frac{1}{W \sigma^2_w + \epsilon} (I_{p_w u} - \mu_w) \sum_{u \in w} (I_{u} - \mu_w) \right) \\ &= \sum_{w \in w_{ii}} - \frac{1}{W \sigma^2_w + \epsilon} (I_{p_w u} - \mu_w) (W\mu_w - W\mu_w) \\ &= \sum_{w \in w_{ii}} 0 = 0 \end{aligned}
  • So user scribbles are required which denote some of foreground and background pixels in an image. Let UU be a N×1N \times 1 vector whose ii-th element is 11 if it is the known foreground pixel or 00 otherwise. In addition, let F={fUf=1}F = \{ f \vert U_f = 1 \}.
J=minmtLm+λfF(mf1)2=minmtLm+λ(mU)tD(mU)\begin{aligned} J = \min m^t L m + \lambda \sum_{f \in F} (m_f - 1)^2 = \min m^t L m + \lambda (m - U)^t D (m - U) \end{aligned}

where DD is an N×NN \times N diagonal matrix which Dii=1D_{ii} = 1 if Ui=1U_{i} = 1 or 00 otherwise.

  • To solve, JJ, we need the following:
Jm=2mtL+λ(mU)t(D+Dt)=2mtL+2λ(mtUt)D=2mt(L+λD)2λUtD=0    mt(L+λD)=λUtD=λUt    (L+λD)tm=(L+λD)m=λU\begin{aligned} \frac{\partial J}{\partial m} &= 2 m^t L + \lambda (m - U)^t (D + D^t) = 2 m^t L + 2 \lambda (m^t - U^t) D \\ &= 2 m^t (L + \lambda D) - 2 \lambda U^t D = 0 \\ &\implies m^t (L + \lambda D) = \lambda U^t D = \lambda U^t \\ &\implies (L + \lambda D)^t m = (L + \lambda D) m = \lambda U \end{aligned}

Therefore, the solution can be from the linear system (L+λD)m=λU(L + \lambda D)m = \lambda U where λ\lambda is a large number, for example, 100100.

4. The eigenvectors of LL

  • Without user scribbles, the solution of J=mtLmJ = m^t L m is that of Lm=0Lm = 0.
  • Consider Lv=λvLv = \lambda v where vv is the eigenvector of LL and λ\lambda is the eigenvalue of vv. If we choose λ\lambda closing to 00, then Lv=λv0=LmLv = \lambda v \approx 0 = Lm.
  • This eigenvector vv looks like a grayscale image.
  • Let the set Φ={vLv=λv,λ0}\Phi = \{ v \vert Lv = \lambda v, \lambda \approx 0 \}. Then for s=Φs = \vert \Phi \vert, Lv1++Lvs=L(v1++vs)0Lv_1 + \cdots + Lv_s = L(v_1 + \cdots + v_s) \approx 0. It means that any combination of elements in Φ\Phi can be a candidate of mm. More formally, there exist some constants such that mi=1scivim \approx \sum_{i=1}^s c_i v_i and i=1sci=1\sum_{i=1}^s c_i = 1 for cRc \in \R. However, it is more realistic that we define an N×1N \times 1 vector β\beta such that mi=1sβivim \approx \sum_{i=1}^s \beta_i v_i and i=1sβi,j=1\sum_{i=1}^s \beta_{i, j} = 1 where βi,j\beta_{i, j} denotes the jj-th element of βi\beta_i for 1jN1 \le j \le N.
  • Meanwhile, it is likely that most elements of mm are 00 or 11 in the ground truth of mm. This is because most mim_i is in range (0,1)(0, 1) only when the ii-th pixel is in the boundary of the foreground. So, it is reasonable that if mm is well estimated, most elements of mm are 00 or 11.

Eigenvectors

  • Therefore, JJ can be reformed as follows:
J=mini=1sj=1Nβi,jρ+1βi,jρ\begin{aligned} J = \min \sum_{i=1}^s \sum_{j=1}^N \vert \beta_{i, j} \vert^\rho + \vert 1 - \beta_{i, j} \vert^\rho \end{aligned}

Each βi\beta_{i} is forced to be distributed to 00 or 11 when it is minimized. This cost function produces a good solution even when user scribbles are not given.

Reference

[1] Richard J. Radke, (2012) Computer Vision for Visual Effects, Cambridge University Press, Cambridge.


© 2022. All rights reserved.