CATEGORII DOCUMENTE
Afaceri Calculatoare Casa masina Didactica pedagogie Diverse Educatie Finante Geografie Istorie & politica Legislatie Limba Management Sanatate Tehnologie

Bulgara	Ceha slovaca	Croata	Engleza	Estona	Finlandeza	Franceza
Germana	Italiana	Letona	Lituaniana	Maghiara	Olandeza	Poloneza
Sarba	Slovena	Spaniola	Suedeza	Turca	Ucraineana

Administration	Animals	Art	Biology	Books	Botanics	Business	Cars
Chemistry	Computers	Comunications	Construction	Ecology	Economy	Education	Electronics
Engineering	Entertainment	Financial	Fishing	Games	Geography	Grammar	Health
History	Human-resources	Legislation	Literature	Managements	Manuals	Marketing	Mathematic
Medicines	Movie	Music	Nutrition	Personalities	Physic	Political	Psychology
Recipes	Sociology	Software	Sports	Technical	Tourism	Various

Symmetric positive-definite matrices and least-squares approximation

Mathematic

+ Font mai mare | - Font mai mic

DOCUMENTE SIMILARE

AN ILL FATED SATELLITE

A DIFFERENT DEMONSTRATION OF THE PYTHAGOREAN THEOREM USING THE POWER OF THE POINT WITH RESPECT TO A CIRCLE

VECTOR SPACES

Maths and Vibrations � Tutorial Sheet 4 Solutions

Linear transformations - Definition and General Properties

Conics - a class of plain curves

BILLINIAR FORMS AND QUADRATIC FORMS

Irish Maths Test

Standard Normal Probabilities

Geometrical Characteristics of the Beam Cross-Section

Symmetric positive-definite matrices and least-squares approximation

Symmetric positive-definite matrices have many interesting and desirable properties. For example, they are nonsingular, and LU decomposition can be performed on them without our having to worry about dividing by 0. In this section, we shall prove several other important properties of symmetric positive-definite matrices and show an interesting application to curve fitting by a least-squares approximation.

The first property we prove is perhaps the most basic.

Lemma 13

Any symmetric positive-definite matrix is nonsingular.

Proof Suppose that a matrix A is singular. Then by Corollary 3, there exists a nonzero vector x such that Ax = 0. Hence, x^TAx = 0, and A cannot be positive-definite.

The proof that we can perform LU decomposition on a symmetric positive-definite matrix A without dividing by 0 is more involved. We begin by proving properties about certain submatrices of A. Define the kth leading submatrix of A to be the matrix A_k consisting of the intersection of the first k rows and first k columns of A.

Lemma 14

If A is a symmetric positive-definite matrix, then every leading submatrix of A is symmetric and positive-definite.

Proof That each leading submatrix A_k is symmetric is obvious. To prove that A_k is positive-definite, let x be a nonzero column vector of size k, and let us partition A as follows:

Then, we have

since A is positive-definite, and hence A_k is also positive-definite.

We now turn to some essential properties of the Schur complement. Let A be a symmetric positive-definite matrix, and let A_k be a leading k k submatrix of A. Partition A as

Then, the Schur complement of A with respect to Ak is defined to be

(By Lemma 14, Ak is symmetric and positive-definite; therefore, exists by Lemma 13, and S is well defined.) Note that our earlier definition (23) of the Schur complement is consistent with definition (29), by letting k = 1.

The next lemma shows that the Schur-complement matrices of symmetric positive-definite matrices are themselves symmetric and positive-definite. This result was used in Theorem 12, and its corollary is needed to prove the correctness of LU decomposition for symmetric positive-definite matrices.

Lemma 15

If A is a symmetric positive-definite matrix and A_k is a leading data k k submatrix of A, then the Schur complement of A with respect to A_k is symmetric and positive-definite.

Proof That S is symmetric follows from Exercise 1-7. It remains to show that S is positive-definite. Consider the partition of A given in equation (28).

For any nonzero vector x, we have x^TAx > 0 by assumption. Let us break x into two subvectors y and z compatible with A_k and C, respectively. Because A_k^-1 exists, we have

by matrix magic. (Verify by multiplying through.) This last equation amounts to 'completing the square' of the quadractic form. (See Exercise 6-2.)

Since x^TAx > 0 holds for any nonzero x, let us pick any nonzero z and then choose , which causes the first term in equation (30) to vanish, leaving

as the value of the expression. For any z 0, we therefore have z^TSz = x^TAx > 0, and thus S is positive-definite.

Corollary 16

LU decomposition of a symmetric positive-definite matrix never causes a division by 0.

Proof Let A be a symmetric positive-definite matrix. We shall prove something stronger than the statement of the corollary: every pivot is strictly positive. The first pivot is a₁₁. Let e₁ be the first unit vector, from which we obtain . Since the first step of LU decomposition produces the Schur complement of A with respect to A₁ = (a₁₁), Lemma 15 implies that all pivots are positive by induction.

Least-squares approximation

Fitting curves to given sets of data points is an important application of symmetric positive-definite matrices. Suppose that we are given a set of m data points

(x₁,y₁),(x₂,y₂),,(x_m,y_m),

where the y_i are known to be subject to measurement errors. We would like to determine a function F (x) such that

y_i = F(x_i) +

_i,

for i = 1, 2, . . . , m, where the approximation errors i are small. The form of the function F depends on the problem at hand. Here, we assume that it has the form of a linearly weighted sum,

where the number of summands n and the specific basis functions f_j are chosen based on knowledge of the problem at hand. A common choice is f_j(x) = x^j^-1, which means that

F(x) = c₁ + c₂x + c₃x² +

+ c_nxⁿ^-1

is a polynomial of degree n - 1 in x.

By choosing n = m, we calculate each y_i exactly in equation (31). Such a high-degree F 'fits the noise' as well as the data, however, and generally gives poor results when used to predict y for previously unseen values of x. It is usually better to choose n significantly smaller than m and hope that by choosing the coefficients c_j well, we can obtain a function F that finds the significantl patterns in the data points without paying undue attention to the noise. Some theoretical principles exist for choosing n, but they are beyond the scope of this text. In any case, once n is chosen, we end up with an overdetermined set of equations that we wish to solve as well as we can. We now show how this can be done.

Let

denote the matrix of values of the basis functions at the given points; that is, a_ij = f_j(x_i). Let c = (c_k) denote the desired size-n vector of coefficients. Then,

is the size-m vector of 'predicted values' for y. Thus,

= Ac - y

is the size-m vector of approximation errors.

To minimize approximation errors, we choose to minimize the norm of the error vector , which gives us a least-squares solution, since

Since

we can minimize |||| by differentiating ||||² with respect to each c_k and then setting the result to 0:

The n equations (32) for k = 1, 2, . . . , n are equivalent to the single matrix equation

(Ac -y)^TA = 0

or, equivalently (using Exercise 1-3), to

A^T(Ac - y) = 0 ,

which implies

A^TAc = A^Ty .

In statistics, this is called the normal equation. The matrix A^T A is symmetric by Exercise 1-3, and if A has full column rank, then A^TA is positive-definite as well. Hence, (A^TA)^-1 exists, and the solution to equation (33) is

c = ((A^TA)^-1A^T)y
= A⁺y ,

where the matrix A⁺ = ((A^TA)^-1A^T) is called the pseudoinverse of the matrix A. The pseudoinverese is a natural generalization of the notion of a matrix inverse to the case in which A is nonsquare. (Compare equation (34) as the approximate solution to Ac = y with the solution A^-1b as the exact solution to Ax = b.)

As an example of producing a least-squares fit, suppose that we have 5 data points

shown as black dots in Figure 3. We wish to fit these points with a quadratic polynomial

F(x) = c₁ + c₂x + c₃x².

We start with the matrix of basis-function values

Figure 3 The least-squares fit of a quadratic polynomial to the set of data points . The black dots are the data points, and the white dots are their estimated values predicted by the polynomial F(x) = 1.2 - 0.757x + 0.214x², the quadratic polynomial that minimizes the sum of the squared errors. The error for each data point is shown as a shaded line.

whose pseudoinverse is

Multiplying y by A⁺, we obtain the coefficient vector

which corresponds to the quadratic polynomial

F(x) = 1.200 - 0.757x + 0.214x²

as the closest-fitting quadratic to the given data, in a least-squares sense.

As a practical matter, we solve the normal equation (33) by multiplying y by A^T and then finding an LU decomposition of A^T A. If A has full rank, the matrix A^T A is guaranteed to be nonsingular, because it is symmetric and positive-definite. (See Exercise 1-3 and Theorem 6.)

Exercises

Prove that every diagonal element of a symmetric positive-definite matrix is positive.

Let be a 2 2 symmetric positive-definite matrix. Prove that its determinant ac - b² is positive by 'completing the square' in a manner similar to that used in the proof of Lemma 15.

Prove that the maximum element in a symmetric positive-definite matrix lies on the diagonal.

Prove that the determinant of each leading submatrix of a symmetric positive-definite matrix is positive.

Let A_k denote the kth leading submatrix of a symmetric positive-definite matrix A. Prove that det(A_k)/det(A_k_-1) is the kth pivot during LU decomposition, where by convention det(A₀) = 1.

Find the function of the form

F(x) = c₁ + c₂x lg x + c₃e^x

that is the best least-squares fit to the data points

Show that the pseudoinverse A⁺ satisfies the following four equations:

AA A = A ,
A AA⁺= A⁺,
(AA⁺)^T= AA⁺,
(A⁺A)^T= A⁺A .

Problems

31-1 Shamir's boolean matrix multiplication algorithm

In Section 3, we observed that Strassen's matrix-multiplication algorithm cannot be applied directly to boolean matrix multiplication because the boolean quasiring Q = (, , 0, 1) is not a ring. Theorem 10 showed that if we used arithmetic operations on words of O(lg n) bits, we could nevertheless apply Strassen's method to multiply n n boolean matrices in O(n^{lg 7}) time. In this problem, we investigate a probabilistic method that uses only bit operations to achieve nearly as good a bound but with some small chance of error.

a. Show that R = (, , , 0, 1), where is the XOR (exclusive-or) function, is a ring.

Let A = (a_ij) and B = (b_ij) be n n boolean matrices, and let C = (c_ij) = AB in the quasiring Q. Generate A' = (a'_ij) from A using the following randomized procedure:

If a_ij = 0, then let a'_ij = 0.

If A_ij = 1, then let a'_ij = 1 with probability 1/2 and let a'_ij = 0 with probability 1/2. The random choices for each entry are independent.

b Let C' = (c'_ij) = A'B in the ring R. Show that c_ij = 0 implies c'_ij = 0. Show that c_ij = 1 implies c'_ij = 1 with probability 1/2.

c Show that for any > 0, the probability is at most /n² that a given c'_ij never takes on the value c_ij for lg(n²/) independent choices of the matrix A'. Show that the probability is at least 1 - that all c'_ij take on their correct values at least once.

d Give an O(n^{lg 7}lg n)-time randomized algorithm that computes the product in the boolean quasiring Q of two n n matrices with probability at least 1 - 1/n^k for any constant k > 0. The only operations permitted on matrix elements are , and .

31-2 Tridiagonal systems of linear equations

Consider the tridiagonal matrix

a. Find an LU decomposition of A.

b. Solve the equation Ax = (1 1 1 1 1)^T by using forward and back substitution.

c. Find the inverse of A.

d. Show that for any n n symmetric, positive-definite, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LU decomposition. Argue that any method based on forming A^-¹ is asymptotically more expensive in the worst case.

e. Show that for any n n nonsingular, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LUP decomposition.

31-3 Splines

A practical method for interpolating a set of points with a curve is to use cubic splines. We are given a set of n + 1 point-value pairs, where x₀ < x₁ < < x_n. We wish to fit a piecewise-cubic curve (spline) f(x) to the points. That is, the curve f(x) is made up of n cubic polynomials f_i(x) = a_i + b_ix + c_ix² + d_ix³ for i = 0,1, . . . , n - 1, where if x falls in the range x_i x x_i_+
1, then the value of the curve is given by f(x) = f_i(x - x_i). The points x_i at which the cubic polynomials are 'pasted' together are called knots. For simplicity, we shall assume that x_i = i for i = 0,1, . . . , n.

To ensure continuity of f(x), we require that

f(x_i) = f_i(0) = y_i,
f(x_i₊₁) = f_i(1) = y_i₊₁

for i = 0, 1, . . . , n - 1. To ensure that f(x) is sufficiently smooth, we also insist that there be continuity of the first derivative at each knot:

for i = 0, 1, . . . , n - 1.

a. Suppose that for i = 0, 1, . . . , n, we are given not only the point-value pairs but also the first derivatives D_i = f'(x_i) at each knot. Express each coefficient a_i, b_i, c_i, and d_i in terms of the values y_i, y_i₊₁, D_i, and D_i₊₁. (Remember that x_i = i.) How quickly can the 4n coefficients be computed from the point-value pairs and first derivatives?

The question remains of how to choose the first derivatives of f(x) at the knots. One method is to require the second derivatives to be continuous at the knots:

for i = 0, 1, . . . ,n - 1. At the first and last knots, we assume that ; these assumptions make f(x) a natural cubic spline.

b. Use the continuity constraints on the second derivative to show that for i = 1, 2, . . . , n - 1,

D_i + 4D_i + D_i_{+ 1} = 3(y_i_{+ 1} - y_i_{- 1}) .

c. Show that

2D₀+ D₁ = 3(y₁ - y₀) ,

D_n+ 2D_n = 3(y_n - y_n_{- 1}) .

d. Rewrite equations (35)--(37) as a matrix equation involving the vector D = D₀, D₁, . . . , D_n of unknowns. What attributes does the matrix in your equation have?

e. Argue that a set of n + 1 point-value pairs can be interpolated with a natural cubic spline in O(n) time (see Problem 31-2).

f. Show how to determine a natural cubic spline that interpolates a set of n + 1 points (x_i, y_i) satisfying x₀ < x₁ < < x_n, even when x_i is not necessarily equal to i. What matrix equation must be solved, and how quickly does your algorithm run?

Chapter notes

There are many excellent texts available that describe numerical and scientific computation in much greater detail than we have room for here. The following are especially readable: George and Liu [81], Golub and Van Loan [89], Press, Flannery, Teukolsky, and Vetterling[161, 162], and Strang[181, 182].

The publication of Strassen's algorithm in 1969 [183] caused much excitement. Before then, it was hard to imagine that the naive algorithm could be improved upon. The asymptotic upper bound on the difficulty of matrix multiplication has since been considerably improved. The most asymptotically efficient algorithm for multiplying n n matrices to date, due to Coppersmith and Winograd [52], has a running time of O(n^2.376). The graphical presentation of Strassen's algorithm is due to Paterson [155]. Fischer and Meyer [67] adapted Strassen's algorithm to boolean matrices (Theorem 10) .

Gaussian elimination, upon which the LU and LUP decompositions are based, was the first systematic method for solving linear systems of equations. It was also one of the earliest numerical algorithms. Although it was known earlier, its discovery is commonly attributed to C. F. Gauss (1777-1855). In his famous paper [183], Strassen also showed that an n n matrix can be inverted in O(n^{lg 7}) time. Winograd [203] originally proved that matrix multiplication is no harder than matrix inversion, and the converse is due to Aho, Hopcroft, and Ullman [4].

Strang [182] has an excellent presentation of symmetric positive-definite matrices and on linear algebra in general. He makes the following remark on page 334: 'My class often asks about unsymmetric positive definite matrices. I never use that term.'

6 Symmetric positive-definite matrices and least-squares approximation

The first property we prove is perhaps the most basic.

Lemma 13

Any symmetric positive-definite matrix is nonsingular.

Proof Suppose that a matrix A is singular. Then by Corollary 3, there exists a nonzero vector x such that Ax = 0. Hence, x^TAx = 0, and A cannot be positive-definite.

Lemma 14

If A is a symmetric positive-definite matrix, then every leading submatrix of A is symmetric and positive-definite.

Proof That each leading submatrix A_k is symmetric is obvious. To prove that A_k is positive-definite, let x be a nonzero column vector of size k, and let us partition A as follows:

Then, we have

since A is positive-definite, and hence A_k is also positive-definite.

We now turn to some essential properties of the Schur complement. Let A be a symmetric positive-definite matrix, and let A_k be a leading k k submatrix of A. Partition A as

Then, the Schur complement of A with respect to Ak is defined to be

Lemma 15

If A is a symmetric positive-definite matrix and A_k is a leading data k k submatrix of A, then the Schur complement of A with respect to A_k is symmetric and positive-definite.

Proof That S is symmetric follows from Exercise 1-7. It remains to show that S is positive-definite. Consider the partition of A given in equation (28).

For any nonzero vector x, we have x^TAx > 0 by assumption. Let us break x into two subvectors y and z compatible with A_k and C, respectively. Because A_k^-1 exists, we have

by matrix magic. (Verify by multiplying through.) This last equation amounts to 'completing the square' of the quadractic form. (See Exercise 6-2.)

Since x^TAx > 0 holds for any nonzero x, let us pick any nonzero z and then choose , which causes the first term in equation (30) to vanish, leaving

as the value of the expression. For any z 0, we therefore have z^TSz = x^TAx > 0, and thus S is positive-definite.

Corollary 16

LU decomposition of a symmetric positive-definite matrix never causes a division by 0.

Least-squares approximation

Fitting curves to given sets of data points is an important application of symmetric positive-definite matrices. Suppose that we are given a set of m data points

(x₁,y₁),(x₂,y₂),,(x_m,y_m),

where the y_i are known to be subject to measurement errors. We would like to determine a function F (x) such that

y_i = F(x_i) +

_i,

for i = 1, 2, . . . , m, where the approximation errors i are small. The form of the function F depends on the problem at hand. Here, we assume that it has the form of a linearly weighted sum,

where the number of summands n and the specific basis functions f_j are chosen based on knowledge of the problem at hand. A common choice is f_j(x) = x^j^-1, which means that

F(x) = c₁ + c₂x + c₃x² +

+ c_nxⁿ^-1

is a polynomial of degree n - 1 in x.

Let

denote the matrix of values of the basis functions at the given points; that is, a_ij = f_j(x_i). Let c = (c_k) denote the desired size-n vector of coefficients. Then,

is the size-m vector of 'predicted values' for y. Thus,

= Ac - y

is the size-m vector of approximation errors.

To minimize approximation errors, we choose to minimize the norm of the error vector , which gives us a least-squares solution, since

Since

we can minimize |||| by differentiating ||||² with respect to each c_k and then setting the result to 0:

The n equations (32) for k = 1, 2, . . . , n are equivalent to the single matrix equation

(Ac -y)^TA = 0

or, equivalently (using Exercise 1-3), to

A^T(Ac - y) = 0 ,

which implies

A^TAc = A^Ty .

c = ((A^TA)^-1A^T)y
= A⁺y ,

As an example of producing a least-squares fit, suppose that we have 5 data points

shown as black dots in Figure 3. We wish to fit these points with a quadratic polynomial

F(x) = c₁ + c₂x + c₃x².

We start with the matrix of basis-function values

Figure 3 The least-squares fit of a quadratic polynomial to the set of data points . The black dots are the data points, and the white dots are their estimated values predicted by the polynomial F(x) = 1.2 - 0.757x + 0.214x², the quadratic polynomial that minimizes the sum of the squared errors. The error for each data point is shown as a shaded line.

whose pseudoinverse is

Multiplying y by A⁺, we obtain the coefficient vector

which corresponds to the quadratic polynomial

F(x) = 1.200 - 0.757x + 0.214x²

as the closest-fitting quadratic to the given data, in a least-squares sense.

Exercises

Prove that every diagonal element of a symmetric positive-definite matrix is positive.

Let be a 2 2 symmetric positive-definite matrix. Prove that its determinant ac - b² is positive by 'completing the square' in a manner similar to that used in the proof of Lemma 15.

Prove that the maximum element in a symmetric positive-definite matrix lies on the diagonal.

Prove that the determinant of each leading submatrix of a symmetric positive-definite matrix is positive.

Let A_k denote the kth leading submatrix of a symmetric positive-definite matrix A. Prove that det(A_k)/det(A_k_-1) is the kth pivot during LU decomposition, where by convention det(A₀) = 1.

Find the function of the form

F(x) = c₁ + c₂x lg x + c₃e^x

that is the best least-squares fit to the data points

Show that the pseudoinverse A⁺ satisfies the following four equations:

AA A = A ,
A AA⁺= A⁺,
(AA⁺)^T= AA⁺,
(A⁺A)^T= A⁺A .

Problems

31-1 Shamir's boolean matrix multiplication algorithm

a. Show that R = (, , , 0, 1), where is the XOR (exclusive-or) function, is a ring.

Let A = (a_ij) and B = (b_ij) be n n boolean matrices, and let C = (c_ij) = AB in the quasiring Q. Generate A' = (a'_ij) from A using the following randomized procedure:

If a_ij = 0, then let a'_ij = 0.

If A_ij = 1, then let a'_ij = 1 with probability 1/2 and let a'_ij = 0 with probability 1/2. The random choices for each entry are independent.

b Let C' = (c'_ij) = A'B in the ring R. Show that c_ij = 0 implies c'_ij = 0. Show that c_ij = 1 implies c'_ij = 1 with probability 1/2.

31-2 Tridiagonal systems of linear equations

Consider the tridiagonal matrix

a. Find an LU decomposition of A.

b. Solve the equation Ax = (1 1 1 1 1)^T by using forward and back substitution.

c. Find the inverse of A.

e. Show that for any n n nonsingular, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LUP decomposition.

31-3 Splines

To ensure continuity of f(x), we require that

f(x_i) = f_i(0) = y_i,
f(x_i₊₁) = f_i(1) = y_i₊₁

for i = 0, 1, . . . , n - 1. To ensure that f(x) is sufficiently smooth, we also insist that there be continuity of the first derivative at each knot:

for i = 0, 1, . . . , n - 1.

The question remains of how to choose the first derivatives of f(x) at the knots. One method is to require the second derivatives to be continuous at the knots:

for i = 0, 1, . . . ,n - 1. At the first and last knots, we assume that ; these assumptions make f(x) a natural cubic spline.

b. Use the continuity constraints on the second derivative to show that for i = 1, 2, . . . , n - 1,

D_i + 4D_i + D_i_{+ 1} = 3(y_i_{+ 1} - y_i_{- 1}) .

c. Show that

2D₀+ D₁ = 3(y₁ - y₀) ,

D_n+ 2D_n = 3(y_n - y_n_{- 1}) .

d. Rewrite equations (35)--(37) as a matrix equation involving the vector D = D₀, D₁, . . . , D_n of unknowns. What attributes does the matrix in your equation have?

e. Argue that a set of n + 1 point-value pairs can be interpolated with a natural cubic spline in O(n) time (see Problem 31-2).

Chapter notes

Politica de confidentialitate | Termeni si conditii de utilizare

DISTRIBUIE DOCUMENTUL

Vizualizari: 1904
Importanta:

Comenteaza documentul:

Te rugam sa te autentifici sau sa iti faci cont pentru a putea comenta

Creaza cont nou

Distribuie URL
https://www.scrigroup.com/limba/engleza/108/Symmetric-positivedefinite-mat33829.php

Adauga cod HTML in site
<a href="https://www.scrigroup.com/limba/engleza/108/Symmetric-positivedefinite-mat33829.php" target="_blank" title=" - https://www.scrigroup.com/limba/engleza/108/Symmetric-positivedefinite-mat33829.php">Symmetric positive-definite matrices and least-squares approximation</a>