Emile Timothy
Main Page

Blog

Arbitrary-Precision Approximations of Lipschitz Functions by Linear Combinations of Bernstein Polynomials

Let \(P_n\) denote the set of all real polynomials of degree \(\leq n\). For any function \(f : [0, 1] \to \mathbb{R}\), we define the \(\ell_\infty\)-norm as $$\|f\|_\infty = \max\{|f(x)|: x \in [0, 1]\}$$ The Weierstrass approximation theorem says: for any continuous real function \(f\) on \([0, 1]\), that $$\lim_{n\to\infty} \min_{p \in P_n} \|p − f\|_\infty = 0$$ This means that any function can be approximated arbitrarily well by an infinite-degree polynomial. In 1912 Bernstein gave an elegant probabilistic proof of this theorem. In this post, I'm going to prove a variant of the approximation theorem in which we have a Lipschitz condition on \(f\): that there is a \(C < \infty\) such that $$\sup_{x\neq y \in [0,1]} \frac{|f(x) - f(y)|}{|x − y|} \leq C$$ The goal of this post is to show that \(\|f - b_f\|_\infty \leq O(C/\sqrt{n})\). This means that I'm going to bound the distance of the furthest deviation from the approximation by a term that tends to \(0\) (by the inverse square root of \(n\)) as \(n\to\infty\). I'm also going to define an explicit polynomial with which we can create such an approximation: the Bernstein polynomial.

The Bernstein polynomials of degree \(n\) are defined for \(0 \leq k \leq n\) as: $$b_{n,k}(x) = {n\choose k}x^k(1−x)^{n−k}$$ Also, the Bernstein approximation of a function \(f\) is given by $$b_f(x)=\sum_{k=1}^n f(k/n) b_{n,k}(x)$$ We then show that \(\|f-b_f\|_\infty \leq O(C/\sqrt{n})\), where \(f\) is a \(C\)-Lipschitz function such that \(\sup_{x\neq y \in [0,1]} |f(x) − f(y)|/|x − y| \leq C\), where \(C < \infty\). For \(x\in[0,1]\), let \(\{K_i\}_{i=1}^n\) be independently and identically distributed Bernoulli random variables with expectation \(x\), and let \(K = \sum_{i=1}^n K_i\). Then, \(\Pr(K_i = 1) = x\) and \(\Pr(K_i = 0) = 1−x\).

We observe that \(b_{n,i}(x) = {n\choose i}x^i (n-x)^{n-i} = \Pr[K = i]\) as \(K \sim \mathrm{Bin}(n,x)\) since \(K = \sum_{i=1}^n K_i\), where \(K_i \sim \mathrm{Bernoulli}(x)\). Then, \(b_f(x) = \mathbb{E}[f(K/n)]\) since \(b_f(x) = \sum_{i=1}^n f(i/n) b_{n,i}(x) = \sum_{i=1}^n f(i/n) \Pr[K = i] = \mathbb{E}[f(K/n)]\).

\(\DeclareMathOperator*{\EE}{\mathbb{E}}\) Note that for all \(x\in[0,1]\), if \(|f(x) -b_f(x)| \leq \beta\), then \(\|f-b_f\|_\infty \leq \beta\) as well. So, it suffices to show that for all \(x\in[0,1]\), we have that \(|f(x) - b_f(x)|\leq O(C/\sqrt{n})\). That is what we aim to first show. First, since \(b_f(x) = \EE[f(K/n)]\) from earlier, we must have that: \begin{align*} |f(x)-b_f(x)| &= |f(x) - \EE[f(K/n)]| = |\EE[f(x) - f(K/n)]| \end{align*} The last equality follows by linearity of expectations: \(\EE[\alpha X + \beta Y] = \alpha \EE[X] + \beta\EE[Y]\) for constants \(\alpha,\beta\) and random variables \(X,Y\). Then, to simplify this further, we note that the absolute value operator is a convex function since \(|\alpha X + \beta Y| \leq \alpha |X| + \beta|Y|\) for positive \(\alpha,\beta\). Therefore, Jensen's inequality must hold for it. Jensen's inequality states that for a convex function \(\zeta\), we must have that \(\zeta(\EE[X]) \leq \EE[\zeta(X)]\). Therefore, setting \(\zeta = |\cdot|\), we get that: \begin{align*} |\EE[f(x) - f(K/n)]|&\leq \EE[|f(x) - f(K/n)|] \end{align*} Since \(f\) is \(C\)-Lipschitz continuous, we have that \(|f(x) - f(K/n)|/|x-K/n| \leq C \implies |f(x) - f(K/n)|\leq C|x-K/n|\). So, \begin{align*} |f(x)-b_f(x)| &\leq \EE[|f(x)-f(K/n)|] \leq \EE[C|x-K/n|] \\ &= C\EE[|x-K/n|] \end{align*} In the last line, we used the linearity of expectations. To compute the resulting expectation, we then define a family of events \(\{D_i\}_{i=1}^n\): $$D_i = \bigg[\frac{i}{\sqrt{n}}<\bigg|\frac{K}{n}-x\bigg|\leq \frac{i+1}{\sqrt{n}}\bigg]$$ So, we then use the law of total expectation, where we condition on \(D_i\), which gives us: \begin{align*} |f(x) - b_f(x)| \leq C\EE[|x-K/n|] = C\cdot \sum_{i=0}^n \EE[|x-K/n| \text{ $|$ }D_i]\Pr(D_i) \end{align*} Note that since \(D_i = [\frac{i}{\sqrt{n}}<|\frac{K}{n}-x|\leq \frac{i+1}{\sqrt{n}}]\), we can write that \(\EE[|x-K/n|\text{ | }D_i]\geq \frac{i}{\sqrt{n}}\). Additionally, we upper bound the two-tailed probability scenario by only considering the limit of the lower-tail: \begin{align*} \Pr(D_i) &= \Pr\bigg(\frac{i}{\sqrt{n}}<\bigg|\frac{K}{n}-x\bigg|\leq\frac{i+1}{\sqrt{n}}\bigg) \leq \Pr\bigg(\frac{i}{\sqrt{n}}<\bigg|\frac{K}{n}-x\bigg|\bigg) \\ \end{align*} Simplifying this term by considering each case of the absolute value yields that: \begin{align*} \Pr\bigg(\frac{i}{\sqrt{n}}<\bigg|\frac{K}{n}-x\bigg|\bigg) &= \Pr\bigg(\frac{i}{\sqrt{n}}<\frac{K}{n}-x\bigg) + \Pr\bigg(\frac{i}{\sqrt{n}} < x - \frac{K}{n}\bigg) \\ &= \Pr(i\sqrt{n}< K-x n) + \Pr(i\sqrt{n}< x n - K) \end{align*} Note that \(\EE[xn - K] = nx - \EE[K] = nx - nx = 0\), and \(\EE[K-xn]=\EE[K]-xn = xn - xn = 0\).

Before I continue, I'm quickly going to state the Chernoff bound: Let \(X_i\) for \(1\leq i\leq n\) be Bernoulli with \(\EE[X_i]=p_i\). Set \(Y_i = X_i - p_i\) so \(\EE[Y_i]=0\). Set \(Y = \sum Y_i\). Then, for \(\lambda>0\), we must have that \(\Pr[Y>\lambda\sqrt{n}]< e^{-2\lambda^2}\).

Therefore, the Chernoff bound tells us that \(\Pr(i\sqrt{n}< K-xn) < e^{-2i^2}\) and \(\Pr(i\sqrt{n}< xn-K) < e^{-2i^2}\). So, \(\Pr(D_i)< 2e^{-2i^2}\). Hence, we get that: \begin{align*} |f(x) - b_f(x)| &\leq C\cdot \sum_{i=0}^n \EE[|x-K/n| \text{ $|$ }D_i]\Pr(D_i) \leq C\sum_{i=0}^n \frac{2ie^{-2i^2}}{\sqrt{n}} = \frac{2C}{\sqrt{n}} \sum_{i=0}^n i e^{-2i^2} \end{align*} We now bound the resulting finite sum with its infinite sum: \(\sum_{i=0}^n ie^{-2i^2} \leq \sum_{i=0}^\infty ie^{-2i^2}\). Then, \(e^{2i^2}>i^3\) for large enough \(i\) since taking natural logs on both sides tells us that \(2i^2 > 3\ln i\). Thus, we have that \(\sum_{i=0}^n ie^{-2i^2}\leq \sum_{i=0}^\infty i/i^3 = \sum_{i=0}^\infty \frac{1}{i^2} = \frac{\pi^2}{6}\). Therefore: \begin{align*} |f(x) - b_f(x)| &\leq \frac{2C}{\sqrt{n}} \sum_{i=0}^n i e^{-2i^2} \\ &\leq \frac{2C}{\sqrt{n}} \cdot \frac{\pi^2}{6} = \frac{\pi^2}{3} \cdot \frac{C}{\sqrt{n}} \\ &= O(C/\sqrt{n}) \end{align*} Hence, \(|f(x) - b_f(x)| \leq O(C/\sqrt{n})\). But, from our earlier assertion, this then implies that \begin{align*} \boxed{\|f - b_f\|_\infty \leq O(C/\sqrt{n})} \end{align*} This proves the claim. So, all \(C\)-Lipschitz functions can be approximated arbitrarily well by a linear combination of Bernstein polynomials, where the furthest deviation errs by atmost \(O(C/\sqrt{n})\), where \(n\) is the degree of the Bernstein polynomial.