Transfunctions Applied to Plans, Markov Operators and Optimal Transport

Jason Bentley; Piotr Mikusiński

doi:10.2478/amsil-2024-0020

Full Article

1.

Introduction

Let ℳ_X and ℳ_Y be vector spaces of finite signed measures defined on measurable spaces (X, ∑_X) and (Y, ∑_Y ), respectively. For finite positive measure µ on (X, ∑_X) and for real-valued function f ∈ ℒ¹(X, µ), let fµ denote the measure A ↦ ∫_A f dµ and define $ℳ_{μ}^{p, +} = {f μ : f \in ℒ^{p} (X, μ), f \geq 0} and ℳ_{μ}^{p} : = {f μ : f \in ℒ^{p} (X, μ)},$ {\cal M}_\mu ^{p, + } = \{ f\mu :\,f \in \,{\cal L^p}(X,\mu ),\,f \ge 0\} \,{\rm{and}}\,{\cal M}_\mu ^p: = \{ f\mu :\,f \in {\cal L^p}(X,\mu )\} , for p ∈ [1, ∞]. We define ℳ_µ to be the set of all finite signed measures absolutely continuous with respect to µ. By the Radon–Nikodym Theorem, $ℳ_{μ} = ℳ_{μ}^{1}$ {\cal M_\mu } = {\cal M}_\mu ^1 . Similarly, we define $ℳ_{v}^{p, +}$ {\cal M}_v^{p, + } and $ℳ_{v}^{p}$ {\cal M_v^p} for finite positive measure ν on (Y, ∑_Y ).

A transfunction is any function Φ: ℳ_X → ℳ_Y , [8]. Strongly σ-additive transfunctions are those which are linear and continuous with respect to total variation. We will sometimes call an operator between Banach spaces strongly σ-additive if it is linear and norm-continuous.

Plans have applications for finding weak solutions for optimal transport problems, [10]. Markov operators, defined in Section 2, have some similarities to stochastic matrices, [4]. Plans and Markov operators have a bijective correspondence as described in [9] and in Section 2. We assign to any corresponding Markov operator/plan pair (T, κ) with marginals µ, ν a unique transfunction Φ: ℳ_µ → ℳ_ν – called a Markov transfunction. However, each Markov transfunction corresponds to a family of Markov operators (resp. plans) which have different marginals but follow the same “instructions”. Φ, T , and κ are related via the equalities $\begin{array}{l} Φ (1_{A} μ) (B) = \int_{B} T (1_{A}) d v = κ (A \times B), \\ \int_{Y} g d Φ (f μ) = \int_{Y} T (f) d (g v) = \int_{X \times Y} (f \otimes g) d κ \end{array}$ \eqalign{ & \Phi ({1_A}\mu )(B) = \int_B {T({1_A})\,dv = \kappa (A\, \times \,B)} , \cr & \int_Y {g\,d\Phi (f\mu )} = \int_Y {T(f)\,d(gv)} = \,\int_{X \times Y} {(f \otimes g)\,d\kappa } \cr} which hold for all A ⊆ X, B ⊆ Y , f ∈ ℒ^∞(X), and g ∈ ℒ^∞ (Y ). The first set of equalities, although simpler, imply the second set of equalities by strong σ-additivity of Φ, bounded-linearity of T , and σ-additivity of κ.

In our investigation of transfunctions we are motivated by the theory developed for the Monge-Kantorovich transportation problems and their far-reaching outcomes; see [1], [5], [6], and [10].

Let ℱ_X and ℱ_Y be spaces of measurable functions which are integrable by measures in ℳ_X and ℳ_Y , respectively. If {ℱ_X, ℳ_X} and {ℱ_Y , ℳ_Y } are separating pairs with respect to integration as defined in Section 3, then we define the Radon adjoint of Φ: ℳ_X → ℳ_Y (if it exists) to be the unique linear bounded operator Φ^* : ℱ_Y → ℱ_X such that $\int_{X} Φ^{*} (g) d λ = \int_{Y} g d Φ (λ)$ \int_X {{\Phi ^ * }} \,(g)\,d\lambda = \int_Y {\,g\,d\Phi (\lambda )} for all g ∈ ℱ_Y and λ ∈ ℳ_X.

If X and Y are second-countable locally compact Hausdorff spaces, if ℱ_X and ℱ_Y are Banach spaces of bounded continuous functions (uniform norm) and if ℳ_X and ℳ_Y are Banach spaces of finite regular signed measures (total variation), then any strongly σ-additive weakly-continuous transfunction Φ: ℳ_X → ℳ_Y has an adjoint Φ* which is a linear, uniformly-continuous and bounded-pointwise-continuous operator (and vice versa) such that ||Φ|| = ||Φ*||. When X and Y are also non-atomic, ℳ_X and ℳ_Y include measures which are non-atomic and strictly-positive; see [2].

In future research, we wish to develop functional analysis on transfunctions, and adjoints may be utilized to this end. In contexts where operators on functions are more appropriate or preferable, the adjoint may prove crucial.

A simple transfunction Φ: ℳ_X → ℳ_Y is one which has the form $Φ (λ) : = \sum_{i = 1}^{m} f_{i}, λ〉 ρ_{i}$ \Phi (\lambda )\,: = \sum\limits_{i = 1}^m {\left\langle {{f_i},\,\lambda } \right\rangle } {\rho _i} for f₁, . . . , f_m ∈ ℱ_X and ρ₁, . . . , ρ_m ∈ ℳ_Y , where 〈f_i, λ〉 := ∫_X f_i dλ. Simple transfunctions are weakly-continuous and strongly σ-additive. When working with locally compact Polish (metric) spaces, simple Markov transfunctions have two advantages: they weakly approximate all Markov transfunctions, and a subclass of them can be utilized to approximate the optimal cost between two marginals with respect to a transport cost c(x, y) that is bounded by αd(x, y)^p for constants α, p > 0.

In [3] the notions of localization of a transfunction and the graph of a transfunction are introduced and studied. They give us an insight into which transfunctions arise from continuous functions or measurable functions or are close to such functions.

This paper is based on part of Bentley’s PhD dissertation.

2.

Markov transfunctions

In this section, we describe a class of transfunctions in which each transfunction corresponds to a family of plans and a family of Markov operators. First, we introduce these concepts. All measurable or continuous functions shall be real-valued in this text. Note that the following definitions allow for all finite positive measures rather than all probability measures.

Definition 2.1

Let µ and ν be finite positive measures on (X, ∑_X) and (Y, ∑_Y ) respectively with ||µ|| = ||ν||. Let κ be a finite positive measure on the product measurable space (X × Y, ∑_X×Y ). We say that κ is a plan with marginals µ and ν if κ(A × Y ) = µ(A) and κ(X × B) = ν(B) for all A ∈ ∑_X and B ∈ ∑_Y . We define Π(µ, ν) to be the set of all plans with marginals µ and ν.

If random variables X, Y have laws µ, ν, then any coupling of X, Y has a law κ which is a plan in Π(µ, ν).

Definition 2.2

Let µ and ν be finite positive measures on (X, ∑_X) and (Y, ∑_Y ) respectively with ||µ|| = ||ν||, and let p ∈ [1, ∞]. We say that a function T : ℒ^p(X, µ) → ℒ^p(Y, ν) is a Markov operator if:

(i)
T is linear with T 1_X = 1_Y ;
(ii)
f ≥ 0 implies T f ≥ 0 for all f ∈ ℒ^p(X, µ);
(iii)
∫_X f dµ = ∫_Y T f dν for all f ∈ ℒ^p(X, µ).

Notice that the definition of Markov operators depends on underlying measures µ and ν on X and Y respectively, even when p = ∞. We now define some properties for transfunctions that are analogous to (ii) and (iii) from Definition 2.2.

Definition 2.3

Let Φ: ℳ_X → ℳ_Y be a transfunction.

(i)
Φ is positive if λ ≥ 0 implies that Φ λ ≥ 0 for all λ ∈ ℳ_X.
(ii)
Φ is measure-preserving if (Φλ)(Y ) = λ(X) for all λ ∈ ℳ_X.
(iii)
Φ is Markov if it is strongly σ-additive, positive and measure-preserving.

By [9], there is a bijective relationship between plans and Markov operators. We will show soon that a relationship between Markov operators and Markov transfunctions exists, which will imply that all three concepts are connected.

Lemma 2.4

Let µ be a finite positive measure on (X, ∑_X), and define the map J_µ : ℒ¹(X, µ) → ℳ_µ via J_µf := fµ. Then J_µ (hence $J_{μ}^{- 1}$ J_\mu ^{ - 1} ) is a positive linear isometry.

Proof

Positivity and linearity of integrals with respect to µ ensure that J_µ is positive and linear. Surjectivity of J_µ is the statement of the Radon–Nikodym Theorem. Injectivity and isometry hold because $\begin{array}{l} J_{μ} f‖ & = & J_{μ} (f^{+}) - J_{μ} (f^{-})‖ = \int_{X} f^{+} d μ + \int_{X} f^{-} d μ \\ = & \int_{X} f| d μ = {f‖}_{1} . \end{array}$ \eqalign{ & \left\| {{J_\mu }f} \right\| & = \left\| {{J_\mu }({f^ + }) - {J_\mu }({f^ - })} \right\| = \int_X {{f^ + }d\mu + \int_X {{f^ - }d\mu } } \cr & & = \int_X {\left| f \right|} d\mu = {\left\| f \right\|_1}. \cr}

Theorem 2.5

Let µ and ν be finite positive measures on X and Y respectively, with ||µ|| = ||ν|| and let s ∈ [1, ∞]. For every Markov operator T : ℒ^s(X, µ) → ℒ^s(Y, ν), there exists a unique Markov transfunction $Φ : ℳ_{μ}^{s} \to ℳ_{ν}^{s}$ \Phi \:\,{\cal M}_\mu ^s \to {\cal M}_\nu ^s such that $\int_{B} T (1_{A}) d ν = Φ (1_{A} μ) (B)$ \int_B {T({1_A})} d\nu = \Phi ({1_A}\mu )(B) for all A ∈ ∑_X and B ∈ ∑_Y .

Every Markov transfunction $Φ : ℳ_{μ}^{s} \to ℳ_{ν}^{s}$ \Phi :\,{\cal M}_\mu ^s \to {\cal M}_\nu ^s corresponds to a family of Markov operators {T_λ,ρ : ℒ^∞(X, λ) → ℒ^∞(Y, ρ) | $λ \in ℳ_{μ}^{s, +}$ \lambda \in {\cal M_\mu ^{s, + }} , ρ = Φλ } which satisfies $\int_{B} T_{λ, ρ} (1_{A}) d ρ = Φ (1_{A} λ) (B)$ \int_B {{T_{\lambda ,\rho }}({1_A})d\rho = \Phi ({1_A}\lambda )} (B) for all A ∈ ∑_X and B ∈ ∑_Y .

Proof

First, we prove each statement for s = 1, then extend the argument to other values of s. Let T : ℒ¹(X, µ) → ℒ¹(Y, ν) be a Markov operator. Define $Φ = J_{ν} T J_{μ}^{- 1}$ \Phi = {J_\nu }\,T\,J_\mu ^{ - 1} . Since all three operators in the definition of Φ are positive and strongly σ-additive, we see that Φ is also positive and strongly σ-additive. Next, if λ ∈ ℳ_µ, then $(Φ λ) (Y) = J_{ν} (T J_{μ}^{- 1} λ) (Y) = \int_{Y} T (J_{μ}^{- 1} λ) d ν = \int_{X} J_{μ}^{- 1} (λ) d μ = λ (X)$ (\Phi \lambda )(Y) = {J_\nu }(TJ_\mu ^{ - 1}\lambda )(Y) = \int_Y {T(J_\mu ^{ - 1}\lambda )d\nu = } \int_X {J_\mu ^{ - 1}} (\lambda )d\mu = \lambda (X) by the definitions of isometries $J_{μ}^{- 1}$ J_\mu ^{ - 1} and J_ν, and by property (iii) of T , so Φ is measure-preserving. Finally, notice that $Φ (1_{A} μ) (B) = J_{ν} T (J_{μ}^{- 1} (1_{A} μ)) (B) = J_{ν} (T 1_{A}) (B) = \int_{B} T (1_{A}) d ν$ \Phi ({1_A}\mu )(B) = {J_\nu }T(J_\mu ^{ - 1}({1_A}\mu ))(B) = {J_\nu }(T{1_A})(B) = \int_B {T({1_A})d\nu } for all A ∈ ∑_X and B ∈ ∑_Y , hence the relation holds.

Now let s ∈ (1, ∞] and let T : ℒ^s(X, µ) → ℒ^s(Y, ν) be a Markov operator. By Theorem 1 from [9], T can be uniquely extended to a Markov operator T̂ on ℒ¹(X, µ). By our previous argument, T̂ corresponds to a Markov transfunction Φ̂ defined on ℳ_µ. We define Φ to be the restriction of Φ̂ to $ℳ_{μ}^{s}$ {\cal M}_\mu ^s . The necessary properties are inherited from the previous argument.

Now we prove the second statement. Let s ∈ [1, ∞], let $Φ : ℳ_{μ}^{s} \to ℳ_{ν}^{s}$ \Phi :\,{\cal M}_\mu ^s \to {\cal M}_\nu ^s be a Markov transfunction, let $λ \in ℳ_{μ}^{s}$ \lambda \in {\cal M}_\mu ^s be positive, and define $ρ : = Φ (λ) \in ℳ_{ν}^{s}$ \rho \,: = \,\Phi (\lambda ) \in {\cal M}_\nu ^s , which is also positive. Define $T = T_{λ, ρ} : = J_{ρ}^{- 1} Φ J_{λ}$ T = {T_{\lambda ,\rho }}\,: = J_\rho ^{ - 1}\,\Phi \,{J_\lambda } with domain L^∞(X, λ). Then $T (1_{X}) = J_{ρ}^{- 1} Φ (J_{λ} (1_{X})) = J_{ρ}^{- 1} (Φ λ) = J_{ρ}^{- 1} ρ = 1_{Y} .$ T({1_X}) = J_\rho ^{ - 1}\Phi ({J_\lambda }({1_X})) = J_\rho ^{ - 1}(\Phi \lambda ) = J_\rho ^{ - 1}\rho = {1_Y}.

Since all three operators in the definition of T are positive and strongly σ-additive, we see that T is also positive and strongly σ-additive, satisfying parts (i) and (ii) of Definition 2.2. Next, if f ∈ ℒ^∞(X, λ), then $\int_{Y} T f d ρ = \int_{Y} J_{ρ}^{- 1} (Φ J_{λ} f) d ρ = (Φ (J_{λ} f)) (Y) = (J_{λ} f) (X) = \int_{X} f d λ,$ \int_Y {T\,f\,d\rho = \int_Y {J_\rho ^{ - 1}} (\Phi {J_\lambda }f)} d\rho = (\Phi ({J_\lambda }f))(Y) = ({J_\lambda }f)(X) = \int_X {f\,d\lambda } , so (iii) of Definition 2.2 is met. Finally, notice that $\int_{B} T (1_{A}) d ρ = \int_{B} J_{ρ}^{- 1} (Φ J_{λ} (1_{A})) d ρ = Φ (J_{λ} (1 A)) (B) = Φ (1_{A} λ) (B)$ \int_B {T({1_A})d\rho } = \int_B {J_\rho ^{ - 1}} (\Phi {J_\lambda }({1_A}))d\rho = \Phi ({J_\lambda }(1A))(B) = \Phi ({1_A}\lambda )(B) for all A ∈ ∑_X and B ∈ ∑_Y , so the relation holds.

One consequence from Theorem 2.5 is that any Markov transfunction defined on $ℳ_{μ}^{s}$ {\cal M}_\mu ^s for s ∈ [1, ∞] uniquely extends or restricts to $ℳ_{μ}^{s^{'}}$ {\cal M}_\mu ^{s'} for all s′ ∈ [1, ∞], thus the value of s is insignificant. This is analogous to a similar property held by Markov operators, as in [9].

The remainder of this section aims to emphasize the importance of Theorem 2.5. For any p ∈ [1, ∞], a transfunction $Φ : ℳ_{μ}^{p} \to ℳ_{ν}^{p}$ \Phi :{\cal M}_\mu ^p \to {\cal M}_\nu ^p , a Markov operator T : ℒ^p(X, µ) → ℒ^p(Y, ν ) and a plan κ ∈ Π(µ, ν) that satisfy the equalities $Φ (1_{A} μ) (B) = \int_{B} T (1_{A}) d ν = κ (A \times B)$ \Phi ({1_A}\mu )(B) = \int_B {T({1_A})\,d\nu = \kappa (A\, \times \,B)} for all A ⊆ X and B ⊆ Y contain the same information (transportation method), but convey it differently. By extending the equalities above for all f ∈ ℒ^p(X, µ) and g ∈ ℒ^q(Y, ν ) with 1/p + 1/q = 1, we have $\int_{Y} g d Φ (f μ) = \int_{Y} T (f) d (g ν) = \int_{X \times Y} (f \otimes g) d κ .$ \int_Y {g\,d\Phi (f\mu )} = \int_Y {T(f)\,d(g\nu ) = \int_{X \times Y} {(f \otimes g)\,d\kappa .} }

Note that if some positive measure µ′ also generates $ℳ_{μ}^{p}$ {\cal M}_\mu ^p , and if we define ν′ = Φ(µ′), then the same transfunction $Φ : ℳ_{μ}^{p} \to ℳ_{ν}^{p}$ \Phi :{\cal M}_\mu ^p \to {\cal M}_\nu ^p corresponds to a Markov operator T′ : ℒ^p(X, µ′) → ℒ^p(Y, ν′) and it corresponds to a plan κ′ with marginals µ′ and ν′. Therefore T and T′ are different Markov operators, κ and κ′ are different plans, yet they follow the same “instructions” encoded by Φ. In this regard, Φ is a global way to describe a transportation method independent of marginals. If µ′ instead generates a smaller space than ℳ_µ, then Φ restricted to ℳ_µ′ contains part but not all of the instructions. Regardless, Φ will be Markov on this restriction. Notably, if µ′ = hµ, then Φ: ℳ_µ′ → ℳ_ν′ has associated Markov operator T_h(f) := T (hf) and associated plan κ′ = (h ⊗ 1_Y )κ.

3.

Radon adjoints of transfunctions

Let (X, ∑_X) be a Borel measurable space, let ℱ_X be a subset of bounded measurable real-valued functions on X and let ℳ_X be a subset of finite signed measures on X. Analogously, we have Y , ℱ_Y and ℳ_Y . For f ∈ ℱ_X and λ ∈ ℳ_X, define 〈f, λ〉 := ∫_X f dλ. Similarly, for g ∈ ℱ_Y and ρ ∈ ℳ_Y , define 〈g, ρ 〉 := ∫_Y g dρ. Occasionally, the elements within angular brackets shall be written in reverse order.

We say that {ℱ_X, ℳ_X} is a separating pair if 〈f₁, λ〉 = 〈f₂, λ〉 for all λ ∈ ℳ_X implies that f₁ = f₂, and if 〈f, λ₁〉 = 〈f, λ₂〉 for all f ∈ ℱ_X implies that λ₁ = λ₂. In this section, we shall develop some theory for two choices of the collections {ℱ_X, ℳ_X} and {ℱ_Y , ℳ_Y }, which we call the continuous setting and the measurable setting.

Definition 3.1

Let {ℱ_X, ℳ_X} and {ℱ_Y , ℳ_Y } each be a separating pair, let Φ: ℳ_X → ℳ_Y be a transfunction, and let S : ℱ_Y → ℱ_X be a function. Then Φ and S are Radon adjoints of each other if the equation $\int_{Y} g d Φ (λ) = \int_{X} S (g) d λ, i . e . g, Φ (λ)〉 = S (g), λ〉$ \int_Y {g\,d\Phi (\lambda ) = \int_X {S(g)\,d\lambda ,\,{\rm{i}}.{\rm{e}}.} } \,\left\langle {g,\,\Phi \,(\lambda )} \right\rangle = \left. {S(g),\lambda } \right\rangle holds for all g ∈ ℱ_Y and λ ∈ ℳ_X.

By utilizing the separation properties of 〈·, ·〉, Radon adjoints of both kinds are unique if they exist. We shall denote the Radon adjoint of Φ by Φ* and of S by S*.

If (Φ, S) is a Radon adjoint pair, then for all g ∈ ℱ_Y , $g, Φ \sum_{i} λ_{i}〉 = S g, \sum_{i} λ_{i}〉 = \sum_{i} S g, λ_{i}〉 = \sum_{i} g, Φ λ_{i}〉 = g, \sum_{i} Φ λ_{i}〉,$ \left\langle {g,\,\Phi \sum\nolimits_i {{\lambda _i}} } \right\rangle = \left\langle {Sg,\sum\nolimits_i {{\lambda _i}} } \right\rangle = \sum\nolimits_i {\left\langle {Sg,\,{\lambda _i}} \right\rangle } = \sum\nolimits_i {\left\langle {g,\,\Phi {\lambda _i}} \right\rangle } = \left\langle {g,\,\sum\nolimits_i {\Phi {\lambda _i}} } \right\rangle , meaning that Φ is linear. Similarly, for all λ ∈ ℳ_X, $S \sum_{i} g_{i}, λ〉 = \sum_{i} g_{i}, Φ λ〉 = \sum_{i} g_{i}, Φ λ〉 = \sum_{i} S g_{i}, λ〉 = \sum_{i} S g_{i}, λ〉,$ \left\langle {S\sum\nolimits_i {{g_i},\,\lambda } } \right\rangle = \,\left\langle {\sum\nolimits_i {{g_i},\,\Phi \lambda } } \right\rangle = \sum\nolimits_i {\left\langle {{g_i},\,\Phi \lambda } \right\rangle } = \sum\nolimits_i {\left\langle {S{g_i},\,\lambda } \right\rangle } = \left\langle {\sum\nolimits_i {S{g_i},\,\lambda } } \right\rangle , meaning that S is linear.

Example 3.2

If Φ = f_# (the push-forward operator) for some measurable f : X → Y , then Φ*(g) = g ○ f = f*g (the pull-back operator acting on g). This is because ∫_Y g d(f_#λ) = ∫_X g ○ f dλ for all g ∈ ℱ_Y, λ ∈ℳ_X.

Example 3.3

If X = Y and Φλ:= f λ for some continuous (or measurable) f : X → ℝ, then Φ* (g) = gf. This is because ∫_X g d(fλ) = ∫_X g f dλ for all g ∈ ℱ_X, λ∈ ℳ_x.

Definition 3.4

Let {ℱ_X, ℳ_X} and {ℱ_Y , ℳ_Y } each be a separating pair.

(i)
(f_n) weakly converges to f in ℱ_X, notated as $f_{n} \underset{\to}{w} f$ f_n {\underrightarrow w}f , if every finite regular measure λ on X yields 〈f_n, λ〉 → 〈f, λ〉 as n → ∞.
(ii)
${(λ_{n})}_{n = 1}^{\infty}$ \left( {{\lambda _n}} \right)_{n = 1}^\infty weakly converges to λ in ℳ_X, notated as $λ_{n} \underset{\to}{w} λ$ \lambda _n \underrightarrow {\,w\,}\lambda , if every bounded continuous f : X → ℝ yields 〈f, λ_n〉 → 〈f, λ〉 as n → ∞.
(iii)
An operator S : ℱ_Y → ℱ_X is weakly continuous if $g_{n} \underset{\to}{w} g$ g_n \underrightarrow {\,w\,}g in ℱ_Y implies that $S g_{n} \underset{\to}{w} S g$ Sg_n \underrightarrow {\,w}\,Sg in ℱ_X.
(iv)
A transfunction Φ: ℳ_X → ℳ_Y is weakly continuous if $λ_{n} \underset{\to}{w} λ$ \lambda _n \underrightarrow {\,w}\,\lambda in ℳ_X implies that $Φ λ_{n} \underset{\to}{w} Φ λ$ \Phi \lambda _n \,\underrightarrow w\,\Phi \lambda in ℳ_Y.

Note that weak convergence of (f_n) in Definition 3.4 (i) is the same notion as bounded-pointwise convergence.

4.

Approximations of identity

Definition 4.1

For a metric space (X, d) with x ∈ X, A ⊆ X and δ > 0, define B(x; δ) := {z ∈ X : d(x, z) < δ} to be the δ-ball around x and define B(A; δ) := ∪_x∈AB(x; δ) to be the δ-inflation around A.

The following two lemmas aid in showing Proposition 4.5.

Lemma 4.2

Let (X, d) be a locally compact metric space. The positive function c: X → (0, ∞] defined via $c (x) : = \sup {δ > 0 : B (x : δ) i s p r e c o m p a c t}$ c(x)\,: = \,\sup \{ \delta > 0\,:\,B(x:\delta )\,is\,precompact\} is either identically ∞ or it is finite and continuous on X. It follows that every compact set K has a precompact inflation B(K; δ) for some δ > 0.

Lemma 4.3

Let (X, d) be a locally compact Polish metric space. Then there exists a pair of sequences ${(x_{i})}_{i = 1}^{\infty}$ ({x_i})_{i = 1}^\infty from X and ${(β_{i})}_{i = 1}^{\infty}$ ({\beta _i})_{i = 1}^\infty from (0, 1] and there exists a function p: ℕ → ℕ such that for all n ∈ ℕ, $K_{n} : = \cup_{i = 1}^{p (n)} \bar{B (x_{i}, β_{i} / n)}$ {K_n}\,: = \bigcup\limits_{i = 1}^{p(n)} {\overline {B({x_i},\,{\beta _i}/n)} } is compact with $K_{n + 1} \supseteq K_{n + 1}^{○} \supseteq K_{n}$ {K_{n + 1}}\, \supseteq K_{n + 1}^\circ \, \supseteq {K_n} and $\cup_{n = 1}^{\infty} K_{n} = X$ \bigcup\nolimits_{n = 1}^\infty {{K_n}\, = \,X} .

Using the setup from Lemma 4.3, we define the collection of sets $C_{n, i} : = \bar{B (x_{i}, β_{i} / n)} - \underset{j < 1}{\cup} \bar{B (x_{j}, β_{j} / n)}$ {C_{n,i}}\,: = \,\overline {B({x_i},{\beta _i}/n)} - \bigcup\limits_{j < 1} {\overline {B({x_j},\,{\beta _j}/n)} } for all n, i ∈ ℕ. It follows for any n ∈ ℕ that $\cup_{i = 1}^{p (n)} C_{n, i} = K_{n}$ \bigcup\nolimits_{i = 1}^{p(n)} {{C_{n,i}}} = {K_n} .

Definition 4.4

A measure µ is called a point-mass measure at x if µ(A) = 1 when x ∈ A and µ(A) = 0 when x ∉ A. A finite linear combination of point-mass measures is called a simple measure.

It is straightforward to show that simple measures are regular. The following proposition suggests a method to create approximations of identity, which shall be discussed in their respective sections below.

Proposition 4.5

Simple measures on a second-countable locally compact Hausdorff space form a dense subset of all finite regular measures with respect to weak convergence.

Proof

Construct sequences ${(x_{i})}_{i = 1}^{\infty}$ ({x_i})_{i = 1}^\infty , ${(β_{i})}_{i = 1}^{\infty}$ ({\beta _i})_{i = 1}^\infty , p: ℕ → ℕ and (C_n,i) via Lemma 4.3. Fix some positive finite measure $λ \in ℳ_{X}^{+}$ \lambda \in {\cal M}_X^ + . Construct a sequence ${(λ_{n})}_{n = 1}^{\infty}$ ({\lambda _n})_{n = 1}^\infty of positive simple measures via $λ_{n} : = \sum_{i = 1}^{p (n)} λ (C_{n, i}) δ_{x_{i}}$ {\lambda _n}: = \sum\nolimits_{i = 1}^{p(n)} {\lambda ({C_{n,i}})} {\delta _{{x_i}}} . We will show that $λ_{n} \underset{\to}{w} λ$ \lambda _n \,\underrightarrow w\,\lambda . In doing so, we fix some function f ∈ C_b(X) and show that 〈f, λ_n〉 → 〈f, λ〉. For density of signed measures, one utilizes the Jordan decomposition and applies a similar argument for each component.

Let ε > 0. Define η := ε/(3||f|| + 3||λ|| + 1) so that ||f|| η < ε/3 and that ||λ||η < ε/3. Choose some natural M such that $λ (K_{M}^{c}) < η$ \lambda (K_M^c) < \eta . Apply Lemma 4.2 to obtain some α > 0 with $L : \bar{B (K_{M}; α)}$ L:\overline {B({K_M};\alpha )} being compact. By uniform continuity of f|_L, choose some natural N > M such that 2/N < α and for all x ∈ L, f(B(x; 2/N) ∩ L) ⊆ B(f(x); η).

Now let n > N. Define $ρ_{n, M} : = \sum_{i = 1}^{p (n)} λ (C_{n, i} \cap K_{M}) δ_{x_{i}}$ {\rho _{n,M}}: = \sum\nolimits_{i = 1}^{p(n)} {\lambda ({C_{n,i}} \cap {K_M}){\delta _{{x_i}}}} . Notice that C_n,i ∩ K_M ≠ Ø implies that x_i ∈ B(K_M ; 1/n) and that C_n,i ⊆ B(K_M ; 2/n) ⊆ L, resulting in f(C_n,i) ⊆ B(f(x_i); η). Three observations can be made:

(a)
$f, λ - 1_{K_{M}} λ〉| \leq f‖ \cdot λ (K_{M}^{c}) < f‖ η$ \left| {\left\langle {f,\lambda - {1_{{K_M}}}\lambda } \right\rangle } \right| \le \left\| f \right\| \cdot \lambda (K_M^c) < \left\| f \right\|\eta ;
(b)
$f, 1_{K_{M}} λ - ρ_{n, M}〉| \leq \int_{K_{M}} f d λ - \sum_{i = 1}^{p (n)} f (x_{i}) λ (C_{n, i} \cap K_{M})| < λ‖ η$ \left| {\left\langle {f,{1_{{K_M}}}\lambda - {\rho _{n,M}}} \right\rangle } \right| \le \left| {\int_{{K_M}} {f\,d\lambda - \sum\nolimits_{i = 1}^{p(n)} {f({x_i})\lambda ({C_{n,i}} \cap {K_M})} } } \right| < \left\| \lambda \right\|\eta ;
(c)
$f, ρ_{n, M} - λ_{n}〉| \leq f‖ \sum_{i = 1}^{p (n)} λ (C_{n, i} \cap K_{M}^{c}) \leq f‖ λ (K_{M}^{c}) < f‖ η$ \left| {\left\langle {f,{\rho _{n,M}} - {\lambda _n}} \right\rangle } \right| \le \left\| f \right\|\sum\nolimits_{i = 1}^{p(n)} {\lambda ({C_{n,i}} \cap K_M^c) \le \left\| f \right\|\lambda (K_M^c) < \left\| f \right\|\eta } .

Therefore, | 〈f, λ − λ_n〉| < 3(ε/3) = ε.

For any finite signed measure λ on X, the sequence (λ_n) of simple measures from Proposition 4.5 weakly converges to λ, hence the sequence of transfunctions (I_n) given by $I_{n} : λ \mapsto λ_{n} = \sum_{i = 1}^{p (n)} 1_{C_{n, i}}, λ〉 δ_{x_{i}}$ {I_n}:\lambda \mapsto {\lambda _n} = \sum\nolimits_{i = 1}^{p(n)} {\left\langle {{1_{{C_{n,i}}}},\lambda } \right\rangle {\delta _{{x_i}}}} is an approximation of identity.

The approximation of identity above is simply described with characteristic functions (1_{C_n,i}) and point-mass measures (δ_{x_i}). However, in each of the two settings below, either the characteristic functions must be replaced by bounded continuous functions or the point-mass measures must be replaced by compactly-supported measures that are absolutely continuous with respect to some underlying measure. With the correct choice of replacements, the same argument as given in Proposition 4.5 can be applied, yielding valid approximations of identities for the respective settings.

For the remainder of this paper, let X and Y be locally-compact Polish spaces, and pick any complete metric for each of them when needed.

5.

Continuous setting: ℱ = C_b, ℳ = ℳ_fr

Let ℱ_X = C_b(X) denote the Banach space of all bounded continuous functions on X with the uniform norm and let ℳ_X = ℳ_fr(X) denote the Banach space of all finite (hence, regular) signed measures on X with the total variation norm. Develop Y , ℱ_Y , and ℳ_Y analogously. It is known that {ℱ_X, ℳ_X} is a separating pair in this setting.

An approximation of identity can be formed in this setting: keep the point-mass measures ρ_n,i := δ_{x_i}, then for each natural n, replace the characteristic functions {1_{C_n,i} : 1 ≤ i ≤ p(n)} used in Proposition 4.5 with positive compactly supported continuous functions {f_n,i : 1 ≤ i ≤ p(n)} such that f_n,i ≤ 1_{B(C_n,i;1/n)} and that $1_{K_{n}} \leq \sum_{i = 1}^{p (n)} f_{n, i} \leq 1_{B (K_{n}; 1 / n)}$ {1_{{K_n}}} \le \sum\nolimits_{i = 1}^{p(n)} {{f_{n,i}} \le {1_{B({K_n};1/n)}}} . Then an approximation of identity in the continuous setting is given by the sequence (I_n), where $I_{n} : λ \mapsto \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 ρ_{n, i} = \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 δ_{x_{i}} .$ {I_n}:\lambda \mapsto \sum\limits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\lambda } \right\rangle \,{\rho _{n,i}} = \sum\limits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\lambda } \right\rangle \,{\delta _{{x_i}}}.} }

Theorem 5.1

Every strongly σ-additive and weakly-continuous transfunction Φ: ℳ_fr(X) → ℳ_fr(Y ) has a strongly σ-additive and weakly-continuous Radon adjoint S : C_b(Y ) → C_b(X). Conversely, every strongly σ-additive and weakly-continuous operator S : C_b(Y ) → C_b(X) has a strongly σ-additive and weakly-continuous Radon adjoint Φ: ℳ_fr(X) → ℳ_fr(Y ). When the Radon adjoint pair exists, their operator norms are equal (with respect to total-variation and uniform-convergence).

Proof

For the first claim, define S(g)(x) := 〈g, Φ(δ_x)〉 for all g ∈ C_b(Y ) and for all x ∈ X so that 〈S(g), δ_x〉 = 〈g, Φ(δ_x)〉. Let x_n → x on X, so that $δ_{x_{n}} \underset{\to}{w} δ_{x}$ \delta _{x_n } \,\underrightarrow w\,\delta _x , which means that $Φ (δ_{x_{n}}) \underset{\to}{w} Φ (δ_{x})$ \Phi (\delta _{x_n } )\,\underrightarrow w\,\Phi (\delta _x ) . Also let g_n → g bounded-pointwise in C_b(Y ) (i.e. $g_{n} \underset{\to}{w} g$ g_n \,\underrightarrow w\,g ). Then the statements below ensure that S(g) ∈ C_b(Y ), that S is bounded (hence uniform-continuous) and that S is bounded-pointwise-continuous (via the Dominated Convergence Theorem): $\begin{array}{l} S (g) (x_{n}) = S (g), δ_{x_{n}}〉 = g, Φ (δ_{x_{n}})〉 \to g, Φ (δ_{x})〉 = S (g), δ_{x}〉 = S (g) (x); \\ S (g)‖ = \sup_{x \in X} S (g) (x)| = \sup_{x \in X} S (g), δ_{x}〉| = \sup_{x \in X} g, Φ (δ_{x})〉| \leq g‖ \cdot Φ‖; \\ S (g_{n}) (x) = S (g_{n}), δ_{x}〉 = g_{n}, Φ δ_{x}〉 \to g, Φ δ_{x}〉 = S (g), δ_{x}〉 = S (g) (x) . \end{array}$ \eqalign{ & S(g)\,({x_n}) = \left\langle {S(g),{\delta _{{x_n}}}} \right\rangle = \left\langle {g,\Phi ({\delta _{{x_n}}})} \right\rangle \to \left\langle {g,\Phi ({\delta _x})} \right\rangle = \left\langle {S(g),{\delta _x}} \right\rangle = S(g)(x); \cr & \left\| {S(g)} \right\| = \mathop {\sup }\limits_{x \in X} \left| {S(g)\,(x)} \right| = \mathop {\sup }\limits_{x \in X} \left| {\left\langle {S(g),{\delta _x}} \right\rangle } \right| = \mathop {\sup }\limits_{x \in X} \left| {\left\langle {g,\Phi ({\delta _x})} \right\rangle } \right| \le \left\| g \right\| \cdot \left\| \Phi \right\|; \cr & S({g_n})\,(x) = \left\langle {S({g_n}),{\delta _x}} \right\rangle = \left\langle {{g_n},\Phi {\delta _x}} \right\rangle \to \left\langle {g,\Phi {\delta _x}} \right\rangle = \left\langle {S(g),{\delta _x}} \right\rangle = S(g)\,(x). \cr}

Since countable linear combinations of point-mass measures are weakly dense in ℳ_fr(X), the linearity and weak-continuity of the second coordinate in the 〈·, ·〉 structure and the weak-continuity of Φ yields that 〈S(g), λ〉 = 〈g, Φ λ〉 for all g ∈ C_b(Y ) and λ ∈ ℳ_fr(X). Hence, S is the Radon adjoint of Φ with the desired properties.

For the second claim, note that for every λ ∈ ℳ_fr(X), the continuous functional g ↦ 〈S(g), λ〉 defined on C₀(Y ) has Riesz representation 〈·,Φ (λ)〉 for some unique signed measure Φ(λ) ∈ ℳ_fr(Y ). Defining Φ in this manner for all λ, we obtain the equation 〈S(g), λ〉 = 〈g, Φλ〉 for all g ∈ C₀(Y ) and λ ∈ ℳ_fr(X). C₀(Y ) is dense in C_b(Y ) with respect to bounded-pointwise convergence, so with $C_{0} (Y) ∋ g_{n} \underset{\to}{w} g \in C_{b} (Y)$ C_0 (Y) \mathrel\ni g_n \,\underrightarrow w\,g \in C_b (Y) , it follows that 〈g_n, Φλ〉 → 〈g, Φλ〉 by the Dominated Convergence Theorem. Similarly, bounded-pointwise-continuity of S ensures that $S (g_{n}) \underset{\to}{w} S (g)$ S(g_n )\,\underrightarrow w\,S(g) , which means that 〈S(g_n), λ〉 → 〈S(g), λ 〉 by the Dominated Convergence Theorem. Therefore, 〈S(g), λ〉 = 〈g, Φλ〉 for all g ∈ C_b(Y ) and λ ∈ ℳ_fr(X), implying that Φ is the Radon adjoint of S. To see that Φ is weakly-continuous, let $λ_{n} \underset{\to}{w} λ$ \lambda _n \,\underrightarrow w\,\lambda . Then 〈g, Φλ_n〉 = ⌣S(g), λ_n〉 → 〈S(g), λ〉 = 〈g, Φλ〉. Therefore, $Φ λ_{n} \underset{\to}{w} Φ λ$ \Phi \lambda _n \,\underrightarrow w\,\Phi \lambda . Finally, ||Φ|| ≤ ||S||, hence ||Φ|| = ||S||, follows via $Φ λ‖ = \sup_{g‖ = 1} g, Φ λ〉| = \sup_{g‖ = 1} S (g), λ〉| \leq S‖ \cdot λ‖ .$ \left\| {\Phi \lambda } \right\| = \mathop {\sup }\limits_{\left\| g \right\| = 1} \,\left| {\left\langle {g,\Phi \lambda } \right\rangle } \right| = \mathop {\sup }\limits_{\left\| g \right\| = 1} \,\left| {\left\langle {S(g),\lambda } \right\rangle } \right| \le \left\| S \right\| \cdot \left\| \lambda \right\|.

6.

Measurable setting: ℱ = ℒ^∞, ℳ = ℳ^∞

In this setting, let (X, ∑_X, µ) be a finite measure space, let ℱ_X := ℒ^∞(X, µ) and let $ℳ_{X} : = ℳ_{μ}^{\infty}$ {{\cal M}_X}: = {\cal M}_\mu ^\infty . Define (Y, ∑_Y , ν), ℱ_Y , ℳ_Y analogously. Then it is straightforward to verify that ℱ_X and ℳ_X separate each other.

An approximation of identity can be formed in this setting: for each natural n and 1 ≤ i ≤ p(n), replace each point-mass measure δ_{x_i} used in Proposition 4.5 with the measure ρ_n,i := 1_{C_n,i}µ and define f_n,i := 1_{C_n,i}/µ(C_n,i) when µ(C_n,i) > 0; otherwise, define f_n,i = 0. That is, an approximation of identity in the measurable setting is given by the sequence (I_n), where $I_{n} : λ \mapsto \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 ρ_{n, i} = \sum_{\begin{array}{l} i = 1 \\ μ (C_{n, i}) > 0 \end{array}}^{p (n)} \frac{1 C_{n, i}}{μ (C_{n, i})}, λ〉 1 C_{n, i} μ .$ I_n :\lambda \mapsto \sum\limits_{i = 1}^{p(n)} {\left\langle {f_{n,i} ,\lambda } \right\rangle } \,\rho _{n,i} = \sum\limits_{ i = 1 \atop \mu (C_{n,i} ) \gt 0} ^{p(n)} {\left\langle {{{1C_{n,i} } \over {\mu (C_{n,i} )}},\lambda } \right\rangle } \,1C_{n,i} \mu . The following lemma will be used in the proof of the next theorem:

Lemma 6.1

For every strongly σ-additive transfunction $Φ : ℳ_{μ}^{\infty} \to ℳ_{ν}^{\infty}$ \Phi :{\cal M}_\mu ^\infty \to {\cal M}_\nu ^\infty , there is a unique strongly σ-additive transfunction $Φ^{†} : ℳ_{ν}^{\infty} \to ℳ_{μ}^{\infty}$ {\Phi ^\dagger }:{\cal M}_\nu ^\infty \to {\cal M}_\mu ^\infty such that Φ^†(1_B ν)(A) = Φ(1_Aµ)(B) for all measurable A ⊆ X, B ⊆ Y , which implies that 〈f, Φ^†(g ν)〉 = 〈Φ(fµ), g〉 for all f ∈ ℒ^∞(X, µ), g ∈ ℒ^∞(Y, ν). Also, Φ^†† = Φ.

Proof

Let Φ be strongly σ-additive. For fixed B ⊆ Y , it follows by strong σ-additivity of Φ that the set function A ↦ Φ(1_Aµ)(B) is a measure. Define this measure to be Ψ(1_B ν). Then Ψ, defined on {1_B ν| B ⊆ Y } is a strongly σ-additive transfunction that behaves like Φ^† in the equality above. Ψ can be linearly extended to $ℳ_{ν}^{\infty}$ {\cal M}_\nu ^\infty according to the following equalities for A ⊆ X, g ≅ ∑_j β_j1_{B_j} with ∑_j|β_j| < ∞: $Ψ (g ν) (A) = \sum_{j = 1}^{\infty} β_{j} Ψ (1 B_{j} ν) (A) = \sum_{j = 1}^{\infty} β_{j} Φ (1_{A} μ) (B_{j}) = \int_{Y} g d Φ (1_{A} μ) .$ \Psi (g\nu )\,(A) = \sum\limits_{j = 1}^\infty {{\beta _j}} \Psi (1{B_j}\nu )\,(A) = \sum\limits_{j = 1}^\infty {{\beta _j}\Phi ({1_A}\mu )} \,({B_j}) = \int_Y {g\,d\Phi \,({1_A}\mu ).}

The extended Ψ is strongly σ-additive on $ℳ_{ν}^{\infty}$ {\cal M}_\nu ^\infty . A similar calculation shows that $\int_{X} f d Ψ (g ν) = \int_{Y} g d Φ (f μ)$ \int_X {f\,d\Psi \,(g\nu )} = \int_Y {g\,d\Phi \,(f\,\mu )} for all f ∈ L^∞(X, µ) and g ∈ ℒ^∞(Y, ν). Therefore Φ^† is uniquely determined to be Ψ. Finally, Φ^††(1_Aµ)(B) = Φ^†(1_B ν)(A) = Φ(1_Aµ)(B) for all A ⊆ X and B ⊆ Y , so Φ^†† = Φ.

If Φ is Markov, then Φ^† is also Markov. Furthermore, the plans κ, κ^† corresponding to Φ, Φ^† respectively are dual to each other: that is, κ(A × B) = κ^†(B × A) for all measurable sets A ⊆ X, B ⊆ Y . However, Φ^† is sensitive to the choice of measures µ and ν, which is not ideal when working with non-injective extensions of Markov transfunction Φ.

Theorem 6.2

Every strongly σ-additive $Φ : ℳ_{μ}^{\infty} \to ℳ_{ν}^{\infty}$ \Phi :{\cal M}_\mu ^\infty \to {\cal M}_\nu ^\infty has a linear and bounded Radon adjoint S : ℒ^∞(Y, ν) → ℒ^∞(X, µ). Conversely, every linear and bounded operator S : ℒ^∞(Y, ν) → ℒ^∞(X, µ) has a strongly σ-additive Radon adjoint $Φ : ℳ_{μ}^{\infty} \to ℳ_{ν}^{\infty}$ \Phi :{\cal M}_\mu ^\infty \to {\cal M}_\nu ^\infty .

Proof

Assume that $Φ : ℳ_{μ}^{\infty} \to ℳ_{ν}^{\infty}$ \Phi :{\cal M}_\mu ^\infty \to {\cal M}_\nu ^\infty is strongly σ-additive and define $S : J_{μ}^{- 1} Φ^{†} J_{ν}$ S:\,J_\mu ^{ - 1}{\Phi ^\dagger }{J_\nu } with domain ℒ^∞(Y, ν). Then S is linear and bounded. S = Φ* follows because for any f ∈ ℒ^∞(X, µ) and g ∈ ℒ^∞(Y, ν), $g, Φ (f μ)〉 = Φ^{†} (g ν), f〉 = (J_{μ} S) g, f〉 = S (g), f μ〉 .$ \left\langle {g,\Phi \,(f\mu )} \right\rangle = \left\langle {{\Phi ^\dagger }(g\nu ),\,f} \right\rangle = \left\langle {({J_\mu }S)g,f} \right\rangle = \left\langle {S(g),\,f\mu } \right\rangle .

On the other hand, assume that S : ℒ^∞(Y, ν) → ℒ^∞(X, µ) is linear and bounded. Then define $Ψ : = J_{μ} S J_{ν}^{- 1}$ \Psi : = {J_\mu }SJ_\nu ^{ - 1} and Φ := Ψ^†. Then Φ is strongly σ-additive. Φ = S* follows because for any f ∈ ℒ^∞(X, µ) and g ∈ ℒ^∞(Y, ν), $S (g), f μ〉 = (J_{μ}^{- 1} Ψ) (g ν), f μ〉 = Φ^{†} (g ν), f〉 = g, Φ (f μ)〉 .$ \left\langle {S(g),f\mu } \right\rangle = \left\langle {(J_\mu ^{ - 1}\Psi )\,(g\nu ),f\mu } \right\rangle = \left\langle {{\Phi ^\dagger }(g\nu ),f} \right\rangle = \left\langle {g,\Phi (f\mu )} \right\rangle .

7.

Simple transfunctions

Let ℱ_X, ℱ_Y , ℳ_X, and ℳ_Y be defined in either the continuous setting or the measurable setting.

Definition 7.1

A transfunction Φ: ℳ_X → ℳ_Y is simple if there exist functions ${(f_{i})}_{i = 1}^{m}$ ({f_i})_{i = 1}^m from ℱ_X and there exist measures ${(ρ_{i})}_{i = 1}^{m}$ ({\rho _i})_{i = 1}^m from ℳ_Y such that $\forall λ \in ℳ_{X}, Φ λ = \sum_{i = 1}^{m} f_{i}, λ〉 ρ_{i} .$ \forall \lambda \in {{\cal M}_X},\Phi \lambda = \sum\limits_{i = 1}^m {\left\langle {{f_i},\lambda } \right\rangle } {\rho _i}.

It is straightforward to verify that simple transfunctions are strongly σ-additive. In the continuous setting, simple transfunctions are also weakly-continuous. Therefore by Theorem 5.1 in the continuous setting or Theorem 6.2 in the measurable setting, the Radon adjoint Φ* exists and satisfies $\forall g \in ℱ_{Y}, Φ^{*} g = \sum_{i = 1}^{m} g, ρ_{i}〉 f_{i} .$ \forall g \in {{\cal F}_Y},\,{\Phi ^ * }g = \sum\limits_{i = 1}^m {\left\langle {g,\,{\rho _i}} \right\rangle } {f_i}.

Note that the approximations of identity covered in both the continuous setting and the measurable setting involve sequences of simple transfunctions.

Theorem 7.2

In both the continuous setting and the measurable setting, linear weakly-continuous transfunctions can be approximated by simple transfunctions with respect to weak convergence; that is, simple transfunctions form a dense subset of linear weakly-continuous transfunctions with respect to weak convergence.

Proof

Let Φ: ℳ_X → ℳ_Y be weakly-continuous transfunction and fix λ ∈ ℳ_X. Define Φ_n := Φ I_n, where $I_{n} : λ \mapsto \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 ρ_{n, i}$ {I_n}:\lambda \mapsto \sum\nolimits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\lambda } \right\rangle } {\rho _{n,i}} forms the approximation of identity as defined in either Subsections 3.2 (continuous setting) or 3.3 (measurable setting). Then $Φ_{n} λ = \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 Φ ρ_{n, i}$ {\Phi _n}\lambda = \sum\nolimits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\lambda } \right\rangle \Phi {\rho _{n,i}}} , implying that Φ_n is a simple transfunction. It follows by $I_{n} λ \underset{\to}{w} λ$ I_n \lambda \,\underrightarrow w\,\lambda and by weak-continuity of Φ that $Φ_{n} λ = Φ (I_{n} λ) \underset{\to}{w} Φ λ$ \Phi _n \lambda = \Phi (I_n \lambda )\,\underrightarrow w\,\Phi \lambda .

8.

Applications: optimal transport

Markov transfunctions provide a new perspective to optimal transport theory.

Definition 8.1

Let (X, ∑_X, µ) and (Y, ∑_Y , ν) be Polish measure spaces with finite positive measures µ and ν, respectively, with ||µ|| = ||ν||. A cost function is any continuous function c: X×Y → [0, ∞). A plan κ ∈ Π(µ, ν) is c-optimal if ∫_X×Y c dκ ≤ ∫_X×Y c dπ for all π∈ Π(µ, ν). A Markov transfunction Φ: ℳ_X → ℳ_Y is c-optimal on µ if the corresponding plan κ with marginals µ and Φµ is c-optimal, and Φ is simply c-optimal if it is c-optimal on ℳ_X.

The next proposition implies that optimal inputs for Φ form a large class of measures.

Proposition 8.2

Let (X, ∑_X), (Y, ∑_Y ) be Polish spaces, let c be a cost function, and let Φ: ℳ_X → ℳ_Y be a Markov transfunction. If Φ is c-optimal on µ ∈ ℳ_X, then Φ is c-optimal on $ℳ_{μ}^{\infty}$ {\cal M}_\mu ^\infty .

Proof

The proof follows easily from Theorem 4.6 in [10] on the inheritance of optimality of plans by restriction.

In the next theorem, we provide a “warehouse strategy” which approximates the optimal cost between fixed marginals with respect to some cost function. First, we subdivide the input marginal by local regions, and send the subdivided measures to point mass measures – warehouses – within their respective regions. Second, we transfer mass between warehouses via the discrete transport problem. Finally, the warehouses locally redistribute to form the output marginal. The overall cost of transport via the warehouse strategy approaches the optimal cost as the size of the regions decreases.

Theorem 8.3

Let (X, ∑_X) be a locally compact Polish measurable space with complete metric d, let λ and ρ be finite positive compactly-supported measures with ||λ|| = ||ρ||, and let c: X × X → [0, ∞) be a cost function with c(x, y) ≤ αd(x, y)^p for some constants α, p > 0. The optimal cost between marginals λ, ρ with respect to c can be sufficiently approximated by the costs of simple Markov transfunctions.

Proof

Consider the approximation of identity (I_n) from the continuous setting in Section 5. For large n, we create a composition of three simple Markov transfunctions: λ first maps to $I_{n} λ = \sum_{i = 1}^{p (n)} f_{n, i}, λ〉 δ_{x_{i}}$ {I_n}\lambda = \sum\nolimits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\lambda } \right\rangle } {\delta _{{x_i}}} , which maps to $I_{n} ρ = \sum_{i = 1}^{p (n)} f_{n, i}, ρ〉 δ_{x_{i}}$ {I_n}\rho = \sum\nolimits_{i = 1}^{p(n)} {\left\langle {{f_{n,i}},\rho } \right\rangle } {\delta _{{x_i}}} , which finally maps to ρ. These steps are measure-preserving because K_n (from Lemma 4.3) contains the supports of λ and ρ for large n. The most crucial goal is to determine the optimal simple Markov transfunction for the middle step.

The first and last steps cost no more than αn^−p||λ|| each, which reduces to 0 as n → ∞. This means that the optimal cost between marginals λ_n and ρ_n approaches the optimal cost between marginals λ and ρ as n → ∞. Solving the former optimal cost is the well-known discrete version of the Monge-Kantorovich transport problem.

By approximating each of the values 〈f_n,i, λ 〉 ≈ a_n,i/z and 〈f_n,i,ρ〉 ≈ b_n,i/z for natural numbers a_n,i, b_n,i, z with 1 ≤ i ≤ p(n), the middle step can approximately be interpreted as the Assignment Problem on a weighted bipartite graph between vertex sets P and Q, where P denotes a set created by forming a_n,i copies of a vertex corresponding to each δ_{x_i} in λ_n, Q denotes the set created by forming b_n,j copies of a vertex corresponding to each δ_{x_j} in ρ_n, and drawing edges between these vertices with weight c(x_i, x_j). This problem has been studied, and can be solved in polynomial time of $P| = \sum_{i = 1}^{p (n)} a_{n, i} \approx λ‖ z$ \left| P \right| = \sum\nolimits_{i = 1}^{p(n)} {{a_{n,i}} \approx } \left\| \lambda \right\|z ; the Hungarian method is one well-known algorithm [7].

Although Theorem 8.3 provides a sequence of simple transfunctions that approximate the optimal cost between fixed marginals, the sequence is not expected to converge weakly to an optimal Markov transfunction, as the solutions to the middle step could vary greatly as n increases. Consequently, we can find a Markov transfunction whose cost between marginals is sufficiently close to the optimal cost, but Theorem 8.3 does not provide an optimal Markov transfunction.

However, for any Markov transfunction between fixed marginals, the next theorem yields an approximation by simple Markov transfunctions with respect to weak convergence. Consequently, the cost between the marginals of the constructed sequence of simple Markov transfunctions approaches the cost for the original transfunction.

Theorem 8.4

Let (X, ∑_X, µ) and (Y, ∑_Y , ν) be locally compact Polish measure spaces with finite compactly-supported positive measures µ and ν such that ||µ|| = ||ν||. Any Markov transfunction Φ: ℳ_µ → ℳ_ν can be approximated by simple Markov transfunctions with respect to weak convergence.

Proof

Consider the approximation of identity (I_n) with respect to µ from the measurable setting from Section 6. Let n be large so that K_n (from Lemma 4.3) contains the supports of µ and ν.

Let κ be the plan corresponding to Markov transfunction Φ from Theorem 2.5. For 1 ≤ i, j ≤ p(n), the quantity κ(C_n,i × C_n,j) represents how much mass transfers from 1_{C_n,i}µ to 1_{C_n,j}ν. If µ(C_n,i)ν(C_n,j) > 0, then we can approximate nonzero measure (1_{C_n,i} ⊗ 1_{C_n,j} ) κ with $κ_{n, i, j} : = κ (C_{n, i} \times C_{n, j}) \frac{1 C_{n, i} μ}{μ (C_{n, i})} \times \frac{1 C_{n, j} ν}{ν (C_{n, j})} .$ {\kappa _{n,i,j}}: = \kappa ({C_{n,i}}\, \times \,{C_{n,j}}){{1{C_{n,i}}\mu } \over {\mu ({C_{n,i}})}} \times {{1{C_{n,j}}\nu } \over {\nu ({C_{n,j}})}}. Otherwise, we define κ_n,i,j := 0. Then κ_n := ∑_i ∑_j κ_n,i,j is a plan from Π(µ, ν) which corresponds to a Markov transfunction Φ_n from Theorem 2.5.

Next, we show that $κ_{n} \underset{\to}{w} κ$ \kappa _n \,\underrightarrow w\,\kappa as n → ∞. Let c ∈ C_b(X × Y ) with ||c|| ≤ 1, and for 1 ≤ i, j ≤ p(n), let β_n,i,j := sup c(C_n,i × C_n,j) − inf c(C_n,i × C_n,j). By uniform continuity of c on K_n × K_n, we have that β_n := max{ β_n,i,j |1 ≤ i, j ≤ p(n)} → 0 as n → ∞, which implies that $c, κ - κ_{n}〉| \leq \sum_{i} \sum_{j} β_{n, i, j} κ (C_{n, i} \times C_{n, j}) \leq β_{n} κ‖ \to 0.$ \left| {\left\langle {c,\kappa - {\kappa _n}} \right\rangle } \right| \le \sum\limits_i {\sum\limits_j {{\beta _{n,i,j}}\,\kappa ({C_{n,i}}\, \times {C_{n,j}})} \le {\beta _n}\left\| \kappa \right\| \to 0.}

There are some properties of Φ_n worth noting: Φ_n maps $ℳ_{μ}^{\infty}$ {\cal M}_\mu ^\infty to span{1_{C_n,j} ν}; Φ_n behaves as a matrix when applied to span{1_{C_n,i}µ}; the structure of κ_n guarantees that Φ_n = Φ_nI_n. If we choose bases (1_{C_n,i}µ) and (1_{C_n,j} ν), the matrix M_n representing Φ_n has entries M_n(j, i) := κ(C_n,i × C_n,j)/ν(C_n,j).

Let $λ \in ℳ_{μ}^{\infty}$ \lambda \in {\cal M}_\mu ^\infty and for 1 ≤ i ≤ p(n). Then $Φ_{n} λ = Φ_{n} I_{n} λ = \sum_{j} \sum_{i} \frac{κ (C_{n, i} \times C_{n, j})}{ν (C_{n, j})} \frac{1_{C_{n, i}}}{μ (C_{n, i})}, λ〉 1_{C_{n, j}} ν,$ {\Phi _n}\lambda = {\Phi _n}{I_n}\lambda = \sum\limits_j {\left\langle {\sum\limits_i {{{\kappa ({C_{n,i}}\, \times \,{C_{n,j}})} \over {\nu ({C_{n,j}})}}{{{1_C}_{_{n,i}}} \over {\mu ({C_{n,i}})}},} \lambda } \right\rangle {1_C}_{_{n,j}}\nu ,} showing that Φ_n is simple.

We now show that $Φ_{n} λ \underset{\to}{w} Φ λ$ \Phi _n \lambda \,\underrightarrow w\,\Phi \lambda as n → ∞. Let g ∈ C_b(Y ) with ||g|| ≤ 1.

Let ε > 0. Since λ = fµ for some f ∈ ℒ^∞(X, µ), choose some f̃ ∈ C_b(X) such that ||(f − f̃)µ|| < ε/3. Since $Φ^{*}‖ = Φ_{n}^{*}‖ = 1$ \left\| {{\Phi ^ * }} \right\| = \left\| {\Phi _n^ * } \right\| = 1 , we have that $g, Φ (f - \tilde{f}) μ〉| = Φ^{*} g, (f - \tilde{f}) μ〉| \leq Φ^{*} g‖ \cdot (f - \tilde{f}) μ‖ \leq ε / 3,$ \left| {\left\langle {g,\Phi (f - \tilde f)\mu } \right\rangle } \right| = \left| {\left\langle {{\Phi ^ * }g,(f - \tilde f)\mu } \right\rangle } \right| \le \left\| {{\Phi ^ * }g} \right\| \cdot \left\| {(f - \tilde f)\mu } \right\| \le \varepsilon /3, and that $g, Φ_{n} (f - \tilde{f}) μ〉| = Φ_{n}^{*} g, (f - \tilde{f}) μ〉| \leq Φ_{n}^{*} g‖ \cdot (f - \tilde{f}) μ‖ \leq ε / 3.$ \left| {\left\langle {g,{\Phi _n}(f - \tilde f)\mu } \right\rangle } \right| = \left| {\left\langle {\Phi _n^ * g,\,(f - \tilde f)\mu } \right\rangle } \right| \le \left\| {\Phi _n^ * g} \right\| \cdot \left\| {(f - \tilde f)\mu } \right\| \le \varepsilon /3. Since $κ_{n} \underset{\to}{w} κ$ \kappa _n \,\underrightarrow w\,\kappa as n → ∞ and f̃ ⊗ g ∈ C_b(X × Y ), there is some natural N so that for all n ≥ N, $g, (Φ - Φ_{n}) \tilde{f} μ〉| = \tilde{f} \otimes g, κ - κ_{n}〉| < ε / 3.$ \left| {\left\langle {g,(\Phi - {\Phi _n})\tilde f\mu } \right\rangle } \right| = \left| {\left\langle {\tilde f \otimes g,\kappa - {\kappa _n}} \right\rangle } \right| < \varepsilon /3. It follows from above that |〈g, (Φ − Φ_n) λ〉| ≤ ε for n ≥ N by the triangle inequality.

Theorem 8.4 can be strengthened by removing the assumptions that µ and ν are compactly supported; the approximation of identity may not capture all of µ nor λ, and κ_n ∈ Π(1_{K_n}µ, 1_{K_n} ν) may not belong to Π(µ, ν), but the rest of the analysis holds. Notably, to show $κ_{n} \underset{\to}{w} κ$ \kappa _n \,\underrightarrow w\,\kappa as n → ∞, the inequalities become $\begin{array}{l} c, κ - κ_{n}〉| & \leq κ (K_{n}^{c}) + \sum_{i} \sum_{j} β_{n, i, j} κ (C_{n, i} \times C_{n, j}) \\ \leq κ (K_{n}^{c}) + β_{n} κ_{n}‖ \to 0. \end{array}$ \eqalign{ & \left| {\left\langle {c,\kappa - {\kappa _n}} \right\rangle } \right| & \le \kappa (K_n^c) + \sum\limits_i {\sum\limits_j {{\beta _{n,i,j}}\,\kappa ({C_{n,i}}\, \times \,{C_{n,j}})} } \cr & & \le \kappa (K_n^c) + {\beta _n}\left\| {{\kappa _n}} \right\| \to 0. \cr}

Transfunctions Applied to Plans, Markov Operators and Optimal Transport

Full Article

Paradigm

My account