Let ℳX and ℳY be vector spaces of finite signed measures defined on measurable spaces (X, ∑X) and (Y, ∑Y ), respectively. For finite positive measure µ on (X, ∑X) and for real-valued function f ∈ ℒ1(X, µ), let fµ denote the measure A ↦ ∫A f dµ and define
A transfunction is any function Φ: ℳX → ℳY , [8]. Strongly σ-additive transfunctions are those which are linear and continuous with respect to total variation. We will sometimes call an operator between Banach spaces strongly σ-additive if it is linear and norm-continuous.
Plans have applications for finding weak solutions for optimal transport problems, [10]. Markov operators, defined in Section 2, have some similarities to stochastic matrices, [4]. Plans and Markov operators have a bijective correspondence as described in [9] and in Section 2. We assign to any corresponding Markov operator/plan pair (T, κ) with marginals µ, ν a unique transfunction Φ: ℳµ → ℳν – called a Markov transfunction. However, each Markov transfunction corresponds to a family of Markov operators (resp. plans) which have different marginals but follow the same “instructions”. Φ, T , and κ are related via the equalities
In our investigation of transfunctions we are motivated by the theory developed for the Monge-Kantorovich transportation problems and their far-reaching outcomes; see [1], [5], [6], and [10].
Let ℱX and ℱY be spaces of measurable functions which are integrable by measures in ℳX and ℳY , respectively. If {ℱX, ℳX} and {ℱY , ℳY } are separating pairs with respect to integration as defined in Section 3, then we define the Radon adjoint of Φ: ℳX → ℳY (if it exists) to be the unique linear bounded operator Φ* : ℱY → ℱX such that
If X and Y are second-countable locally compact Hausdorff spaces, if ℱX and ℱY are Banach spaces of bounded continuous functions (uniform norm) and if ℳX and ℳY are Banach spaces of finite regular signed measures (total variation), then any strongly σ-additive weakly-continuous transfunction Φ: ℳX → ℳY has an adjoint Φ* which is a linear, uniformly-continuous and bounded-pointwise-continuous operator (and vice versa) such that ||Φ|| = ||Φ*||. When X and Y are also non-atomic, ℳX and ℳY include measures which are non-atomic and strictly-positive; see [2].
In future research, we wish to develop functional analysis on transfunctions, and adjoints may be utilized to this end. In contexts where operators on functions are more appropriate or preferable, the adjoint may prove crucial.
A simple transfunction Φ: ℳX → ℳY is one which has the form
In [3] the notions of localization of a transfunction and the graph of a transfunction are introduced and studied. They give us an insight into which transfunctions arise from continuous functions or measurable functions or are close to such functions.
This paper is based on part of Bentley’s PhD dissertation.
In this section, we describe a class of transfunctions in which each transfunction corresponds to a family of plans and a family of Markov operators. First, we introduce these concepts. All measurable or continuous functions shall be real-valued in this text. Note that the following definitions allow for all finite positive measures rather than all probability measures.
Let µ and ν be finite positive measures on (X, ∑X) and (Y, ∑Y ) respectively with ||µ|| = ||ν||. Let κ be a finite positive measure on the product measurable space (X × Y, ∑X×Y ). We say that κ is a plan with marginals µ and ν if κ(A × Y ) = µ(A) and κ(X × B) = ν(B) for all A ∈ ∑X and B ∈ ∑Y . We define Π(µ, ν) to be the set of all plans with marginals µ and ν.
If random variables X, Y have laws µ, ν, then any coupling of X, Y has a law κ which is a plan in Π(µ, ν).
Let µ and ν be finite positive measures on (X, ∑X) and (Y, ∑Y ) respectively with ||µ|| = ||ν||, and let p ∈ [1, ∞]. We say that a function T : ℒp(X, µ) → ℒp(Y, ν) is a Markov operator if:
- (i)
T is linear with T 1X = 1Y ;
- (ii)
f ≥ 0 implies T f ≥ 0 for all f ∈ ℒp(X, µ);
- (iii)
∫X f dµ = ∫Y T f dν for all f ∈ ℒp(X, µ).
Notice that the definition of Markov operators depends on underlying measures µ and ν on X and Y respectively, even when p = ∞. We now define some properties for transfunctions that are analogous to (ii) and (iii) from Definition 2.2.
Let Φ: ℳX → ℳY be a transfunction.
- (i)
Φ is positive if λ ≥ 0 implies that Φ λ ≥ 0 for all λ ∈ ℳX.
- (ii)
Φ is measure-preserving if (Φλ)(Y ) = λ(X) for all λ ∈ ℳX.
- (iii)
Φ is Markov if it is strongly σ-additive, positive and measure-preserving.
By [9], there is a bijective relationship between plans and Markov operators. We will show soon that a relationship between Markov operators and Markov transfunctions exists, which will imply that all three concepts are connected.
Let µ be a finite positive measure on (X, ∑X), and define the map Jµ : ℒ1(X, µ) → ℳµ via Jµf := fµ. Then Jµ (hence
Positivity and linearity of integrals with respect to µ ensure that Jµ is positive and linear. Surjectivity of Jµ is the statement of the Radon–Nikodym Theorem. Injectivity and isometry hold because
Let µ and ν be finite positive measures on X and Y respectively, with ||µ|| = ||ν|| and let s ∈ [1, ∞]. For every Markov operator T : ℒs(X, µ) → ℒs(Y, ν), there exists a unique Markov transfunction
Every Markov transfunction
First, we prove each statement for s = 1, then extend the argument to other values of s. Let T : ℒ1(X, µ) → ℒ1(Y, ν) be a Markov operator. Define
Now let s ∈ (1, ∞] and let T : ℒs(X, µ) → ℒs(Y, ν) be a Markov operator. By Theorem 1 from [9], T can be uniquely extended to a Markov operator T̂ on ℒ1(X, µ). By our previous argument, T̂ corresponds to a Markov transfunction Φ̂ defined on ℳµ. We define Φ to be the restriction of Φ̂ to
Now we prove the second statement. Let s ∈ [1, ∞], let
Since all three operators in the definition of T are positive and strongly σ-additive, we see that T is also positive and strongly σ-additive, satisfying parts (i) and (ii) of Definition 2.2. Next, if f ∈ ℒ∞(X, λ), then
One consequence from Theorem 2.5 is that any Markov transfunction defined on
The remainder of this section aims to emphasize the importance of Theorem 2.5. For any p ∈ [1, ∞], a transfunction
Note that if some positive measure µ′ also generates
Let (X, ∑X) be a Borel measurable space, let ℱX be a subset of bounded measurable real-valued functions on X and let ℳX be a subset of finite signed measures on X. Analogously, we have Y , ℱY and ℳY . For f ∈ ℱX and λ ∈ ℳX, define 〈f, λ〉 := ∫X f dλ. Similarly, for g ∈ ℱY and ρ ∈ ℳY , define 〈g, ρ 〉 := ∫Y g dρ. Occasionally, the elements within angular brackets shall be written in reverse order.
We say that {ℱX, ℳX} is a separating pair if 〈f1, λ〉 = 〈f2, λ〉 for all λ ∈ ℳX implies that f1 = f2, and if 〈f, λ1〉 = 〈f, λ2〉 for all f ∈ ℱX implies that λ1 = λ2. In this section, we shall develop some theory for two choices of the collections {ℱX, ℳX} and {ℱY , ℳY }, which we call the continuous setting and the measurable setting.
Let {ℱX, ℳX} and {ℱY , ℳY } each be a separating pair, let Φ: ℳX → ℳY be a transfunction, and let S : ℱY → ℱX be a function. Then Φ and S are Radon adjoints of each other if the equation
By utilizing the separation properties of 〈·, ·〉, Radon adjoints of both kinds are unique if they exist. We shall denote the Radon adjoint of Φ by Φ* and of S by S*.
If (Φ, S) is a Radon adjoint pair, then for all g ∈ ℱY ,
If Φ = f# (the push-forward operator) for some measurable f : X → Y , then Φ*(g) = g ○ f = f*g (the pull-back operator acting on g). This is because ∫Y g d(f#λ) = ∫X g ○ f dλ for all g ∈ ℱY, λ ∈ℳX.
If X = Y and Φλ:= f λ for some continuous (or measurable) f : X → ℝ, then Φ* (g) = gf. This is because ∫X g d(fλ) = ∫X g f dλ for all g ∈ ℱX, λ∈ ℳx.
Let {ℱX, ℳX} and {ℱY , ℳY } each be a separating pair.
- (i)
(fn) weakly converges to f in ℱX, notated as
, if every finite regular measure λ on X yields 〈fn, λ〉 → 〈f, λ〉 as n → ∞.f_n {\underrightarrow w}f - (ii)
weakly converges to λ in ℳX, notated as\left( {{\lambda _n}} \right)_{n = 1}^\infty , if every bounded continuous f : X → ℝ yields 〈f, λn〉 → 〈f, λ〉 as n → ∞.\lambda _n \underrightarrow {\,w\,}\lambda - (iii)
An operator S : ℱY → ℱX is weakly continuous if
in ℱY implies thatg_n \underrightarrow {\,w\,}g in ℱX.Sg_n \underrightarrow {\,w}\,Sg - (iv)
A transfunction Φ: ℳX → ℳY is weakly continuous if
in ℳX implies that\lambda _n \underrightarrow {\,w}\,\lambda in ℳY.\Phi \lambda _n \,\underrightarrow w\,\Phi \lambda
Note that weak convergence of (fn) in Definition 3.4 (i) is the same notion as bounded-pointwise convergence.
For a metric space (X, d) with x ∈ X, A ⊆ X and δ > 0, define B(x; δ) := {z ∈ X : d(x, z) < δ} to be the δ-ball around x and define B(A; δ) := ∪x∈AB(x; δ) to be the δ-inflation around A.
The following two lemmas aid in showing Proposition 4.5.
Let (X, d) be a locally compact metric space. The positive function c: X → (0, ∞] defined via
Let (X, d) be a locally compact Polish metric space. Then there exists a pair of sequences
Using the setup from Lemma 4.3, we define the collection of sets
A measure µ is called a point-mass measure at x if µ(A) = 1 when x ∈ A and µ(A) = 0 when x ∉ A. A finite linear combination of point-mass measures is called a simple measure.
It is straightforward to show that simple measures are regular. The following proposition suggests a method to create approximations of identity, which shall be discussed in their respective sections below.
Simple measures on a second-countable locally compact Hausdorff space form a dense subset of all finite regular measures with respect to weak convergence.
Construct sequences
Let ε > 0. Define η := ε/(3||f|| + 3||λ|| + 1) so that ||f|| η < ε/3 and that ||λ||η < ε/3. Choose some natural M such that
Now let n > N. Define
- (a)
;\left| {\left\langle {f,\lambda - {1_{{K_M}}}\lambda } \right\rangle } \right| \le \left\| f \right\| \cdot \lambda (K_M^c) < \left\| f \right\|\eta - (b)
;\left| {\left\langle {f,{1_{{K_M}}}\lambda - {\rho _{n,M}}} \right\rangle } \right| \le \left| {\int_{{K_M}} {f\,d\lambda - \sum\nolimits_{i = 1}^{p(n)} {f({x_i})\lambda ({C_{n,i}} \cap {K_M})} } } \right| < \left\| \lambda \right\|\eta - (c)
.\left| {\left\langle {f,{\rho _{n,M}} - {\lambda _n}} \right\rangle } \right| \le \left\| f \right\|\sum\nolimits_{i = 1}^{p(n)} {\lambda ({C_{n,i}} \cap K_M^c) \le \left\| f \right\|\lambda (K_M^c) < \left\| f \right\|\eta }
For any finite signed measure λ on X, the sequence (λn) of simple measures from Proposition 4.5 weakly converges to λ, hence the sequence of transfunctions (In) given by
The approximation of identity above is simply described with characteristic functions (1Cn,i) and point-mass measures (δxi). However, in each of the two settings below, either the characteristic functions must be replaced by bounded continuous functions or the point-mass measures must be replaced by compactly-supported measures that are absolutely continuous with respect to some underlying measure. With the correct choice of replacements, the same argument as given in Proposition 4.5 can be applied, yielding valid approximations of identities for the respective settings.
For the remainder of this paper, let X and Y be locally-compact Polish spaces, and pick any complete metric for each of them when needed.
Let ℱX = Cb(X) denote the Banach space of all bounded continuous functions on X with the uniform norm and let ℳX = ℳfr(X) denote the Banach space of all finite (hence, regular) signed measures on X with the total variation norm. Develop Y , ℱY , and ℳY analogously. It is known that {ℱX, ℳX} is a separating pair in this setting.
An approximation of identity can be formed in this setting: keep the point-mass measures ρn,i := δxi, then for each natural n, replace the characteristic functions {1Cn,i : 1 ≤ i ≤ p(n)} used in Proposition 4.5 with positive compactly supported continuous functions {fn,i : 1 ≤ i ≤ p(n)} such that fn,i ≤ 1B(Cn,i;1/n) and that
Every strongly σ-additive and weakly-continuous transfunction Φ: ℳfr(X) → ℳfr(Y ) has a strongly σ-additive and weakly-continuous Radon adjoint S : Cb(Y ) → Cb(X). Conversely, every strongly σ-additive and weakly-continuous operator S : Cb(Y ) → Cb(X) has a strongly σ-additive and weakly-continuous Radon adjoint Φ: ℳfr(X) → ℳfr(Y ). When the Radon adjoint pair exists, their operator norms are equal (with respect to total-variation and uniform-convergence).
For the first claim, define S(g)(x) := 〈g, Φ(δx)〉 for all g ∈ Cb(Y ) and for all x ∈ X so that 〈S(g), δx〉 = 〈g, Φ(δx)〉. Let xn → x on X, so that
Since countable linear combinations of point-mass measures are weakly dense in ℳfr(X), the linearity and weak-continuity of the second coordinate in the 〈·, ·〉 structure and the weak-continuity of Φ yields that 〈S(g), λ〉 = 〈g, Φ λ〉 for all g ∈ Cb(Y ) and λ ∈ ℳfr(X). Hence, S is the Radon adjoint of Φ with the desired properties.
For the second claim, note that for every λ ∈ ℳfr(X), the continuous functional g ↦ 〈S(g), λ〉 defined on C0(Y ) has Riesz representation 〈·,Φ (λ)〉 for some unique signed measure Φ(λ) ∈ ℳfr(Y ). Defining Φ in this manner for all λ, we obtain the equation 〈S(g), λ〉 = 〈g, Φλ〉 for all g ∈ C0(Y ) and λ ∈ ℳfr(X). C0(Y ) is dense in Cb(Y ) with respect to bounded-pointwise convergence, so with
In this setting, let (X, ∑X, µ) be a finite measure space, let ℱX := ℒ∞(X, µ) and let
An approximation of identity can be formed in this setting: for each natural n and 1 ≤ i ≤ p(n), replace each point-mass measure δxi used in Proposition 4.5 with the measure ρn,i := 1Cn,iµ and define fn,i := 1Cn,i/µ(Cn,i) when µ(Cn,i) > 0; otherwise, define fn,i = 0. That is, an approximation of identity in the measurable setting is given by the sequence (In), where
For every strongly σ-additive transfunction
Let Φ be strongly σ-additive. For fixed B ⊆ Y , it follows by strong σ-additivity of Φ that the set function A ↦ Φ(1Aµ)(B) is a measure. Define this measure to be Ψ(1B ν). Then Ψ, defined on {1B ν| B ⊆ Y } is a strongly σ-additive transfunction that behaves like Φ† in the equality above. Ψ can be linearly extended to
The extended Ψ is strongly σ-additive on
If Φ is Markov, then Φ† is also Markov. Furthermore, the plans κ, κ† corresponding to Φ, Φ† respectively are dual to each other: that is, κ(A × B) = κ†(B × A) for all measurable sets A ⊆ X, B ⊆ Y . However, Φ† is sensitive to the choice of measures µ and ν, which is not ideal when working with non-injective extensions of Markov transfunction Φ.
Every strongly σ-additive
Assume that
On the other hand, assume that S : ℒ∞(Y, ν) → ℒ∞(X, µ) is linear and bounded. Then define
Let ℱX, ℱY , ℳX, and ℳY be defined in either the continuous setting or the measurable setting.
A transfunction Φ: ℳX → ℳY is simple if there exist functions
It is straightforward to verify that simple transfunctions are strongly σ-additive. In the continuous setting, simple transfunctions are also weakly-continuous. Therefore by Theorem 5.1 in the continuous setting or Theorem 6.2 in the measurable setting, the Radon adjoint Φ* exists and satisfies
Note that the approximations of identity covered in both the continuous setting and the measurable setting involve sequences of simple transfunctions.
In both the continuous setting and the measurable setting, linear weakly-continuous transfunctions can be approximated by simple transfunctions with respect to weak convergence; that is, simple transfunctions form a dense subset of linear weakly-continuous transfunctions with respect to weak convergence.
Let Φ: ℳX → ℳY be weakly-continuous transfunction and fix λ ∈ ℳX. Define Φn := Φ In, where
Markov transfunctions provide a new perspective to optimal transport theory.
Let (X, ∑X, µ) and (Y, ∑Y , ν) be Polish measure spaces with finite positive measures µ and ν, respectively, with ||µ|| = ||ν||. A cost function is any continuous function c: X×Y → [0, ∞). A plan κ ∈ Π(µ, ν) is c-optimal if ∫X×Y c dκ ≤ ∫X×Y c dπ for all π∈ Π(µ, ν). A Markov transfunction Φ: ℳX → ℳY is c-optimal on µ if the corresponding plan κ with marginals µ and Φµ is c-optimal, and Φ is simply c-optimal if it is c-optimal on ℳX.
The next proposition implies that optimal inputs for Φ form a large class of measures.
Let (X, ∑X), (Y, ∑Y ) be Polish spaces, let c be a cost function, and let Φ: ℳX → ℳY be a Markov transfunction. If Φ is c-optimal on µ ∈ ℳX, then Φ is c-optimal on
The proof follows easily from Theorem 4.6 in [10] on the inheritance of optimality of plans by restriction.
In the next theorem, we provide a “warehouse strategy” which approximates the optimal cost between fixed marginals with respect to some cost function. First, we subdivide the input marginal by local regions, and send the subdivided measures to point mass measures – warehouses – within their respective regions. Second, we transfer mass between warehouses via the discrete transport problem. Finally, the warehouses locally redistribute to form the output marginal. The overall cost of transport via the warehouse strategy approaches the optimal cost as the size of the regions decreases.
Let (X, ∑X) be a locally compact Polish measurable space with complete metric d, let λ and ρ be finite positive compactly-supported measures with ||λ|| = ||ρ||, and let c: X × X → [0, ∞) be a cost function with c(x, y) ≤ αd(x, y)p for some constants α, p > 0. The optimal cost between marginals λ, ρ with respect to c can be sufficiently approximated by the costs of simple Markov transfunctions.
Consider the approximation of identity (In) from the continuous setting in Section 5. For large n, we create a composition of three simple Markov transfunctions: λ first maps to
The first and last steps cost no more than αn−p||λ|| each, which reduces to 0 as n → ∞. This means that the optimal cost between marginals λn and ρn approaches the optimal cost between marginals λ and ρ as n → ∞. Solving the former optimal cost is the well-known discrete version of the Monge-Kantorovich transport problem.
By approximating each of the values 〈fn,i, λ 〉 ≈ an,i/z and 〈fn,i,ρ〉 ≈ bn,i/z for natural numbers an,i, bn,i, z with 1 ≤ i ≤ p(n), the middle step can approximately be interpreted as the Assignment Problem on a weighted bipartite graph between vertex sets P and Q, where P denotes a set created by forming an,i copies of a vertex corresponding to each δxi in λn, Q denotes the set created by forming bn,j copies of a vertex corresponding to each δxj in ρn, and drawing edges between these vertices with weight c(xi, xj). This problem has been studied, and can be solved in polynomial time of
Although Theorem 8.3 provides a sequence of simple transfunctions that approximate the optimal cost between fixed marginals, the sequence is not expected to converge weakly to an optimal Markov transfunction, as the solutions to the middle step could vary greatly as n increases. Consequently, we can find a Markov transfunction whose cost between marginals is sufficiently close to the optimal cost, but Theorem 8.3 does not provide an optimal Markov transfunction.
However, for any Markov transfunction between fixed marginals, the next theorem yields an approximation by simple Markov transfunctions with respect to weak convergence. Consequently, the cost between the marginals of the constructed sequence of simple Markov transfunctions approaches the cost for the original transfunction.
Let (X, ∑X, µ) and (Y, ∑Y , ν) be locally compact Polish measure spaces with finite compactly-supported positive measures µ and ν such that ||µ|| = ||ν||. Any Markov transfunction Φ: ℳµ → ℳν can be approximated by simple Markov transfunctions with respect to weak convergence.
Consider the approximation of identity (In) with respect to µ from the measurable setting from Section 6. Let n be large so that Kn (from Lemma 4.3) contains the supports of µ and ν.
Let κ be the plan corresponding to Markov transfunction Φ from Theorem 2.5. For 1 ≤ i, j ≤ p(n), the quantity κ(Cn,i × Cn,j) represents how much mass transfers from 1Cn,iµ to 1Cn,jν. If µ(Cn,i)ν(Cn,j) > 0, then we can approximate nonzero measure (1Cn,i ⊗ 1Cn,j ) κ with
Next, we show that
There are some properties of Φn worth noting: Φn maps
Let
We now show that
Let ε > 0. Since λ = fµ for some f ∈ ℒ∞(X, µ), choose some f̃ ∈ Cb(X) such that ||(f − f̃)µ|| < ε/3. Since
Theorem 8.4 can be strengthened by removing the assumptions that µ and ν are compactly supported; the approximation of identity may not capture all of µ nor λ, and κn ∈ Π(1Knµ, 1Kn ν) may not belong to Π(µ, ν), but the rest of the analysis holds. Notably, to show