Optimal transport

Published: January 13, 2023 by Chia-Hsien (Cathy) Shih

Introduction

summary from this cool article with a brief history and various applications

problem: artisans waste too much time to pick up milk from the farms that are too far away
Monge-Kantorovich problem: find a function T,called the Monge map, that assigns each farm to an artisan ( all prod in the farm goes to that specific artisan), considering
- distances
- respective demands
Monge-Kantorovich problem: find a (probabilistic) function Г that is allowed to split the production of farms among all artisan
- always have solution under some assumptions on distances
optimal transport:
- a mechanism of transforming one (continuous) probability distribution into another one with the lowest possible effort.
- effort → measured by Wasserstein distance (comparing dissimilarity for any pair of points from the two distributions)
  - What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?
    - it is symmetric and follows triangle inequality
    - depends on input "order"!!!
key: model data as probability distributions
- image → distribution over pixels with probabilities given by their intensities normalized to sum to one.
- text → discrete distributions of words with probabilities given by their frequencies of occurrence.
interpolation
- intermediate transformations along the transformation path
- e.g. 2 images
  - min distances between each pixels
applications:
- [CV] color transfer
  - sample pixels (RGB data) from 2 images and align using OT
  - simple code using pot in the following section
- [CV] poisson image editing with OT
  - apply OT to align color gradients from 2 images
  - solve poisson eqn to blend boundaries
    - (this step is from the original poisson image editing, the catch is that it preserves colors in the source. That’s where OT comes in)
  - simple code using pot in the following section
- [NLP] automatic translation
  - align word embeddings

pot library

pip install pot
- import ot
transport cost matrix
- M=ot.dist(source, target)
Earth Mover’s distance (discrete Wasserstein distance)
- soln: transport matrix
  - ot.emd(ource, target, M)
  - number of things from source to target
    - sum of columns: supply
    - sum of rows: demand
- slow O(n^3log(n))
  - n=max(source dim, target dim)
- Pro:
  - integers, sparse soln
Sinkhorn algorithm does that by adding an entropic regularization term
- soln: transport matrix
  - ot.sinkhorn(source, target, reg=reg, M=M/M.max())
- not sparse soln, value can be fraction (sometimes doesn’t make sense)
color transfer
- sample Xs from IMGs and sample Xt from IMGt
  - Xs(ource) dim: (pixel, rgb)
  - Xt(arget) dim: (pixel, rgb)
- ot_emd = ot.da.EMDTransport()
- ot_emd.fit(Xs=Xs, Xt=Xt)
- transp_Xt_emd = ot_emd.inverse_transform(Xt=IMGt)

Policy fusion for diverse tasks using optimal transport

Tan, Julia, Ransalu Senanayake, and Fabio Ramos. "Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse Skills." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.

goal
- train base policies for diverse tasks and fuse them to get a policy that solves all
motivation
- most real world applications require numerous specialized capabilities
- training from scratch to achieve these tasks is challenging
  - sample inefficient to get solution in reasonable time
  - difficulties in designing reward fn for all capabilities
fuse existing policies for learning new skills using optimal transport theory,
- learn new skills more effectively
- within a shorter time period
key obs / hypothesis
- for each policy network with the same input features, similar weight vectors at each individual layer may represent common learned patterns
method
- align two NN policies layer by layer
  - two sets of weight vectors %rarr; probability distributions of dirac measure (delta function)
  - align neurons that most likely to represent common patterns
    - optimal transport: minimizing the cost of aligning the neurons
      - Earthmovers distance: sum of the work (moving to the target &^rarr; flow_i*dist_i) normalized by the total flow (sum of flow_i):
    - reorder the neurons of one policy to the neurons of the other policy according to the optimal map
    - rearrange the neurons in the following layers before going to the next layer
- fuse a new policy by averaging the weights
demo
- image classification problem: MNIST
- RL: reacher/ walker