Optimal transport
Introduction
summary from this cool article with a brief history and various applications
- problem: artisans waste too much time to pick up milk from the farms that are too far away
- Monge-Kantorovich problem: find a function T,called the Monge map, that assigns each farm to an artisan (
all prod in the farm goes to that specific artisan), considering
- distances
- respective demands
- Monge-Kantorovich problem: find a (probabilistic) function Г that is allowed to split the production of
farms among all artisan
- always have solution under some assumptions on distances
- optimal transport:
- a mechanism of transforming one (continuous) probability distribution into another one with the lowest possible effort.
- effort → measured by Wasserstein distance (comparing dissimilarity for any pair of points from the two
distributions)
- What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?
- it is symmetric and follows triangle inequality
- depends on input "order"!!!
- What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?
- key: model data as probability distributions
- image → distribution over pixels with probabilities given by their intensities normalized to sum to one.
- text → discrete distributions of words with probabilities given by their frequencies of occurrence.
- interpolation
- intermediate transformations along the transformation path
- e.g. 2 images
- min distances between each pixels
- applications:
- [CV] color transfer
- sample pixels (RGB data) from 2 images and align using OT
- simple code using pot in the following section
- [CV] poisson image editing with OT
- apply OT to align color gradients from 2 images
- solve poisson eqn to blend boundaries
- (this step is from the original poisson image editing, the catch is that it preserves colors in the source. That’s where OT comes in)
- simple code using pot in the following section
- [NLP] automatic translation
- align word embeddings
- [CV] color transfer
pot library
- pip install pot
- import ot
- transport cost matrix
- M=ot.dist(source, target)
- Earth Mover’s distance (discrete Wasserstein distance)
- soln: transport matrix
- ot.emd(ource, target, M)
- number of things from source to target
- sum of columns: supply
- sum of rows: demand
- slow O(n^3log(n))
- n=max(source dim, target dim)
- Pro:
- integers, sparse soln
- soln: transport matrix
- Sinkhorn algorithm does that by adding an entropic regularization term
- soln: transport matrix
- ot.sinkhorn(source, target, reg=reg, M=M/M.max())
- not sparse soln, value can be fraction (sometimes doesn’t make sense)
- soln: transport matrix
- color transfer
- sample Xs from IMGs and sample Xt from IMGt
- Xs(ource) dim: (pixel, rgb)
- Xt(arget) dim: (pixel, rgb)
- ot_emd = ot.da.EMDTransport()
- ot_emd.fit(Xs=Xs, Xt=Xt)
- transp_Xt_emd = ot_emd.inverse_transform(Xt=IMGt)
- sample Xs from IMGs and sample Xt from IMGt
Policy fusion for diverse tasks using optimal transport
- goal
- train base policies for diverse tasks and fuse them to get a policy that solves all
- motivation
- most real world applications require numerous specialized capabilities
- training from scratch to achieve these tasks is challenging
- sample inefficient to get solution in reasonable time
- difficulties in designing reward fn for all capabilities
- fuse existing policies for learning new skills using optimal transport theory,
- learn new skills more effectively
- within a shorter time period
- key obs / hypothesis
- for each policy network with the same input features, similar weight vectors at each individual layer may represent common learned patterns
- method
- align two NN policies layer by layer
- two sets of weight vectors %rarr; probability distributions of dirac measure (delta function)
- align neurons that most likely to represent common patterns
- optimal transport: minimizing the cost of aligning the neurons
- Earthmovers distance: sum of the work (moving to the target &^rarr; flow_i*dist_i) normalized by the total flow (sum of flow_i):
- reorder the neurons of one policy to the neurons of the other policy according to the optimal map
- rearrange the neurons in the following layers before going to the next layer
- optimal transport: minimizing the cost of aligning the neurons
- fuse a new policy by averaging the weights
- align two NN policies layer by layer
- demo
- image classification problem: MNIST
- RL: reacher/ walker