Iterative residual policy for dynamic manipulation

Dynamic manipulation of deformable objects

  • complex dynamics (introduced by object deformation and high-speed action)
  • strict task requirements (defined by a precise goal specification)
  • rope whipping
    • action space: target angles for two joints and angular velocity across all joints
    • observation space
      • tip trajectory as an image
      • The pixel values correspond to occupancy probability
        • the observed trajectories have a binary pixel
        • the predicted trajectories are real-valued: between 0 to 1

Iterative Residual Policy (IRP)

  • learns delta dynamics that predict the effects of delta action on the previously-observed trajectory
  • metric for picking action from the predicted trajectories: min distance from any point of the tip trajectory to the goal location
  • loss to train the prediction model: Binary Cross Entropy Loss of the predicted trajectory and the true trajectory with delta actions
  • sample actions with a uniform distribution and delta actions with a gaussian distribution

spatial action map

My thoughts on IRP

  • The approach still follows the concept of learning world dynamics (states + action → next states)
    • train a model to predict trajectories well, and choose actions based on this
  • The key differences make it useful for apply the framework to unseen environments or agents
    • delta actions effectively capture the effect of the changes in actions
      • (even without delta actions, the iterative or the sampling parts of the method can both still be applied)
      • similar to the concept of using velocity in the states instead of previous location
    • whole trajectory prediction instead of step-by-step prediction
      • → this makes it easy to set the criteria for picking the best action for the next iteration
      • this limits the applications on repeatable tasks with complex dynamic

[1] Chi, Cheng, et al. "Iterative residual policy: for goal-conditioned dynamic manipulation of deformable objects." arXiv preprint arXiv:2203.00663 (2022).