Repository

labmlai/annotated_deep_learning_paper_implementations

🧑‍🏫 59 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
13039 1371 236 67
  • 000
line 98 route_prob = self.softmax(self.switch(x)) line 102 route_prob_max, routes = torch.max(route_prob, dim=-1) As far as I know, the torch.max operation is discontinuous, so that the parameters of self.switch cannot be trained, which means that the experts will be chosen randomly each time, I don...