This work proposes a novel framework that utilizes reinforcement learning algorithms to optimize a max pressure controller considering the phase switching loss. We extend the max pressure control by introducing a switching curve and prove that the proposed control method is throughput-optimal in a store-and-forward network. Then the theoretical control policy is extended by using a distributed approximation and position-weighted pressure so that the policy-gradient reinforcement learning algorithms can be utilized to optimize the parameters in the policy network. The proposed framework combines the strengths of the data-driven method and the theoretical control model; it is also of great significance for real-world implementations because the proposed control policy can be generated in a distributed fashion based on local observations.