This paper develops a gain randomization framework against inference attacks on feedback control systems where an adversary with access to the states of the system attempts to infer the system model. To prevent the inference attack, the control gain of the system at each time step is randomly selected from a predefined set of control gains. We cast the gain selection problem as an optimal control problem where a gain selection policy at each time step selects a control gain according to a probability distribution such that (i) quadratic control cost is minimized and (ii) the uncertainty level of the adversary about selected control gain is maximized. In our formulation, the gain selection policy is allowed to depend on the entire history of the state measurements and the uncertainty level of the adversary about the control gain is captured by the Kullback-Leibler (KL) divergence between a uniform distribution and the posterior distribution of the feedback gains given the history of the system states. We first derive the backward Bellman optimality equation for the gain selection problem and study the structural properties of the optimal gain selection policy. Our results show that the optimal gain selection policy only depends on the current state of the system, rather than the entire history of the states, which renders the optimal gain selection problem to a non-linear Markov decision process. We next derive a policy gradient theorem for the gain selection problem which provides an expression for the gradient of the objective function of the gain selection problem with respect to the parameter of a stationary (time-invariant) policy. The policy gradient theorem allows us to develop a stochastic gradient descent algorithm for computing an optimal policy. We finally demonstrate the effectiveness of our results using a numerical example.