We propose to use the concept of the Hamming bound to derive the optimal criteria for learning hash codes with a deep network. In particular, when the number of binary hash codes (typically the number of image categories) and code length are known, it is possible to derive an upper bound on the minimum Hamming distance between the hash codes. This upper bound can then be used to define the loss function for learning hash codes. By encouraging the margin (minimum Hamming distance) between the hash codes of different image categories to match the upper bound, we are able to learn theoretically optimal hash codes. Our experiments show that our method significantly outperforms competing deep learning-based approaches and obtains top performance on benchmark datasets.