Images captured by fisheye lenses violate the pinhole camera assumption and suffer from distortions. Rectification of fisheye images is therefore a crucial preprocessing step for many computer vision applications. In this paper, we propose an end-to-end multi-context collaborative deep network for removing distortions from single fisheye images. In contrast to conventional approaches, which focus on extracting hand-crafted features from input images, our method learns high-level semantics and low-level appearance features simultaneously to estimate the distortion parameters. To facilitate training, we construct a synthesized dataset that covers various scenes and distortion parameter settings. Experiments on both synthesized and real-world datasets show that the proposed model significantly outperforms current state of the art methods. Our code and synthesized dataset will be made publicly available.