Conventional stereoscopic displays suffer from vergence-accommodation conflict which cause visual fatigue. Integral imaging-based (II) displays resolves this problem by directly projecting light field sub-views into the eye using microlens arrays. However, II-based light field displays has inherent trade-off between angular and spatial resolutions. In this paper, we propose a novel display concept called coded time-division light field display (C-TDM-LFD), which projects encoded light field sub-views to the viewers' eyes, offering correct cues for vergence-accommodation reflex. By jointly optimizing display inputs and pattern of coded apertures, our pipeline can render high resolution refocused images from sparse light field sub-views with minimal aliasing effects. By simulating light transport and image formation with Fourier optics, we can learn display inputs and coded aperture patterns via deep learning in an end-to-end fashion. To our knowledge, we are among the first to optimize the light field display pipeline with deep learning. We verify our concept with objective image quality metrics (PSNR, SSIM) and optics software simulation, and perform extensive studies on the various customizable design variables in our display pipeline. Experiments results show that our method can generate refocused images with higher quality both quantitatively and qualitatively compared to baseline display designs.