This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in $\Rd$ under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how $s$-sparse halfspaces can be efficiently learned under the challenging {\em malicious noise} model, where an adversary may corrupt both the unlabeled data distribution and the labels. We answer this question in the affirmative by designing a computationally efficient algorithm with near-optimal label complexity $\tilde{O}\big(s \log^3 d \cdot \log^4\frac{1}{\epsilon}\big)$ and noise tolerance $\eta = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate. Our main techniques include attribute-efficient paradigms for instance reweighting and for empirical risk minimization, and a new analysis of uniform concentration for unbounded data~--~all of them crucially take the structure of the underlying halfspace into account. To the best of our knowledge, this is the first near-optimal result in the setting. As a byproduct of our analysis, we resolve a long-standing problem in statistics and machine learning: we show that a global optimum of sparse principal component analysis can be found in polynomial time without any statistical assumption on the data. This result might be of independent interest to both communities.