Modern instance segmentation approaches mainly adopt a sequential paradigm - ``detect then segment'', as popularized by Mask R-CNN, which have achieved considerable progress. However, they usually struggle to segment huddled instances, i.e., instances which are crowded together. The essential reason is the detection step is only learned under box-level supervision. Without the guidance from the mask-level supervision, the features extracted from the regions containing huddled instances are noisy and ambiguous, which makes the detection problem ill-posed. To address this issue, we propose a new region-of-interest (RoI) feature extraction strategy, named Shape-aware RoIAlign, which focuses feature extraction within a region aligned well with the shape of the instance-of-interest rather than a rectangular RoI. We instantiate Shape-aware RoIAlign by introducing a novel refining module built upon Mask R-CNN, which takes the mask predicted by Mask R-CNN as the region to guide the computation of Shape-aware RoIAlign. Based on the RoI features re-computed by Shape-aware RoIAlign, the refining module updates the bounding box as well as the mask predicted by Mask R-CNN. Experimental results show that the refining module equipped with Shape-aware RoIAlign achieves consistent and remarkable improvements than Mask R-CNN models with different backbones, respectively, on the challenging COCO dataset. The code will be released.