Complete and textured 3D reconstruction of dynamic scenes has been facilitated by mapped RGB and depth information acquired by RGB-D cameras based multi-view systems. One of the most critical steps in such multi-view systems is to determine the relative poses of all cameras via a process known as extrinsic calibration. In this work, we propose a sensor fusion framework based on a weighted bi-objective optimization for refinement of extrinsic calibration tailored for RGB-D multi-view systems. The weighted bi-objective cost function, which makes use of 2D information from RGB images and 3D information from depth images, is analytically derived via the Maximum Likelihood (ML) method. The weighting factor appears as a function of noise in 2D and 3D measurements and takes into account the affect of residual errors on the optimization. We propose an iterative scheme to estimate noise variances in 2D and 3D measurements, for simultaneously computing the weighting factor together with the camera poses. An extensive quantitative and qualitative evaluation of the proposed approach shows improved calibration performance as compared to refinement schemes which use only 2D or 3D measurement information.