The nonlocal network is designed for capturing long-range spatial-temporal dependencies in several computer vision tasks. Although having shown excellent performances, it needs an elaborate preparation for both the number and position of the building blocks. In this paper, we propose a new formulation of the nonlocal block and interpret it from the general graph signal processing perspective, where we view it as a fully-connected graph filter approximated by Chebyshev polynomials. The proposed nonlocal block is more efficient and robust, which is a generalized form of existing nonlocal blocks (e.g. nonlocal block, nonlocal stage). Moreover, we give the stable hypothesis and show that the steady-state of the deeper nonlocal structure should meet with it. Based on the stable hypothesis, a full-order approximation of the nonlocal block is derived for consecutive connections. Experimental results illustrate the clear-cut improvement and practical applicability of the generalized nonlocal block on both image and video classification tasks.
Pedestrian trajectory prediction is a challenging task because of the complexity of real-world human social behaviors and uncertainty of the future motion. For the first issue, existing methods adopt fully connected topology for modeling the social behaviors, while ignoring non-symmetric pairwise relationships. To effectively capture social behaviors of relevant pedestrians, we utilize a directed social graph which is dynamically constructed on timely location and speed direction. Based on the social graph, we further propose a network to collect social effects and accumulate with individual representation, in order to generate destination-oriented and social-aware representations. For the second issue, instead of modeling the uncertainty of the entire future as a whole, we utilize a temporal stochastic method for sequentially learning a prior model of uncertainty during social interactions. The prediction on the next step is then generated by sampling on the prior model and progressively decoding with a hierarchical LSTMs. Experimental results on two public datasets show the effectiveness of our method, especially when predicting trajectories in very crowded scenes.