Unsupervised domain adaptation (UDA) aims to transfer the knowledge on a labeled source domain distribution to perform well on an unlabeled target domain. Recently, the deep self-training involves an iterative process of predicting on the target domain and then taking the confident predictions as hard pseudo-labels for retraining. However, the pseudo-labels are usually unreliable, and easily leading to deviated solutions with propagated errors. In this paper, we resort to the energy-based model and constrain the training of the unlabeled target sample with the energy function minimization objective. It can be applied as a simple additional regularization. In this framework, it is possible to gain the benefits of the energy-based model, while retaining strong discriminative performance following a plug-and-play fashion. We deliver extensive experiments on the most popular and large scale UDA benchmarks of image classification as well as semantic segmentation to demonstrate its generality and effectiveness.
Compared with single image based crowd counting, video provides the spatial-temporal information of the crowd that would help improve the robustness of crowd counting. But translation, rotation and scaling of people lead to the change of density map of heads between neighbouring frames. Meanwhile, people walking in/out or being occluded in dynamic scenes leads to the change of head counts. To alleviate these issues in video crowd counting, a Locality-constrained Spatial Transformer Network (LSTN) is proposed. Specifically, we first leverage a Convolutional Neural Networks to estimate the density map for each frame. Then to relate the density maps between neighbouring frames, a Locality-constrained Spatial Transformer (LST) module is introduced to estimate the density map of next frame with that of current frame. To facilitate the performance evaluation, a large-scale video crowd counting dataset is collected, which contains 15K frames with about 394K annotated heads captured from 13 different scenes. As far as we know, it is the largest video crowd counting dataset. Extensive experiments on our dataset and other crowd counting datasets validate the effectiveness of our LSTN for crowd counting.
Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions. However, the temporal relation of an action has not been fully explored. In this paper, we propose an end-to-end Progress Regression Recurrent Neural Network (PR-RNN) for online spatial-temporal action localization, which learns to infer the action by temporal progress regression. Two new action attributes, called progression and progress rate, are introduced to describe the temporal engagement and relative temporal position of an action. In our method, frame-level features are first extracted by a Fully Convolutional Network (FCN). Subsequently, detection results and action progress attributes are regressed by the Convolutional Gated Recurrent Unit (ConvGRU) based on all the observed frames instead of a single frame or a short clip. Finally, a novel online linking method is designed to connect single-frame results to spatial-temporal tubes with the help of the estimated action progress attributes. Extensive experiments demonstrate that the progress attributes improve the localization accuracy by providing more precise temporal position of an action in unconstrained videos. Our proposed PR-RNN achieves the stateof-the-art performance for most of the IoU thresholds on two benchmark datasets.
LinkedIn Talent Solutions business contributes to around 65% of LinkedIn's annual revenue, and provides tools for job providers to reach out to potential candidates and for job seekers to find suitable career opportunities. LinkedIn's job ecosystem has been designed as a platform to connect job providers and job seekers, and to serve as a marketplace for efficient matching between potential candidates and job openings. A key mechanism to help achieve these goals is the LinkedIn Recruiter product, which enables recruiters to search for relevant candidates and obtain candidate recommendations for their job postings. In this work, we highlight a set of unique information retrieval, system, and modeling challenges associated with talent search and recommendation systems.
Talent search and recommendation systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter or a hiring manager expressed in terms of a search query or a job posting. Recent work in this domain has mainly focused on linear models, which do not take complex relationships between features into account, as well as ensemble tree models, which introduce non-linearity but are still insufficient for exploring all the potential feature interactions, and strictly separate feature generation from modeling. In this paper, we present the results of our application of deep and representation learning models on LinkedIn Recruiter. Our key contributions include: (i) Learning semantic representations of sparse entities within the talent search domain, such as recruiter ids, candidate ids, and skill entity ids, for which we utilize neural network models that take advantage of LinkedIn Economic Graph, and (ii) Deep models for learning recruiter engagement and candidate response in talent search applications. We also explore learning to rank approaches applied to deep models, and show the benefits for the talent search use case. Finally, we present offline and online evaluation results for LinkedIn talent search and recommendation systems, and discuss potential challenges along the path to a fully deep model architecture. The challenges and approaches discussed generalize to any multi-faceted search engine.
The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as cell-level classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial networks architecture with a new formulation of loss to perform robust cell-level visual representation learning in an unsupervised setting. Our model is not only label-free and easily trained but also capable of cell-level unsupervised classification with interpretable visualization, which achieves promising results in the unsupervised classification of bone marrow cellular components. Based on the proposed cell-level visual representation learning, we further develop a pipeline that exploits the varieties of cellular elements to perform histopathology image classification, the advantages of which are demonstrated on bone marrow datasets.
In this paper, we present an architecture executing a complex machine learning model such as a neural network capturing semantic similarity between a query and a document; and deploy to a real-world production system serving 500M+users. We present the challenges that arise in a real-world system and how we solve them. We demonstrate that our architecture provides competitive modeling capability without any significant performance impact to the system in terms of latency. Our modular solution and insights can be used by other real-world search systems to realize and productionize recent gains in neural networks.
We consider a D2D-enabled cellular network where user equipments (UEs) owned by rational users are incentivized to form D2D pairs using tokens. They exchange tokens electronically to "buy" and "sell" D2D services. Meanwhile the devices have the ability to choose the transmission mode, i.e. receiving data via cellular links or D2D links. Thus taking the different benefits brought by diverse traffic types as a prior, the UEs can utilize their tokens more efficiently via transmission mode selection. In this paper, the optimal transmission mode selection strategy as well as token collection policy are investigated to maximize the long-term utility in the dynamic network environment. The optimal policy is proved to be a threshold strategy, and the thresholds have a monotonicity property. Numerical simulations verify our observations and the gain from transmission mode selection is observed.
Cumulative local muscle fatigue may lead to potential musculoskeletal disorder (MSD) risks {\color{red}, and subject-specific muscle fatigability needs to be considered to reduce potential MSD risks.} This study was conducted to determine local muscle fatigue rate at shoulder joint level based on an exponential function derived from a muscle fatigue model. Forty male subjects participated in a fatiguing operation under a static posture with a range of relative force levels (14% - 33%). Remaining maximum muscle strengths were measured after different fatiguing sessions. The time course of strength decline was fitted to the exponential function. Subject-specific fatigue rates of shoulder joint moment strength were determined. Good correspondence ($R^2>0.8$) was found in the regression of the majority (35 out of 40 subjects). Substantial inter-individual variability in fatigue rate was found and discussed.
This paper presents our work on relationship of evaluation results between virtual environment (VE) and realistic environment (RE) for assembling tasks. Evaluation results consist of subjective results (BPD and RPE) and objective results (posture and physical performance). Same tasks were performed with same experimental configurations and evaluation results were measured in RE and VE respectively. Then these evaluation results were compared. Slight difference of posture between VE and RE was found but not great difference of effect on people according to conventional ergonomics posture assessment method. Correlation of BPD and performance results between VE and RE are found by linear regression method. Moreover, results of BPD, physical performance, and RPE in VE are higher than that in RE with significant difference. Furthermore, these results indicates that subjects feel more discomfort and fatigue in VE than RE because of additional effort required in VE.