Abstract:The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the data processing pipeline consists of broad collection to scale up and reweighting to improve quality. We then pretrain a 7B model BaichuanSEED with 3T tokens processed by our pipeline without any deliberate downstream task-related optimization, followed by an easy but effective supervised fine-tuning stage. BaichuanSEED demonstrates consistency and predictability throughout training and achieves comparable performance on comprehensive benchmarks with several commercial advanced large language models, such as Qwen1.5 and Llama3. We also conduct several heuristic experiments to discuss the potential for further optimization of downstream tasks, such as mathematics and coding.
Abstract:Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation that enters the day-ahead market. With such a forecast, the existing deterministic market clearing framework can be maintained, and the day-ahead and real-time overall operation cost is reduced. At the training phase, the forecast model parameters are estimated to minimize expected day-ahead and real-time overall operation costs, instead of minimizing forecast errors in a statistical sense. Theoretically, we derive the exact form of the loss function for training the forecast model that aligns with such a goal. For market clearing modeled by linear programs, this loss function is a piecewise linear function. Additionally, we derive the analytical gradient of the loss function with respect to the forecast, which inspires an efficient training strategy. A numerical study shows our forecasts can bring significant benefits of the overall cost reduction to deterministic market clearing, compared to quality-oriented forecasting approach.
Abstract:In this paper, we propose LF-PGVIO, a Visual-Inertial-Odometry (VIO) framework for large Field-of-View (FoV) cameras with a negative plane using points and geodesic segments. Notoriously, when the FoV of a panoramic camera reaches the negative half-plane, the image cannot be unfolded into a single pinhole image. Moreover, if a traditional straight-line detection method is directly applied to the original panoramic image, it cannot be normally used due to the large distortions in the panoramas and remains under-explored in the literature. To address these challenges, we put forward LF-PGVIO, which can provide line constraints for cameras with large FoV, even for cameras with negative-plane FoV, and directly extract omnidirectional curve segments from the raw omnidirectional image. We propose an Omnidirectional Curve Segment Detection (OCSD) method combined with a camera model which is applicable to images with large distortions, such as panoramic annular images, fisheye images, and various panoramic images. Each point on the image is projected onto the sphere, and the detected omnidirectional curve segments in the image named geodesic segments must satisfy the criterion of being a geodesic segment on the unit sphere. The detected geodesic segment is sliced into multiple straight-line segments according to the radian of the geodesic, and descriptors are extracted separately and recombined to obtain new descriptors. Based on descriptor matching, we obtain the constraint relationship of the 3D line segments between multiple frames. In our VIO system, we use sliding window optimization using point feature residuals, line feature residuals, and IMU residuals. Our evaluation of the proposed system on public datasets demonstrates that LF-PGVIO outperforms state-of-the-art methods in terms of accuracy and robustness. Code will be open-sourced at https://github.com/flysoaryun/LF-PGVIO.
Abstract:Unit commitment (UC) are essential tools to transmission system operators for finding the most economical and feasible generation schedules and dispatch signals. Constraint screening has been receiving attention as it holds the promise for reducing a number of inactive or redundant constraints in the UC problem, so that the solution process of large scale UC problem can be accelerated by considering the reduced optimization problem. Standard constraint screening approach relies on optimizing over load and generations to find binding line flow constraints, yet the screening is conservative with a large percentage of constraints still reserved for the UC problem. In this paper, we propose a novel machine learning (ML) model to predict the most economical costs given load inputs. Such ML model bridges the cost perspectives of UC decisions to the optimization-based constraint screening model, and can screen out higher proportion of operational constraints. We verify the proposed method's performance on both sample-aware and sample-agnostic setting, and illustrate the proposed scheme can further reduce the computation time on a variety of setup for UC problems.
Abstract:Prediction intervals offer an effective tool for quantifying the uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to unforeseen changes in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles. It relies on the online learning ability of reinforcement learning to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve the quality of PIs. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.
Abstract:The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. However, as an emerging domain, several challenges remain:1)The efficiency of recomputation is limited for both static and dynamic methods. 2)Swapping requires offloading parameters manually, which incurs a great time cost. 3) There is no such dynamic and fine-grained method that involves tensor swapping together with tensor recomputation nowadays. To remedy the above issues, we propose a novel scheduler manager named DELTA(Dynamic tEnsor offLoad and recompuTAtion). To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight. In DELTA, we propose a filter algorithm to select the optimal tensors to be released out of GPU memory and present a director algorithm to select a proper action for each of these tensors. Furthermore, prefetching and overlapping are deliberately considered to overcome the time cost caused by swapping and recomputing tensors. Experimental results show that DELTA not only saves 40%-70% of GPU memory, surpassing the state-of-the-art method to a great extent but also gets comparable convergence results as the baseline with acceptable time delay. Also, DELTA gains 2.04$\times$ maximum batchsize when training ResNet-50 and 2.25$\times$ when training ResNet-101 compared with the baseline. Besides, comparisons between the swapping cost and recomputation cost in our experiments demonstrate the importance of making a reasonable dynamic scheduler on tensor swapping and tensor recomputation, which refutes the arguments in some related work that swapping should be the first and best choice.
Abstract:Rotational displacement about the grasping point is a common grasp failure when an object is grasped at a location away from its center of gravity. Tactile sensors with soft surfaces, such as GelSight sensors, can detect the rotation patterns on the contacting surfaces when the object rotates. In this work, we propose a model-based algorithm that detects those rotational patterns and measures rotational displacement using the GelSight sensor. We also integrate the rotation detection feedback into a closed-loop regrasping framework, which detects the rotational failure of grasp in an early stage and drives the robot to a stable grasp pose. We validate our proposed rotation detection algorithm and grasp-regrasp system on self-collected dataset and online experiments to show how our approach accurately detects the rotation and increases grasp stability.