

Abstract:Accurate forecasting of package volumes at delivery stations is critical for last-mile logistics, where errors lead to inefficient resource allocation, higher costs, and delivery delays. We propose OpComm, a forecasting and decision-support framework that combines supervised learning with reinforcement learning-based buffer control and a generative AI-driven communication module. A LightGBM regression model generates station-level demand forecasts, which serve as context for a Proximal Policy Optimization (PPO) agent that selects buffer levels from a discrete action set. The reward function penalizes under-buffering more heavily than over-buffering, reflecting real-world trade-offs between unmet demand risks and resource inefficiency. Station outcomes are fed back through a Monte Carlo update mechanism, enabling continual policy adaptation. To enhance interpretability, a generative AI layer produces executive-level summaries and scenario analyses grounded in SHAP-based feature attributions. Across 400+ stations, OpComm reduced Weighted Absolute Percentage Error (WAPE) by 21.65% compared to manual forecasts, while lowering under-buffering incidents and improving transparency for decision-makers. This work shows how contextual reinforcement learning, coupled with predictive modeling, can address operational forecasting challenges and bridge statistical rigor with practical decision-making in high-stakes logistics environments.




Abstract:This work introduces a methodology to adjust forecasts based on node-specific cost function asymmetry. The proposed model generates savings by dynamically incorporating the cost asymmetry into the forecasting error probability distribution to favor the least expensive scenario. Savings are calculated and a self-regulation mechanism modulates the adjustments magnitude based on the observed savings, enabling the model to adapt to station-specific conditions and unmodeled factors such as calibration errors or shifting macroeconomic dynamics. Finally, empirical results demonstrate the model's ability to achieve \$5.1M annual savings.




Abstract:Humans naturally share information with those they are connected to, and video has become one of the dominant mediums for communication and expression on the Internet. To support the creation of high-quality large-scale video content, a modern pipeline requires a comprehensive understanding of both the raw input materials (e.g., the unedited footage captured by cameras) and the editing components (e.g., visual effects). In video editing scenarios, models must process multiple modalities (e.g., vision, audio, text) with strong background knowledge and handle flexible input lengths (e.g., hour-long raw videos), which poses significant challenges for traditional models. In this report, we introduce Vidi, a family of Large Multimodal Models (LMMs) for a wide range of video understand editing scenarios. The first release focuses on temporal retrieval, i.e., identifying the time ranges within the input videos corresponding to a given text query, which plays a critical role in intelligent editing. The model is capable of processing hour-long videos with strong temporal understanding capability, e.g., retrieve time ranges for certain queries. To support a comprehensive evaluation in real-world scenarios, we also present the VUE-TR benchmark, which introduces five key advancements. 1) Video duration: significantly longer than existing temporal retrival datasets, 2) Audio support: includes audio-based queries, 3) Query format: diverse query lengths/formats, 4) Annotation quality: ground-truth time ranges are manually annotated. 5) Evaluation metric: a refined IoU metric to support evaluation over multiple time ranges. Remarkably, Vidi significantly outperforms leading proprietary models, e.g., GPT-4o and Gemini, on the temporal retrieval task, indicating its superiority in video editing scenarios.