Abstract:Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.
Abstract:The growth of evolutionary computing (EC) methods in the exploration of complex potential energy landscapes of atomic and molecular clusters, as well as crystals over the last decade or so is reviewed. The trend of growth indicates that pure as well as hybrid evolutionary computing techniques in conjunction of DFT has been emerging as a powerful tool, although work on molecular clusters has been rather limited so far. Some attempts to solve the atomic/molecular Schrodinger Equation (SE) directly by genetic algorithms (GA) are available in literature. At the Born-Oppenheimer level of approximation GA-density methods appear to be a viable tool which could be more extensively explored in the coming years, specially in the context of designing molecules and materials with targeted properties.