Michael Pokorny
Abstract:Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
Abstract:Fuel cells using oxygen and glucose could power microscopic robots operating in blood vessels. Swarms of such robots can significantly reduce oxygen concentration, depending on the time between successive transits of the lung, hematocrit variation in vessels and tissue oxygen consumption. These factors differ among circulation paths through the body. This paper evaluates how these variations affect the minimum oxygen concentration due to robot consumption and where it occurs: mainly in moderate-sized veins toward the end of long paths prior to their merging with veins from shorter paths. This shows that tens of billions of robots can obtain hundreds of picowatts throughout the body with minor reduction in total oxygen. However, a trillion robots significantly deplete oxygen in some parts of the body. By storing oxygen or limiting their consumption in long circulation paths, robots can actively mitigate this depletion. The variation in behavior is illustrated in three cases: the portal system which involves passage through two capillary networks, the spleen whose slits significantly slow some of the flow, and large tissue consumption in coronary circulation.
Abstract:Microscopic robots in the bloodstream could obtain power from fuel cells using glucose and oxygen. Previous studies of small numbers of such robots operating near each other showed how robots compete with their neighbors for oxygen. However, proposed applications involve billions of such robots operating throughout the body. With such large numbers, the robots can have systemic effects on oxygen concentration. This paper evaluates these effects and their consequences for robot power generation, oxygen available to tissue and heating as such robots move with the blood. When robots consume oxygen as fast as it diffuses to their surfaces, available power decreases significantly as robots move from the lungs, through arteries to capillaries and veins. Tens of billions of robots can obtain hundreds of picowatts throughout the circuit, while a trillion robots significantly deplete oxygen in the veins. Robots can mitigate this depletion by limiting their oxygen consumption, either overall or in specific locations or situations.
Abstract:Ultrasound can power implanted medical devices. This paper evaluates its feasibility for microscopic robots in tissue that mechanically harvest energy using pistons. At these sizes, viscous drag dominates the piston motion and acoustic attenuation limits how far power can reach. Combining these factors shows that frequencies around 100kHz can deliver hundreds of picowatts to well-separated micron-size robots in low-attenuation tissues within about 10cm of the skin. However, applications of microscopic robots could involve large numbers, in which case the robots themselves significantly increase acoustic attenuation. Robots can mitigate this attenuation using cooperative swarm behaviors, with trade-offs among individual power, group performance and the complexity of the robot controllers. With such mitigating behaviors, acoustic power can be useful for swarms of a few hundred billion robots in the body, that each use tens of picowatts, on average, and can tolerate significant variability in available power, e.g, as robots in the bloodstream move from near the skin to deep within the body, or from low- to high-attenuation tissue such as the lungs.
Abstract:We study the relationship between the Quantum Approximate Optimization Algorithm (QAOA) and the underlying symmetries of the objective function to be optimized. Our approach formalizes the connection between quantum symmetry properties of the QAOA dynamics and the group of classical symmetries of the objective function. The connection is general and includes but is not limited to problems defined on graphs. We show a series of results exploring the connection and highlight examples of hard problem classes where a nontrivial symmetry subgroup can be obtained efficiently. In particular we show how classical objective function symmetries lead to invariant measurement outcome probabilities across states connected by such symmetries, independent of the choice of algorithm parameters or number of layers. To illustrate the power of the developed connection, we apply machine learning techniques towards predicting QAOA performance based on symmetry considerations. We provide numerical evidence that a small set of graph symmetry properties suffices to predict the minimum QAOA depth required to achieve a target approximation ratio on the MaxCut problem, in a practically important setting where QAOA parameter schedules are constrained to be linear and hence easier to optimize.
Abstract:We present an algorithm for learning a latent variable generative model via generative adversarial learning where the canonical uniform noise input is replaced by samples from a graphical model. This graphical model is learned by a Boltzmann machine which learns low-dimensional feature representation of data extracted by the discriminator. A quantum annealer, the D-Wave 2000Q, is used to sample from this model. This algorithm joins a growing family of algorithms that use a quantum annealing subroutine in deep learning, and provides a framework to test the advantages of quantum-assisted learning in GANs. Fully connected, symmetric bipartite and Chimera graph topologies are compared on a reduced stochastically binarized MNIST dataset, for both classical and quantum annealing sampling methods. The quantum-assisted associative adversarial network successfully learns a generative model of the MNIST dataset for all topologies, and is also applied to the LSUN dataset bedrooms class for the Chimera topology. Evaluated using the Fr\'{e}chet inception distance and inception score, the quantum and classical versions of the algorithm are found to have equivalent performance for learning an implicit generative model of the MNIST dataset.
Abstract:Objects moving in fluids experience patterns of stress on their surfaces determined by their motion and the geometry of nearby boundaries. Fish and underwater robots can use these patterns for navigation. This paper extends this stress-based navigation to microscopic robots in tiny vessels, where robots can exploit the physics of fluids at low Reynolds number. This applies, for instance, in vessels with sizes and flow speeds comparable to those of capillaries in biological tissues. We describe how a robot can use simple computations to estimate its motion, orientation and distance to nearby vessel walls from fluid-induced stresses on its surface. Numerically evaluating these estimates for a variety of vessel sizes and robot positions shows they are most accurate when robots are close to vessel walls.
Abstract:Objects moving in fluids experience patterns of stress on their surfaces determined by the geometry of nearby boundaries. Flows at low Reynolds number, as occur in microscopic vessels such as capillaries in biological tissues, have relatively simple relations between stresses and nearby vessel geometry. Using these relations, this paper shows how a microscopic robot moving with such flows can use changes in stress on its surface to identify when it encounters vessel branches.
Abstract:Microscopic robots could perform tasks with high spatial precision, such as acting on precisely-targeted cells in biological tissues. Some tasks may benefit from robots that change shape, such as elongating to improve chemical gradient sensing or contracting to squeeze through narrow channels. This paper evaluates the energy dissipation for shape-changing (i.e., metamorphic) robots whose size is comparable to bacteria. Unlike larger robots, surface forces dominate the dissipation. Theoretical estimates indicate that the power likely to be available to the robots, as determined by previous studies, is sufficient to change shape fairly rapidly even in highly-viscous biological fluids. Achieving this performance will require significant improvements in manufacturing and material properties compared to current micromachines. Furthermore, optimally varying the speed of shape change only slightly reduces energy use compared to uniform speed, thereby simplifying robot controllers.
Abstract:A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.