Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion.