In this paper I explore a number of issues in the analysis of data requirements for statistical NLP systems. A preliminary framework for viewing such systems is proposed and a sample of existing works are compared within this framework. The first steps toward a theory of data requirements are made by establishing some results relevant to bounding the expected error rate of a class of simplified statistical language learners as a function of the volume of training data.