This paper presents a comparative evaluation among the systems that participated in the Spanish and English lexical sample tasks of Senseval-2. The focus is on pairwise comparisons among systems to assess the degree to which they agree, and on measuring the difficulty of the test instances included in these tasks.