Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within clusters (e.g., classes). This article introduces multilevel IRT (MLIRT) modeling, and provides the basic information to conduct, interpret, and report results based on an analysis using MLIRT modeling. The procedures are demonstrated using a sample data set based on the National Institute for the Evaluation of School System survey completed in Italy by fifth-grade students nested in classrooms to assess math achievement. The data and command files (Stata, M plus, flexMIRT) needed to reproduce all analyses and plots in this article are available as supplemental online materials at http://jea.sagepub.com/supplemental .