Baas, S., Boucherie, R.J., and Fox, J.-P. (2022). Bayesian covariance structure modeling of multi-way nested data. arXiv, 2201.10612 (link). ( draft version ).
(Summary: Our Bayesian statistical method enables the modeling of negative clustering effects. In our study, analyzed differences between interval-censored event times of patients — who were randomly allocated to treatment with different stents (BIO-RESORT) — showed  evidence for equal risk of the three stents with respect to three event types. We found a negative correlation between event times of patients who received the same treatment. Unobserved factors such as heart surgeon or the medical centre could explain heterogeneity in event times between patients receiving the same treatment.)

Santos, Jose R.S., Azevedo, Caio L.N., Fox, J.-P. (2021). Bayesian longitudinal item response modeling with multivariate asymmetric serial dependencies. Journal of Statistical Computation and Simulation (GSCS) . ( link )

Fox, J.-P. and Smink, W.A.C. (2021). Assessing an alternative for ‘negative variance components’: A gentle introduction to Bayesian covariance structure modelling for negative associations among patients with personalized treatments. arXiv, 2106.10107 (link). ( draft version ). Psychological Methods. Advance online publication.
(Summary: The effect of a (personalized) treatment can differ across individuals, and this can lead to negative associations among measurements of individuals who are treated by the same therapist. Our Bayesian modeling approach can model negative associations among clustered measurements and aids in the interpretation of negative clustering effects. In our study, we identified individual variability in treatment effects and identified those who benefitted from the treatment, although a significant main treatment effect could not be found.)

Fox, J.-P., Klotzke, K.K., Veen, D. (2021). Generalized linear randomized response modeling using GLMMRR. arXiv, 2106.10171, (link). ( draft version ). R Journal (online 15-12-2021) (link). (Software contribution).

Fox, J.-P., Klotzke, K. Simsek, A.S. (2021). LNIRT: An R Package for joint modeling of response accuracy and times. arXiv, 2106.10144 (link). ( draft version ). (Software contribution).

Nielsen, N.M., Smink, W.A.C. & Fox, J.-P. (2021). Small and negative correlations among clustered observations: limitations of the linear mixed effects model. Behaviourmetrika, 48 , 51-77. DOI: 10.1007/s41237-020-00130-8.(Open Access)
(Summary: the linear mixed effects model is designed for positive cluster correlation. Negative cluster correlation is often ignored but we show that this leads to deflated Type-I errors, invalid standard errors and confidence intervals in regression analysis.  


Fox, J.-P., Koops, J., Feskens, R., Beinhauer, L. (2020). Bayesian covariance structure modelling for measurement invariance testing. Behaviormetrika 47, 385–410. DOI 10.1007/s41237-020-00119-3. (Open Access)

Fox J-P, Wenzel J, Klotzke K. (2020). The Bayesian covariance structure model for testlets. Journal of Educational and Behavioral Statistics. doi:10.3102/1076998620941204.(Draft Version)

Fox J.-P. (2020). Special issue on item response theory in medical studies. Statistical Methods in Medical Research, 29(4):959-961. doi:10.1177/0962280220902660. (Open Access)
(Summary: This special issue represents a cross-section of current research in IRT for medical studies. The debate on the use of IRT in health research and practices is highlighted, and advanced comparisons between IRT-based and CTT-based approaches in longitudinal data analysis are shown. This special issue will be of use to interested researchers both in psychometric/statistical methods and in relevant applications. This issue might also serve as a point of reference for advanced IRT modeling, for instance when the response process is more complicated leading to complex response behavior.)

Gorter R, Fox J.-P., Eekhout I, Heymans M, Twisk J. (2020) Missing item responses in latent growth analysis: Item response theory versus classical test theory. Statistical Methods in Medical Research, 29(4):996-1014. doi:10.1177/0962280219897706.(Open Access)
(Summary: in medical research, repeated questionnaire data is used to model latent variables over time. A direct comparison is made between latent growth analysis under classical test theory and item response theory, while including effects of missing item responses. Our study shows that parameter estimates of the latent growth model using item response theory have less bias and smaller standard errors than those estimates using classical test theory.)

Gorter R, Fox J.-P., Riet GT, Heymans M, Twisk J. (2020) Latent growth modeling of IRT versus CTT measured longitudinal latent variables. Statistical Methods in Medical Research, 29(4):962-986. doi:10.1177/0962280219856375.(Draft Version)
(Summary: In many medical and epidemiological studies, the individual health outcomes cannot be observed directly and are indirectly observed through survey items. It is shown that estimated individual trajectories using item response theory, compared to classical test theory to measure outcomes, provide a more detailed description of individual change over time, since item response patterns are more informative about the health measurements than sum scores.)

Fox J.-P. Reaction to “Sufficient statistics and insufficient explanations”: Use your information. (2020) Statistical Methods in Medical Research, 29(4):991-995. doi:10.1177/0962280219893460.(Draft Version)


J Mulder, J.-P. Fox (2019). Bayes factor testing of multiple intraclass correlations. Bayesian Analysis 14 (2), 521-552. (Open Access)

Klotzke, K., Fox, J.-P.. Modeling dependence structures for response times in a Bayesian framework. Psychometrika 84, 649–672 (2019). DOI: 10.1007/s11336-019-09671-8 (Open Access)

Klotzke, Konrad, Fox, J.-P. (2019): Bayesian covariance structure modeling of responses and process data. Frontiers. Collection. DOI: 10.3389/fpsyg.2019.01675 (Open Access)

T Keuning, M van Geel, A Visscher, J.-P. Fox (2019). Assessing and validating effects of a data‐based decision‐making intervention on student growth for mathematics and spelling. Journal of educational measurement 56(4), 757-792. (Draft Version)

J Mulder, X Gu, A Olsson-Collentine, A Tomarken, F Böing-Messing, … (2019). BFpack: Flexible bayes factor testing of scientific theories in R. arXiv preprint arXiv:1911.07728

M van Geel, T Keuning, A Visscher, J.-P. Fox (2019). Changes in educational leadership during a data-based decision making intervention. Leadership and policy in schools 18 (4), 628-647.

WAC Smink, J.-P. Fox, E Tjong Kim Sang, AM Sools, GJ Westerhof, … (2019). Understanding therapeutic change process research through multilevel modeling and text mining. Frontiers in psychology 10, 1186. (Open Access)

FJR Van de Vijver, F Avvisati, E Davidov, M Eid, J.-P. Fox, N Le Donné, … (2019). Invariance analyses in large-scale studies. OECD. (Open Access)

J.-P. Fox, Duco Veen, and Konrad Klotzke (2019). Generalized Linear Mixed Models for Randomized Responses. Methodology 15:1, 1-18. (Draft Version)

Feskens R., Fox JP., Zwitser R. (2019) Differential Item Functioning in PISA Due to Mode Effects. In: Veldkamp B., Sluijter C. (eds). Theoretical and Practical Advances in Computer-based Educational Measurement. Methodology of Educational Measurement and Assessment. Springer, Cham.


Wlömert, N., Pellenwessel, D., Fox, J.-P., & Clement, M. (2019). Multidimensional assessment of social desirability bias: an application of multiscale item randomized response theory to measure academic misconduct. Journal of Survey Statistics and Methodology, 7(3), 365-397.

Fox, J.-P. van den Berg, S.M., and Veldkamp, B.P. (May 2018). Bayesian psychometric methods. In Handbook of Psychometric Testing, P. Irwing, T. Booth and D. Hughes. Wiley-Blackwell.(Draft Version). ISBN: 978-1-118-48983-3.


Fox, J.-P., Mulder, J., Sinharay, S. (2017). Bayes Factor Covariance Testing in Item Response Models, Psychometrika . DOI: 10.1007/s11336-017-9577-6.
(Published Version (Springer Nature Content Sharing Initiative)).

Fox, J.-P., Marianti, S. (2017). Person-Fit Statistics for Joint Models for Accuracy and Speed, Journal of Educational Measurement . 54 (2), 243-262. DOI: 10.1111/jedm.12143. (Draft Version) (Web Supplement)

Van Geel, M., Keuning, T. Visscher, V., and Fox, J.-P. (2017). Changes in Educators’ Data Literacy During a Data-Based Decision Making Intervention. Teaching and Teacher Education, 64 , pp. 187–198. (Draft Version)

Veldkamp, B.P., Avetisyan, M., Weissman, A., & Fox, J.-P. (2017). Stochastic programming for individualized test assembly with mixture response time models. Computers in Human Behavior , 76, 693-702. DOI: 10.1016/j.chb.2017.04.060.


Gorter, R., Fox, J.-P., Apeldoorn, A., and Twisk, J.W.R. (2016). Measurement model choice influences randomized controlled trial results. Journal of Clinical Epidemiology, 79, pp. 140-149. doi: 10.1016/j.jclinepi.2016.06.011.
(Summary: The increase in measurement precision is shown when using item response theory — instead of classical test theory — when analyzing questionnaire data of patient-reported outcomes in randomized clinical trials (RCT). A bias of around one standard deviation was found in the estimated trend of measurements when using sum scores, where item response theory showed negligible bias).

Schmidt, S., Troitschanskaia, O., and Fox, J.-P. (2016). Pretest-Posttest-Posttest Multilevel IRT Modeling of Competence Growth of Students in Higher Education in Germany. Journal of Educational Measurement, 53(3), 352-367. DOI: 10.1111/jedm.12115. Special Issue: Valid Assessment of Student Competencies in Higher Education-Methodological Innovations and Perspectives for Educational Measurement. (article).

Van Geel, Keuning, T., Visscher, A., and Fox, J.-P. (2016). Assessing the effects of a school-wide data-based decision-making intervention on student achievement growth in primary schools. American Educational Research Journal, 53, pp. 360-394. (article)

Fox, J.-P., and Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioural Research. (article)

Van der Linden, W.J. and Fox, J.-P. (2016) Joint hierarchical modeling of responses and response times. In Handbook of Modern Item Response Theory, W.J van der Linden (Ed.), Vol 1, Chapter 29, Chapman and Hall/CRC Press.

Fox, J.-P. (2016). Bayesian randomized item response theory models for sensitive measurement. In Handbook of Modern Item Response Theory, W.J van der Linden and R.K. Hambleton (Eds.), Vol 1, Chapter 28, Chapman and Hall/CRC Press. (Draft Version)

Fox, J.-P. and Glas C.A.W. (2016). Multilevel item response theory: An overview. In Handbook of Modern Item Response Theory, W.J van der Linden and R.K. Hambleton (Eds), Vol. 1, Chapter 24, Chapman and Hall/CRC Press. (Draft Version)

Keuning, T., van Geel, M., Visscher, A., Fox, J.-P., and Molenaar, N. (2016). The transformation of schools’ social networks during a Data-Based Decision-Making Reform. Teachers College Record, Vol 118, Number 9, p. 1-33.

Verhagen, A.J., Levy, R., Millsap, R.E., and Fox, J.-P. (2016). Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models. Journal of Mathematical Psychology. Volume 72, Pages 171–182. (article),


Azevedo, C.L.N., Fox, J.-P. and Andrade, D.F. (2015). Longitudinal Multiple-Group IRT Modeling: Covariance pattern selection using MCMC and RJMCMC. International Journal of Quantitative Research in Education, 2, 213-243. ( (article) )

Camilli, G., and Fox, J.-P. (2015). An aggregate IRT procedure for exploratory factor analysis. Journal of Educational and Behavioral Statistics, 40(4). DOI:10.3102/1076998615589185.

Gorter, R., Fox, J.-P., and Twisk, J.W.R. (2015). Why item response theory should be used for longitudinal questionnaire data analysis in medical research.   BMC Medical Research Methodology, 15(1):55. DOI:10.1186/s12874-015-0050-x. (Highly Accessed) 
Open Access:
(Summary: Multi-item questionnaires are important instruments for monitoring health in epidemiological longitudinal studies. We show the negative impact of using sum-scores — in comparison to using item response data — as the dependent variable in a longitudinal data analysis. The use of sum-scores leads to an overestimation of the random error variance and an underestimation of the variance between performances of participants.) 

Van den Hout, A., Fox, J.-P., and Muniz-Terrera, G. (2015). Longitudinal mixed-effects models for latent cognitive function. Statistical Modelling, 15(4):366-387. DOI:10.1177/1471082X14555607.
(Summary: Cognitive function of elderly — we use data from the Cambridge City over-75s Cohort Study — is modeled with a bent-cable change-point predictor to identify potential decline of cognitive function over time. It is shown that longitudinal (MMSE) test data can be used to measure individual trajectories of cognitive function with the change-point predictor to describe non-linear trends.)

Trompetter, H.R., Bohlmeijer, E.T., Fox, J.-P., Schreurs, K.M.G. (2015). Psychological flexibility and catastrophizing as associated change mechanisms during online Acceptance & Commitment Therapy for chronic pain. Behaviour Research and Therapy ; 74:50-59. DOI:10.1016/j.brat.2015.09.001.

De Jong, M.G., J.-P. Fox, and Steenkamp, J.E.B.M. (2015). Quantifying under- and over-reporting in surveys through a dual questioning-technique design. Journal of Marketing Research.

van den Berg, S.M., de Moor, M., […], Fox, J.-P., […]. (2015). Harmonization of Neuroticism and Extraversion phenotypes across inventories and cohorts in the Genetics of Personality Consortium: an application of Item Response Theory. Behavior Genetics ; DOI:10.1007/s10519-014-9654-x.

Gosselt, J., van Hoof, J., Gent, B., and Fox, J.-P. (2015). Violent frames. Analyzing Internet Movie Database reviewers’ text descriptions of media violence and gender differences from 39 years of U.S. action, thriller, crime, and adventure movies. International Journal of Communication,9,547-567. (doi: 1932–8036/20150005)

Azevedo, C.L.N. , Fox, J.-P., and Andrade, D.F. (2015). Bayesian longitudinal item response modeling with restricted covariance pattern structures. Statistics and Computing. (doi 10.1007/s11222-014-9518-5)

Fox, J.-P., M. Marsman, J. Mulder, and J.A. Verhagen (2015). Complex latent variable modeling in educational assessment. Communications in Statistics – Simulation and Computation.(doi:10.1080/03610918.2014.939518)


Fox, J.-P., Klein Entink, R.K., and C. Timmers (2014). The Joint Multivariate Modeling of Multiple Mixed Response Sources: Relating Student Performances with Feedback Behavior. Multivariate Behavioral Research, Volume 49(1), p 54-66. (doi:10.1080/00273171.2013.843441). ( (article) )

Marianti, S., Fox, J.-P., M. Avetisyan, B.P. Veldkamp, and Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, vol. 39 no. 6 426-451, pp. 1–26. (doi: 10.3102/1076998614559412) ( (article) )


Fox, J.-P. (2013). Multivariate zero-inflated modeling with latent predictors: Modeling feedback behavior. Computational Statistics and Data Analysis, 68, 361–374. (article)

Fox, J.-P., Avetisyan, M., van der Palen, J. (2013). Mixture randomized item-response modeling: a smoking behavior validation study. Statistics in Medicine, 32(27) p. 4821-4837. (doi: 10.1002/sim.5859). (article) (Supportive material)
(Summary: The randomized item-response technique is validated in a treatment–control design. The results of a multi-item measure to assess individual smoking behavior is compared to breath test results. The detection rate of smokers is higher when using the randomized-response technique in survey administration.) 

Fox, J.-P., Klein Entink, R.H., Avetisyan, M. (2013). Compensatory and noncompensatory multidimensional randomized item response models. British Journal of Mathematical and Statistical Psychology, 67, 133-152. (article) (Supportive material; real data study, simulation study , description)

Verhagen, J. and Fox, J.-P. (2013). Longitudinal measurement in health-related surveys. A Bayesian joint growth model for multivariate ordinal responses. Statistics in Medicine, 32 Issue 17, p. 2988-3005. (doi: 10.1002/sim.5692) (article)
(Summary: Longitudinal surveys measuring physical or mental health status are often used to evaluate treatments. A method is proposed to examine if the statistical properties of survey items change over occasions. The developed statistical model allows for changes in item characteristics while guaranteeing a proper measurement scale. )

Fledderus, M., Bohlmeijer, E.T., Fox, J.-P., Schreurs, K.M.G., and Spinhoven, P. (2013). The role of psychological flexibility in a self-help acceptance and commitment therapy intervention for psychological distress in a randomized controlled trial. Behaviour Research and Therapy, 51, 142-151. (article)

Mulder, J. and Fox J.-P (2013). Bayesian tests on components of the compound symmetry covariance matrix. Statistics and Computing. January 2013, Volume 23, Issue 1, pp 109-122, (online, doi 10.1007/s11222-011-9295-3).(article)

Verhagen, J. and Fox, J.-P. (2013). Bayesian Tests of Measurement Invariance. British Journal of Mathematical and Statistical Psychology, 66, 383-401. (online doi: 10.1111/j.2044-8317.2012.02059.x.).


Avetisyan, M. and Fox, J.-P. (2012) The Dirichet-Multinomial Model for Multivariate Randomized Response Data and Small Samples. Psicologica: International Journal of Methodology and Experimental Psychology, v33 n2 p362-390.(article)

Azevedo, C.L.N. and Andrade, D.F. and Fox, J.-P. (2012). A Bayesian generalized multiple group IRT model with model-fit assessment tools. Computational Statistics and Data Analysis, 56, issue 12, 4399-4412 (doi: 10.1016/j.csda.2012.03.017).(article)


Van den Hout, A., Fox, J.-P., and Klein Entink, R.H. (2011). Bayesian inference for an illness-death model for stroke with cognition as a latent time-dependent risk factor. Statistical Methods in Medical Research. (online, doi: 10.1177/0962280211426359). (article).
(Summary: Longitudinal (MMSE) questionnaire data is used to model occasion-specific cognitive function, which is used as a predictor variable to model the transition intensities between healthy and unhealthy states. The time-dependent transition intensities are steered by cognitive function within an illness-death model. The method extends current statistical inference regarding disease progression and cognitive function and is illustrated with data from the Medical Research Council Cognitive Function and Ageing Study in the UK (1991–2005).

Klein Entink, R.H. Fox, J.-P., and van den Hout, A. (2011). A mixture model for the joint analysis of latent developmental trajectories and survival. Statistics in Medicine, 30, 2310-2325. (article)
(Summary: The use of questionnaires by clinicians is widespread. We used an item response theory model to measure cognitive function of elderly using the mini-mental-state-examination (MMSE) survey. Our statistical method enables including the developmental trajectory of cognitive decline as a predictor variable (risk factor) for survival to examine the relationship between a change in cognitive impairment and survival. An advantage of our approach is that the often arbitrary classification of individuals based on their MMSE sum scores is avoided.)


Fox, J.-P. (2010). Bayesian Item Response Modeling: Theory and Applications. New York: Springer. ISBN 1441907416

Fox, J.-P., and A. J. Verhagen (2010). Random item effects modeling for cross-national survey data. In E. Davidov & P. Schmidt, and J. Billiet (Eds.), Cross-cultural Analysis: Methods and Applications (pp), London: Routeledge Academic.(article)

De Jong, M.G., Pieters, F.G.M., Fox, J.-P. (2010). Reducing social desirability bias through item randomized response: An application to measure underreported desires. Journal of Marketing Research, 47, 14-27.(article)

Van der Linden, W.J., Klein Entink, R.H., Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327-347.(article)


Klein Entink, R.H., van der Linden, W.J., Fox, J.-P. (2009). A Box-Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62, 621-640.(article)

Klein Entink, R.H., Kuhn, J.-T., Hornke, L.F., and Fox, J.-P (2009). Evaluating cognitive theory: A joint modeling approach using responses and response times. Psychological Methods, 14, 54-75.(article)

Klein Entink, R.H., Fox, J.-P., van der Linden, W.J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21-48.(article)


Fox, J.-P. (2008). Bayesian item response models for complex survey data. Proceedings of the 23rd international workshop on Statistical Modelling. Eilers, P. (Ed.), 19-26.(article)

Fox, J.-P., and Meijer, R.R. (2008). Using IRT to obtain individual information from randomized response data: An application using cheating data. Applied Psychological Measurement, 32, 595-610.(article)

Fox, J.-P., and Wyrick, C.H. (2008). A mixed effects randomized item response model. Journal of Educational and Behavioral Statistics, 33, 389-415.(article)

Fox, J.-P. (2008). Beta-Binomial ANOVA for multivariate randomized response data. British Journal of Mathematical and Statistical Psychology, 61, 453-470. (article)

De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P., and Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: A global investigation. Journal of Marketing Research, 45, 104-115.(article)


Fox, J.-P. (2007). Multilevel IRT modeling in practice. Journal of Statistical Software, 20, issue 5.(article)

Fox, J.-P., Klein Entink, R.H., van der Linden, W.J. (2007). Modeling of responses and response times with the package cirt. Journal of Statistical Software, 20, issue 7.

De Jong, M.G., Steenkamp, J.B.E.M., and Fox, J.-P. (2007). Relaxing cross-national measurement invariance using a hierarchical IRT model. Journal of Consumer Research, 34, 260-278.

Klein Entink, R.H., Fox, J.-P., Betlem, B.H.L., Roffel, B. (2007). Hierarchical process modeling: Describing within- and between-run variation. Journal of Process Control, 17, 349-361.(article)


Fox, J.-P., Pimentel, J.L., Glas, C.A.W. (2006). Fixed effects IRT model. Behaviormetrika, 33, 27-42.(article)


Fox, J.-P., & Glas, C.A.W. (2005). Bayesian modification indices for IRT models. Statistica Neerlandica, 59, 95-106.(article)

Fox, J.-P. (2005). Randomized item response theory models. Journal of Educational and Behavioral Statistics, 30, 189-212.(article)

Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous items. British Journal of Mathematical and Statistical Psychology, 58, 145-172.(article)


Fox, J.-P. (2004). Applications of multilevel IRT modeling. School Effectiveness and School Improvement, 15, 261-280.(article)

Fox, J.-P. (2004). Modeling response error in school effectiveness research. Statistica Neerlandica, 58, 138-160.(article)

Fox, J.-P. (2004). Multilevel IRT Model Assessment. In van der Ark, Croon, Sijtsma (Eds.) New Developments in Categorical Data Analysis for the Social and Behavioral Sciences (p. 227-252), London: Lawrence Erlbaum Associates, Inc.(article)


Fox, J.-P. (2003). Stochastic EM for Estimating the Parameters of a Multilevel IRT Model. British Journal of Mathematical and Statistical Psychology, 56, 65-81.(article)

Fox, J.-P., & Glas, C.A.W. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika 68, 169-191.(article)


Fox, J.-P. & Glas, C.A.W. (2002). Modeling measurement error in a structural multilevel model. In G.A. Marcoulides & I. Moustaki (Eds.), Latent Variable and Latent Structure Models (pp. 245-269), London: Lawrence Erlbaum Associates, Publishers.(article)


Fox, J.-P. & Glas, C.A.W. (2001). Bayesian Estimation of a Multilevel IRT Model using Gibbs Sampling. Psychometrika, 66, 271-288.(article)

Fox, J.-P. (2001). Multilevel IRT: A Bayesian perspective on estimating parameters and testing statistical hypotheses.Unpublished doctoral dissertation, Twente University, Enschede, Netherlands. (2002 Psychometric Society Dissertation Award).(article)