Excerpt
There are two separate NCLEX examinations, the National Council Licensure Examination for Registered Nurses and (NCLEX-RN ® ) examination and the National Council Licensure Examination for Practical/Vocational Nurses (NCLEX-PN ® ) examination. These two examinations have different test plans, item pools, passing standards and, to a large extent, candidate pools (some states allow RN candidates who have not fully completed their educational program to take the PN examination). 1 Historically, each was administered in paper-and-pencil format (twice a year, to a total of over 120 000 RN candidates and over 60 000 PN candidates per year). Passing rates hovered around 90% for first-time takers educated in the United States (the “reference group” on whom item difficulties are estimated). 2
The NCLEX-RN examination in the paper-and-pencil (PP) format was a 2-day examination consisting of 372 questions—300 operational items used in scoring and 72 pretest items being field tested. All operational items had known difficulties, having been calibrated and equated to a common scale using the Rasch model. 3 The 1-day NCLEX-PN examination had 240 items, 204 of which were operational, and 36 of which were pretest. The passing standards, set using a modified-Angoff technique and reevaluated every 3 years, were maintained as logit ability levels on the Rasch-calibrated “bank scale,” and each candidate’s equated ability estimate was compared to that standard to determine pass or fail status.
On the PP examinations, efforts were made to better target the examination difficulties to the pass/fail standard. Historically, the average item difficulty has been lower than the passing standard, as has been the average difficulty of items in the pool. In addition, it was hard to predict and write items with difficulties near the passing standard which resulted in items with a great variety of difficulties in the item pools. These factors, combined with the high-stakes nature of the examinations and the resultant need for large item pools, made the option of building a peaked test implausible. 4 In 1986, the decision-making body of the National Council voted to pursue CAT as a more efficient method for assessing the competence of nurses to practice safely and effectively in entry-level positions.
In part because of the shortage of items with difficulties in the region of the passing standard, the goal of CAT was envisioned as estimating a candidate’s ability as accurately and efficiently as possible. A candidate’s examination would end when the probability of the candidate’s ability being on the other side of the passing standard (ie, of the final pass/fail decision being different if more items were administered) fell below a specified level.
CAT software was developed that uses Item Response Theory (IRT) item difficulties and a maximum-likelihood ability estimation method to calculate candidates’ abilities.