Comparability of Computer and Paper-and-Pencil Versions of Algebra and Biology Assessments
description- – This study examined comparability of student scores obtained from computerized and paper-and-pencil formats of the large-scale statewide end-of-course (EOC) examinations in the two subject areas of Algebra and Biology. Evidence in support of comparability of computerized and paper-based tests was sought by examining scale scores, item parameter estimates, test characteristic curves, test information functions, Rasch ability estimates at the content domain level, and the equivalence of the construct. Overall, the results support the comparability of computerized and paper-based tests at the item-level, subtest-level and whole test-level in both subject areas. For both subject areas, no evidence was found to suggest that the administration mode changed the construct being measured.
- – CBT
- – 2007-12-21
- – application/pdf
Examining the Relationship between Students' Mathematics Test Scores and Computer Use at Home and at School
description- – Over the past decade, standardized test results have become the primary tool used to judge the effectiveness of schools and educational programs, and today, standardized testing serves as the keystone for educational policy at the state and federal levels. This paper examines the relationship between fourth grade mathematics achievement and technology use at home and at school. Using item level achievement data, individual student's state test scores on the Massachusetts Comprehensive Assessment System (MCAS), and student and teacher responses to detailed technology-use surveys, this study examines the relationship between technology-use and mathematics performance among 986 regular students, from 55 intact fourth grade classrooms in 25 schools across 9 school districts in Massachusetts. The findings from this study suggest that various uses of technology are differentially related to student outcomes and that in general, student and teacher technology uses are weakly related to mathematics achievement on the MCAS. Implications for improving methods for examining the relationship between technology use and standardized test scores are presented.
- – computer
- – 2008-01-18
- – application/pdf
Does it Matter if I Take My Mathematics Test on Computer? A Second Empirical Study of Mode Effects in NAEP
description- – This article describes selected results from the Math Online (MOL) study, one of three field investigations sponsored by the National Center for Education Statistics (NCES) to explore the use of new technology in NAEP. Of particular interest in the MOL study was the comparability of scores from paper- and computer-based tests. A nationally representative sample of eighth-grade students was administered a computer-based mathematics test and a test of computer facility, among other measures. In addition, a randomly parallel group of students was administered a paper-based test containing the same math items as the computer-based test. Results showed that the computer-based mathematics test was significantly harder statistically than the paper-based test. In addition, computer facility predicted online mathematics test performance after controlling for performance on a paper-based mathematics test, suggesting that degree of familiarity with computers may matter when taking a computer-based mathematics test in NAEP.
- – 2008-06-17
- – application/pdf
Comparisons between Classical Test Theory and Item Response Theory in Automated Assembly of Parallel Test Forms
description- – The automated assembly of alternate test forms for online delivery provides an alternative to computer-administered, fixed test forms, or computerized-adaptive tests when a testing program migrates from paper/pencil testing to computer-based testing. The weighted deviations model (WDM) heuristic particularly promising for automated test assembly (ATA) because it is computationally straightforward and produces tests with desired properties under realistic testing conditions. Unfortunately, research into the WDM heuristic has focused exclusively on the Item Response Theory (IRT) methods even though there are situations under which Classical Test Theory (CTT) item statistics are the only data available to test developers. The purpose of this study was to investigate the degree of parallelism of test forms assembled with the WDM heuristic using both CTT and IRT methods. Alternate forms of a 60-item test were assembled from a pool of 600 items. One CTT and two IRT approaches were used to generate content and psychometric constraints. The three methods were compared in terms of conformity to the test-assembly constraints, average test overlap rate, content parallelism, and statistical parallelism. The results led to a primary conclusion that the CTT approach performed at least as well as the IRT approaches. The possible reasons for the results of the comparability of the three test-assembly approaches were discussed and the suggestions for the future ATA applications were provided in this paper.
- – learning
- – 2008-04-29
- – application/pdf
Does Survey Medium Affect Responses? An Exploration of Electronic and Paper Surveying in British Columbia Schools
description- – The purpose of this study was to determine whether or not survey medium (electronic versus paper format) has a significant effect on the results achieved. To compare survey media, responses from elementary students to British Columbia's Satisfaction Survey were analyzed. Although this study was not experimental in design, the data set served as a rich source for which to investigate the research question. The methods included reliability, item mean, factor analysis, response rate and response completeness comparisons across survey media. From the analyses, the differences between electronic and paper media in this study appear to be minor, and do not seem to have a significant effect on overall results. In conclusion, the medium does not seem to overly affect response patterns and does not pose any threats to the validity or reliability of survey results.
- – learning
- – 2008-04-11
- – application/pdf
Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees' Cognitive Skills in Algebra on the SAT
description- – The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT algebra items administered in March 2005 to promote cognitive diagnostic inferences about examinees. The AHM is a psychometric method for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. An attribute is a description of the procedural or declarative knowledge needed to perform a task. These attributes form a hierarchy of cognitive skills that represent a cognitive model of task performance. The study was conducted in two steps. In step 1, a cognitive model was developed by having content specialists, first, review the SAT algebra items, identify their salient attributes, and order the item-based attributes into a hierarchy. Then, the cognitive model was validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT algebra cognitive model by evaluating the model-data fit between the expected response patterns generated by the cognitive model and the observed response patterns produced from a random sample of 5000 examinees who wrote the items. Attribute probabilities were also computed for this random sample of examinees so diagnostic inferences about their attribute-level performances could be made. We conclude the study by describing key limitations, highlighting challenges inherent to the development and analysis of cognitive diagnostic assessments, and proposing directions for future research.This article contains embedded media (video and audio files) and may take a few minutes to download. You will need Flash Player 9.0 (available from www.adobe.com) to play the files. An alternate, smaller version of this article, that does not contain media files is available below under the Alternate Version heading.
- – 2008-02-12
- – application/pdf
Examining Differences in Examinee Performance in Paper and Pencil and Computerized Testing
description- – The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect sizes were small (d<0.20) and not statistically significant (p>0.05), suggesting no substantial difference between the two test versions. Moreover, DIF analysis revealed that reading and mathematics items were comparable for both versions. However, three writing items were flagged for DIF. Substantive reviews failed to identify format differences that could explain the performance differences, so the causes of DIF could not be identified.
- – 2007-11-20
- – application/pdf
Automated Essay Scoring Versus Human Scoring: A Comparative Study
description- – The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by AES and human raters. Data collection included two standardized writing tests - WritePlacer Plus and the Texas Higher Education Assessment (THEA) writing test. The research sample of 107 participants was drawn from a Hispanic serving institution in South Texas. The One-Way Repeated-Measures ANOVA and the follow-up Paired Samples t test were conducted to examine the group mean differences. Results of the tests indicated that the mean score assigned by IntelliMetric was significantly higher than faculty human raters' mean score on WriterPlacer Plus test, and IntelliMetric mean score was also significantly higher than THEA mean score assigned by human raters from National Evaluation Systems. A statistically significant difference also existed between the human raters' mean score on WritePlacer Plus and human raters' mean score on THEA. These findings did not corroborate previous studies that reported non-significant mean score differences between AES and human scoring.
- – computer
- – 2007-10-18
- – application/pdf
Enhancing the Design and Delivery of Assessment Systems: A Four-Process Architecture
description- – Persistent elements and relationships underlie the design and delivery of educational assessments, despite their widely varying purposes, contexts, and data types. One starting point for analyzing these relationships is the assessment as experienced by the examinee: 'What kinds of questions are on the test?,' 'Can I do them in any order?,' 'Which ones did I get wrong?,' and 'What's my score?' These questions, asked by people of all ages and backgrounds, reveal an awareness that an assessment generally entails the selection and presentation of tasks, the scoring of responses, and the accumulation of these response evaluations into some kind of summary score. A four-process architecture is presented for the delivery of assessments: Activity Selection, Presentation, Response Processing, and Summary Scoring. The roles and the interactions among these processes, and how they arise from an assessment design model, are discussed. The ideas are illustrated with hypothetical examples. The complementary modular structures of the delivery processes and the design framework are seen to encourage coherence among assessment purpose, design, and delivery, as well as to promote efficiency through the reuse of design objects and delivery processes.
- – 2002-10-01
- – application/pdf
Investigating Children's Emerging Digital Literacies
description- – Departing from the view that the digital divide is a technical issue, the EDC Center for Children and Technology (CCT) and Computers for Youth (CFY) have completed a 1-year comparative study of children's use of computers in low- and middle-income homes. To assess emerging digital literacy skills at home, we define digital literacy as a set of habits through which children use computer technology for learning, work, socializing, and fun.Our findings indicate that both groups of children used the computer to do schoolwork. Many children with leisure time at home also spent 2 to 3 hours a day communicating with peers, playing games, and pursuing creative hobbies. When solving technical problems, the children from low-income homes relied more on formal help providers such as CFY and schoolteachers, while the children from middle-income homes turned to themselves, their families, and their peers. All the children developed basic literacy with word processing, email, and the Web. Not surprisingly, those children who spent considerably more time online developed more robust skills in online communication and authoring.The results also show that children's digital literacy skills are emerging in ways that reflect local circumstances, such as the length of time children had a computer at home; the family's ability to purchase stable Internet connectivity; the number of computers in the home and where they are located (bedroom or public area); parents' attitudes toward computer use; parents' own experience and skills with computers; children's leisure time at home; the computing habits of children's peers; the technical expertise of friends, relatives, and neighbors; homework assignments; and the direct instruction provided by teachers in the classroom.The findings highlight issues impacting social, school, and assessment policy and practice. Specifically, these results have implications for local educational systems interested in developing digital literacy assessment instruments that demonstrate progress as well as specific areas that need improvement. The digital literacy analysis model developed in this study affords teachers opportunities to start to construct activities based on 5 central digital literacy components: computing for a range of purpose, understanding the function of and ability to use common tools, communication literacy, Web literacy, and troubleshooting skills. These activities can help teachers scaffold for their students and themselves the range of digital literacy proficiency skills, that is, their proficiency in using common tools as well as their use of different communications and Web tools. However, when it comes to large-scale assessments of digital literacy of teachers and students at the national and federal levels, the use of the digital literacy analysis model outlined in this study would be operationally and financially impractical.The field urgently needs to develop valid methods and instruments of assessment that help aggregate state and federal data as schools and districts at the local level acquire more and more technology. These methods and measurement instruments are likely to include surveys, e-readiness assessment tools, multiple-choice tests, pre- and post-tests, etc., that can measure individual as well as group progress in digital literacy.
- – 2002-08-01
- – application/pdf
Assessing Student Problem-Solving Skills With Complex Computer-Based Tasks
description- – Valid formative assessment is an essential element in improving both student learning and the professional development of educators. Various shortcomings in common assessment modalities, however, hinder our ability to make and evaluate such formative decisions. The diffusion of computer technology into American classrooms offers new opportunities to evaluate student learning and a rich, new source of data upon which to make inferences about the formative interventions that will improve learning. The path from data to inference, however, requires appropriate methodologies that can fully exploit the data without discarding or oversimplifying the behavioral complexity of student activity. This study used IMMEX, a computerized simulation and problem-solving tool, along with artificial neural networks as pattern recognizers to identify the common types of strategies high school chemistry students used to solve qualitative chemistry problems. Then, based on the calculated probabilities that students would transition between these strategy types over time, Markov hidden chain analysis allowed us to develop a model of the capacity of the current curriculum to produce students able to apply chemistry content to a real-world problem.
- – 2002-06-01
- – application/pdf
Automated Essay Scoring Using Bayes' Theorem
description- – Two Bayesian models for text classification from the information science field were extended and applied to student produced essays. Both models were calibrated using 462 essays with two score points. The calibrated systems were applied to 80 new, pre-scored essays with 40 essays in each score group. Manipulated variables included the two models; the use of words, phrases and arguments; two approaches to trimming; stemming; and the use of stopwords. While the text classification literature suggests the need to calibrate on thousands of cases per score group, accuracy of over 80% was achieved with the sparse dataset used in this study.
- – 2002-06-01
- – application/pdf
Inexorable and Inevitable: The Continuing Story of Technology and Assessment
description- – This paper argues that the inexorable advance of technology will force fundamental changes in the format and content of assessment. Technology is infusing the workplace, leading to widespread requirements for workers skilled in the use of computers. Technology is also finding a key place in education. This is occurring not only because technology skill has become a workplace requirement. It is also happening because technology provides information resources central to the pursuit of knowledge and because the medium allows for the delivery of instruction to individuals who couldn't otherwise obtain it. As technology becomes more central to schooling, assessing students in a medium different from the one in which they typically learn will become increasingly untenable. Education leaders in several states and numerous school districts are acting on that implication, implementing technology-based tests for low- and high-stakes decisions in elementary and secondary schools and across all key content areas. While some of these examinations are already being administered statewide, others will take several years to bring to fully operational status. These groundbreaking efforts will undoubtedly encounter significant difficulties that may include cost, measurement, technological-dependability, and security issues. But most importantly, state efforts will need to go beyond the initial achievement of computerizing traditional multiple-choice tests to create assessments that facilitate learning and instruction in ways that paper measures cannot.
- – 2002-06-01
- – application/pdf
A Feasibility Study of On-the-Fly Item Generation in Adaptive Testing
description- – The goal of this study was to assess the feasibility of an approach to adaptive testing using item models based on the quantitative section of the Graduate Record Examination (GRE) test. An item model is a means of generating items that are isomorphic, that is, equivalent in content and equivalent psychometrically. Item models, like items, are calibrated by fitting an IRT response model. The resulting set of parameter estimates is imputed to all the items generated by the model. An on-the-fly adaptive test tailors the test to examinees and presents instances of an item model rather than independently developed items. A simulation study was designed to explore the effect an on-the-fly test design would have on score precision and bias as a function of the level of item model isomorphicity. In addition, two types of experimental tests were administered - an experimental, on-the-fly, adaptive quantitative-reasoning test as well as an experimental quantitative-reasoning linear test consisting of items based on item models. Results of the simulation study showed that under different levels of isomorphicity, there was no bias, but precision of measurement was eroded at some level. However, the comparison of experimental, on-the-fly adaptive test scores with the GRE test scores closely matched the test-retest correlation observed under operational conditions. Analyses of item functioning on the experimental linear test forms suggested that a high level of isomorphicity across items within models was achieved. The current study provides a promising first step toward significant cost reduction and theoretical improvement in test creation methodology for educational assessment.
- – 2003-11-01
- – Bejar, Isaac I.
- – Lawless, René R.
- – Morley, Mary E.
- – Wagner, Michael E.
- – Bennett, Randy E.
- – Revuelta, Javier
- – application/pdf
An Exploratory Study to Examine the Feasibility of Measuring Problem-Solving Processes Using a Click-Through Interface
description- – In this study we investigated the feasibility of a novel user interface to support the measurement of problem-solving processes. Our research questions addressed the use of a"click-through"interface to measure the"generate-and-test"problem-solving process for a design problem. A click-through interface requires the user to explicitly perform an online action (e.g., to view time, the user has to click on a"time"icon). This interface allowed us to measure participants' intentional acts. Freshman college students were given the task of modifying a given, computer-interactive bicycle pump to satisfy performance requirements. The simulation interface provided participants with point-and-click access to controls to modify pump parameters, to run the simulation, to view important information, and to attempt to solve the task. Lag sequential analyses of participants' problem-solving processes over time showed cyclical behavior consistent with the generate-and-test strategy of modifying the pump design, running the simulation, viewing the information, and then either modifying the design or attempting to solve the problem and then modifying the design again. This behavior set was remarkably stable, with most lag 1 associations greater than .80. Our approach to measuring problem-solving processes appears feasible and promising, but more work is needed to gather additional validity evidence.
- – 2003-08-01
- – application/pdf
The Effect of Computers on Student Writing: A Meta-analysis of Studies from 1992 to 2002
description- – Meta-analyses were performed including 26 studies conducted between 1992-2002 focused on the comparison between K-12 students writing with computers vs. paper-and-pencil. Significant mean effect sizes in favor of computers were found for quantity of writing (d=.50, n=14) and quality of writing (d= .41, n=15). Studies focused on revision behaviors between these two writing conditions (n=6) revealed mixed results. Others studies collected for the meta-analysis which did not meet the statistical criteria were also reviewed briefly. These articles (n=35) indicate that the writing process is more collaborative, iterative, and social in computer classrooms as compared with paper-and-pencil environments. For educational leaders questioning whether computers should be used to help students develop writing skills, the results of the meta-analyses suggest that on average students who use computers when learning to write are not only more engaged and motivated in their writing, but they produce written work that is of greater length and higher quality.
- – 2003-02-01
- – application/pdf
Computerized Adaptive Testing: A Comparison of Three Content Balancing Methods
description- – Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items.
- – 2003-12-01
- – application/pdf
Examinee Characteristics Associated With Choice of Composition Medium on the TOEFL Writing Section
description- – The Test of English as a Foreign Language (TOEFL) contains a direct writing assessment, and examinees are given the option of composing their responses at a computer terminal using a keyboard or composing their responses in handwriting. This study sought to determine whether examinees from different demographic groups choose handwriting versus word-processing composition media with equal likelihood. The relationship between several demographic characteristics of examinees and their composition medium choice on the TOEFL writing assessment is examined using logistic regression. Females, speakers of languages based on non-Roman/Cyrillic character systems, examinees from Africa and the Middle East, and examinees with less proficient English skills were more likely to choose handwriting. Although there were only small differences between age groups with respect to composition medium choice in most geographic regions, younger examinees from Europe and older examinees from Asia were more likely to choose handwriting than their regional counterparts.
- – 2003-12-01
- – application/pdf
Developing Computerized Versions of Paper-and-Pencil Tests: Mode Effects for Passage-Based Tests
description- – As testing moves from paper-and-pencil administration toward computerized administration, how to present tests on a computer screen becomes an important concern. Of particular concern are tests that contain necessary information that cannot be displayed on screen all at once for an item. Ideally, the method of presentation should not interfere with examinee performance on the test. Examinees should perform similarly on an item regardless of the mode of administration. This paper discusses the development of a computer interface for passage-based, multiple-choice tests. Findings are presented from two studies that compared performance across computer and paper administrations of several fixed-form tests. The effect of computer interface changes made between the two studies is discussed. The results of both studies showed some performance differences across modes. Evaluations of individual items suggested a variety of factors that could have contributed to mode effects. Although the observed mode effects were in general small, overall the findings suggest that it would be beneficial to develop an understanding of factors that can influence examinee behavior and to design a computer interface accordingly, to ensure that examinees are responding to test content rather than features inherent in presenting the test on computer.
- – 2004-02-01
- – application/pdf
Telementoring as a Collaborative Agent for Change
description- – This case study explored the effectiveness of telementoring as a vehicle for preservice teachers to hone skills in the teaching of writing, to establish a mentoring relationship with urban high school students, and to help struggling writers improve writing skills necessary for student achievement. Inherent in this research was the goal to develop a collaborative model between the university and the high school for using technology to improve "at-risk" urban students' skills in writing. Additionally, the research allowed preservice teachers to learn about themselves as evolving teachers as they broached some of the difficulties of teaching writing to academically diverse students and learned about the scarcity of resources and difficult realities that exist for urban students.
- – 2004-05-01
- – application/pdf
0-20 of 41 | next



![[x]](/static/imgs/cross.gif)