Questions and Answers
PEG Scoring Engine
How does PEG grade my essay?
How accurate is my score? How does it compare with those of real human judges?
What is meant by a “good faith” essay?
How does PEG evaluate content?
Given the emphasis on the analysis and synthesis of textual information prescribed in the Common Core State Standards for Writing and the imperative to “write across the curriculum,” what is the role of automated scoring in the classroom?
What kinds of scores can PEG give?
Why does PEG seem to ignore some grammar “trouble spots” identified by Microsoft Word (or other programs)?
The PEG score I received just doesn’t make sense to me. It is much better (or worse) than I expected. Why is that?
How does NC Write work?
What grades can use NC Write?
Does the student’s work follow him/her year over year?
Is NC Write compatible with iPads or Android tablets?
Can I see a demonstration of NC Write before I purchase?
What are the minimum technical requirements for NC Write?
How much is NC Write?
I didn’t find an answer to my question. Who can I contact?
PEG Scoring Engine
Project Essay Grade (PEG) software is the automated essay scoring engine that powers NC Write (and other products as well). The technology is based on more than 40 years of research by Dr. Ellis Batten Page
, whose pioneering work in the field of computational linguistics has distinguished him as the father of computer-based essay scoring.
Using advanced, proven statistical techniques, PEG analyzes written prose, calculates measures that reflect the intrinsic characteristics of writing (fluency, diction, grammar, construction, etc.), and models the decisions of professional readers to produce scores that deliver unparalleled accuracy and reliability.
PEG, PEGScore, and Project Essay Grade are registered trademarks of Measurement Incorporated, Durham, NC, USA.
>> back to top
2. How does PEG grade my essay?
Although computers can neither “read” nor “understand” your essay in the traditional sense, it can describe (in numeric terms) many of its characteristics. For example, how many words does it contain? How many words are misspelled? How many sentences have more than one verb? What’s the average sentence length?
There are literally hundreds of characteristics that can be used to describe your essay in mathematical terms. Many, like those mentioned above, are fairly easy to understand and relate to the way a human might evaluate a piece of writing. Others are more complex and abstract and their relationship to scoring decisions is less obvious. For example, in certain situations, the bigram (two-letter combination) “na” can be correlated to a human judge’s perception of an essay’s “fluidity” although the exact nature of this relationship is not fully understood. New measures are being identified every day as the field of computational linguistics and natural language processing continues to advance.
In training itself to grade your essay, PEG software has analyzed thousands of essays—written by others just like you, on topics just like the one you’ll write about—and has compiled a wealth of statistics about their characteristics. In addition, this “training set” has been graded by experienced professional judges at Measurement Incorporated using a standard set of criteria—the same set that one of these human judges would use to grade yours.
By analyzing the scores and the computed characteristics of the training set, PEG software can identify those factors that figure most prominently in the judges’ scoring decisions. Combining these factors, PEG software produces a mathematical equation (called a model), that predicts how the judges might have scored your essay. In short, the computer is not really grading your essay at all. It is simulating how expert professional judges would grade it based on the decisions they made grading other essays of similar form.
In many respects, the model used to score your essay reflects the quality of the decisions that were made by those who graded the original training set. In that, you can have great confidence. With more than 25 years of experience in scoring high stakes writing assessments for state departments of education, Measurement Incorporated has earned a reputation for scoring quality unrivaled in the testing industry.
PEG scores are extremely accurate and, when used to evaluate good faith essays, are virtually indistinguishable from scores that experienced human judges would assign.
Within the testing industry, there are several measures of score reliability. The most common is percent agreement, which measures how often human readers agree with one another. “Agreement” in this context usually means that the scores recorded by Judge A and Judge B will differ by no more than one score point.
Human scoring environments can differ significantly based on a number of factors including the level of training and the clarity of the scoring criteria themselves. Typically, however, when using a six-point scale, well-trained judges will independently agree on exactly the same score for a given essay about 65 percent of the time. Within one score point, judges will agree about 85-95 percent of the time. PEG software typically yields agreement rates ranging from 93-98%.
A recent study sponsored by the Hewlett Foundation compared the accuracy and reliability of several automated scoring engines. For its analysis, the foundation used a variety of statistical measures, including quadratic weighted kappa, which measures agreement between judges factoring out the influence of chance agreement. In seven of the eight essay sets evaluated, PEG led all scoring engines in correctly predicting the scores assigned by human readers and even out-performed the agreement achieved by the two independent readers themselves.
It is important to note that although PEG software is extremely reliable in terms of producing scores that are comparable to those awarded by human judges, it can be fooled. Computers, like humans, are not perfect.
PEG presumes “good faith” essays authored by “motivated” writers. A “good faith” essay is one that reflects the writer’s best efforts to respond to the assignment and the prompt without trickery or deceit. A “motivated” writer is one who genuinely wants to do well and for whom the assignment has some consequence (a grade, a factor in admissions or hiring, etc.).
Efforts to “spoof” the system by typing in gibberish, repetitive phrases, or off-topic, illogical prose will produce illogical and essentially meaningless results.
Like most automated scoring technologies, PEG, when properly trained, can determine whether a student’s essay is on topic. PEG can identify the presence or absence of key words that give clues to the content. For example, references to the Nina, Pinta, and Santa Maria would lead PEG to the conclusion that the topic was related to the voyage of Christopher Columbus–provided that these keywords were defined prior to the analysis (or were frequently referenced in the training set).
However, analyzing the content for “correctness” is a much more complex challenge illustrated by the “Columbus problem.” Consider the sentence, “Columbus navigated his tiny ships to the shores of Santa Maria.” The sentence, of course, is well- framed, grammatically sound, and entirely on topic. It is also incorrect. Without a substantial knowledge base specifically aligned to the question, artificial intelligence (AI) technology will fail to grasp the meaning behind the prose. Likewise, evaluating how well a student has analyzed a problem or synthesized information from an article or other stimulus is currently beyond the capabilities of today’s state-of-the-art automated scoring technologies.
Simply put, the demands for more classroom writing will put an enormous strain on teachers to grade the material. As a trusted (and tireless) teaching assistant, PEG (and the many products based on it) can do much of the “heavy lifting” by focusing on the constructs of good writing (organization, support, usage, mechanics, sentence variety, etc.), freeing teachers, including those in Social Studies, History, Science, and other content areas, to concentrate on content and the accuracy or correctness of the student’s essay.
PEG can produce a variety of scores depending on the goal of the assessment. A holistic score is one that gives the writer a single indicator of the “general quality” of his or her writing. Typically, holistic scores are calculated using whole numbers within a range, such as 1-4 or 1-6. Holistic scores provide a measure of performance relative to others of similar age and educational background, using a common set of scoring criteria (called a rubric).
Trait scores, on the other hand, can provide a more meaningful insight into one’s writing ability and may suggest areas of strength or weakness. In trait scoring, several characteristics intrinsic to good writing can be analyzed. Typical traits include organization, grammar, support (for an argument), or mechanics (such as punctuation and capitalization). Each trait is scored independently according to its own rubric. For example, on the “organization” trait, an essay that directs the reader through the text in a clear and logical manner will invariably produce higher scores than one that jumps from point to point with little or no transition or coherence. Taken together, trait scores provide instructional direction to the writer. Like holistic scores, trait scores are typically reported as whole numbers on a standard scale.
PEG’s grammar checker can detect and provide feedback for a wide variety of syntactic, semantic and punctuation errors. These errors include, but are not limited to, run-on sentences, sentence fragments and comma splices; homophone errors and other errors of word choice; and missing or misused commas, apostrophes, quotation marks and end punctuation. In addition, the grammar checker can locate and offer feedback on style choices inappropriate for formal writing.
Unlike commercial grammar checkers, however, PEG only reports those errors for which there is a high degree of confidence that the “error” is indeed an error. Commercial grammar checkers generally implement a lower threshold and as a result, may report more errors. The downside is they also report higher number of “false positives” (errors that aren’t errors). Because PEG factors these error conditions into scoring decisions, we are careful not to let “false positives” prejudice an otherwise well constructed essay.
PEG also uses other factors to evaluate grammar. In essay scoring (human or computer), grammar “quality” is seldom assessed against an absolute standard of flawless English. Instead, it is a relative or comparative measure across the population of essays being evaluated, taking into account such factors as the age or educational level of the writers. In addition, in many contexts, grammar and mechanics (spelling, capitalization, and punctuation) are secondary to the primary goal of written communication - to impart meaning and understanding. In the relative scheme of things, the grammatical distinction between “data is” and “data are” (for example) is not critical to understanding. Only when grammar and mechanics complicate or impede the process of communication do they become significant concerns. This is what PEG is designed to measure—the degree to which grammar and mechanics contribute to or detract from the overall quality of the writing. So, for example, a “mechanics” score of three (on a five-point scale) suggests that writer’s application of punctuation, spelling, and capitalization rules has a neutral effect on the overall quality of the essay. A score of one or two suggests that poor mechanics plays a significant role in shaping the overall perception of quality.
This is not to suggest that grammar is unimportant. On the contrary, good grammar is a prerequisite for good writing. However, the assessment of one’s knowledge of grammar and convention is best accomplished throught other forms of assessment.
As research continues to advance the science of computational linguistics, PEG’s direct grammar checking algorithms will begin to play a more significant role in the assignment of grammar scores that complement or supplement the traditional but more indirect “quality” measures.
As described in Question 1 above, PEG scores are derived from an analysis of a training set of essays and scores assigned to those essays by human readers. From this process, PEG builds a mathematical formula (called a model) that predicts what the score would be for a new essay of similar form. If the essay you’ve submitted is outside of that frame of reference, the scores returned by PEG will be unreliable and consequently, invalid. To illustrate, consider the following scenario:
A team of teachers has been assembled to select the best essays submitted by fifth graders on the topic, “How I Spent My Summer Vacation.” In the course of their evaluation, they encounter a poem written by one of the students about a favorite swimming hole. Although generally on topic, in terms of construction, organization and mechanics the writing sample bears no resemblance to the other submissions. The teachers quickly recognize the poem as being out of place and evaluate it against a different set of criteria (alternatively, they could reject it as being non-responsive to the task). Confronted with this same situation, however, PEG dutifully calculates the statistics and executes its model, oblivious to the fact that the poem is beyond anything it has been trained to score.
Unfortunately, because writing tasks can vary widely, there is no single model that can be applied universally with any degree of reliability. The good news is that PEG can score almost any writing task—from a newspaper article to a business letter to a college admissions essay, provided that it has been properly trained with appropriate samples.
NC Write is an online formative assessment tool created to improve students’ writing skills by enabling them to submit essays, receive feedback and revise their work. Students compose essays in response to one of our NC Write pre-packaged prompts or to a teacher-generated prompt. Upon completion of the essay, the student submits it for feedback. Within seconds, NC Write generates a holistic score as well as a score for each of the Six Traits of Writing. This feedback also includes suggestions on improvements for each trait and recommends specific lessons or interactive tutorials. The teacher can provide additional feedback through the use of in-line essay corrections, textual-evidence and content-accuracy scoring and instant messaging.
NC Write is designed for use in grades 3-12. However, the writing program has been used with great success with higher level 2nd graders supported by ample teacher modeling.
Yes. As long as the student is actively enrolled in NC Write and is submitting essays to be reviewed, his/her work will be accessible throughout different school years. This work is housed in a student portfolio, where students and teachers can view all submitted essays, score progress over time, and time spent on lessons.
Yes – the entire site is available on tablets.
Yes. Please contact us
to schedule a demonstration of NC Write.
Minimum technical requirements for NC Write can be found here
NC Write is priced per student per school year. With each NC Write license, students and teachers have unlimited access to the program, with no limitations on essay submissions or time of use. All that is needed to log in is a unique user name and password, and the system can be accessed 24/7 from school, from home, or from wherever you can connect to the internet. For a quote, please click here
NC Write Usage Questions
NC Write includes a wealth of usage information in its HELP section, available for licensed subscribers. If you are a subscriber, please log in
using your account credentials.