the blog of Carol Burris

Why the NY VAM measure of high school principals is flawed

The New York State Education Department has been working on creating a VAM measure of high school principals to be used this year, even though its parameters have not been shared with those who will be evaluated.  It was just introduced to the Board of Regents this month.

Below is a letter that I sent to the Regents expressing my concerns.  Thanks to Kevin Casey, the Executive Director of SAANYs, Dr. Jack Bierwirth the Superintendent of Herricks, and fellow high school principals Paul Gasparini  and Harry Leonardatos for their review and input.

May 18, 2013

Dear Members of the Board of Regents:

It is mid-May and the school year is nearly over.  High school principals, however, have yet to be informed about what will comprise our VAM score, which will be 25% of our APPR evaluation this year. A Powerpoint presentation was recently posted on the State Education Department website, following the April meeting of the Regents.  The very few ‘illustrative’ slides relevant to our value-added measure do not provide sufficient detail regarding how scores will be derived, or information regarding the validity or reliability of the model that will be used to evaluate our work. The slides also do not answer the most important question of all—what specifically does VAM evaluate about a principal’s work?

Upon seeing the slides, I contacted SAANYs and they provided additional information.  What I received raised more doubts regarding the validity, reliability and fairness of the measure. I will be most interested to read the BETA report when it becomes available. More important, it is apparent that like the 3-8 evaluation system, there may be unintended consequences for students and for schools.

Construct validity is the degree to which a measurement instrument, in this case the VAM score, actually measures what it purports to measure.   The measure, therefore, should isolate the effect of the high school principal on student learning, to the exclusion of other factors that might influence student achievement.  Because this model does not appear in any of the research on principal effectiveness, we do not know if it indeed isolates the influence of a high school principal on the chosen measures outside of the context of factors such as setting, funding, Board of Education policies and the years of service (and therefore influence) of the individual principal.

Simply because AIR can produce a bell curve on which to place high school leaders, it does not follow that the respective position of any principal on that curve is meaningful. That is because the individual components of the measure, which I discuss below, are highly problematic.

The First Component—ELA/Algebra Growth

The first proposed measure compares student scores on seventh and eighth grade tests against scores on two Regents exams—the Integrated Algebra Regents and the English Language Arts Regents.  It is a misnomer to call it a growth measure.  The Integrated Algebra Regents, which is taken by students between Grades 7-12, is a very different test than the seventh and eighth grade math tests.  It is an end or course exam, not one that shows continuous growth in skills.  Because it is a graduation requirement, some students take it several times prior to passing.

Because many students take the Integrated Algebra Regents in Grade 8, the amount of data points with which to compare principals will also vary widely across the state.  For example, if you were to use the Integrated Algebra scores of my ninth-grade students this year, you would have 14 scores of the weakest math students in the ninth-grade class. That is because about 250 ninth-graders passed the test in Grade 8.  You would have a few more scores if you included 10th-12th graders who have yet to pass the test. These scores would be the scores of ELL students who are recent arrivals, students who transferred in from out of state or from other districts, students with severe learning disabilities, or students with attendance issues. In many of these cases, there would be no middle-school scores for comparison purposes.

At the end of the day, perhaps there would be 20 scores in the possible pool.  How is that a defensible partial measure of the effectiveness of a principal of nearly 1200 students?  There are other schools that universally accelerate all students in Grade 8, and still others who accelerate not all, but nearly all eighth-graders.  There are still other schools that give the Algebra Regents to some students in Grade 7, thus further complicating the problem.

The second measure that comprises the Math/ELA growth measure compares similar students’ performance on the eighth-grade ELA exam and the ELA Regents.   Some schools give 11th graders that test in January and others in June.  That means that principal ‘effectiveness’ will be, as in the case of Algebra, compared using different exams at different times of the year.  The ‘growth’ in English Language Arts skills takes place over the course of three years in Grades 9, 10, 11.    Therefore, any principal who has been in her school for less than three years has only proportional influence on the scores.

The Second Component—The Growth in Regents Exams Passed

The second component of principal effectiveness counts the numbers of Regents examinations passed in a given year, comparing the progress of similar students.   This is a novel concept, but again there is no research that demonstrates that it has any relationship to principal effectiveness, and like the first measure, it is highly problematic.

First, not all Regents exams are similar in difficulty, although they are counted equally in the proposed model. There are 11th graders who take the Earth Science Regents, a basic science exam of low difficulty, and others who take the Physics Regents, which the state correctly categorizes as an advanced science exam. Both groups of students may have accrued the same number of Regents exams (5) and have similar middle-school scores (thus meeting the test of ‘similar student’), but certainly Earth Science would be far easier to pass.  Yet for each exam, the principal gets (or does not get) a comparative point.

And what of schools that are unusual in their course of studies?  Scarsdale High School only offers 6 Regents exams, choosing instead to have its students take rigorous tests based on Singapore Math. It also gives its own challenging physics exam in lieu of the Regents.  Will the principal of Scarsdale High School be scored ineffective because he cannot keep up with the count with his high performing students?  Or will he be advantaged in the upper grades when his high performing students are now compared to students with low Regents counts who frequently failed exams, thus disadvantaging the principals of schools serving less affluent populations?

The Ardsley School District double accelerates a group of students in mathematics.  Some students enter their high school having passed 3 Regents exams—two in mathematics and one in science. Who will be the ‘similar students’ for these ninth-graders?  How will the principals of portfolio schools, which only give the English Regents, receive a score?  Is the system fair to principals of schools with no summer school program, thus giving students fewer opportunities to pass the exam? How will a VAM score be generated for principals of BOCES high schools who give few or no Regents exams?  Will those Regents exams, taken at BOCES, reflect the score of the home school principal who has absolutely no influence on instruction, or the BOCES principal? The scores are presently reported from the home school.

The Board of Regents allows a score of 55 to serve as a passing score for students with disabilities. How will this measure affect the principals of schools with large numbers of special education students, especially those schools who have, as their mission, the improvement of the emotional health of the student, rather than the attainment of a score of 65?

The Unintended Consequences of Implementation

All of the above bring into question the incentives and disincentives that will be created by the system.  This is the most important consideration of all, because the unintended consequences of change affect our students.  Will this point system incentivize principals to encourage students to take less challenging, easier-to-pass science Regents rather than the advanced sciences of chemistry and physics?  Will schools such as Scarsdale High School and portfolio schools abandon their unusual curricula from which we all can learn, in order to protect their principals from ineffective and developing scores?

Will principals find themselves in conflict with parents who want their children to attend BOCES programs in the arts and in career tech, rather than continue the study of advanced mathematics and science that are rewarded by the system?  Will we find that in some cases, it is in a principal’s interest that students take fewer exams so that they are compared with lower performing ‘similar’ students?  What will happen to rigorous instruction when simply passing the test is rewarded?  Will special education students be pressured to repeatedly take exams beyond what is in their best interest in order to achieve a 65 for ‘the count’? No ethical measure of performance should put the best interests of students in possible conflict with the best interests of the adults who serve them.

Most important of all, how will this affect the quality of leadership of our most at-risk schools, where principals work with greater stress and fewer supports?  School improvement is difficult work, especially when it involves working with high needs students.  This model does not control for teacher effects, therefore it is in fact a crude measure of both teacher and principal effects.   If the leadership of the school is removed due to an ineffective VAM score, who will want to step in, knowing that receiving an ineffective score the following year is nearly inevitable?

Why would a new principal who receives a developing score want to risk staying in a school in need of strong leadership, knowing that it will take several years before they can achieve substantial improvement on any of these measures?  The response that VAM is only a partial measure of effectiveness is hollow.  An additional 15% is based on student achievement, and the majority of composite points are in the ineffective category, deliberately designed so that ‘ineffective’ in the first two categories assures an ineffective rating overall.

We frequently see the unintended consequences of changes in New York State education policy.  The press recently noted a drop in the Regents Diploma with Advanced Designation rate, which resulted from the decision to eliminate the August Advanced Algebra/Trigonometry and Chemistry Regents.  The use of the four-year graduation rate as a high-stakes measure has resulted in the proliferation of ‘credit recovery’ programs of dubious quality along with teacher complaints of being pressured to pass students with poor attendance and grades, especially in schools under pressure to improve.  These are but two obvious examples of the unintended consequences of policy decisions. The actions that you take and the measures that you use in a high-stakes manner greatly affect our students, often in negative ways that were never intended.

You have before you an opportunity to show courageous leadership. You can acknowledge with candor that the VAM models for teachers and principals developed by the department and AIR are not ready to be used because they have not met the high standards of validity, reliability and fairness that you require.  You can acknowledge that even if they were perfect measures, the unintended consequences from using them make them unacceptable. Or, you can favor form over substance, allowing the consequences that come from rushed models to occur.  You can raise every bar and continue to load on change and additional measures, or you can acknowledge and support the truth that school improvement takes time, capacity building, professional development and district and state support.

I hope that you will seize this moment to pause, ask important questions, provide transparency to stakeholders and seek their input before rushing yet another evaluation system into place. Creating a bell curve of relative performance may look like progress and science, but it is a measure without meaning that will not help schools improve.  Thank you for considering these concerns.

Sincerely,

Carol Burris, Ed. D.

Principal of South Side High School

18 Responses to “Why the NY VAM measure of high school principals is flawed”

  1. Kathy Brown

    APPR in New York needs to be stopped in its tracks! It was ill-conceived and poorly implemented and is using scary algorithms to wreak havoc on our careers as educators and then ultimately our students. Please keep informing all of us so we can have meaningful dialog with our elected officials so they can help us fight this attack on New Yorkers’ Civil Rights.

    Reply
  2. carolcorbettburris

    No ethical measure of performance should put the best interests of students in possible conflict with the best interests of the adults who serve them.

    Reply
  3. Dayamonay

    For years I have taught the class of struggling students who have little support at home (which is important in primary grades). When they first began talking about value added, my question became who would want to teach those kids? It’s more work and the growth shows less. But they need teachers too.

    Reply
  4. Concerned Educator

    Dr. Burris has made the case for what every educator in NY knows — the top-down policy dictates of the NYSED non-educators have failed and continue to fail. And the victims are the kids, parents, teachers, administrators and the tax paying public. A once proud, effective, and exemplary state public education system has been reduced to chaos. “In chaos, there’s profit” = the NYSED mantra.

    Reply
  5. reality-based educator

    Carol,

    On one of NYC Educator’s blog posts about VAM and APPR, you made a comment that ultimately under the new teacher evaluation system nobody would lose their jobs because it won’t stand up to court challenges.

    As my colleagues and I have sat through meetings on the 100+ page Danielson rubric this week that has a lot of people scared they will get dinged on technicalities, as we discussed what APPR will look like in NYC after Commissioner King imposes a system June 1, as we looked at the requirements for Student Learning Objectives and talked about what the Growth Measures and Value-Added Measurements are going to look like, I couldn’t help but think back to your comment at NYC Educator’s blog:

    No one will lose their jobs because this system won’t stand up to court challenges.

    I wonder if the state educrats and Regents board members and others who have supported these new evaluation systems for both teachers and principals realize that by making the systems so complex and convoluted, they have sowed the seeds of destruction for these systems? Because it is hard to see how systems this unworkable stand up to either court challenges or voters.

    When it becomes clear to the parents all around the state how bad the intended and unintended consequences of this new system are, I bet the politicians are going to hear it from them.

    Given how scandal-ridden Albany is these days, I bet those politicians actually listen as well. Even the governor, with his political popularity falling by the month as measured in both the Marist and Quinnipiac polls, will be softened up to listen to parents and educators.

    The arrogance and hubris of Cuomo, Tisch, King and the other education reform proponents who have promoted these untested and unpiloted systems, who have attacked critics of these systems rather than acknowledged concerns and worked to assuage those concerns, will ultimately be the reason why these systems are dismantled sooner rather than later.

    John King has already blamed the news media for their failure to properly convey the complexity of these reforms to the public so that they can understand them (http://gothamschools.org/2013/05/10/alone-among-policy-heavyweights-vallas-conveys-reform-fears/)

    King is misguided, of course.

    As Paul Vallas told an audience of education reformers at the same meeting, “We’re losing the communications game because we don’t have a good message to communicate…”we’re trying to sell “a system where you literally have binders on individual teachers with rubrics that are so complicated … that they’ll just make you suicidal.”

    Indeed.

    One more thing: thank you for all the work you have done to illuminate teachers and the public about the dangers of these new evaluation systems.

    Reply
  6. Fred Smith

    Given the minimal information that the SED has provided (minimal is a euphemism), Carol Burris has devoted far more time and thought into understanding the presumed workings and implications of its latest model than the a amount that went into its design. The terrible likelihood is that the state will insist on rolling it out anyway.

    The fragmented Power Point description offers three ways to gauge a principal’s effectiveness using a combination of ELA and math exam results and subsequent performance on Regents. None of the details are spelled out. The state’s message seems to be that as long as data and quantification are thrown together in a cauldron, an objective procedure will emerge. Calling this a Beta report doesn’t cut it.

    More questions are raised by SED’s presentation than are answered given the multi-variate complexities involved in defining and evaluating leadership and student achievement and accounting for the uncountable number of confounding factors that influence school performance.

    Carol Burris has taken one more sow’s ear from SED and given us a silk purse of a response.

    Reply
  7. Robert Rendo

    “If the leadership of the school is removed due to an ineffective VAM score, who will want to step in, knowing that receiving an ineffective score the following year is nearly inevitable?”

    Carol, I found your appeal to be comprehensive and fair. Thank you for so many critical points.

    With regard to what you wrote above, the answer is simple:

    Those who ultimately step in under this model, which I think IS very intentional about removing education as a public trust, will be from private charter school management companies, and, at least in New York State, will no longer be subject to the same APPR system that engulfed the principals and faculty in the first place.

    The rules are set up to give charterization the freedom to do away with current paradigms, and that includes an elimination of much of the APPR system that caused the harm that triggered the usage of a private management company. It is not so cyclical because the pattern gets broken once the charter takes over. So far, very close to 20% of the nation’s charter schools easily and legally are required to be closed due to poor performance, but the agencies that monitor and regulate them turn their heads away in the name of deals that have been struck behind closed doors that revolve around money.

    So who will step in?

    The answer is obvious.

    Reply
  8. Amber

    Could be worse. I work in NC and last year received an “ineffective” rating based on test scores of children who weren’t even in my class, children I have NEVER taught. Administrative and peer observations were great. Personal growth plan completed, yet ineffective based on scores that I had no control over. Sad!
    We were given a song and dance about the benefits of VAM, but my opinion has always been that you will have a difficult time recruiting in a school that is 98% FRL and 78% ESL. They will never fix education until society ills have been addressed. To think that years of neglect can be erased in months by an effective teacher is niave. Why don’t we ever hold parents accountable?

    Reply
  9. David Greene

    Carol,

    It was indeed a pleasure reading your essay. So much has been written by outsiders who don’t really get it. You get it because you are on the inside looking out, feeling the pressures. Teachers and administrators both get it because they have to weight these outsider imposed pressures vs the real needs of their students. The regents who run the state ed dept, along with John King are also outsiders. They are not principals or teachers. I know a regent personally. He is a great guy, a true believer in the education we want, but he feels too inhibited politically do do more than state an objection or too.

    As for Regents exams….they have always been a joke. They are among the longest tenured standardized exams in the nation and have always varied in scope, value, and validity depending on subject and year. These tests by committee have often consisted of questions too long and too cumbersome for many students to do as well on as teacher and school based assessments. For decades, passing rates have been determined primarily on tutoring and test prep rates than actual learning.

    Now that jobs are being decided on the outcomes of these tests, which hardly matter in both Scarsdale and Ardsley High Schools, because of the high pass rates in both schools, ( I taught in the former and live in the latter’s district. My kids went to schools in both.) perhaps we can do more than choose not to follow these “regents” test based outside imposed VAT and APPR judgements. Perhaps we can instead look at them and demand the more authentic assessments not just the professionals deserve, but more importantly, our students deserve.

    Sometimes being on the defensive allows one to turn to the offensive stance that should have been taken long ago.

    Reply
  10. MichaelAronson

    One has to wonder, at this point, whether all of these “unintended consequences” really are unintended. To call them “unintended consequences” is to assume that our policymakers had the right idea but the wrong strategy or execution. But are we really all on the same side? Maybe our leaders are not incompetent, as we’d like to think, but rather very competent–at catering to interests that believe more in the free market than free public education.

    Reply
  11. Rick

    In Florida we are on our second year of implementing VAM. My strongest teacher received a low VAM because she willingly takes at risk students. During my observation this year I am scoring her in the highest range possible to off-set questionable VAM results. So much for interrater reliability…

    Reply
  12. Gabriel

    Greetings, I do believe your webhsite could be having internet browser compatibility issues.

    When I look at youir site in Safari, it looks fine however,
    when opening in I.E., it’s got some overlapping issues.

    I merely wanted to give you a quick heads up! Aside from that,
    excellent site!

    Reply

Leave a reply to Rick Cancel reply

Basic HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS