The Use of Actuarial Risk Assessment in the Commitment of Sexually Violent Persons

Karen Lynn O'Brien, J.D.
Introduction

Most people are familiar with actuarial instruments by their use in the insurance industry. These statistically based number crunching formulas predict a person's likelihood of getting into any of the various situations about which insurance companies are concerned. The actuarial instruments are founded on the postulate that people with certain important factors in common are likely to have similar outcomes in a specified time period. Without these methods of prediction, insurance companies would be less capable of generating a profit.

Actuarial risk assessment instruments based on the same methodology that has been utilized by the insurance industry for decades have recently been applied to determinations of whether or not individuals should be involuntarily committed based on their likelihood to commit sexually violent crimes. Actuarial risk assessment evidence is presented in conjunction with results from clinical evaluations to prove that an individual's risk of reoffending is high enough to meet the statutory standard. This form of evidence is unique in that it is the first instance in the modern American legal system that statistical probabilities, based on the conduct of others, have been offered as relevant and weighty evidence to deprive an individual of his or her liberty. Sexually Violent Person Commitment Law

Legislation to civilly commit sex offenders who have finished their prison terms has been a trend since the 1990's. The concept of these laws is that sex offenders who have a high risk of recidivism, defined by the legislature in these statutes, should be prevented from being released into society although they have completed the punishment imposed on them by the criminal law.

Courts have confirmed the constitutionality of sexually violent persons legislation, and the propriety of it is beyond the scope of this discussion. There has also been judicial approval of the two methods of assessing risk: clinical and actuarial.

The State must prove three basic elements in a sexually violent person commitment, and although the verbiage of the elements varies between states, the concept is basically the same. The elements the State must prove are:

1. The respondent to the petition was convicted of a sexually violent offense, adjudicated delinquent for a sexually violent offense, or has been found not guilty be reason of insanity of a sexually violent offense.

2. Dangerous because he or she suffers from a mental disorder

3. That mental disorder makes it substantially probable that the person will engage in acts of sexual violence.

Legislatures and those who pursue the commitment of sexually violent persons claim that the law is only enforced with regard to the most dangerous sex offenders. Persons eligible for commitment under sexually violent persons laws are evaluated by the state to determine whether or not their commitment should be sought. Evaluations are generally performed by mental health professionals involved in the department of corrections or working for the state, and the commitment proceedings are of course carried out in a court of law. This assertion that only the most dangerous sex offenders are civilly committed is based on four assumptions concerning probability that the harm will occur if they are not committed. Two of these assumptions are scientific, two of them legal. The scientific assumptions are that the probability of dangerousness is susceptible of measure and that there is a way to discriminate between predictions of higher and lower probability. The legal assumptions are that there are standards that allow commitments based on the former while excluding confinement based on the latter that these standards are, in fact, enforced. The justification for having sexually violent persons legislation is thus premised upon a reliable determination of likelihood of recidivism.Risk Assessment and Predictions of Dangerousness

Some sexually violent persons statutes make determinations on risk of reoffending, and others on dangerousness. However, all actuarial risk assessments predict risk. The difference between risk and dangerousness is more than semantic. Risk has greater utility and more flexibility than dangerousness. It addresses the presence of a potential hazard and the probability of its occurrence. Dangerousness connotes a narrow but not precisely defined swath of human behavior, typically acts of interpersonal violence. \

Risk can capture a broader range including an offender's risk of eloping, violating parole, drinking, using drugs, developing depression, and committing suicide, among others. While discussions of dangerousness often conflate several distinct concepts (the probability and magnitude of harm), discussions of risk demand clarity about the specific type of behavior in question.

There are two sources of the shortcomings in risk assessment. The first is that humans have a limited ability to assess future risk of harmful behavior. Future is inherently unknowable, of course. But in many scientific areas humans have used empirical research to improve their predictions about the future for many years. As in any method of predicting the future, the quality of risk assessment is variable, and improvement is possible with the knowledge that comes with time and research.

The second source for the shortcomings of any risk assessment is the legal system of SVP laws. The risk thresholds justifying commitment are vague. Courts are still struggling to set standards that are reviewable and enforceable. And finally, there is no assurance that risk thresholds are uniform or that risk assessments are performed at the highest standards, although work is being done to accomplish standards in this area.

Risk Assessment testimony of any kind is evaluated at three stages of the litigation process. First there is the judicial threshold decision of admissibility. Here, judges consider the relevance, prejudice, and reliability of the evidence. Second, judges determine the legal sufficiency of the evidence: whether, if fully credited by the jury or other fact finder, it satisfies the legal standard for commitment. Finally, an assessment of the weight of the testimony is made. This inquiry is into whether, and to what extent, the witness's testimony is credible, and how much influence this testimony ought to have in deciding the ultimate question.Actuarial Risk Assessment

Actuarial scales are developed using statistical analyses of groups of individuals, sex offenders, with known outcomes during a follow-up period. The follow-up usually occurs when the subjects are either arrested for or convicted of a new sexual offense, or not identified as having committed a new sexual offense.

These analyses tell us which items, called "predictor variables," do the best job of differentiating between those who reoffended and those who did not reoffend within a specified time period. Some of these variables inevitably do a better job than others. Based on this information, the scientists developing the instruments determine how much weight should be assigned to each item. The variables are then combined to form a scale, which is tested on many other groups of offenders. The testing of the variables is called a cross-validation.

After the scale has been used on many samples with a sufficiently large number of offenders, the scores derived from the scale may be expressed as estimates of the probability that individuals with that score will reoffend within a specified time frame. The concept is that individuals who share the same characteristics on the scale will perform similarly when released.

At this point, a "life" or "experience" table is developed that provides probabilistic estimates of reoffense for each score, or range of scores, for different time frames (e.g. within 12, 36, 60, or 120 months). These estimates are usually expressed in terms of the percentage of individuals in the development and validation samples with a particular score who reoffended sexually within the specified time period. These actuarial scales are sometimes referred to as "mechanistic" because a statistical formula is used to derive an individual's overall score.

The "Experience Table" then is the focus in risk assessments. The individual to be assessed is scored on the factors, which are combined according to the formula, and the resultant risk score is compared to the table, which yields a probability representing the proportion of the reference group that that reoffended. What is said is that an individual with a particular score has characteristics that place him in a group of persons with the same score who were observed (over the follow-up period) to have given a probability, or frequency, of sexual recidivism.

The development and improvement of actuarial risk assessment instruments for predicting risk of reoffending in sex offenders has been rapid. Actuarial Risk Analysis has yet to take into account the changes in risk status that might be accomplished through effective treatment or well-designed community supervision. These are considered "dynamic" factors. Empirically-driven revisions to and support for existing scales emerge roughly every 12 months, as well as with a new wave of research on dynamic scales within the past several years.Clinical v. Actuarial Risk Assessment

Since clinical risk assessment evidence is always presented in a sexually violent person commitment trial, it is important that the differences between the two types of risk assessments be made clear. The distinctions between the two can be instrumental in both the admissibility of the evidence and the weight given to it by the trier of fact. In the clinical method the decision-maker, a doctor specializing in mental health, combines or processes information in his or her head. In the actuarial or statistical method, the human judge is eliminated and conclusions rest solely on empirically established relations between data and the condition or event of interest.

In the typical clinical evaluation, the expert examines the individual, gathers and reviews as much other information (e.g. medical and institutional records, court records, and other documents pertaining to criminal history) as possible, and applies his expertise to produce an opinion. In many situations, the opinion is expressed in terms that are of direct relevance to the legal question before the court: characterize the individual's level of risk, using the legally relevant terms such as "highly likely" or "substantially probable". In many jurisdictions' commitment proceedings, the clinical expert will also testify as to whether the subject has any mental illness.

There are many potential sources of error in clinical risk assessment. These include:
1. Ignoring or using incorrect base rates;
2. Assigning suboptimal or incorrect weights to information (e.g., overweighing "high profile" but relatively non-predictive information);
3. Failing to take into account regression toward the mean;
4. Failing to property take into account covariation;
5. Relying on illusory correlations between predictor variables and the criterion (i.e., basing decisions upon the presence or absence of information that is unrelated or only weakly related to the criterion);
6. Failing to acknowledge the natural bias among forensic examiners toward "conservative" judgments, defined as an increased potential for incorrect judgments of dangerousness associated with a reluctance to find someone not dangerous;
7. Failing to receive, and thus benefit from, feedback on judgment errors.

Comparative functioning findings demonstrate several strengths of actuarial risk assessment. Unlike clinical risk assessment, in which the ability of the examiner must be taken on faith, actuarial risk assessment allows a quantification of its accuracy, and a comparative examination of accuracy - a known rate of error.

Actuarial methods - Equal or Superior to Clinical Judgments

Grove study

This study consisted of an analysis of prior tests of actuarial instruments. Reported on a meta-analysis of 136 studies in which predictions by both human judges and "mechanical-prediction schemes" had been compared. In all instances, the predictions fell in the realm of psychology or medicine (i.e., all predictions involved human behavior or medical diagnoses), and in all instances the clinician and the actuarial expert had access to the same predictor variables and made their predictions on the basis of the same criterion. In only eight out of 136 studies was the clinical prediction superior to actuarial prediction. In the other 128 studies, either the results were comparable or actuarial prediction was superior. Actuarial prediction was found to be superior in 33% to 47% of the studies, depending on the type of analysis used. Whether the clinician had more data did not significantly alter the superiority of actuarial prediction. Moreover, in those instances in which the clinician had access to a clinical interview, the superiority of actuarial prediction was even greater.

Hanson Study

This study found very small correlations between clinical judgments and recidivism, and stronger correlations between actuarial assessments and recidivism. In an aggregation of 61 sexual offender recidivism studies (23, 393 subjects), the correlation between clinical assessments and sexual recidivism was .10. A correlation is a statistical number comparing the results that two numbers have in common. The higher the correlation, the more accurate the prediction tool. The correlation with violent recidivism was .06, and the correlation with recidivism in general was .14. Actuarial methods correlated .46 for sexual recidivism,.46 for violent recidivism, and .42 for general recidivism.

Bonta Study

The researches in the Bonta study performed an aggregation of 64 independent samples (derived from 58 studies) in order to examine predictors of recidivism for mentally disordered and non-disordered offenders. The relevant or governing offense was non-sexual in 97% of the cases. The correlation between general recidivism and clinical judgment ranged from .06 to .16. The correlation between general recidivism and "objective risk" (actuarial) assessment ranged from .34 to .44. Actuarial Risk Assessment Instruments

VRAG - Violence Risk Appraisal Guide

The VRAG is the most frequently reported actuarial risk assessment scale in the empirical literature. It was developed to assess violent recidivism, and is not necessarily the most relevant actuarial risk assessment tool for predicting recidivism of sex offenders.

This scale was initially based on a sample of 618 men, about 15% of whom were sex offenders, who had been committed, and later released, as mentally disordered offenders to the maximum security psychiatric hospital in Penetanguishene, Ontario for assessment or treatment. The men were followed after release to determine which engaged in any "violent" recidivism, an outcome variable that included all "hands-on" sexual offenses. The average time "at risk" in the community was about seven years.

Almost one-third of the sample committed a new violent offense during the follow-up period.

A large number of potential predictors of violence were examined and the following variables were selected as particularly related to subsequent violence:

1. Separation from parents before age 16
2. Elementary school maladjustment
3. Alcohol abuse history
4. Never married
5. History of nonviolent offenses
6. Failure on prior conditional release
7. Age at index offense
8. Victim injury in index offense
9. Male victim in index offense
10. Diagnosis of any personality disorder according to the Diagnostic
11. Statistical Manual of the American Psychiatric Association (DSM)
12. Diagnosis of schizophrenia according to the DSM
13. Hare's Psychopathy Checklist (PCL-R) Score

In an early study, the correlation between the VRAG and violent recidivism was .46. The two predictors with the highest correlations with violent recidivism were the Psychopathy Checklist (.34) and elementary school maladjustment (.31).

In a cross-validation study, the developers tested the VRAG using an independent sample of 159 sex offenders that were not included in the original construction sample. It yielded similar results. The correlation of the VRAG with violent recidivism was quite comparable (.47) to the correlation of .46 observed in the original study. Then, VRAG assessed the ability of the VRAG to predict outcomes that were limited to sexual recidivism, rather than violent recidivism not limited to sexual recidivism since the results were based upon sex offenders alone. The inquiry suggested that the VRAG does a better job at predicting violent recidivism (nonsexual as well as sexual) than at predicting general sexual recidivism, which inevitably includes many crimes that are on the low end of a violence continuum. In the cross-validation study, the VRAG's correlation with sexual recidivism was .20.

Although there is no uniformly accepted index of accuracy for predictive models such as the VRAG, the "AUC value" is generally regarded as an index that should be reported. The AUC value corresponds to the probability of accurately predicting that a randomly selected, truly dangerous individual is more likely to be scored as dangerous than a randomly selected, non-dangerous individual. Near-perfect accuracy in discriminating between dangerous and non-dangerous individuals would yield an AUC value that approached 1.00, while chance prediction would yield an AUC value of .50. Studies examining the AUC value suggest that the VRAG may have better predictive capabilities in terms of violent recidivism as compared to sexual recidivism.

The Mossman study examined 58 studies of violence prediction, finding that the median AUC value for all 58 studies was .73 and the weighted average was .78.

A similar study conducted by Rice and Harris determined that VRAG's AUC value associated with violent recidivism was .77, a result comparing favorably with the group of studies Mossman reported on.In the same study, the AUC value associated with sexual recidivism was .60, clearly suboptimal.

The mission for VRAG was to assess risk for interpersonal violence, thus it is not surprising that the VRAG falls short when it comes to differentiating among samples exclusively comprised of sexual offenders, many of whom have minimal or no history of physical violence. The VRAG variable with the greatest weight is the PCL-R score, and none of the twelve items capture the sexual pathology (e.g., sexually deviant thoughts/fantasies, intensity of sexual preoccupation with children, amount of contact with children) that should seem to be critical for most child molesters and some types of rapists. An attempt to predict sexual recidivism must take into account sexually deviant thoughts and behaviors, which is why the VRAG is not ideal alone for predicting sexual recidivism as it is predicting general violent recidivism.

SORAG

Developed by Vern Quinsley, Marnie Rice, and Grant Harris, the SORAG was compiled with the same construction principle as VRAG. The study examined the predictive efficacy of a large number of variables on a sample of child sex offenders and rapists (predominantly child molesters). The results found support for the predictive validity of ratings on the Psychopathy Checklist, penile plethysmographic assessment, and prior criminal history. Penile plethysmographic assessment is a physiological test that measures the flow of the blood to and from the penis and genital area while the subject views visual and/or auditory material, which is indicative of sexual arousal. Basically, the SORAG is the VRAG reconstructed with sexual recidivism as the object.

The new scale included eleven of the items on VRAG. Victim injury was dropped, and male victim was changed to sexual offenses only against girls under 14.

Three new items were added: History of violent offenses, Number of prior convictions for sexual offenses, and Deviant Sexual preference (phallometric test results).

An examination of the predictive efficacy of the SORAG with incest offenders proved that it worked as well as for non-incest offenders. When examining violent recidivism, the AUC value for the entire sample of the SORAG was .76, compared with .80 for incest offenders only. When examining sexual recidivism, the AUC value for the entire sample was .81, compared with .67 for incest offenders only.

RRASOR

The Rapid Risk Assessment for Sex Offense Recidivism scale was developed by R. Karl Hanson, who used aggregate data from eight follow-up studies that included 2,592 subjects. Hanson examined seven variables that had emerged as important from an earlier meta-analysis:

1. Official recorded prior sex offenses

2. Stranger victims

3. Any prior non-sexual offenses

4. Age (at time of release for those who were in prison and at time of evaluation for those in the community)

5. Marital Status

6. Any non-related victims (victims not having a biological, step, or foster relationship with the offender)

7. Any male victims (child or adult)

Then, Hanson chose the 4 variables that were most strongly associated with sexual recidivism:

1. Prior sexual offenses

2. Age at risk less than 25

3. Extrafamilial victims

4. Male victims

In the original study, RRASOR correlated .27 with sexual recidivism using the scale development samples. The AUC value was .71. Using a different cross validation sample, the scale correlated .25 with sexual recidivism and the AUC value was .67.

SACJ and SACJ-Min

David Thornton developed the SACJ and the SACJ-Min through exploratory analyses on several datasets in England. This actuarial risk assessment instrument is rated using a multi-stage process.

In the First Stage, documented convictions are coded in the following five areas:

1. Any current sexual offense

2. Any prior sexual offense

3. Any current non-sexual violent offense

4. Any prior violent offense

5. Four or more prior (distinct) sentencing occasions

If four or five of the first stage factors are coded as present, the offender is automatically classified as high risk.

If two or three factors are present, the offender is classified as medium risk.

If one or none of the factors are present, risk is considered low.

The Second Stage incorporates one of two sets of variables that are regarded as potentially aggravating factors.

Set A is relatively easy to code quickly and reliably. Set A - includes:

1. Any stranger victims

2. Any male victims

3. Never married

4. Convictions for non-contact sex offenses

The five Stage 1 items plus the four Set A items comprise the SACJ-Min: the minimum required for a valid assessment.

Set B, which is more time-consuming and difficult to score, includes:

1. Substance abuse

2. Deviant sexual arousal

3. Psychopathy

4. Placement in residential care as a child

SACJ-Min was validated on a different sample of approximately 500 sex offenders released from prisons in 1979. Follow-up data was collected on the complete sample after 16 years. In this validation study, the SACJ-Min correlated .34 with sexual recidivism and .30 with any sexual or violent reoffense.

Static-99

Hanson and Thornton collaborated to integrate the RRASOR and the SACJ-min to create the Static-99. This instrument is probably the most widely used actuarial risk assessment instrument and it includes only static variables. The year 99 suggests that the scale is a work in progress, and the most recent version of this instrument is the Static-2002 (below).

The Static-99 has 10 Variables: 8 of the 9 original SACJ-Min variables (all but current sex offense), all 4 of the RRASOR variables. Since two of the four RRASOR variables were also on the SACJ-Min, only two new variables were added to the eight SACJ-Min variables.

Validation studies have produced comparable support for the Static-99 and the SORAG, with AUC values of .71 (any offense), and .70 (any serious offense), and .70 (any sexual offense). In a recent cross-validation study on a large Swedish sample of 1,400 male sex offenders the AUC value was .74 for any violent offense and .76 for a sexual offense. The Static-99 in the same study predicted sexual recidivism comparably for child molesters (.76) and rapists (.75).

Static-2002

The Static-2002 is a result of the evolutionary process of science and ongoing development of increasingly accurate actuarial assessment tools. This tool contains13 risk predictors, three more than the Static-99. With five new items, coding changes to at least four other items, and one Static-99 item dropped, the Static-2002 is considered a substantially different scale.

MnSOST-R

The MnSOST-R is a tool developed by Douglas Epperson for use in connection with Minnesota's program of community notification. The tool consists of 16 items: number of sex convictions, length of sex offending history, whether offender was under supervision at the time of any of sex crimes, public place, use of force or threat of force, multiple acts on same victim, age of victim, age difference between offender and teenage victim, stranger victim, adolescent antisocial behavior, drug or alcohol abuse within one year of crime, employment history, discipline history while incarcerated, chemical dependency treatment while incarcerated, sex offender treatment while incarcerated, and age at release from institution.

Howard Barbaree

In the most extensive meta-analysis to date, Howard Barbaree conducted a comparative Analysis of five actuarial risk assessment procedures and the PCL-R.

The study resulted in strong support for the SORAG, which had the largest AUC value when predicting any serious reoffense (.73). The other tools were rated as follows: .70 for Static-99, .69 for PCL-R, and .58 for MnSOST-R. In predicting any reoffense, the study's results were most notably as follows: SORAG .76, VRAG .77, and the others ranged from .71 to .60. For predicting any sexual reoffense: RRASOR .77, Static-99 .70, SORAG .70, and AUC values of .65 to .61 obtained for the MnSOST-R, the VRAG, and the PCL-R.

Harris Study

In the most recent comparative study of actuarial risk assessment tools, the VRAG, SORAG, RRASOR and Static-99 were examined in four independent samples totaling 396 sex offenders.

The correlation between SORAG and violent recidivism was .39, and ranged from .31 to .37 across samples. The same correlation for the Static-99 was .21 ranging from .13 to .25 across samples. AUC Values for violent recidivism were for the SORAG from .69 to .67 and for the Static-99 from .60 to .67

As for sexual recidivism, AUC values for the SORAG ranged from .59 to .71, and for the Static-99, .54 to .67. Both scales performed slightly better for child molesters than rapists. Sexual recidivism for child molesters AUC Values were .70 for the SORAG, and .65 for the Static-99. For sexual recidivism for rapists, AUC Values were SORAG at .62, and Static-99, .59. Under "favorable conditions" (e.g., fewer missing items and a fixed follow-up time), the AUC values for prediction of sexual recidivism were as high as .79 for the SORAG and .76 for the Static-99. Admissibility of Actuarial Risk Assessment

Courts generally have three concerns about expert testimony: the connection between the testimony and the legal issue (relevance), the reliability of the evidence, and the potential prejudicial impact of the testimony. Actuarial risk assessment's use for sexually violent persons commitment has for the most part survived attacks that it does not meet Frye or Daubert standards. Frye and Daubert

There are two basic tests used in courts in the United States to determine the admissibility of scientific evidence. Some states have developed their own requirements, but many follow a variation of either Frye v. United States or Daubert v. Merrell Dow Pharmaceuticals, Inc.

The test in Federal Court, as well as at least ten states, was announced in Daubert, and has been codified as Rule 702 of the Federal Rules of Evidence. Under this standard, the trial judge is the gatekeeper of expert testimony. There are four factors to consider for an expert opinion to be admissible: 1) Whether there is a hypothesis that is scientifically valid; 2) Study subject to peer review and publication; 3) Known or potential rate of error; 4) General acceptance.

The Frye Standard for expert testimony was the precursor to Daubert, and is still followed in many states, including Illinois. Under Frye, expert testimony is admissible if the "scientific principle or discovery...[has] gained general acceptance in the particular field in which it belongs."

Expert Evidence Acceptance Standards

States accepting Daubert:

States accepting Frye:

States with their own tests:

Connecticut
Indiana
Kentucky
Louisiana
Massachusetts
New Mexico
Oklahoma
South Dakota
Texas
West Virginia

Alaska
Arizona
California
Colorado
Florida
Illinois
Kansas
Maryland
Michigan
Missouri
Nebraska
New York
Pennsylvania
Washington

Arkansas
Delaware
Georgia
Iowa
Military
Minnesota
Montana
North Carolina
Oregon
South Carolina

Utah
Vermont
Wyoming

Daubert can be more permissive to newly developing scientific methods, while it highly scrutinizes so-called soft sciences (clinical assessment).

Reliability is considered by every court, and due to the context, civil commitment of dangerous individuals, the reliability threshold should be high.

Although actuarial risk assessment is a serious enterprise, backed by sophisticated empirical methodology, courts must determine sufficient reliability, which is often evaluated by looking at the possible imperfections of the instruments. Some of the imperfections that can be attacked include: Small sample sizes, lack of cross-validation, inadequate number of peer-reviewed publications, absence of information on standard errors, and absence of manuals with standardized instructions for scoring.

Appellate Courts' Rulings on the Admissibility of Actuarial Risk Assessment Instruments

Three appellate courts have addressed the issue of reliability of actuarial risk assessment; two have admitted, and one excluded the evidence. In general, none of the three courts engaged in a sophisticated evaluation of the science underlying actuarial risk assessment. These courts appear to be saying, in a general way, "this science seems weighty," while the essence of the excluding court's reasoning was that the evidence about the science seemed rather thin.

All three seemed to evaluate reliability in the context of potential prejudice. Issue was phrased not "how accurate does risk assessment have to be to justify liberty deprivation" but rather "how accurate does it have to be to avoid potential prejudice arising from labeling actuarial risk assessment as 'science'".

The courts judged the potential prejudice of Actuarial Risk Assessment in part by its relationship to clinical risk assessment. Actuarial Risk Assessment was used in conjunction with a full clinical assessment in all three cases. The two admitting courts thought that this conjunction was significant in that it would serve to make clear to the jury that actuarial risk assessment was just another piece of information, passed through the judgment of the clinician, and in this way undercut its (undue) influence as "science". The rejecting court opined that the inclusion of Actuarial Risk Assessment in the clinician's information base would transform the otherwise admissible clinical judgment into potentially prejudicial "science." Note that all three courts thought that clinical judgments would be routinely admitted, even if Actuarial Risk Assessment were excluded.

In Re R.S.

In In Re R.S., the court held that actuarial risk assessment is "reliable for use in [sex offender commitment cases] as a aid in predicting recidivism." The court noted that "actuarial instruments are at least as reliable, if not more so, than clinical interviews."

It reasoned that since expert testimony concerning future dangerousness based on clinical judgments alone has been found sufficiently reliable for admission into evidence at criminal trials, reliable for admission into evidence at criminal trials, the court found it logical that testimony based upon a combination of clinical judgment and actuarial instruments is also reliable. The court noted that not only does actuarial evidence provide the court with additional relevant information, in the view of some, it may even provide a more reliable prediction of recidivism.

Court did not delve into the controversy about the adequacy of the science. According to the court, reliability is contextual. "What constitutes reasonable reliability depends in party on the context of the proceedings involved." The New Jersey court, where commitments are tried to a judge, and not a jury, balanced the reliability of the evidence against its potential prejudice.

NJ Supreme Court affirmed, and expressed that it was impressed with the weight of the science - "the extensive expert testimony in this matter concerning validation studies, cross-validation studies, correlation coefficients, and clinically-derived factors attests to reliability in this context." The Court strongly suggested that its holding might be limited to the use of actuarial risk assessment only as part of a broader clinical evaluation, and that actuarial risk assessment was not to be used as a litmus test.

In Re Holtz

In In Re Holtz, the Iowa intermediate appellate court, sitting in banc, approved the admissibility of actuarial risk assessment. The court noted also that reliability is contextual: "the amount of foundation necessary to establish reliability depends on the complexity of the testimony and the likely impact of the testimony on the fact-finding process." No independent review of the science after the trial court level is conducted. The court admitted the actuarial risk assessment testimony, but warned that "the instruments were used in conjunction with a full clinical evaluation and their limitations were clearly made known to the jury."

People v. Taylor

In this Illinois case, the court first determined that actuarially based testimony is subject to a Fryeanalysis. Under Illinois precedents, clinically based psychological testimony is not subject to Frye. The court rejected the approach of several other courts that exempts hybrid clinical-actuarial testimony from Frye.

The state had failed in its burden to establish that the actuarial instruments relied upon had achieved the level of validity required for admissibility. "The instruments are in the experimental stages and the validity of these instruments has not been established."

With respect to the MnSOST-R, the court noted that the developers had not "released the raw data upon which the MnSOST-R was based, and other researchers have not had the opportunity to replicate and scrutinize the study." State did not introduce sufficient "statistical evidence demonstrating the reliability and accuracy of these instruments."

Court expressed concern about "frequent scoring inconsistencies by different evaluators" and the absence of any "rules on the methods or procedures to combine the results of the various instruments and what weight should be placed upon the instruments in evaluating sexual offender recidivism."

Finally, the court concluded: "Lacking a threshold showing of any indicia of validity, these instruments should not be presented to the jury as 'science.'" The state's witness had claimed that "these instruments are more accurate than pure clinical judgment," but the court refused to credit this testimony, reasoning that the state's witnesses "offered no support for his conclusions" other than his "own assertions".

This case has not actually excluded actuarial risk assessment from use in sexually violent persons litigation. In Cook County, evidence from actuarial risk assessments is presented at every sexually violent persons trial. There have been several subsequent Frye hearings where trial courts have concluded that the instruments are generally accepted in the relevant scientific community.

General Acceptance

When courts evaluate the general acceptance of a science, they must determine: what scientific principle is involved, whether the principle or some aspect of the principal is novel, what the relevant scientific community is, and whether that community accepts the principle.

Courts have taken four distinct approaches as to whether actuarial risk assessment is a novel science:

1. ARA and CRA should be analyzed separately for admissibility purposes.

2. Sidesteps the question of ARA novelty by characterizing the use of actuarially-derived information as just another element of the clinical judgment

3. Supplementing a clinical evaluation with ARA undercuts the exemption of "pure clinical" testimony, and subjects the entire judgment of the expert to Frye

4. Required in Federal Courts, under Kumho, subjects all expert risk assessment - clinical, actuarial, and mixed - to admissibility vetting.

There is vehement disagreement about whether actuarial methods in general, or specific instruments in particular, are well enough developed to be used in the liberty-deprivation context of sexually violent persons cases. Some researchers urge the "complete replacement of existing practice with actuarial methods," and suggest that the use of clinical methods, where actuarial ones are available, would be "unethical." Other scholars conclude that "even the best studied and validated actuarial tools for assessing dangerousness... has not been demonstrated as suitable for practical purposes in many instances, or to be superior to clinical assessments."

There is no solid evidence about the degree of acceptance among experts of actuarial methods. Usage rate is fairly high, at least in SVP cases, but others report that "most professionals continue to use a subjective, clinical judgment approach when making predictive decisions."

Some scholars conclude that "there currently are no widely accepted professional standards or guidelines regarding what constitutes the most appropriate approach to conducting sex offender risk assessments." These authors suggest that courts should only look to the novel aspects of actuarial risk assessment, as compared to clinical risk assessment.

It is important to note that statistical theories based on the same science as actuarial risk assessment instruments have been applied in numerous, diverse context, including weather forecasting, law school admissions, disability determinations, predicting the quality of the vintage for red Bordeaux wines, and predicting the quality of sound in opera houses. This is often the most persuasive argument in favor of the general acceptance of likewise statistically based tools.

Relevance/Fit

When evaluating evidence based on actuarial risk assessment, courts must consider the evidence's relevance or fit: Whether the risk that is measured by the actuarial risk assessment tools is the same as the risk that must be determined under the governing law.

It should be noted that fit questions are possible to raise only because of the relative precision and transparency of actuarial risk assessment. In the clinical method, by contrast, the clinician translates empirical research into risk assessment testimony. The relationship between the outcomes measured in the research and the outcomes of interest in the courtroom may be obscured by the opacity of the clinician's expert judgment.

There are two types of problems concerning fit: outcome-measure problems, and group based problems.

Actuarial Risk Assessment rests report on the probability of a certain outcome; if this outcome is defined differently from the outcome of interest in the SVP law, then fit is imperfect. For example, Californiarequires an assessment of the risk of "predatory" sexual offenses. None of the existing actuarial risk analysis scales limits its outcome measure to "predatory" crimes, and some of the scales may be better at predicting imminent, relatively minor reoffenses rather than the long-term risk of severe crime. Some tools, such as the VRAG, measure the risk of violent recidivism including both sexual and non-sexual crimes. Under some sexually violent persons laws, the relevant question concerns risk in the short-term "under close supervision" or risk "with treatment," while current static actuarial risk assessment scales, in general, measure log-term stable risk and do not take changeable environmental factors into consideration.

Risk of recidivism that is estimated by actuarial risk assessment scales is based on aggregate or group data. The instruments tell the empirically measured rate of recidivism among a group of sex offenders who share a set of characteristics with the subject of the evaluation. Opponents claim that the group-based information is not relevant to the individual risk assessment required by law. They claim that there is not enough proof that this individual is the same as others in the group, merely because he shares common characteristics. Justice Coyne of the Minnesota Supreme Court has said: "Not only are the statistics concerning the violent behavior of others irrelevant, but it seems to be wrong to confine any person on the basis not of that person's own prior conduct, but on the basis of statistical evidence regarding the behavior of other people."

Broad Form of Group-Based Objection is that group probabilities are inherently different from predictions of individual behavior. This gives rise to a deep philosophical dispute about whether it makes any sense to speak of probability when applied to a single individual as opposed to a group. In reality, a given individual, released from prison, either commits another crime (risk is 100%) or does not (risk 0%). The broad form of the objection does not distinguish actuarial risk assessment from clinical risk assessment, because all prediction must be group-based, or at least group informed, otherwise it would merely be a guess. However, Actuarial Risk Assessment makes explicit what clinical risk assessment obscures: that prediction and risk assessment are inherently group-based exercises.

The Narrow Form of Group-Based Objection acknowledges that all risk assessment is inherently group-based, but complains that actuarial risk assessment is fixed or immutable, classifying people into predefined bins that are too rigid and fail to account for significant individual differences. All prediction necessarily treats individuals as abstractions, isolating "essential" features of the group, but critics argue that that actuarial risk assessment is especially defective, because the predictive scales are limited to a few, pre-determined items. This, while a clinician's expertise presumably allows him or her to choose the factors that he or she deems to be most salient for the individual (i.e., to mentally construct the most relevant reference group), actuarial risk assessment rigidly restricts its assessment to the pre-set factors. Clinical risk assessment, this objection goes, may necessarily categorize individuals, but at least it uses the most salient factors about the individual to construct its categories. In contrast, the pre-selected risk predictors of actuarial risk assessment may fail to account for some significant fact about the individual.

In In re Valdez, No. 99-000045CI, at 6 (Fla. Cir. Ct. Aug. 21, 2000) (order granting motion to exclude evidence), the trial court points out that none of the actuarial risk assessment tests "seem[s] to include whether the person has been or is being treated, whether he has been or still is incarcerated, is under house arrest, or is comatose, although to the unsophisticated, one or more of those factors would seem to bear heavily on future conduct."

There are three basic approaches for dealing with the "Fit" objection.

First, there are some, relatively rare, circumstances in which actuarial risk assessment should be disregarded in favor of clinical judgments.

Second, some commentators and practitioners advocate the "adjustment" of actuarial risk assessment scores to account for individualized (dynamic) risk factors. Under this method, the examiner adds or subtracts percentage points from the actuarial risk assessment results to reflect risk factors that (in the examiner's judgment) are not adequately reflected in the actuarial risk assessment result. But many commentators believe that this form of "adjustment" transforms actuarial risk assessment into clinical risk assessment, depriving actuarial risk assessment of its advantage over clinical methods.

Finally, the Weight Method recognizes that sometimes actuarial risk assessment simply does not answer the precise question asked by the SVP court. The proper approach is to recognize that the actuarial information is relevant to, but not dispositive of, the legal question. The lack of precise fit is accounted for in the reduced weight given to the actuarial risk assessment information, but not in a "modification" of that information.

Prejudice

If ARA is materially more prejudicial than clinical assessments, it might not be illogical to exclude the former what admitting CRA. There are three potential sources for prejudice:

1. Concern that the scientific and statistical nature of actuarial assessments will unduly influence the fact-finder into giving it more weight and credibility than it deserves, and that the principle of "actuarial superiority" will exacerbate this tendency. This argument claims that the weaknesses of some ARA instruments are too complex for lay fact-finders to apprehend.

2. Some worry that juries will ignore the lack of "fit" between the actuarially derived risk and the legally relevant risk, thus giving ARA too much weight.

3. The "incriminating significance" of statistical probabilities is obscure.

Conclusion

In conclusion, actuarial risk assessment is yet another resource for determining whether or not a person should be subject to commitment as a sexually violent person. This science is subject to the rules of evidence as any other science, and can be properly bolstered or attacked at trial by a well-informed attorney.

Sources:

Forensic Use of Actuarial Risk Assessment with Sex Offenders: Accuracy, Admissibility, and Accountability, 40 Am. Crim. L. Rev. 1443, 1443 (2003).

Kansas v. Hendricks, 521 U.S. 346, 358 (1997).

State v. Post, 541 N.E.2d 115, 132 (Wis. 1995).

In Re Blodgett, 510 N.E.2d 910, 917 n.15 (Minn. 1994).

725 ILCS 207/5

William M. Grove et al., Clinical Versus Mechanical Prediction: A Meta-Analysis, 12 Psychol. Assessment 19, 19 (2000).

Howard E. Barbaree et al.; Evaluating the Predictive Accuracy of Six Risk Assessment Instruments for Adult Sex Offenders, 28 Crim. Just. & Behav. 490, 492 (2000).

John Monahan et al., Rethinking Risk Assessment: The MacArthur Study of Mental Disorder and Violence 7 (2001).

William M. Grove et al., Clinical Versus Mechanical Prediction: A Meta-Analysis, 12 Psychol. Assessment 19, 19 (2000).

Howard E. Barbaree et al., Evaluating the Predictive Accuracy of Six Risk Assessment Instruments for Adult Sex Offenders, 28 Crim. Just. & Behav. 490, 492 (2000).

Randy K. Otto & John Petrila, Admissibility of Testimony Based on Actuarial Scales in Sex Offender Commitments: A Reply to Doren, 3 Sex Offender Law Report 1 (2002); Terence W. Campbell, Sexual Predator Evaluations and Phernology: Considering Issues of Evidentiary Reliability, 18 Beh. Sci. & L. 111, 128 (2000).

To comment, please sign in to your Yahoo! account, or sign up for a new account.