Yesterday, I posed these questions:
Here I have a couple of urns. The one on the left contains 70 red balls and 30 black. The one on the right contains 30 red and 70 black.
While you weren’t looking, I reached into one of these urns and randomly drew out a dozen balls…4 of them were red and 8 were black.1. If you had to guess, which urn would you guess I drew from?
2. What’s your estimate of the odds that you’re right?
3. Do you think you’re right beyond a reasonable doubt?
I stole this problem from the decision theorist Howard Raiffa, with some minor changes (he used bags instead of urns, and green and white balls instead of red and black — and he drew his twelve balls with replacement, rather than all at once, which has only a tiny effect on the probability). Here, with appropriate minor wording changes, is what Raiffa had to say:
At a cocktail party a few years ago, I asked a group of lawyers, who were discussing the interpretation of probabilistic evidence, what they would answer…
First of all, they wanted to know whether there was any malice aforethought on the part of the experimenter. I assured them of the neutrality of the experimenter, and told them it would be appropriate to assign a .5 chance to each urn.
“In this case”, one lawyer exclaimed after thinking awhile, “I would bet you drew from the left-hand urn”.
“No, you don’t understand”, one of his colleagues retorted. “The drawing was eight blacks and four reds, not the other way around”.
“Yes, I understand, but in my experience at the bar, life is just plain perverse, so I would bet on the left-hand urn!. But I am not really a betting man.”
The other lawyers all agreed that this was not a very rational thing to do — that the evidence was in favor of the right-hand urn.
“But by how much?” I persisted. After a while a consensus emerged: The evidence is meager; the odds might go up from 50-50 to 55-45, but “…as lawyers we are trained to be skeptical, so we would slant our best judgments downward and act as if the odds were still roughly 50-50”.
The correct answer is about 98%. Yes, the balls were drawn from the right-hand urn beyond a reasonable doubt. This story points out the fact that most subjects vastly underestimate the power of a small sample. The lawyers described above had an extreme reaction, but even my statistics students clustered their guesses around .70.
Now the audience here at The Big Questions is substantially more sophisticated than most lawyers and even most statistics students, and therefore quite a few correctly calculated the probability at 98%. (In Raiffa’s experiment, where the balls were drawn with replacement, the answer is 96.7%, which I changed to 98% in the above quote.) Several commenters also worried, as did Raiffa’s lawyers, that I might not have chosen the urn according to the equivalent of a coin flip. Fair enough, though I did indeed mean for you to make this natural default assumption.
To make this result a little more graphic, suppose you had the opportunity, on the first of every month, to place a bet that’s as close to a sure thing as this one is. Then you’d lose your bet only about once every four years or so.
Is Raiffa right that 98% is “beyond a reasonable doubt”? Given a reasonable interpretation of what “reasonable” means, I think the answer is pretty clearly yes. There’s not much in life that we can be more than 98% sure of.
If I were on trial for the crime of drawing from the right urn, I hope this evidence would be strong enough to convict me. If you’re unwilling to convict on this evidence, then you’re ipso facto willing to free 49 guilty men before you’ll convict a single innocent. According to the frequently cited Blackstone Standard, “it is better that ten guilty men escape than that one innocent suffer”. To let 49 guilty men escape is to go far above and beyond this standard.
A few more remarks:
- While 10 guilty men might indeed be the industry standard, the legal scholar Sasha Volokh has documented a long tradition that encompasses a wide range of numbers, some as high as 100 or more. It is difficult for me to believe that the largest of those numbers were ever meant to be taken seriously.
- Note carefully the wording of the Blackstone Standard: “It is better that ten guilty men escape than that one innocent suffer.” That is not at all the same thing as saying “It is better that ten guilty men escape than that one innocent be convicted”. A false conviction is indeed a form of suffering, but so is victimization at the hands of an acquitted criminal (or a criminal emboldened by the difficulty of obtaining convictions). So even by the Blackstone Standard, in order to minimize the suffering of innocents, it’s quite plausible you’d want to convict on substantially less than 90% certainty.
- Indeed, I learned from yesterday’s comments that as an empirical matter, potential jurors appear to set their cutoff for conviction at something like 70-74% certainty.
- 70-74% certainty sounds like roughly the right standard to me in a world where the police can be counted on not to take advantage of that standard by falsifying evidence against people they don’t like. Given that prospect, though, I think I prefer something a little tougher — though not as tough as 98%. I addressed this point in considerably more detail in my book More Sex is Safer Sex.
- One commenter suggested that if we adopted a “98%” standard, then 1 out of every 50 people on death row would be innocent. That’s not true, because under that standard, we’d convict everyone who’s 98% sure to be guilty plus everyone who’s 99% sure, plus everyone who’s 99.5% sure, and so forth. So among the entire convicted population, the fraction of innocents would likely be well below 2%.
It depends on how likely prosecutors are to find the right person to put on trial.
“One commenter suggested that if we adopted a “98%” standard, then 1 out of every 50 people on death row would be innocent. That’s not true, because under that standard, we’d convict everyone who’s 98% sure to be guilty plus everyone who’s 99% sure, plus everyone who’s 99.5% sure, and so forth. So among the entire convicted population, the fraction of innocents would likely be well below 2%.”
Much, much, below 2%. There would be many more certainties above 99% than there would in the 98% range. Suppose the evidence takes the form of a random number of balls, not just 12 — anywhere from 0 to 50. I’m too lazy to do the math, but I’d bet any number of balls over 25 would give a probability of over 99% (for one urn or the other) more than 9 times out of ten.
That is: there are lots and lots of cases where the evidence is so overwhelming that the probability is over 99.999%. But it would be hard to come up with a set of facts (real life, not urns) where the probability would be between 97.5% and 98.5%.
It is as Lukas says. I suppose another way to think about it is if we changed to “balance of probabilities”, or 50% burden of proof. If the prosecutors brought the same cases to trial that they do now, then all those that are currently convicted would be convicted, and a few others that norrowly escape under the reasonable doubt system. However, of course, prosecutors will bring new cases under the new system. As soon as enough evidence can be accumulated to get a reasonable chance of conviction, then it would make sense to prosecute that person. The important factor is how many people will “fit the bill” for a 50% chance? If these outnumber the guilty one, then there could well be many more innocent people in prison.
An example. A classic murder mystery in a stately home. The patriach is murdered! Of course, we all know that the butler did it, but each of the 10 guests had means, motive and opportunity. In a 50% system, you could prosecute any one of the 10 guests, plus the butler, and convict any one of them. The chances of getting the right man are slim. The prosecutors will stop looking when they have enough to prosecute one person.
Harold:
In a 50% system, you could prosecute any one of the 10 guests, plus the butler, and convict any one of them.
No you couldn’t. Given the facts as you’ve stated them, each guest has a 10% chance of being guilty.
As a mathematical problem, the probabilities are appropriate. As
a real-world “reasonable doubt” problem, more information is needed.
For instance, how were the urns filled? Suppose that whoever
filled them first counted out and placed the majority color into the
urn, then counted off and inserted the minority color. This could
have been done with no malice, simply as a way to make the job
easier.
In this case, our “random selection” is biased towards the
minority color, as it’s occupying the top layer of the candidate
marbles.
I would hope that in a legal liability situation, this possibility
would be checked, as it could result in a high probability of false
conviction.
Steve, I meant it to be clear that the butler did it.
Harold: Then I should have said 9%.
No. The butler has a 100% chance of being guilty, since he is the one who did it. All the others are innocent, but they have a 9% chance of being convicted.
Somew more waffle to get it sorted in my own head. Lets say that the 10 people have varying amounts of evidence against them, ranging from 60% for the nephew to 97% for the wife and 99% for the butler. If you set your burden of proof at 50%, you could succesfully prosecute any one of them. Say the police interview the nephew first. OK says inspector Knacker. I think we have our man. No need to look any further. He could say this whoever he picked first. With a 98% threshold, everyone is out except the butler. Inspector Knacker must continue to gather and sift the evidence, or perhaps call in someone with a few more little grey cells.
So the important criterion to determine how many innocent people will end up in prison is the ratio of innocent people that could be convicted to the guilty. I don’t know what this is, but it is clearly higher for a lower the burden of proof. If it follows a normal distribution, then it will rise very rapidly as you come down from the extremes.
I believe the probability of guilt given the evidence only needs to rise above 1/2 provided that my prior reflects a fair understanding of ‘reasonable doubt’ and ‘presumed innocence’ for the particular situation.
Suppose, for instance, that a person was arrested for the crime of antirationalism, and physical evidence was gathered and sent to the lab to be analyzed. The test results were positive. Furthermore, we know that the tests are positive for 100% of guilty people:
Pr[positive test | guilty] = 1.0
and for 10% of innocent people:
Pr[positive test | innocent] = 0.1
The evidence based on this test is 10:1 in favor of guilt, so is it reasonable to convict?
This person has no prior record, and is from a demographic in which only 1 in 20 people commit the crime of antirationalism. In this case, I would say the evidence does not go beyond reasonable doubt, because, if I apply a prior probability for innocence of 0.95, then
Pr[guilty | positive test] = 0.34.
I would convict, however, if I was told that only 5% of innocent people showed a positive test:
Pr[positive test | innocent] = 0.05,
because then the evidence would be 20:1 in favor of guilt, and this would be enough to overcome my presumption of innocence and result in
Pr[guilty | positive test] = 0.513.
That would be enough for me because I included the prior in my calculations, and the prior allows me to account for my understanding of what ‘reasonable doubt’ and ‘presumed innocence’ should mean.
The fact of the matter is that in the real world, you don’t have the luxury of knowing the distributions of each urn, if the problem really is that simple to begin with (in which case you’d bring on an expert statistician). You can only make educated guesses, and any deductions made therein are then based on those estimates which introduces, you know, reasonable doubts. That’s why the lawyers questioned the integrity of the experiment itself. If you know the population characteristics, which you never do, you’re probably omniscient anyway, in which case you hardly need the courts at all.
Harold:
Lets say that the 10 people have varying amounts of evidence against them, ranging from 60% for the nephew to 97% for the wife and 99% for the butler
The probabilities must add to one.
Thomas Bayes: I interpret “reasonable doubt” to apply to your posterior, so all the info about your prior is already built into it. Given that, I think all we need to describe our standard is a single number.
Harold: Lets say that the 10 people have varying amounts of evidence against them, ranging from 60% for the nephew to 97% for the wife and 99% for the butler
Isn’t “the amount of evidence against” a person just the probability that they did it given the available evidence? So shouldn’t these numbers sum to 1?
I am being unclear. I am saying that the amount of evidence against each person is sufficient to obtain a conviction with the burden of proof quoted. For the nephew, say, he was there, he was alone with the victim for 30 minutes before lunch, he stood to gain a small inheritance, had a small blood spot on his trousers. The prosecutor reckons this is enough to get a conviction if the burden of proof is 60% Since it is actually 50%, the prosecutor decides to go ahead with the trial, and has a good chance of convicting.
Of course, he should also investigate the other guests, and the butler, but since he has already got his conviction, he has no incentive to do so.
In my hypothetical case, lowering the burden of proof from 98% to 50% has meant that instead of 1 person being convictable, 11 people are. The greater the amount of “doubt” allowed, the less likely it is that the person in the dock is guilty.
You are a member of the jury in the prosecution of Prisoner Jo for murder. A video camera recorded a riot in which all but one of the prisoners in a yard attack and kill all the guards present. The video does not permit the viewer to determine which prisoner did not participate. Prisoner Jo claims to have been that prisoner.
Do you vote to convict on this evidence? Would your answer change if the number of prisoners in the riot was 9? Would your answer change if the number of prisoners in the riot was 90?
Imagine that the prosecutor introduces the testimony of Prisoner Mo. Mo also claims to have been the one prisoner that did not participate in the attack, thereby rebutting the testimony of Prisoner Jo. The prosecution concedes that it entered into a plea deal wherein the prosecution will not seek to convict Prisoner Mo on the condition that Prisoner Mo testify against all the other prisoners. Does this evidence alter your analysis?
I sense that legal theorists would decline to convict purely on probabilistic evidence alone, even if very strong, if they could not find some additional figleaf behind which to justify their vote. But probabilistic evidence combined with the barest, most self-serving scintilla of additional evidence would suffice for conviction.
(Variation on the theme: The crotchety old newspaper editor tells his ambitious young reporter, “Nice work. But when you’ve been in the business as long as I have, you know never to run a story saying ‘All elected officials are crooks”; we’d be tied up in defamation suits for years. So I’m altering the story to say, ‘All elected officials except one are crooks….’”)
Only on the prosecutor’s side. The jury gets to examine a set of facts selected by the prosecution and the defense, and that set of facts depends on who the prosecution decides to indict.
Harold: The definition of a 90% standard is that it’s a standard according to which 90% of those who are convicted are actually guilty.
The existence of the other ten guests is evidence that goes into the question of whether any one guest meets that standard.
If you’re saying that by lowering the standard, we invite the police to suppress relevant evidence (such as the existence of the other guests) then I agree with you — and I mentioned this concern in my post. But *if the standard were properly enforced* then none of your guests would meet a 90% standard.
nobody.really: Surely if there were a million prisoners in the yard, and it was known that all but one were guilty, nobody would have any hesitation about convicting any one of them. So we’re arguing about where the cutoff is, not about a separate principle.
I think Harold is talking about the case where we interview 1 person and GIVEN ONLY THIS think he is likely to have done it. More formally let N be the event of interviewing the Nephew, W the event of having inerviewed the wife and B the event of having interviewed the bulter. Then there is no particular reason for
P[Nephew did it |N]+P[Butlerdid it |B]+…+P[Wife did it |W] to be 1.
Of course P[Nephew did it |N]+ P[Butler did it|N]+…+P[Wife did it|N]=1, and if T represents having interviewed everyone P[Nephew did it |T]+ P[Butler did it|T]+…+P[Wife did it|T]=1.
Now in real life we’d like to interview everyone, but there are limits to how much we can investigate (interviewing everyone in a 100 mile radius might be excessive). So I think Harold’s point is that the order in which we gather the evidence matters.
There is an interesting follow-up question:
If we assume that the lower limit for reasonable doubt is 70%, what is the chance that a jury of 12 people will unanimously vote to convict an innocent man?
(Is it 0.3^12 or roughly 1 in 2 million? It’s been a long time since I worked with probabilities.)
To clarify my above post is assuming we decide to collect evidence until we’re somewhat sure of who did it, as oppossed to deciding on a set amount of investigating doing it and leaving it there.
Jonathan Kariv: Be that as it may, with a 90% standard, 90% of those convicted will be guilty — by the definition of a 90% standard (and barring unscrupulous manipulation).
Lukas:
Only on the prosecutor’s side. The jury gets to examine a set of facts selected by the prosecution and the defense, and that set of facts depends on who the prosecution decides to indict.
Yes, exactly. Though as long as the prosecutor presents the evidence honestly, the jury knows everything the prosecutor knows, so this is true on the jury’s side also. Your concern is not with the standard itself, but with the ease with which the standard can be manipulated — a separate issue that I did address in the original post.
As a conceptual matter, I don’t know that I agree.
As a practical matter, I know I disagree. When we observe murder conducted by very large numbers of people, I sense that we generally prosecute a few leaders at most, and occasionally convene Truth & Reconciliation Commissions for the rest. One murder is a crime; a million murders is a statistic.
@9 Thomas Bayes:
“I believe the probability of guilt given the evidence only needs to
rise above 1/2 provided that my prior reflects a fair understanding
of ‘reasonable doubt'”
You’re using the wrong standard. What you have defined is
“preponderance of the evidence”, the lowest standard used in civil
suits. That’s very different from “beyond a reasonable doubt” that’s
used in criminal cases.
To all Your “odds are strong that this is correct” is the “Clear
and convincing proof” standard. Again, not good enough for criminal
cases. e.g.: Person A claims to be a Powerball Lottery winner.
That must be a lie, since the odds are so amazingly low for any
given person to be a winner.
The “beyond a reasonable doubt” doesn’t mean “very likely”. It’s
been equated to “you better be damned sure”. Using the previous
example, it’s the prosecution’s responsibility to prove that
person A did not actually win the lottery.
Also note that the “10 guilty men” is a rhetorical statement of
policy, not a setting of a mathematical limit.
Nick: What you are saying is that lawyers, in practice, must solve problems far more difficult than this one.
That, I think, should make Raiffa’s experience more disturbing, not less. If lawyers can’t even solve the simple problems, why would we ever expect them to solve the hard ones?
I think what Harold is trying to say is that if at first glance you are 90% sure the person you interview is guilty then you would go on to prosecute. But on further investigation, turns out there was another person in the room meaning you’re now only 49% sure that the first person is guilty.
What this turns out I guess is upside down to what Harold thinks it does. Whether you require 89% or 50% confidence you would prosecute straight away. But if further investigation shows you are only 49% confidence you would not prosecute which just says if more evidence puts more doubt on the guilt of the person, then you are less likely to prosecute despite your definition of “reasonable doubt”.
I think I am working toward saying that the lower the standard, the harder it is to “properly enforce”. I am also getting mixed up between “90% who are actually guilty” and “10% doubt” in the mind of the juror. I am saying that if you lower the standard then more people become potential culprits. It is not suppression of evidence by the police, but a lowering of the quality of the investigation. It is an inescapable aspect of human nature. Sure, the proper way to conduct the investigation would be to gather all evidence and then decide who is most likely to have done it. If this were done, then the butler would be prosecuted. I am saying that it is much less likely that the butler will be investigated if there are several other people who can be convicted. This would all be done with the best intentions on the part of the police. Evidence is not suppressed, merely never gathered. Why would you carry on with an expensive investigation if you already have your man?
It would then be incumbent on the defence to offer the evidence that there were other people who could have done it. The defence is in no position to conduct this enquiry.
I will think on this some more.
The standard for reasonable doubt is not supposed to change based on the potential penalty associated with a guilty verdict, but in practice I think juries require a higher standard of proof for harsher punishments. An economist might endorse the idea of minimizing the total cost of errors rather than just the number of errors.
The only error I’m considering here is a false conviction. I have no idea if more draconian punishments have correspondingly higher benefits to society, but I doubt it.
If confronted with evidence that is based solely on mathematical analysis and that analysis provides that a circumstance exists in which a person is not guilty … that is reasonable doubt. This is because the evidence fails to prove anything. Thus, the prosecutor must go find some evidence connecting a person to the crime. Absent that, the case must be thrown out.
So … in answer to your question, the evidence failed to prove anything. It only demonstrated the likelihood of different scenarios. Big deal. Now you must offer proof.
Clifford Nelson: So are you saying you’d never convict anyone? Or are you saying you’d convict people who have a 96% chance of being guilty based on one kind of evidence, but not those who have a 98% chance based on another kind — in other words, that you’d prefer a world with more false convictions?
Steve: “The definition of a 90% standard is that it’s a standard according to which 90% of those who are convicted are actually guilty.”
Isn’t this a different thing? The probability of guilt given a conviction is a characteristic of the decision scheme, and is traded off with the probability of guilt given an acquittal. This is a ‘performance characteristic’ for the decision scheme; it is not equal to the probability of being guilty given the evidence.
Let’s look at the original example and say that a guilty person is one who draws their 12 balls from urn 1. A conviction is then guessing urn 1 based on the number of black balls in the 12 total. With equal priors for ‘guilt’ and ‘innocence’, there seems to be much agreement that 8 black balls are enough to ‘convict’, and this is because the posterior is sufficiently large:
P(urn 1 | 8 black, 4 red) = 0.98.
What if we see 7 black and 5 red? Evidently, this would not meet a 90% standard:
P(urn 1 | 7 black, 5 red) = 0.87.
What about 6 black and 6 red? No way:
P(urn 1 | 6 black, 6 red) = 0.50.
But what if my scheme is to ‘convict’ every time I see 6 or more black balls? Will this ensure that 90% of those who are convicted are actually guilty? I believe the answer is yes:
P(urn 1 | 6 or more black balls in 12) = 0.9036
P(urn 1 | 5 or fewer black balls in 12) = 0.0317
So, 90% of the people I convict will be guilty, and 3% of the people I acquit will be guilty.
If I change my scheme to ‘convict’ every time I see 8 or more black balls, then:
P(urn 1 | 8 or more black balls in 12) = 0.9920
P(urn 1 | 7 or fewer black balls in 12) = 0.2112
and 99% of my convictions will be guilty, but 21% of my acquittals will be guilty.
In between these I could set my threshold at 7 black balls:
P(urn 1 | 7 or more black balls in 12) = 0.9683
P(urn 1 | 6 or fewer black balls in 12) = 0.0964
so that 97% of my convictions will be guilty, as will 10% of my acquittals.
If it is sufficient that 90% of the convicted are guilty, then I would use a scheme that set my threshold at 6 black balls because that would meet the standard and have the lowest chance of acquitting a guilty person.
One factor I think we need to consider, though, is how our confidence in the answer can change with time.
Suppose a year after the Great Urn Robbery, someone develops fingerprinting techniques that can tell us with 99% accuracy which urn you took the balls from. If it confirms the right one, no biggie, but if it says the left one, then it starts to get trickier.
To me, this is a strong case against the death penalty (but still to convict people, obviously). Once you kill someone, if you find out they’re innocent, it’s hard to go back and say ‘Oops, my bad!’ If they’re in jail, we could at least slightly redeem ourselves as a population.
I realize that this is only tangentially related, but I think it needs to be factored in. Forensics is becoming more and more powerful in ways that would have been unheard of a few decades ago. DNA evidence has already exonerated people on death row and neuroscientists are working on crazy new techniques that could be just as standard in the future. However sure we are today, tomorrow, our opinions could be radically different.
This analysis is incorrect.
Given the assumption that you started out equally likely to pick from one of the two urns, the fact that you got what you got means that there is a 98% probability that you picked the second urn.
In the equivalent criminal case, this assumption is a bad assumption.
Imagine, say, that you match a suspect’s blood against the blood found at the scene of the crime. This blood matching test has a 98% chance of being accurate.
If you have two suspects, you know that one of them committed the crime, and only one suspect has matching blood, it is very likely that that suspect committed the crime (in fact, a chance even higher than 98%, because you are using the test twice, once on each suspect.)
However, if you start grabbing people off the streets, check their blood, and take the first person you find who fails the blood test, then say “he must be guilty because he matches the blood found at the scene of the crime with a 98% probability”, you’re wrong. The chance that he is guilty is almost nil, because it is not equally likely that a person grabbed off the streets is guilty or innocent, so even ruling out 98% of the innocent people still leaves you with a lot more innocent than guilty.
Or put it another way: A test to detect a fatal disease is 98% accurate. You take the test and it says you are sick. What is the chance that you actually are sick? Assuming that the disease is rare in the first place, the 2% chance of false positives is applied to a much larger group of people than the 98% chance of accurate positives and your chance of being sick is quite low. If the disease has a 1 in 1000 prevalence, then out of every 50000 people the test will find 49 out of 50 diseased persons, but it will also give positives to 999 out of 49950 healthy persons. The chance that given a positive result you’re actually sick is 49 / (999 + 49) or about 4.7%.
Super-Fly,
I would use the new information in my ‘prior’. No longer would I believe it was equally likely that he drew from each urn. The prior for urn 2 would now be
Pr[urn 2 | fingerprints on urn 1] = 0.01,
and the posterior would be:
Pr[urn 2 | 8 black, 4 red, fingerprints on urn 1] = 0.33,
or about one in three. This is one reason why it is good to understand the role of priors. They can be used to see how sensitive your decision method is to other information you may or may not learn. By the way, the 99% reliable fingerprints on urn 1 wouldn’t be enough to overcome seeing 10 black and 2 red balls.
It strikes me that the original Law School Admissions Test has morphed from a test of lawyer and law student probablity/statistical skills (no doubt pitiful) into a discussion of what level of proof we need to convict.
Steve:
“If I were on trial for the crime of drawing from the right urn, I hope this evidence would be strong enough to convict me.”
Yes, but what if you were on trial for a felony where there is a penalty for conviction, would you still be so hopeful?
I am saying that to convict you need evidence that connects the accused to the crime. The fact that he or she can be included in a very small group of people that all “may” have committed the crime is of no significance. Thus, even if I heard evidence that the accused was 1 of only 10 people (out of the billions of people on this planet) who could have committed the crime I would acquit. Think about it this way: If I didn’t acquit then all ten should be convicted for one crime.
In fact, your example left no room for the prosecution to even argue for a conviction since the evidence itself (without any doubt) left the jury unable to distinguish between more than one person as to their guilt of the alleged crime.
Ken Arromdee:
I’m not sure why you think your examples contradict the analysis.
The question I’m asking is: Suppose you’ve got a defendant who, based on a correct Bayesian calculation, is 98% likely to have committed the crime. Should you convict him?
You are making a point about how to do the correct Bayesian analysis. That’s fine, but it’s not (I think) in conflict with anything I’ve asked or said.
There is a big difference between saying that people will vote to convict at 70-74% certainty and saying they will vote to convict where there is a 70-74% probability of guilt. For example, in the case of the urn the consensus would have been that guilt was about 50-55% certain, whereas the actual probability of guilt would be 98%. My guess is that people’s assessment of their own certainty of guilt and the actual probability of guilt are not highly correlated.
Another issue is that guilty verdicts have to be unanimous, which means all twelve people would have to have a 70%+ level of certainty. In practice, that means that the average level of certainty for guilty verdicts will be higher, and perhaps significantly higher, than 70%.
Thomas Bayes writes:
Suppose, for instance, that a person was arrested for the crime of antirationalism, and physical evidence was gathered and sent to the lab to be analyzed. The test results were positive. Furthermore, we know that the tests are positive for 100% of guilty people:
Pr[positive test | guilty] = 1.0
and for 10% of innocent people:
Pr[positive test | innocent] = 0.1
The evidence based on this test is 10:1 in favor of guilt, so is it reasonable to convict?
The answer depends entirely on a variable we haven’t yet addressed: what fraction of accused people are guilty. If that fraction is small, then most of the hits will be false positives. Bruce Schneier has treated this problem at greater length:
http://www.schneier.com/blog/archives/2006/03/data_mining_for.html
John David Galt wrote:
—
The answer depends entirely on a variable we haven’t yet addressed: what fraction of accused people are guilty. If that fraction is small, then most of the hits will be false positives. Bruce Schneier has treated this problem at greater length:
—
I’m not sure why you stopped my quote where you did. This was precisely the point I was trying to make with the next words I wrote after the spot where you ended my quote:
“This person has no prior record, and is from a demographic in which only 1 in 20 people commit the crime of antirationalism. In this case, I would say the evidence does not go beyond reasonable doubt, because, if I apply a prior probability for innocence of 0.95, then
Pr[guilty | positive test] = 0.34.”
Prof. Landsburg, do you think the epistemological issues highlighted by the Gettier Problem have any bearing on this discussion?
when faced with this scenario in real life, any decent lawyer would hire a statistics expert to explain how the balls related to quantitative probabilities before opining one way or t’other. any lawyer who gave you a different answer than this is either a statistics expert in her own right or else not a good lawyer.
… or to put it another way (in my above example) you would have a 10% chance of convicting the person responsible and 90% chance of convicting an innocent person.
I would say a 90% chance that the accused is innocent is a reasonable doubt.
BTW, after I posed the question, I wondered if anyone had ever considered Gettier problems in a legal context, and after some searching, I found this: Michael S. Pardo, The Gettier Problem and Legal Proof.
I find that I have to look at a slightly different problem in order to think about this better – confidence in the evidence alone wasn’t giving me enough. The following probabilities are generated to solve for 10 guilty go free for each innocent sentenced. The analysis starts after the indictment…
Probability Indicted people are guilty: .85
Probability Indicted people are innocent: .15
Probability declared innocent given guilt: .6
Probability declared guilty given guilt: .4
Probability declared innocent given innocence: .992
Probability declared guilty given innocence: .008
The first four probabilities are assumed in order to get the last two. I think they are in the range of believability. I assumed that once you are indicted there is a very good chance you are guilty (p=.85). That is an important probability or the whole enterprise becomes difficult to maintain in remotely believable numbers. Next, I wanted to make sure that any solution had P(Innocent given guilt) < P(Innocent given innocence). That seemed only fair. One would hope that the innocent are more likely to be declared innocent — it also kept me from totally unrealistic solutions.
Folks can check my math if they want. If my probability of 10 men going free is right, then I feel pretty good about it. That calculation goes as follows: (.85*.6)**10
I'm startled by how much the guilty need to be found innocent to make this work. Also, when I look at my numbers I doubt very much that 99.2% of the innocent are found innocent. However, in order to lower that 99.2% number, we must increase the number of guilty being found innocent. This exercise shows the difficulty in living up to that 10 vs 1 ideal.
Once you have this loaded into a spread sheet you can change up the first four probabilities and see how your last two are affected.
Dave W: If the lawyer is advocating for his client’s best interest, then he will be incentivized to try to trick the jury and give a bogus sense of the statistics of the situation. Alan Dershowitz, generally thought to be a good lawyer, explained a bogus version of the statistics in the O.J. case, and he is heralded for helping his client:
http://expertvoices.nsdl.org/cornell-info204/2010/04/13/bayes-rule-misapplied/
I actually don’t know whether Dershowitz was trying to trick the jury or just confused, but either way nobody has suggested that he is not a good lawyer because of this episode.
In cases where statistics experts are relevant, lawyers on both sides hire statistics expert. The statistics expert on each side can say what the statistics expert on the other side is doing wrong.
The OJ case had nothing to do with sttistics.
The OJ case had everything to do with the fact that Los Angeles police lied too frequently over too long a period of time in the community from which the jurry was drawn. We know that OJ did it. We know this because we know that somebody investigating the case other than police (ie, a veritable army of journalists) would have dredged up some good evidence pointing away from OJ if he didn’t do it. Instead, the journalists were stymied. No policeman broke ranks and admitted that there was a conspiracy (because there was none). No DNA scientist admitted to faking test results or purposely contaminating samples (because that didn’t happen). About the only aspect of the case that was not fully explored was the nature of the relationship between Nicole Brown and Ron Goldman (presumably out of some kind of respect for their memories).
However, the jury is not allowed to consider the very factors that make us so sure that OJ did it. The only thing the jury can consider is the word of the police. Police in that time and place had tarnished their credibility so badly and so repeatedly that they were dismissed out of hand. Moreover, that was the right thing to do, if the jurors really felt that way about police. It simply doesn’t take many frame up jobs before there can be no justified convicttions in any case (or at least any case that relies on police testimony).
If group x has a history of lying, and a conviction absolutely depends on the testimony of group x, then there can be no conviction because there is reasonable doubt.
It might have been a different case if the prosecution had brought in the army of disinterested journalists to testify — that is how we know there was no police conspiracy here — but the prosecution didn’t do that, didn’t consider doing that, didn’t want to dignify the jury’s obvious concerns about the police. The prosecution paid the price for that bit of hubris and OJ reaped a windfall.
And let’s do a thought experiment. Imagine that at his cocktail party, Professor Landsburg had asked the lawyers the question about the balls and they all gave the correct answer, to wit, “I would consult a statistics professor to quantify before I formed my own opinion.”
In this thought experiment what is the probability (p) that Professor Landsburg would have blogged about the conversation here at his blog?
Dave W.:
You say, “The OJ case had nothing to do with statistics.”
What does this mean? The jurors were tasked with assessing the probability of O.J.’s guilt, taking into consideration all factors presented including evidence of prior dishonesty on the part of the police.
You say, “The only thing the jury can consider is the word of the police.”
Plus DNA experts, physical evidence, Kato Kalen, etc.
potential jurors appear to set their cutoff for conviction at something like 70-74% certainty . . . 70-74% certainty sounds like roughly the right standard to me in a world where the police can be counted on not to take advantage of that standard by falsifying evidence against people they don’t like. Given that prospect, though, I think I prefer something a little tougher — though not as tough as 98%.
Here is how to model the problem statistically:
Let’s say that we decide that jurors should vote guilty if there is a 90% chance of guilt. Furthermore, let’s say that a given case occurs where we know that if every potential juror in the world heard the trial, deliberated and their gave their opinion on the probability of the defendant’s guilt, then:
The mean would be 90% probability of guilt taken over the entire pop; the distribution would be a bell curve, ceneterd around 90%, with standard deviation between 85% and 95%.
Now we pick 12 jurors at random from this bell curve population and instruct them each to vote guilty only if they each individually think there is at least a 90% chance of guilt. What is the probability that there will be a hung jury? If the hung jury leads to another trial, with jurors from the same population (except the ones that heard the previous trial(s) are excluded), then what can we say statistically about how many successive hung juries would occur before a conviction would occur?
SIDE NOTE: I have long thought that jurors should cast their votes in terms of probability, rather than yes / no, and that verdicts and/or judgements should be based on statistical analysis of the juror’s probabilities, rather than by concensus (or lack thereof). In 1990, I suggested this idea on a law school exam. That did not work out well. I think you get something far more intellectually honest if you have jurors submit probabilistic answers. As one example, you can chop off the outliers.
[quote]The question I’m asking is: Suppose you’ve got a defendant who, based on a correct Bayesian calculation, is 98% likely to have committed the crime. Should you convict him?[/quote]
The question you wrote doesn’t say that the defendant is 98% likely to have committed the crime. It doesn’t give the probability that the defendant committed the crime at all–it invites the reader to calculate it. If I calculate it, don’t make the same assumptions as you, and don’t get 98%, I’ve still answered the question.
Ken Arromdee: If you look more carefully, you’ll see that I did specify (in the post you’re now responding to) that the urn was chosen pursuant to the flip of a fair coin.
I stand corrected.
But this Bayseian probability in real life 1) probably won’t be based on that kind of 50% chance and 2) will be a much more significant consideration in determining reasonable doubt than the question of exactly where the threshhold is. Or to put it another way, you’ve made an unrealistic assumption about the big issue and you’re now asking a question about a small issue; the insight that this can give you into jury rulings will of necessity be very limited.
The DNA experts were essentially police. The DNA experts weren’t experts who testify for the state in half their cases and for the defendant in the other half. The DNA experts are on team police and the jury knew that. To the extent the DNA experts were not on team police, in a realistic sense, the prosecutor needed to show that to establish thir cred in the eyes of the jury. I could imagine such a case. I can imagine putting a DNA investigator on the stand and asking him, “why should this jury not believe that you are just a functionary of the LAPD?” I can imagine the DNA expert having a good, convincing answer to that challenge. However, that is not what the prosecutor chose to do. the prosecution paid a price for that hubris and OJ reaped a windfall.
The physical evidence came from the police, which means that believing the physical evidence existed, and was what it purported to be, entailed believing the police.
As far as Kato Kailin: lol.
To add to what Ken said, making that assumption not only misses the bigger issue, but it results in a severe disconnect when you change the question to the very different one where choosing the wrong box is a crime. It’s hard to know which issue different people are addressing in comments.
It doesn’t matter if the urn was chosen with malice. It’s not like you are choosing the balls first, then assigning them to an urn. If you had deliberately chosen the left urn to through people off, then of course, the balls would have come out differently.