Talk:Conditional probability

From formulasearchengine
Jump to navigation Jump to search



The second paragraph of this article reads: "In the Bayesian interpretation of probability, the conditioning event is interpreted as evidence for the conditioned event [...] (In fact, this is also the Frequentist interpretation.)" Is there an interpretation of probability that rejects this idea? If not, the qualification on this statement is unnecessary to the point of being misleading. Even if there are major objections to the idea, this paragraph could be better structured. Nejssor (talk) 17:00, 4 December 2013 (UTC)

Graph 2004

The graph on this page implies the area is important, but is not. I'd suggest a 2d line. 2010-11-29 —Preceding unsigned comment added by (talk) 04:31, 30 November 2010 (UTC)

I've struck out the sentence about decision trees. There is certainly no sense in which conditional probability calculations are generally easier with decision trees. Decision trees can indeed be interpreted as conditional probability models (or not), but in any event, they are a very, very small part of the world of conditional probability, and making an unwarranted assertion about a minor topic is out of place. Wile E. Heresiarch 17:13, 1 Feb 2004 (UTC)


Conditional probability is the probability of some event A, given that some other event, B, has already occurred


In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B, or vice versa, or they may happen at the same time.

This statement is totally confusing - if event B has already occurred, there has to be a temporal relation between A and B (i.e. B happens before A). --Abdull 12:50, 25 February 2006 (UTC)

I've reworded it. --Zundark 14:32, 25 February 2006 (UTC)
Great, thank you! --Abdull 11:24, 26 February 2006 (UTC)

Since the subject of the article is completely formal, I dislike the references to time, expressions like "temporal relation" or one event "preceeding" another, because I find them informal in this context. In the framework of probability space where we are working time is not formally introduced: what "time" does an event take place? In fact, when we specifically want to represent or model how our knowledge of the world (represented by random variables) is growing as time passes, we can do it by means of filtrations. And I feel the same goes for the "causal relation", in the article such notion is not defined formally.--zeycus 15:22, 23 February 2007 (UTC)

The purpose of this paragraph is to dispel the common misconception that conditional probability has something to do with temporal relationships or causality. The paragraph is necessarily informal, as a probability space does not even have such concepts. (By the way, contrary to your suggestion on my Talk page, this paragraph was added by Wile E. Heresiarch on 10 February 2004. The rewording I mentioned above did not touch this paragraph, it simply removed incorrect suggestions of temporal relationships elsewhere in the article. All this can be seen from the edit history.) --Zundark 08:36, 24 February 2007 (UTC)
I apologize for attributing you the paragraph. I understand what you mean, but I think it is important to separate formal notions from informal ones. So I will add a short comment afterwards. --zeycus 9:42, 24 February 2007 (UTC)

Undefined or Indeterminate?

In the Other Considerations section, the statement If , then is left undefined. seems incorrect. Is it not more correct to say that is indeterminate?

regardless of .

Bob Badour 04:36, 11 June 2006 (UTC)

It's undefined. If you think it's not undefined, then what do you think its definition is? --Zundark 08:54, 11 June 2006 (UTC)
Indeterminate as I said, the definition of which one would paraphrase to incalcuable or unknown. However, an indeterminate form can be undefined, and the consensus in the literature is to call the conditional undefined in the abovementioned case. There are probably reasons for treating it as undefined that I am unaware of, and changing the text in the article would be OR. Thank you for your comments, and I apologize for taking your time. -- Bob Badour 00:07, 12 June 2006 (UTC)

Something about this is bothering me. Suppose is normal standard. I am considering and , for example. Clearly . However, I feel that should be defined, and in fact equal to where is the density function of . In order to informally justify this, I would define and for any . Then, if I am not wrong, .

Suppose someone tells me that a number has been obtained from a normal standard variable, that it is 0 or 5, and that I have a chance for a double-nothing bet trying to guess which one of them it was. Shouldn't I bet for the 0? And how can I argument it, if not with the calculations above? Opinions are most welcome. What do you think? -- zeycus 18:36, 22 February 2007 (UTC)

I think you are absolutely right. However, the theory needed to obtain this is a lot more complicated than the theory needed to understand conditional probability as such. Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 19:33, 22 February 2007 (UTC)
It's already valid for continuous distributions, at least now that I've cleaned up the definition section. It's not usual to define P(A|B) when P(B) = 0, but if someone can find a decent reference for this then it might be worth adding to the article. --Zundark 11:25, 24 February 2007 (UTC)
I was not able to find any source defining when . I posted in a math forum, and after an interesting discussion someone gave a nice argument (a bit long to be copied here) justifying why it does not make sense. I consider the point clarified. --zeycus 15:30, 28 February 2007 (UTC)
I'm afraid these comments are almost entirely wrong. It is perfectly possible to condition on events of probability zero, and this is in fact common. Consider tossing a coin. If one does not know if the coin is fair or not, in the Bayesian world one assigns a probability distribution to the parameter p representing the probability of getting a head. This distribution reflects the degree of belief one has in the fairness of the coin. In the event that this distribution is continuous it is perfectly reasonable to condition on the event that , even though this event has probability zero. To define these conditional probabilities rigourously requires measure theory, and this approach agrees with the naive interpretation given in first level courses. A good reference is Basic Stochastic Processes by Zastawniak and Brzezniak. PoochieR 21:45, 6 November 2007 (UTC)
To repeat myself from above: Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 12:31, 7 November 2007 (UTC)
No, because we aren't dealing only with discrete distributions. --Zundark 12:39, 7 November 2007 (UTC)
Some of the most important modern uses of conditional probability are in Martingale theory, with direct practical applications in all areas of mathematical finance. It is simply impossible to deal with these without conditioning on events of probability zero, so I think it's important that you should include these. A way round would be to make it clear that the definition you have given is a naive definition, which only works for conditioning on events with probability > 0; however to give the definition which works for conditioning on any event requires the use of measure theory. The measure theoretic definition agrees with the naive definition where that is applicable. The natural way to express the measure theoretic formulation is in terms of conditional expectations, conditional on sigma-algebras of events; in this formulation , where is the indicator random variable of event A. A better reference than the one I gave before is: Probability with Martingales, David Williams, Ch.9. PoochieR 09:41, 8 November 2007 (UTC)
I am quite happy to edit your definition with respect to when but you cannot leave it as it is. The correct definition to make the discrete case correspond with the more general case is to define when . There are no problems then with the naive interpretation, and the benefit of agreeing with the more sophisticated approach. In many ways it is similar to the debates that used to go on regarding .PoochieR 18:16, 15 November 2007 (UTC)
Not to be too rude, PoochieR, but whenever is missing an obvious case: we clearly want ! Don't forget that the marginal distribution is, nonetheless, intended to be a distribution. ub3rm4th (talk) 10:10, 14 December 2008 (UTC)

This is a general encyclopedia. I think it's important to write a readable and accessible article, as far as possible, and as a matter of presentation, I think we do that best by limiting ourselves to discrete situations. The purely continuous cases (requiring integrals and such), and the mixed cases (requiring measure theory) can either be treated

  • further down in the article,
  • in separate articles,
  • or by reference to external sources like MathWorld.

--Niels Ø (noe) (talk) 14:00, 23 November 2007 (UTC)

About conditioning on zero-probability condition: see also Conditioning (probability). Boris Tsirelson (talk) 16:34, 14 December 2008 (UTC)

Use of 'modulus signs' and set theory

Are the modulus signs in the "Definition" section intended to refer to the cardinality of the respective sets? It's not clear from the current content of the page. I think the set theory background to probability is a little tricky, so perhaps more explanation could go into this section?

I absolutely agree.--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I may be wrong, but it seems to me that the definition is not just unfortunate, but simply incorrect. Consider for example the probability space with , the set of events and probabilities , , and . Let and . Then . However, . --zeycus 4:46, 24 February 2007 (UTC)

The text talks about elements randomly chosen from a set. The author's intent clearly is that this implies symmetry.--Niels Ø (noe) 08:29, 24 February 2007 (UTC)
Yes, you are absolutely right. But then, why defining conditional probability only in that particular case, when it makes sense and is usually defined for any probabilistic space with the same formula . --zeycus 8:43, 24 February 2007 (UTC)
I agree this is a weak section in the article; one should not have to guess about the author's intentions. Anyway, I think the idea is to gereralize from the fairly obvious situation with symmetry to the general formulae. Of course, that kind of reasoning does not really belong under the heading "Definition". Go ahead; try your hand at it!--Niels Ø (noe) 10:07, 24 February 2007 (UTC)
I've restored the proper definition. --Zundark 11:06, 24 February 2007 (UTC)

Valid for continuous distributions?

Two events A and B are mutually exclusive if and only if P(A∩B) = 0...

Let X be a continuous random variable, e.g. normally distributed with mean 0 and standard deviation 1. Let A be the event that X >= 0, and B the event that X <= 0. Then, A∩B is the event X=0, which has probability 0, but which is not impossible. I don't think A and B should be called exclusive in this case. So, either the context of the statement from the article I quote above should be made clear (For discrete distributions,...), or the statement itself should be modified.

Would it in all cases be correct to say that A and B are exclusive if and only if A∩B = Ø ? Suppose U={0,1,2,3,4,5,6}, P(X=0)=0 and P(X=x)=1/6 for x=1,2,3,4,5,6 (i.e. a silly but not incorrect model of a die). Are A={X even}={0,2,4,6} and B={X<2}={0,1} mutually exclusive or not?--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I wonder if that definition is correct. In the article mutually exclusive, n events are defined as exclusive if the occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1 events. Very similarly, in mathworld:
n events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others.
As Niels said, that is in fact stronger than saying . Somehow, I think the definition now in the article should be labeled as "almost mutually exclusive". Shouldn't we just say that and are mutually exclusive just if , and avoid all this fuss?--User:Zeycus 10:03, 20 March 2007 (UTC)
No answer in three weeks. In a few days, if nobody has anything to say, I will change the definition in the article.--User:Zeycus 14:03, 9 April 2007 (UTC)
Done.--User:Zeycus 8:30, 13 April 2007 (UTC)


Thank you for your fine work. However, it would be useful to more people if you would provide a definition of your math notation such as

An example?

Here's an example involving conditional probabilities that makes sense to me, and usually also to the students to whom I teach this stuff. It clearly shows the difference between P(A|B) and P(B|A).

As the example is currently in the article, there's no need to repeat it here.--Niels Ø (noe) (talk) 11:28, 16 December 2007 (UTC)

So that's my example. Do you like it? Should I include it in the article? Can you perhaps help me improve on it first?--Niels Ø (noe) 11:48, 13 February 2007 (UTC)

No replies for 10 days. I don't know how to interpret that, but I'll now be bold and ad my example to the article.--Niels Ø (noe) 09:25, 23 February 2007 (UTC)
Suggestions for improvement:
* It's a bit confusing that the odds of having the disease and odds of false positive BOTH being 1%. It would be better to have one be different, say 10%.
* I think some people (including myself) see things better graphically. You can represent the same problem (using your original numbers) as a 1.0 x 1.0 square, with one edge divided up into 0.99 and 0.01, and the other edge divided up into 0.99 and 0.01. Now you have one large rectangle (0.99 x 0.99) which represents the those that test negative and are negative, and a tiny rectangle (0.01 x 0.01) that represents those that are testing negative but are positive. The remaining two tall and skinny rectangles (0.99 x 0.01) and (0.99 x 0.01) represent those who are testing positive. One of those skinny rectangles represents positive and testing positive, the other represents negative and testing positive. Those are about the same size, so that would give you the half the false positive rate. I think exploding the rectangles, exaggerating the size of the 0.01 portons, and clearly labelling them would help to.
Clemwang 04:05, 22 March 2007 (UTC)

I agree with Clemwang that a diagram would make the explanation more intuitive.Jfischoff (talk) 01:03, 5 August 2009 (UTC)

So, my example has been in the article for about half a year now. My text has some imperfections - e.g. the way equations are mixed into sentences, which is grammatically incorrect. I hoped someone with a better command of English than myself might correct that, but nothing has happened. I wonder, did anyone actually read this example?

Replies to Clemwang: I don't think having all three probabilities equal 1% is really a problem. Of course, without making the example less realistic, one might let them be 1%, 2% and 3%, say. - I like the type of diagram you suggest; in my experience, they are good for understanding this type of situation, but (surprisingly to me), tree diagrams are more helpful for solving problems (i.e. fewer students mess things up that way). In this particular example, the graphical problem of comparing 1% to 99% is severe; the best solution would actually be if someone could come up with a meaningful example to replace mine, where the three probabilities are 10%, 20% and 30%, say.--Niels Ø (noe) 11:23, 17 September 2007 (UTC)

Is it just me or is there a typo in the example of the article where it gives the final result of false positives as .5% Shouldn't it be 50%, as it says on this page? 22:22, 29 September 2007 (UTC)

I'm not sure what you mean. It says at one place: "0.99% / 1.98% = 50%". Reading "%" as "times 0.01", it just says 0.0099 / 0.0198 = 0.50, which is correct.--Niels Ø (noe) 06:53, 2 October 2007 (UTC)

A comment on the (quite good!) example currently in the article: remove the sentence "With the numbers chosen here, the last result is likely to be deemed unacceptable: half the people testing positive are actually false positives." This is misleading and makes the example seem more mysterious than it is. The reason the probability of not having the disease conditioned on a positive test is so large is because the disease is *so* rare... Very little harm is done to the public in general, only the small percentage who tested positive. It is a small point but for the effect of increasing the wow factor of the example I think some of its simplicity is being hidden. If no comment in 10 days (today is july 14 2009), I will remove the sentence. —Preceding unsigned comment added by (talk) 07:45, 14 July 2009 (UTC)

I'm glad you like the example (I made it up!). I think that sentence serves a purpose: To help a reader get his/her mind arounhd the example and fully understand the difference between A|B abd B|A. Alternatively, the numbers could be tweaked a bit to make the false positives more acceptable - but I think it is better as it is.--Noe (talk) 09:04, 14 July 2009 (UTC)

Improving the Independence Section

Someone ought to expand the discussion of independence to n terms. For instance, three events E, F, G are independent iff:





And so on. Verbally, every combination k (as in n choose k) of the n events (k=2,3,...,n), must be independent for ALL of them to be independent of each other. Most textbooks I've seen include independence definitions for more than two events.

—The preceding unsigned comment was added by (talkcontribs).

This article is only concerned with independence as far as it relates to conditional probability. The general case is covered in the article on statistical independence. --Zundark 21:27, 20 July 2007 (UTC)

WikiProject class rating

This article was automatically assessed because at least one WikiProject had rated the article as start, and the rating on other projects was brought up to start class. BetacommandBot 03:52, 10 November 2007 (UTC)

P(A | B,C) etc

The article is all about the conditional probability of A given B. What about the probability of A given B AND C? (Plus extensions to more variables.)

Maybe the answer is blindingly obvious to people who know about the subject, but to an ignoramus with only three numerate degrees to his name, who nevertheless finds probability theory the most counter-intuitive maths he's ever studied, it's as clear as mud.

-- (talk) 18:17, 1 February 2008 (UTC)

A, B and C would be "events". An event is any subset of the "sample space" U, i.e. the set of all possible outcomes. What you call P(A|B,C) or P(A|B and C) would be , i.e. the probability of A given the event has happened. Here, is the intersection of B and C, i.e. the event that happens if both B and C happen at the same time. Confused? Try reading this again, with the following dicing events in mind:
A={X even}={2,4,6}
Then, .
As this is equal to P(A), in this case A happens to be independent of .
Did that help?--Noe (talk) 20:01, 1 February 2008 (UTC)
Well, partially, but I'd hoped for a formula in terms of conditional and marginal probabilities. Moreover, I think bringing time into it confuses the issue. For example, suppose A = "It rains today", B = "It rained yesterday", C = "It rained the day before yesterday". Clearly the events described in B and C can't happen "at the same time" in any sense, although the propositions B and C can both be true. P(A|B,C) then means "the probability that it rains today knowing that it rained on the previous two days". OK, suppose I know P(A), the probability of rain on any single day; P(A) = P(B) = P(C) because the day labels are arbitrary. Suppose I also know P(A|B), the probability of rain on one day given rain the previous day; P(A|B) = P(B|C) by the same argument. How do I work out P(A|B,C) in terms of these probabilities (and possibly others)?
You need to supply something like the probability of it raining three days in a row -- P(A,B,C|I); or alternatively, the probability of it raining both the day after and the day before a rainy day -- P(A,C|B).
Does C give you any more information about A than you already have through B ? Maybe it does, maybe it doesn't. It depends, given the data, or the physical intuition, that you're assessing your probabilities from. Jheald (talk) 12:35, 11 March 2008 (UTC)
BTW, the notation used here (intersection) is applicable to sets. When dealing with logical propositions such as the ones above, it's more appropriate to use the notation of conjunction: . However, most publications seem to use the comma notation instead. Also Pr(.) is often used nowadays for a single probability value, to distinguish it from p(.) or P(.) for a probability density. So Pr(A|B,C) would be my preference.
-- (talk) 00:02, 11 March 2008 (UTC) (formerly

Merge with Marginal distribution

The Marginal distribution article does not present enough information to stand on its own. It should be merged into this article.

Neelix (talk) 14:52, 13 April 2008 (UTC)

I disagree. Marginal distribution should be expanded. Michael Hardy (talk) 15:47, 13 April 2008 (UTC)
Oppose, per Michael Hardy. Marginal distribution is a sufficiently important and distinctive idea that it deserves its own article. Plus it's a rather different thing from a conditional distribution. Jheald (talk) 16:06, 13 April 2008 (UTC)

Maybe I should expand on this a bit. "Marginal probability" is a rather odd concept. The "marginal probability" of an event is merely the probability of the event; the word "marginal" merely emphasizes that it's not conditional, and is used in contexts in which it is important to emphasize that. So the occasions when it's important to emphasize that are very context-dependent. For those reasons I can feel a certain amount of sympathy for such a "merge" proposal. But on the other hand, just look at the way the concept frequently gets used, and that convinces me that it deserves its own article. Wikipedia is quite extensive in coverage, and it's appropriate that articles are not as clumped together as if coverage were not so broad. Michael Hardy (talk) 18:05, 13 April 2008 (UTC)

That makes does make sense. I will remove the merge suggestion. The marginal distribution article, however, still requires expansion. I will place a proper notice on that article. Neelix (talk) 18:10, 13 April 2008 (UTC)

First impression

I feel that the whole page needs rewriting. The following statement, for example, cannot be a definition because it contains many implications and does not make sense when taken alone:

Marginal probability is the probability of one event, regardless of the other event. Marginal probability is obtained by summing (or integrating, more generally) the joint probability over the unrequired event. This is called marginalization. The marginal probability of A is written P(A), and the marginal probability of B is written P(B). —Preceding unsigned comment added by (talk) 15:42, 14 May 2008 (UTC)

That paragraph was not optimally clear; I've tried to rewrite it, but there ought to be a section on Marginal probability in the main body of the article. An example with a table of joint probabilities in which the margins are the marginal probabilities might help to clarify this; something like:
     B1   B2   B3  | TOT
A1   23%  17%  31% |  71% 
A2   16%   4%   9% |  29%
TOT  39%  21%  40% | 100%
but preferably with something concrete, meaningful, and realistic, instead of the abstract Ai and Bj. The totals, given in the margins, are marginal probabilities.  --Lambiam 09:18, 19 May 2008 (UTC)

Distributions and variables?

The first sentence of the lead says:

This article defines some terms which characterize probability distributions of two or more variables.

I think this sentence is superfluous, and makes things harder than they need be. One can teach good parts of probability theory, including conditional probability, without ever mentioning distributions or variables. E.g., rolling a die, you have a space U={a,b,c,d,e,f} (representing 1,2,3,4,5,6, but I use letters to emphasize that I'm NOT introducing a random variable taking numerical values), a simple probability function P(i)=1/6 for all i in U, events like A and B being subsets of U, an extended probability function P(A) (in this symmetrical case, P(A) = n(A)/n(U) where n counts elements in a set). With A = {even} = {b,d,f} and B = {more than four eyes} = {e,f}, you can find P(B|A), say.--Noe (talk) 08:55, 18 September 2008 (UTC)

I completely agree. (talk) 16:47, 27 September 2008 (UTC)

Conditioning on a random variable

What does it mean to condition on a continuous random variable? The definition given here does not seem to extend to such cases. (talk) 16:22, 27 September 2008 (UTC)

See also Conditioning (probability). Boris Tsirelson (talk) 16:45, 14 December 2008 (UTC)

Other Considerations Section

I am removing the "other considerations" section. The full text of the section was as follows:

  • If B is an event and P(B) > 0, then the function Q defined by Q(A) = P(A|B) for all events A is a probability measure.

These remarks are stated completely out of context and I feel they do not belong here, at least without furhter explanation. Instead, I have moved the remark about P(A|B) being a probability measure to the article on probability spaces, where it fits more naturally. I have simply removed the remark about data mining entirely; perhaps a section on "applications of conditional proability" (or bayesan theory) could use it, but as an isolated remark it seems irrelvant. Birkett (talk) 08:04, 16 December 2008 (UTC)

condition should be called C not an A (in which case non condition is called C)

I think it would be much cleaner to keep condition as C. I know letters don't matter in math and it is contextual but nevertheless ...
It made me little confused when I read it first time. (talk) 15:54, 1 January 2009 (UTC)

Specific properties of sets ?

Hi all. Any source I am picking to look up conditional probability doesn't mention anything about the properties that events and need to pertain to. That is, it is always stated that it can be any and .

Now since events are mere sets of elementary outcomes of that probability space , wouldn't it make sense to add the restriction that for , needs to be a subset of since otherwise an elementary outcome not included in wouldn't be altered in its probability of occurring when had occurred?

Also see page 134 of this book: free book on probability

Don't mind me if I'm wrong ... totally not an expert.

--Somewikian (talk) 15:32, 23 January 2009 (UTC)

I don't get your point. What exactly would be bad about the set of probability space A not being a subset of B?MartinPoulter (talk) 17:05, 23 January 2009 (UTC)
I am not certain whether by 'probability space A' you mean the same thing as 'event A'. Anyways, I didn't read the aforementioned page in that textbook carefully enough (check it out yourself: textbook p. 134 or p. 142 in the pdf file I gave a link to above). The thing they are doing is deriving the equation for conditional probability and they do so using the condition that for an elementary outcome the conditional probability after occurrence of an event that does not include said elementary outcome is set to zero. This might sound more confusing than it actually is ... please check out the textbook I mentioned and let me know whether you think including this here would make sense (I find the derivation given in that textbook pretty nice and easy to grasp ... an I am a total amateur).--Somewikian (talk) 21:23, 23 January 2009 (UTC)
An elementary outcome not included in WILL BE altered in its probability of occurring when had occurred. Namely, its probability will VANISH! —Preceding unsigned comment added by (talk) 19:27, 23 January 2009 (UTC)
You're correct, but it might just be because the equation for conditional probability is derived and defined that way.--Somewikian (talk) 21:24, 23 January 2009 (UTC)
But this is quite natural. In the light of the new information, that outcome becomes impossible (known not to happen). What else could be its new probability, if not zero?
Logically thinking I agree with you absolutely ... I am just someone that likes his definitions right and complete. I think I will just add a section on deriving the equation given in this article - no harm in that I hope. --Somewikian (talk) 08:21, 24 January 2009 (UTC)

It is grossly wrong to say that A must be a subset of B in order that P(A|B) exist. If you were to insist on that, you would be throwing out MOST of the situations in which conditional probability is used. And also most of the situations in which you see it in elementary exercises in textbooks. Michael Hardy (talk) 17:12, 24 January 2009 (UTC)

Oh please I already said I got the section in that textbook I was referring to wrong. Did you just not see that or did you just have to make a point.
According to that textbook: the derivation of includes setting for each elementary event/occurrence the conditional probability to zero, as in .
Again though: I am not a stats buff. I am a complete amateur at stats - so please go and blame the authors of that book and not me since I am just citing their work.--Somewikian (talk) 17:36, 24 January 2009 (UTC)

There, I added a section on deriving the equation ... bash me for errors and please correct them while you're at it. --Somewikian (talk) 08:28, 25 January 2009 (UTC)


IF A and B are mutually exclusive event,are they independent event? —Preceding unsigned comment added by (talk) 09:05, 3 February 2009 (UTC)

Usually not. Drawing a card from a 52-card deck of playing cards, "Aces" and "Kings" are mutually exclusive:
P(A)=4/52; P(K)=4/52; P(A and K)=0; P(A|K)=P(K|A)=0;
but "Aces" and "Hearts" are independent:
P(A)=4/52; P(H)=13/52; P(A and H)=P(Ace of Hearts)=1/52=P(A)*P(H); P(A|H)=1/13=P(A); P(H|A)=1/4=P(H);
and "Aces" and "Unicorns" (which do not exist) are both mutually exclusive and (arguably) independent:
P(A)=4/52; P(U)=0; P(A and U)=0=P(A)*P(U); P(A|U) undefined(?); P(U|A)=0=P(U).
--Noe (talk) 14:16, 3 February 2009 (UTC)

Reducing sample space

The related Dutch article about conditional probability states:

In probability theory we use the term 'conditional probability' if we know that an event, say B, has happened, by which the possible outcomes are reduced to B.

However, on this page it is defined as the probability of some event A, given the occurrence of some other event B, without saying anything about statistical dependency. Further on it says that if A and B have no impact on the probability of each other, events A and B are independent, which means that P(A|B) = P(A).

This doesn't say anything about the situation being conditional or not; it only says that probabilities are in both perspectives the same. "given the occurrence of some other event B" is a perspective rather than a (limited) situation.

Furthermore, there is a fundamental difference between reducing the sample space and affecting probabilities. Throwing two dice don't affect each other's probabilities, but the total sample space is reduced by knowing the outcome of one die.

What is your opinion about the given definition on the Dutch page? Heptalogos (talk) 10:52, 13 February 2009 (UTC)

Other articles

There are quite a few articles that attempt to to deal with conditional probability, with varing levels of quality:

Perhaps some of these should be combined? —3mta3 (talk) 09:53, 27 June 2009 (UTC)

I totally agree. Now it is quite arbitrary which contains what. I think reorganizing, unifying, and probably merge somewhat into fewer articles should be necessary. (talk) 12:37, 3 November 2013 (UTC)


Given P(B|A) = 1, and P(B|~A)=p, can I find P(A|B) without knowing anything about P(A) or P(B)? I have an algorithm, which if correct, gives the correct result all the time. If incorrect, it gives the correct result 10% of the time. It seems that if I get a correct result, I should be able to have some confidence that the algorithm is correct. Intuitively, I think it should be possible, but I can't seem to formulate it right. —Preceding unsigned comment added by (talk) 17:21, 10 July 2009 (UTC)

Short answer: no, it depends on P(A). Longer answer: See Bayes theorem or ask at Wikipedia:Reference desk/Mathematics. —3mta3 (talk) 17:47, 10 July 2009 (UTC)


This article would be very confusing to someone who is not mathematically minded (e.g. me). When reading the article I was confronted with a large wall of formulae with no clear explanation of what they meant. There needs to be a section in this article with a clear explanation with examples of what conditional probability is and how it can be used by a non mathematically minded person without the use of complex formulae. —Preceding unsigned comment added by (talk) 10:38, 3 August 2009 (UTC)

I totally agree! The terminology and pictography is initially unfathomable :( We're not here to show off guys, we here to share information in a digestible format.Dickmojo (talk) —Preceding undated comment added 11:40, 21 February 2012 (UTC).

What does this notation mean and does it make sense?

Does the notation above mean the following?

If not, what does it mean?

If B is an uncountable set of measure 0 and all the terms of the sum are 0, then what is meant? I don't see how the proposed definition (if a definition is what it is supposed to be) makes sense. Michael Hardy (talk) 11:00, 9 September 2009 (UTC)

I agree, it doesn't make sense in its present form. Moreover, trying to define such a quantity misses the whole point of the Borel–Kolmogorov paradox (i.e. that one should condition on a sigma algebra, not an event). I would suggest using the formal definition on Conditional_expectation#Definition_of_conditional_probability (though I was the one who wrote it, so am somewhat biased).–3mta3 (talk) 12:10, 9 September 2009 (UTC)
Being equally biased, I want to say that I did my best explaining all that in Conditioning (probability). Boris Tsirelson (talk) 13:55, 9 September 2009 (UTC)
I think Michael Hardy meant
In any case I think the approach to justifying the measure zero case using the limit of conditioning on a set that shrinks to the required set, as in Conditioning (probability) should be said explicitly as it understandable without measure theory. Melcombe (talk) 16:28, 9 September 2009 (UTC)
Yes, but let us not forget that the limit depends crucially on the choice of the shrinking sequence of non-negligible sets. (This is also the origin of the Borel–Kolmogorov paradox.) In some lucky cases we have a "favorite" sequence; in other cases we do not. Boris Tsirelson (talk) 17:32, 9 September 2009 (UTC)
Another problem with such approach is that it can lead to a non-regular conditional probability (also generating paradoxes); see Talk:Regular conditional probability#Non-regular conditional probability. Boris Tsirelson (talk) 17:36, 9 September 2009 (UTC)
I am the guilty party for the erroneous terms in the discrete case. Likewise, I agree the notation is much clearer than just plain . As for the Borel–Kolmogorov paradox, yes, "the relevant sigma field must not be lost sight of", to quote Billingsley. If there is a way to make the statement rigorous that would be great. Thanks all who've worked on these articles. Btyner (talk) 01:08, 10 September 2009 (UTC)
I do not know what was meant initially, but I do know one case that really leads to formulas of this kind. Let Y be a random variable whose distribution is a mix of an absolutely continuous part (that is, having a density) and a discrete part (that is, a finite or countable collection of atoms) (thus, no singular part is allowed), while X — absolutely continuous. Now, if a measurable subset B of the real line contains no atoms of Y and is of positive probability (w.r.t. Y) then only the absolutely continuous part is relevant, and the formula with integrals holds. If, however, B is of zero Lebesgue measure but still of positive probability w.r.t. Y (thus, contains at least one atom) then only the discrete part is relevant, and the formula with a sum over atoms y holds. Note however that Omega will not appear in the formulas; integrals are on the real line (and its subsets). Boris Tsirelson (talk) 08:21, 10 September 2009 (UTC)

(Unindenting for ease) I am unclear about the supposed relation between the various articles but, assuming that "Conditioning (probability)" is aimed at those who work at the level of measure theory, can this one (conditional probability) be aimed at those looking for something simpler? For example in the above formula, can "measure zero" be replaced by "probability zero". The text here is making enough assumptions that the complicated situations Boris Tsirelson refers to cannot occur. I think what is needed here is just a pointer that more complicated cases are discussed in the other articles. Melcombe (talk) 09:18, 10 September 2009 (UTC)

As for me, "Conditioning (probability)" is aimed to connect three levels. Boris Tsirelson (talk) 10:58, 10 September 2009 (UTC)
Here is an instructive example.
First, the conditional probability of the event given that Y belongs to the two-point set (where a>0) is
If you (the reader) believe that it is just , then substitute it into the total probability formula and observe a wrong result for the unconditional probability Or alternatively, think about an infinitesimal interval and the corresponding set (You see, "the relevant sigma field must not be lost sight of", indeed.)
Second, the conditional density of X (under the same condition on Y) is

Boris Tsirelson (talk) 10:51, 10 September 2009 (UTC)

Thus, the naive summation over points of B (without special weights) does not work even under most usual assumptions on the joint distribution. As I like to say it, a point (of a continuum) is not a measurement unit; thus, it is senseless to say that one point is just one half of a two-point set. Sometimes it is two thirds (or another fraction)! Boris Tsirelson (talk) 11:09, 10 September 2009 (UTC)
I don't follow your choice of case to consider, but it seems you are saying that the limit of
as the δyi approach zero, depends on their relationship as they approach zero, which looks correct. But can the formula in the article be saved, sensibly, by restricting B to a single point. Melcombe (talk) 12:50, 10 September 2009 (UTC)
Yes, this is what I say. Yes, for a single point the formula is just a formula from textbooks (rather than someone's original research or error), and can (and should) be saved. With some stipulations about "almost everywhere" if you want it to be rigorous, or without, if this is not the issue. Boris Tsirelson (talk) 14:17, 10 September 2009 (UTC)
Now about my case to consider. I mean that we do not observe a random variable Y but only such a function of it:
Boris Tsirelson (talk) 14:23, 10 September 2009 (UTC)

I have replaced the text with the above conclusion as to what is wanted in the definition section. Unfortunately the overall structure is now poor because there is now something like a "derivation" in the definition section, and then a derivation section. Melcombe (talk) 14:52, 21 September 2009 (UTC)

Replacement for inaccurate bit

Suggested replacement for incorrect text (but I was hoping someone would make a good change themselves).--

For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒX,Y(xy) then, if B has positive measure,

The case where B has zero measure can only be dealt with directly in the case that B={y0}, representing a single point, in which case

It is important to note that if A has measure zero the conditional probability is zero. An indication of why the more general case of zero measure cannot be dealt with in a similar way can be seen by noting that that the limit, as all δyi approach zero, of

depends on their relationship as they approach zero. Melcombe (talk) 12:49, 11 September 2009 (UTC)

Much improved, but the last bit seems a little strange as yi seems to denote separate things on the left and righthand sides. Btyner (talk) 01:08, 12 September 2009 (UTC)
Where is the problem? There are the yi and the δyi, which I think is a fairly usual notation. It isn't δ×yi. The apparent sum over yi might be better as a sum over i ... this chanfge might help. Melcombe (talk) 09:00, 14 September 2009 (UTC)
The LHS looks like a function of the pair (yi, δyi) is being defined, yet the RHS has no dependence on yi due to the summation. Btyner (talk) 00:07, 15 September 2009 (UTC)
The LHS contains "one of", which means the union, and is similar to the summation.Boris Tsirelson (talk) 05:52, 15 September 2009 (UTC)
I didn't use the union symbol mainly because I was too lazy to look up how to do this, but also because I am somewhat against using over-sophisticated maths symbols where something simpler will do, particularly in what can be or should be non-technical articles or sections. We just need to find someything that fits in with the level of the rest of the article. Melcombe (talk) 09:25, 15 September 2009 (UTC)
And here is an interesting experimental fact: out of two readers, one (me) understood your notation, and one (Btyner) did not. Boris Tsirelson (talk) 12:49, 15 September 2009 (UTC)
Indeed, it occurred to me late last night that there might be an implicit union there. So is the open interval from yi to yi + δyi, rather than , which is how I (wrongly) interpreted it the first time. Anyway it seems to me you'd want the left endpoint to be closed not open. Would it be too cumbersome to write
 ? Thanks all, Btyner (talk) 00:35, 16 September 2009 (UTC)
Looks OK. But would it be preferable (more usual?) to use an interval centered on yi? And perhaps an approximation sign rather than an equals? Melcombe (talk) 09:28, 16 September 2009 (UTC)
Definitely prefer . As for the interval, I think uncentered is traditional because it is cleaner for proofs involving the cdf. Btyner (talk) 19:15, 19 September 2009 (UTC)
There still is something wrong: I guess the integral over A has been forgotten. Nijdam (talk) 11:56, 1 January 2010 (UTC)
Yes, it is forgotten! Boris Tsirelson (talk) 13:16, 1 January 2010 (UTC)

I propose adding a link in external references

The proposed link is

It provides examples of calculations with Conditional probability.

Does anyone object? —Preceding unsigned comment added by Kaslanidi (talkcontribs) 20:22, 30 December 2009 (UTC)

I object, per WP:ELNO points 1, 4, and 11. - MrOllie (talk) 20:31, 30 December 2009 (UTC)

I second the objection. OhNoitsJamie Talk 20:41, 30 December 2009 (UTC)


P(A|B) or P(A)?

1. P(A) is conditional when the possible outcomes of the experiment are reduced to event B. This is a reduction of only sample space; it doesn't have to change the probability of A.
2. If event B has no impact on P(A) (A and B are independent), the unconditional and conditional probabilities are the same.

Combining these statements, the following is possible and true:

3. Event B reduces the sample space of event A, but not it's probability. P(A|B) = P(A).
4. P(A) is conditional because of statement 1, but it doesn't need conditional maths, because of statement 2.
Heptalogos (talk) 22:08, 31 December 2009 (UTC)

Yes, this may happen. But I do not see any paradox here. Likewise, 2/4 could differ from 1/2, but is in fact equal to it. Boris Tsirelson (talk) 05:49, 1 January 2010 (UTC)

In the case of option 3, it is paradoxical, because there seems no use or sense in regarding the situation as conditional. Conditional probability is the probability of some event A, given the occurrence of some other event B. But event B does not affect the probability of event A! Heptalogos (talk) 10:24, 1 January 2010 (UTC)

In general it does; in some special cases it does not. Independence is a special case of dependence. Similarly, zero is a special case of a number. Many centuries ago it was sometimes treated as a paradox: a number expresses a quantity of something; there seems no use or sense in quantity of nothing... Boris Tsirelson (talk) 11:01, 1 January 2010 (UTC)
How do you know that in general it does? From the references, Grinstead and Snell speak about an event E as a condition (second line from chapter 4.1). This could be any event, without even changing the sample space. Isn't there any formal definition? What is the source for the statement in this article that when "the possible outcomes of the experiment are reduced to B", the probability is conditional? Heptalogos (talk) 21:59, 1 January 2010 (UTC)
I am astonished by your questions. The formal definition is, of course,
All these words about reducing the sample space, derivation (of the definition!) etc. are rather informal comments/explanations; you do not need them as far as you understand the definition and accept it. How do I know that in general it does? Since (once again), independence is a special case of dependence. Boris Tsirelson (talk) 16:34, 2 January 2010 (UTC)
It's a funny but usefull discussion we are having. Most mathematicians only need their formulas and their understanding of it. They are dealing with maths, not reality. An author of a student book however needs to answer the question what formulas are most usefull. One step further, an encyclopedia should preferably describe the use of a (manmade) concept it describes. Remember, this is not a student book.
The average mathematician obviously doesn't produce his own problems to be solved, nor does he doubt their relevance. The astonishing thing to me is that the introduction of this article presents P(C|A) as a conditional probability which is simply solved by the argument "it is 1/6, since die 2 must land on 5". What's the use of the complex formula further on? After which it continues to show that P(B|A) makes even less sense, using the formula. It is only in the example of the conditional probability fallacy that I find the (?) benefit of the formula. Heptalogos (talk) 23:02, 2 January 2010 (UTC)
If you can give more instructive examples, just do it. Yes, we mathematicians see things somewhat differently. Given a formula, we bother whether it is true (in general), or not. The question when and how it should be used is too informal; theory does not deal with such questions. Experience, intuition, sometimes even talent suggest which formula is worth using in a given situation. Our examples often explain the meaning of the formula rather than its typical use. Boris Tsirelson (talk) 06:43, 3 January 2010 (UTC)
If in a probability space an event B occurs, the sample space is reduced to B and the probability becomes the conditional probability given B. Some events may be independent of B, others not. If A and B are independent, P(A|B)=P(A), if C and B are not independent P(C|B)<>P(C). The reduced probability space "needs' conditional treatment, be it that for events independent of B the conditional probability may be easily calculated, as they are equal to the unconditional. Nijdam (talk) 11:45, 1 January 2010 (UTC)

When events are independent, a reduced probability space does not need conditional treatment, because P(A|B)=P(A). Why should it? What is the common rule? I would like the article to say something about it. Heptalogos (talk) 18:16, 1 January 2010 (UTC)

Do I write in Chinese? Nijdam (talk) 19:11, 1 January 2010 (UTC)

The usefulness of conditional solution

I hope the following is true and clear:

Statement: conditional method is useful solving the probability of a cause (C) given the effect (E), when the effect is unequally distributed over the sample space.

I will use three somewhat similar examples to demonstrate. The first one also explains the conditional method. The similar part of all three:

There are two baskets with fruit. Basket 1 has two apples and an orange. Basket 2 has an apple, an orange and a peach. A basket is picked randomly, from which an item is picked randomly.
  • Example 1, calculating a cause given the effect.

Given that an apple is picked (event E), what is the chance that it comes from basket 1 (event C)?

1. The probability to pick basket 1 is 1/2.
2. The probability to pick an apple from it is 2/3.
3. The probability to pick both basket 1 and the apple is 1/2 * 2/3 = 1/3 = 2/6.
4. The probability to pick basket 2 is 1/2.
5. The probability to pick an apple from it is 1/3.
6. The probability to pick both basket 2 and the apple is 1/2 * 1/3 = 1/6.

Both probabilities to eventually pick an apple are 2/6 for basket 1 and 1/6 for basket 2, together 3/6. So, the probability that the apple was picked from basket 1 is 2/6 / 3/6 = 2/3. In words: the probability that basket 1 provided the picked apple is the probability of both events together (3) divided by the overall probability to pick an apple (3+6).
In formula: P(C|E) = P(C and E) / P(E).

  • Example 2, calculating an effect given the cause.

Given that basket 1 is picked (event C), what is the chance to pick an apple (event E)?

P(E|C) = P(C and E) / P(C) = 2/6 / 1/2 = 2/3.
Since P(C) = 1 (the cause is given) and thus P(C and E) = P(E), P(C and E) / P(C) = P(E)/1 = P(E). Therefore P(E|C) = P(E).
The probability to pick an apple from it is 2/3, as in statement 2, which is unconditional. There's no need for conditional method.

  • Example 3, calculating a cause given the effect, when the effect is equally distributed over the sample space.

Given that an orange is picked (event E), what is the chance that it comes from basket 1 (event C)?

P(C|E) = P(C and E) / P(E) = (1/6) / (1/6 + 1/6) = 1/2.
The orange has the same probability to be picked from every basket, because it's equally distributed over all baskets. This makes the probability the same as the probability of the cause, as in statement 1. Therefore P(C|E) = P(C). There's no need for conditional method. [[Us

You're missing the point: there is no such as a conditional method. In every case you mention, a conditional probability has to be calculated. What you seem to demonstrate is that in some cases this conditional probability is easily determined without using Bayes' formula. Nothing new or surprising. Nijdam (talk) 17:20, 18 January 2010 (UTC)
At least you're not missing the point; thanks for using the right terms. Good to know that this is not surprising to you. Now let's bother about the rest of our customers. First of all I tried to provide an example (1) that would be easy to understand and at the same time explaining the formula. Secondly I try to address the usefulness of what you call Bayes' formula, which usefulness doesn't seem to be a maths thing, but would IMO be so nice on Wikipedia. I've read Bayes' theorem in which the formula P(C|E)=P(C and E)/P(E) is only mentioned somewhere halfway as the definition of conditional probability. I do miss an introduction/ definition/ description of this formula. Heptalogos (talk) 20:16, 18 January 2010 (UTC)


What is the meaning of the following section from the article?

In general it does not make much sense to ask after observation of a remarkable series of events, "What is the probability of this?"; which is a conditional probability based upon observation. The distinction between conditional and unconditional probabilities can be intricate if the observer who asks "What is the probability?" is himself/herself an outcome of a random selection. The name "Wyatt Earp effect" was coined in an article "Der Wyatt Earp Effekt" (in German) showing through several examples its subtlety and impact in various scientific domains.

Nijdam (talk) 13:48, 31 August 2010 (UTC)


Conditional probability is the probability of some event A, given the occurrence of some other event B.

I am thinking of it like this,

"Conditional probability is the probability of some event x being an element of a set A, given that the event x is an element of a set B."

I think of an event as taking a ball from a bag of balls. Or an event might actually be anything. The key thing is that an event is a member of sets.

Am I correct or barking? Thepigdog (talk) 11:46, 2 February 2011 (UTC)

Barking is a strong word... by an event is not an ELEMENT of a set; it IS a set. The elements are called outcomes, so "event x" should be "outcome x". However, you say "some event x", and "some" here suggest to me that it's about a SPECIFIC x, which it isn't. Exactly why the original sentence should be changed is not clear tyo me, and my English is not good enough to be sure if the following woudl be better, but it is how I'd write it:
"The conditional probability of an event A given an event B is the probability that the event A occurs, when it is known that event B occurs."
Introducing the terminology of sets and elements is, I think, confusing to many lay readers.-- (talk) 14:37, 2 February 2011 (UTC)
Yes I see my terminlogy was wrong. Event is the set to which the outcome belongs.
I suggest the following wording,
"The conditional probability of an event A given an event B is the probability that the outcome x occurs in the event A, when it is known that the outcome x is in the event B."
I think this demystifies the language and explains what is really happening. I believe that describing probability in terms of elements (outcomes) makes it easier for a layperson like me to understand.
The probability is then clearly how many of the possible outcomes in are in B.
The terminology P(A|B) is confusing because A|B has no meaning. Only P(A|B) has meaning. This is also confusing for the layperson/student. In a sense all probabilities are conditional because P(A) = P(A|All).
Thepigdog (talk) 03:26, 5 February 2011 (UTC)

Write (A|B) instead of P(A|B)

The conditional probability is written P(A|B), or perhaps p(A|B) or Pr(A|B). Why not omit the P and write (A|B) instead of P(A|B) for the conditional probability, and (A| ) instead of P(A) for the unconditional probability? Like the bra-ket notation of quantum mechanics.

The formula

(A∩B| ) = (A|B)(B| ) = (B|A)(A| )

is easier to read than

P(A∩B) = P(A|B)P(B) = P(B|A)P(A)

Bo Jacoby (talk) 15:03, 6 July 2011 (UTC).

We have to follow established notation. Anything else would clearly be WP:OR. As for the wider question of whether the P actually serves anything, I think maybe it does -- for example, avoiding any confusion with certain inner product notations. Jheald (talk) 11:14, 7 July 2011 (UTC)

Terminology Section

In my view, the section Terminology is unnecessary. Although I can see the logic in that joint and marginal probabilities are part of the definition of conditional probability, this article should not be for explaining these concepts - it should be sufficient to link to the relevant articles. This material stems from very early in the article history when it was stated "This article defines some terms which characterize probability distributions of two or more variables." I deleted the section, however it was reverted, with the reason that its existence was "to avoid entries in see also section". I am of the opposite opinion - if anything, it is more appropriate and concise to link from See Also to the relevant articles. Thoughts? Gnathan87 (talk) 13:58, 4 August 2011 (UTC)

The purpose of the "See also" section is plainly set out at WP:SEEALSO. There needn't be a "terminology" section, but the article needs to say enough about the related concepts mentioned there presently that the article stands on its own. Melcombe (talk) 14:10, 4 August 2011 (UTC)
Apologies, I wasn't very clear on this - my preferred way forward would be to 1. remove the Terminology section as it stands 2. rewrite the article to include links to Joint probability and Marginal probability in a context that more obviously establishes relevance to the topic 3. also include these links from a navbox. While it is desirable to make an article as self sufficient as possible, a balance should be struck between readers' needs and focus. I would argue that many readers will be familiar with these concepts, and for those who are not, it will be more useful to point to the prerequisites. Gnathan87 (talk) 18:36, 4 August 2011 (UTC)

This article has come a long way since 2007; I'm re-rating it as C-class quality. -Bryanrutherford0 (talk) 17:37, 19 July 2013 (UTC)


The lead contains discussion that belongs into the article body, please refer to WP:LEAD. Paradoctor (talk) 00:42, 17 January 2014 (UTC)