Back to Community
Quantopian Lecture Series: Hypothesis Testing

Rigorous statistics is the underpinning of any quantitative finance, and the underpinning of rigorous statistics is careful formulation and testing of hypotheses. In this notebook we introduce the concept of testing a hypothesis.

Hypothesis testing is incredibly important because it allows you to decide whether certain property of the market is true within a set confidence amount. Without this you are basically just guessing without any sense of how likely it is you're wrong.

All lectures can be found here: https://www.quantopian.com/lectures

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

8 responses

As always, thanks for the lecture. I noticed a couple of things that should be revised:

  • In the section, "Hypothesis Testing on Means", when describing the t-distribution, it should read, "It has fatter tails and a lower peak, giving more flexibility compared to a normal distribution." (In the lecture, it says that the t-distribution has a higher peak than the normal distribution).

  • In the example on "Hypothesis Testing on Variances" (in the last cell for that section), the lecture should read, "Because we are using the 'less than or equal to' formulation of a one-sided hypothesis test, we reject the null hypothesis if our test statistic is greater than the critical value." (Lecture states that we reject the null if our test statistic is less than the critical value).

Hope this helps. Feel free to let me know if I am mistaken.

Mike

Thanks, Mike! Good catches. I will correct the lecture.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi, you mentioned that "p -values as the "probability that the null hypothesis is false". This is actually a wrong interpretation. In fact, most people misinterpret p-value. The right way to interpret is that 'assuming that the null hypothesis is true, what is the probably that we will see the observed difference". So, it doesn't say whether the null hypothesis is false or true. It already assumes it is true.

Hey there, Seine.

You are completely correct, in fact the full passage in the lecture reads: "Often people will interpret p-values as the "probability that the null hypothesis is false", but this is misleading. A p-value only makes sense when compared to the significance value. If a p-value is less than α , we reject the null and otherwise we do not. Lower p-values do not make something "more statistically significant"."

If this is not the passage you were referring to or if you believe the lecture is still incorrect please let me know. We want to be sure that there is nothing misleading in the lecture as we understand how subtle p-value interpretations can be.

I was reviewing this material as well and really great format, the Python notebook. Allows for this kind of exchange we're having where we can share and reproduce results.

My interpretation of a p-value agrees with Seine. And not to get to nitpicky, but I don't see a p-value as binary. The result of the test is binary, but the p-value is a probability measure (as defined above). Moreover, the usefulness of the p-value is that it only depends on the sample statistic and then anybody can bring whatever significance level they have in mind. So a p-value depends on the assumed underlying distribution (null hypothesis) and then on the sample statistic we obtained. And then you can either believe in or discard the null hypothesis depending on what level of significance you require.

I am currently reading the Hypothesis Testing lecture and I found two lines seemingly contradictory. Excerpt below. I have used bold font to highlight the lines in question

The Null and Alternative Hypothesis
The first thing we need to introduce is the null hypothesis, commonly written as H0 . The null hypothesis is the default case, generally reflecting the current common conception of the world. The alternative hypothesis is the one you are testing.

First we state the hypothesis that we wish to test. We do this by identifying a null hypothesis and an alternative hypothesis. The null hypothesis, H0 , is the one that we want to test, while the alternative hypothesis, HA , is the hypothesis that is accepted in the case where H0 is rejected.

I wanted to add one more excerpt of something that appears contradictory

For example, we might estimate a sample mean as 100 , with a confidence interval of 90,110 at a 95% confidence level. This doesn't mean that the true population mean is between 90 and 110 with 95% probability

For example, if our 99% confidence interval for the mean of MSFT returns was (−0.0020,0.0023) , that would mean that there was a 99% chance that the true value of the mean was within that interval.

If you are searching for the best painters in calgary for your walls. Then you must try us we are the best One day painting calgary service providers. We complete our work without any delay and also not interrupt your schedules.

"For example, we might estimate a sample mean as 100, with a confidence interval of 90,110 at a 95% confidence level. This doesn't mean that the true population mean is between 90 and 110 with 95% probability, as the true mean is a fixed value and the probability is 100% or 0% but we don't know which one. Instead what this means is that over many computations of a 95% confidence interval assuming underlying assumptions about distributions hold, the population mean will be in the interval 95% of the time."

How is the first part of this statement different from the second part?