WEBVTT
1
00:00:21.980 --> 00:00:27.990
Welcome to Swayam. Course is called Biostatistics and Mathematical Biology. Thank you for choosing
2
00:00:27.990 --> 00:00:33.680
this web course and I welcome you all to this course, and I hope you will have a great time
3
00:00:33.680 --> 00:00:34.960
on attending this course.
4
00:00:34.960 --> 00:00:39.320
First of all, let me tell you why you want to learn this course, what are the benefit
5
00:00:39.320 --> 00:00:45.280
that you are going to get out of this course, or otherwise, why do you want to study the
6
00:00:45.280 --> 00:00:51.370
Biostatistics and Mathematical Biology? There are three main, you know, benefits that you
7
00:00:51.370 --> 00:00:56.329
can accrue by learning Biostatistics and Mathematical Biology, if you are a biology student.
8
00:00:56.329 --> 00:01:01.750
First of all, you can analyze your own data, that is a very very important and infer meaningful
9
00:01:01.750 --> 00:01:07.060
conclusions about your data. Second one is that, this course and learning biostatistics
10
00:01:07.060 --> 00:01:14.910
will help you to read and understand primary research articles that is on your subject,
11
00:01:14.910 --> 00:01:15.910
on your field.
12
00:01:15.910 --> 00:01:20.840
For example, you know, if you are reading a any kind of a scientific article, that is
13
00:01:20.840 --> 00:01:25.750
also called scientific literature, right ? Or or primary research paper or scholarly literature,
14
00:01:25.750 --> 00:01:30.530
a lot of name for that. These are original research articles. So, if you read this kind
15
00:01:30.530 --> 00:01:34.650
of original research article, it’s very difficult to understand that research article,
16
00:01:34.650 --> 00:01:39.979
if you do not understand how to perform this bi… biostatistics via the statistical operation.
17
00:01:39.979 --> 00:01:45.300
So, this course will help you to understand and decipher the primary research article.
18
00:01:45.300 --> 00:01:51.220
And the third important point is that it make informed decisions on your life. For example,
19
00:01:51.220 --> 00:01:57.429
purchasing an annuity or an insurance policy, or to be an informed customer, right , to
20
00:01:57.429 --> 00:02:03.090
buy the things in an informed way or to risk, to understand the risks associated with investing
21
00:02:03.090 --> 00:02:09.069
or mathematical expectancy in lottery or gambling and so on. So, the course will not only help
22
00:02:09.069 --> 00:02:16.580
the only biologist, it’ll help you, it’ll help most of the general reasonings for the everyday life.
23
00:02:16.680 --> 00:02:23.300
There is a very famous quote by Ronald Aylmer Fisher, he is the father of statistics and
24
00:02:23.300 --> 00:02:28.500
his famous statement is that "To call in the statistician after the experiment is done
25
00:02:28.500 --> 00:02:35.430
is no more than asking him to perform a postmortem surgery or examination- he may be able to
27
00:02:35.430 --> 00:02:38.609
tell you, what the particular experiment is died of.”
28
00:02:38.609 --> 00:02:45.049
So, a common strategy for most of the biology students and investigators is to collaborate
29
00:02:45.049 --> 00:02:50.110
with a statistician for analyzing the data, after the data has been generated.
30
00:02:50.110 --> 00:02:56.579
So, that is not the right way. So, my inspiration to start this particular course is to know how
31
00:02:56.579 --> 00:03:01.920
to design an experiment, thus to enable the students, to know how to design an experiment
32
00:03:01.920 --> 00:03:07.569
perfectly and how to analyze the data, right , so that statistics in co… common picture
33
00:03:07.569 --> 00:03:13.569
much before, even during the designing of the statistic, the the scientific experiments.
34
00:03:13.569 --> 00:03:17.959
Duration of this course is fifteen weeks. So, it is three credit course, in fifteen
35
00:03:17.959 --> 00:03:23.200
weeks, each of this particular week we are going to cover two modules. So, that is how
36
00:03:23.200 --> 00:03:27.370
the structure of this particular course. And the level of this course is post graduate.
37
00:03:27.370 --> 00:03:32.000
So, for the graduates students this will be of great help, right ?
38
00:03:32.000 --> 00:03:37.469
So, the transfer of credits across the U G C recognized universities across India, pan
39
00:03:37.469 --> 00:03:43.310
India that is possible. So, you can take this credit as part of your existing ongoing postgraduate
40
00:03:43.310 --> 00:03:46.830
program and you can earn the cre… the credits, from this particular thing because this is
41
00:03:46.830 --> 00:03:53.200
a U G C Swayam platform. So, that is, this the course is absolutely free to take also,
42
00:03:53.200 --> 00:03:56.799
that that many advantages for taking this particular M O O C course.
43
00:03:56.799 --> 00:04:01.969
So, the course may also be taken by anyone, irrespective of the educational background,
44
00:04:01.969 --> 00:04:07.250
for understanding the probability and statistics. So, as part of your lifelong learning, you
45
00:04:07.250 --> 00:04:12.930
can take this course. So, there are no prior requisites for this particular course to take it so.
47
00:04:12.930 --> 00:04:17.180
Let me explain to you about the course. The course in one sentence, I can tell you is
48
00:04:17.180 --> 00:04:23.560
that it is a non-mathematical intuitive introduction, to mathematics and statistics for the biologists.
49
00:04:23.560 --> 00:04:28.180
So, I’m targeting, the target audience is biologist and it's a non-mathematical and
50
00:04:28.180 --> 00:04:34.240
intuitive introduction to the discipline of statistics and mathematical biology.
51
00:04:34.240 --> 00:04:39.430
The target audience as I told you is postgraduate level students of sciences with limited or
52
00:04:39.430 --> 00:04:44.080
no background on mathematics other than, you know, high school mathematics. So, I do not
53
00:04:44.080 --> 00:04:49.280
expect you to have a great background on mathematics. So we will actually start from ground zero.
54
00:04:49.280 --> 00:04:53.890
So take it easy, the course is going to be a lot of fun. So, you know, you can take this
55
00:04:53.890 --> 00:04:58.400
course, there's absolutely no problem in that. So it's suitable for a post graduate level
56
00:04:58.400 --> 00:05:00.650
students in science and medicine.
57
00:05:00.650 --> 00:05:05.660
So, as I'm not as looking for any special mathematical background for you people, I'm
58
00:05:05.660 --> 00:05:10.990
going to make it as simple as possible, right, that the the course is going to be very
59
00:05:10.990 --> 00:05:15.560
simple and very non-technical and it will be great for tho… those students who have
60
00:05:15.560 --> 00:05:19.090
never been exposed to college mathematics, for example.
61
00:05:19.090 --> 00:05:23.570
And the prerequisites of the course is nothing but a a basic mathematical understanding of
62
00:05:23.570 --> 00:05:29.050
the high school mathematics. And a degree in sciences would be beneficial, because most
63
00:05:29.050 --> 00:05:33.340
of the examples that I'm going to cover in this particular course will be from the sciences,
64
00:05:33.340 --> 00:05:36.780
especially from the biology. So the course is targeted to the biology, biological sciences
65
00:05:36.780 --> 00:05:40.040
students or biomedical students or medical students.
66
00:05:40.040 --> 00:05:45.670
Now, coming to the course objectives that are, what are the course objective, you know?
67
00:05:45.670 --> 00:05:51.840
The first objective is to introduce the basic concepts of the probability, statistics and
68
00:05:51.840 --> 00:05:56.630
statistical hypothesis testing for the students of biology. So I'm going to introduce the
69
00:05:56.630 --> 00:06:01.920
basic concepts, so the probability and statistics for the students of biology. Elaborate how
70
00:06:01.920 --> 00:06:06.680
to interpret the statistical results of the published study, so how to actually read and
71
00:06:06.680 --> 00:06:11.750
understand the published literature, you know, the published paper. So, you can actually
72
00:06:11.750 --> 00:06:16.240
go through the published , you know, literature in much more informed manner after taking
73
00:06:16.240 --> 00:06:17.800
this particular course.
74
00:06:17.800 --> 00:06:22.750
Choosing the right statistical tests for the scientific problem and interpretation of the
75
00:06:22.750 --> 00:06:27.810
research. So, you must be having ss… a specific scientific problem at hand. So, how to actually
76
00:06:27.810 --> 00:06:32.570
perform and how to choose the best statistical test for, to suit your requirements?
77
00:06:32.570 --> 00:06:38.800
So, that this particular course will enable you to do exactly that. And it will also sensitize the students
78
00:06:38.800 --> 00:06:43.900
about various statistical pitfalls to avoid, so that will be elaborated as part
79
00:06:43.900 --> 00:06:49.700
of this course. And to provide a brief framework on the mathematical biology for the students
80
00:06:49.700 --> 00:06:55.050
of biology, not the mathematical biology will also be introduced as part of this particular course.
82
00:06:55.050 --> 00:07:00.960
So, the total number of modules of this course will be thirty and per week, each week we
83
00:07:00.960 --> 00:07:06.220
are going to cover two modules, right ? So each module in turn is subdivided into three
84
00:07:06.220 --> 00:07:11.210
sections, so each section will consist of around thirteen minutes video, and around
85
00:07:11.210 --> 00:07:18.820
thousand words, e-text. So, I hope everybody will be actually going through this particular course.
87
00:07:18.820 --> 00:07:24.540
So weekly time commitment will be approximately three hours per week. So each week, I expect
88
00:07:24.540 --> 00:07:29.920
you to commit around three hours, each week, right , so that would mean that total eighty
89
00:07:29.920 --> 00:07:37.190
minutes of video per week, plus eighty minutes of the reading or as well as the problem solving.
90
00:07:37.190 --> 00:07:41.100
So eighty minutes of video, eighty minutes of the problem solving, plus twenty minutes
91
00:07:41.100 --> 00:07:45.600
of the assessment that will be covered off here. So overall time commitment of the entire
92
00:07:45.600 --> 00:07:50.380
course is around forty five hours, which includes twenty hours of video.
93
00:07:50.380 --> 00:07:55.400
So example, week here as you can see here on Sunday, I’ll going to release the module
94
00:07:55.400 --> 00:08:01.290
one with the subsection one the video and e-text, so on Monday I’ll release the ma…
95
00:08:01.290 --> 00:08:06.590
module number one, the same module, section two, and the next day section three.
96
00:08:06.590 --> 00:08:12.490
So on the third day, I'm going to release an ungraded test for this particular. Remember that there
97
00:08:12.490 --> 00:08:17.360
are two types of tests ungraded and graded. So in the case of ungraded, your marks will
98
00:08:17.360 --> 00:08:21.250
not be counted for the final, where the performance of you students.
99
00:08:21.250 --> 00:08:27.700
Now, on the Wednesday, we'll start with the second module, the section one, then section
100
00:08:27.700 --> 00:08:32.539
two, then section three, and then on the same day, we are going to have an ungraded test
101
00:08:32.539 --> 00:08:36.789
for that particular module. And on the Saturday we are going to have the graded test, so that
102
00:08:36.789 --> 00:08:40.570
is how in each week we are going to cover two modules.
103
00:08:40.570 --> 00:08:45.450
So let us go through each module one by one. So in the first week to the f… the week
104
00:08:45.450 --> 00:08:50.920
number five, we are going to cover ten modules, right ? So, in the first week we are going
105
00:08:50.920 --> 00:08:55.759
to cover the module entitled Biostatistics and Mathematical Biology an introduction and
106
00:08:55.759 --> 00:08:59.930
subsequent on the same week we are going to cover the types of studies.
107
00:08:59.930 --> 00:09:05.517
On the second week we are going to cover Levels of Measurements and Summarizing the Data-The Tabular Presentation.
108
00:09:05.517 --> 00:09:09.310
On the third week we are going to cover Summarizing the Data-
109
00:09:09.310 --> 00:09:14.639
Graphical Presentation and Charting with the Excel. And now coming to the fourth week we
110
00:09:14.639 --> 00:09:19.149
are going to cover Descriptive Statistics- Point Estimates and then Interval Estimates.
111
00:09:19.149 --> 00:09:24.689
And on the fifth week, we are going to cover Error Bars, Moments, Normality Test and Outliers.
112
00:09:24.689 --> 00:09:30.350
Now coming to, six to ten week, so on the sixth week we are going to cover Concepts
113
00:09:30.350 --> 00:09:35.360
of Population, Sample, Confidence Interval, and subsequent on the same week Statistical
114
00:09:35.360 --> 00:09:41.380
Hypothesis Testing. And on seventh week we are going to cover Statistical Significance and P-Values
115
00:09:41.380 --> 00:09:46.470
and Relationship between Confidence Intervals and Statistical Significance.
116
00:09:46.470 --> 00:09:51.889
So subsequently, on the next week we are going to cover Statistical Power and Choosing the
117
00:09:51.889 --> 00:09:55.830
right Sample Size, I'm going to elaborate how to choose the best sample size for your
118
00:09:55.830 --> 00:10:01.269
data and subsequently t-Distribution and the test of significance based on the t-distribution
119
00:10:01.269 --> 00:10:02.450
will be elaborated.
120
00:10:02.450 --> 00:10:07.699
Now on the ninth week we are going to cover F-distribution and the test of significance
121
00:10:07.699 --> 00:10:12.339
based on F-distribution, and on the same week we are going to cover Chi squared distribution
122
00:10:12.339 --> 00:10:16.889
and sig… test of significance based on the Chi squared distribution in this particular way.
123
00:10:16.889 --> 00:10:22.019
And finally, on the tenth week, we are going to cover Comparing Proportions, and
124
00:10:22.019 --> 00:10:28.670
on the same week we are also going to cover Gaussian, Binomial, Lognormal and Poisson Distributions.
125
00:10:28.670 --> 00:10:31.980
So, different kinds of distributions will be elaborated on the tenth.
126
00:10:31.980 --> 00:10:36.819
On eleventh week we are going to cover Pearson's Correlation and Simple Linear Regression and
127
00:10:36.819 --> 00:10:42.819
on twelfth week we're going to cover Non-Linear Regression as well as Nonparametric tests.
128
00:10:42.819 --> 00:10:46.879
Then on thirteenth week we are going to cover Permutations and Combinations, and on the
129
00:10:46.879 --> 00:10:52.550
next week we are, on the same week we are also going to cover Probability. On fourteenth
130
00:10:52.550 --> 00:10:57.160
week, we are going to cover Bayes Theorem and Maximum Likelihood. And the same week
131
00:10:57.160 --> 00:11:02.770
we're going to cover Statistics with M S Excel and GraphPad Prism, so the tue... two of this
132
00:11:02.770 --> 00:11:07.890
most important software, we are going to cover comprehensively in this particular course.
133
00:11:07.890 --> 00:11:12.520
And on finally on fifteenth week, the last week of this program, we are going to cover
134
00:11:12.520 --> 00:11:17.209
Key concepts of the statistics. This is kind of a sum-up of the whole course, okay, so
135
00:11:17.209 --> 00:11:21.069
it's a key take away from this course we are going to cover and finally statistical pitfalls
136
00:11:21.069 --> 00:11:25.370
to avoid, what what are what are the main takeaway from this course that we are going
137
00:11:25.370 --> 00:11:27.189
to cover on the fifteenth week.
138
00:11:27.189 --> 00:11:33.130
We will be covering two of the most widely used softwares for Biostatistical analysis.
139
00:11:33.130 --> 00:11:38.230
The first one is Microsoft Excel, well, the second one is called GraphPad Prism.
140
00:11:38.230 --> 00:11:43.300
The version seven is being used for this M O O C . So, let us first see the Microsoft Excel.
141
00:11:43.300 --> 00:11:51.329
I click here, the Microsoft Excel icon. Here you can see four groups- uranium, lead, arsenic, and mercury.
143
00:11:51.329 --> 00:11:56.069
I'll just show you how to perform a commonly used statistical analysis for ANOVA.
144
00:11:56.069 --> 00:12:03.309
I click here data, then I click here data analysis, and I click here ANOVA single factor, you
145
00:12:03.309 --> 00:12:09.230
can see there are two, three types of ANOVA here, two factor with replication, two factor
146
00:12:09.230 --> 00:12:13.280
without replication and single factor. So, I select here single factor. I tell you again,
147
00:12:13.280 --> 00:12:19.730
don't worry, we are going to cover all about ANOVA later in this module, later in this course.
149
00:12:19.730 --> 00:12:24.639
So, I click here first input range and define the input range, which it also includes a label.
150
00:12:24.639 --> 00:12:30.350
Then of course, it contains a label, so I click here the labels in the first row.
151
00:12:30.350 --> 00:12:37.160
I click new worksheet by and click here ‘all okay’. So, to get this particular the results
152
00:12:37.160 --> 00:12:41.720
of the ANOVA single factor, which also shows the P value here, the P value, obtained P
153
00:12:41.720 --> 00:12:47.930
value is three point three six e minus zero six, that means three point three six multiplied
154
00:12:47.930 --> 00:12:54.009
by ten power minus six. So this is the P value. And again, I tell you don't worry about it,
155
00:12:54.009 --> 00:12:59.490
I will teach you how to interpret this P value, but this is how to perform the one way ANOVA
156
00:12:59.490 --> 00:13:00.490
in a nutshell.
157
00:13:00.490 --> 00:13:07.360
Now, let us see the GraphPad Prism. Here, is one example, data sheet here we have a
158
00:13:07.360 --> 00:13:13.050
group A and group B. These are nothing but marks that the students got in the M S T one
159
00:13:13.050 --> 00:13:18.339
and then the M S T two. So these two groups that we will have to check out these two groups
160
00:13:18.339 --> 00:13:20.600
for, you know the column statistics.
161
00:13:20.600 --> 00:13:26.949
So first I just have to click here to highlight the group A and group B. Then I go here on
162
00:13:26.949 --> 00:13:34.269
the top insert, new graph from the existing data, then I click here, the column statistics,
163
00:13:34.269 --> 00:13:40.089
the scatter plot with the bar, so this is the scatter plot the bar or I can also click
164
00:13:40.089 --> 00:13:45.279
here the scatter plot. So I simply click here the scatter plot, I click okay, then we we
165
00:13:45.279 --> 00:13:51.420
have got this scatter plot. So we can see here, M S T one and M S T two each dot represent,
166
00:13:51.420 --> 00:13:56.519
you know, each data element, so data points, that is actually the marks that the students
167
00:13:56.519 --> 00:14:03.029
got, on Y axis is the marks, while X axis says M S T one and M S T two with the middle
168
00:14:03.029 --> 00:14:10.390
line is basically the average while this plus and minus is ninety percentage confidence interval.
170
00:14:10.390 --> 00:14:13.759
Again I told you don't worry, I will actually tell you all about this confidence interval
171
00:14:13.759 --> 00:14:20.300
and how to calculate this particular ninety five percent confidence interval, etcetera.
172
00:14:20.300 --> 00:14:24.329
Course textbook for this course that we are going to follow is this book Intuitive Biostatistics,
173
00:14:24.329 --> 00:14:28.529
which is available on bookstores all around the country or you can even order through
174
00:14:28.529 --> 00:14:33.869
online, so you don't really need this to buy this particular book, we are going to cover
175
00:14:33.869 --> 00:14:37.170
most of the contents of this particular thing and how to perform the operations you know,
176
00:14:37.170 --> 00:14:41.309
as outlined this particular book. So anyway, this is our course textbook that is actually
177
00:14:41.309 --> 00:14:46.380
called Intuitive Biostatistics and Non-mathematical Guide to the Statistical Thinking by Oxford
178
00:14:46.380 --> 00:14:49.929
University Press by Harvey Motulsky.
179
00:14:49.929 --> 00:14:53.819
So assessment, coming to the assessment of this particular course we are going to have
180
00:14:53.819 --> 00:14:59.499
twenty percentage of the total credit, you're going to, you are going to earn from the online
181
00:14:59.499 --> 00:15:04.170
based test, that is actually each week we are going to cover, you know, the graded test.
182
00:15:04.170 --> 00:15:07.550
So from those graded assignments and graded test you're going to earn twenty percentage
183
00:15:07.550 --> 00:15:14.240
of the total score of this particular course and the rest eighty percentage you will be
184
00:15:14.240 --> 00:15:17.470
earning through the proctored examinations in select centers.
185
00:15:17.470 --> 00:15:22.319
So, most probably this will be decided later by the U G C Swayam platform. So, you will
186
00:15:22.319 --> 00:15:27.649
have to go to that particular center and you have to get that examination done on pen and
187
00:15:27.649 --> 00:15:32.410
paper or a computer based that will be decided later on. So this is how eighty percentage
188
00:15:32.410 --> 00:15:37.240
through the proctored test, while twenty percentage will be through the online examination that
189
00:15:37.240 --> 00:15:39.230
you have the total freedom.
190
00:15:39.230 --> 00:15:44.029
Learning outcomes of this particular course are several learning outcomes. First one is to learn the scope
191
00:15:44.029 --> 00:15:48.229
and application of the field of biostatistics and mathematical biology.
192
00:15:48.229 --> 00:15:53.019
Second one is to learn the correct way to interpret the data using the tables as well
193
00:15:53.019 --> 00:15:57.730
as the diagram, so how to interpret the data. The third objective is to learn how to choose
194
00:15:57.730 --> 00:16:03.110
the right test out of the repertoire or the different statistical test for the scientific
195
00:16:03.110 --> 00:16:07.470
problem at hand. So, for your scientific problem how to choose the best statistical test, right?
197
00:16:08.470 --> 00:16:13.179
Now, the fourth objective of this particular or the learning outcome of this particular
198
00:16:13.179 --> 00:16:18.660
course is to learn how to interpret the statistical results of the published scientific study.
199
00:16:18.660 --> 00:16:24.079
So, how to interpret that particular data? Fifth, learning outcome is to learn how to
200
00:16:24.079 --> 00:16:29.869
perform the commonly used descriptive and inferential statistical tests, all the scientific
201
00:16:29.869 --> 00:16:35.670
data and interpretation of that particular data, how to interpret that data perfectly fine.
203
00:16:35.670 --> 00:16:41.309
And finally, to learn how to perform commonly used statistical tests on online and using
204
00:16:41.309 --> 00:16:47.350
MS Excel and also to learn about the statistical pitfalls to avoid, so le… several learning
205
00:16:47.350 --> 00:16:53.220
outcomes of this particular course and remember this course is going to be as non-mathematical
206
00:16:53.220 --> 00:16:58.009
and as non-technical as possible, and it's going to be a cool course and that there is
207
00:16:58.009 --> 00:17:03.059
absolutely no problem associated with this course. Other than a lot of fun, no prerequisites
208
00:17:03.059 --> 00:17:07.829
that I'm actually looking for this particular course and again, once again, thank you for
209
00:17:07.829 --> 00:17:13.449
choosing the course and a warm welcome. Course and I I I suggest you to interact with other
210
00:17:13.449 --> 00:17:15.589
students and meet through the discussion forums.