Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.
Visit www.dummies.com/cheatsheet/biostatistics to view this book's cheat sheet.
Table of Contents
Introduction
About This Book
Conventions Used in This Book
What You’re Not to Read
Foolish Assumptions
How This Book Is Organized
Part I: Beginning with Biostatistics Basics
Part II: Getting Down and Dirty with Data
Part III: Comparing Groups
Part IV: Looking for Relationships with Correlation and Regression
Part V: Analyzing Survival Data
Part VI: The Part of Tens
Icons Used in This Book
Where to Go from Here
Part I: Beginning with Biostatistics Basics
Chapter 1: Biostatistics 101
Brushing Up on Math and Stats Basics
Doing Calculations with the Greatest of Ease
Concentrating on Clinical Research
Drawing Conclusions from Your Data
Statistical estimation theory
Statistical decision theory
A Matter of Life and Death: Working with Survival Data
Figuring Out How Many Subjects You Need
Getting to Know Statistical Distributions
Chapter 2: Overcoming Mathophobia: Reading and Understanding Mathematical Expressions
Breaking Down the Basics of Mathematical Formulas
Displaying formulas in different ways
Checking out the building blocks of formulas
Focusing on Operations Found in Formulas
Basic mathematical operations
Powers, roots, and logarithms
Factorials and absolute values
Functions
Simple and complicated formulas
Equations
Counting on Collections of Numbers
One-dimensional arrays
Higher-dimensional arrays
Arrays in formulas
Sums and products of the elements of an array
Chapter 3: Getting Statistical: A Short Review of Basic Statistics
Taking a Chance on Probability
Thinking of probability as a number
Following a few basic rules
Comparing odds versus probability
Some Random Thoughts about Randomness
Picking Samples from Populations
Recognizing that sampling isn’t perfect
Digging into probability distributions
Introducing Statistical Inference
Statistical estimation theory
Statistical decision theory
Homing In on Hypothesis Testing
Getting the language down
Testing for significance
Understanding the meaning of “p value” as the result of a test
Examining Type I and Type II errors
Grasping the power of a test
Going Outside the Norm with Nonparametric Statistics
Chapter 4: Counting on Statistical Software
Desk Job: Personal Computer Software
Checking out commercial software
Focusing on free software
On the Go: Calculators and Mobile Devices
Scientific and programmable calculators
Mobile devices
Gone Surfin’: Web-Based Software
On Paper: Printed Calculators
Chapter 5: Conducting Clinical Research
Designing a Clinical Study
Identifying aims, objectives, hypotheses, and variables
Deciding who will be in the study
Choosing the structure of the study
Using randomization
Selecting the analyses to use
Defining analytical populations
Determining how many subjects to enroll
Putting together the protocol
Carrying Out a Clinical Study
Protecting your subjects
Collecting and validating data
Analyzing Your Data
Dealing with missing data
Handling multiplicity
Incorporating interim analyses
Chapter 6: Looking at Clinical Trials and Drug Development
Not Ready for Human Consumption: Doing Preclinical Studies
Testing on People during Clinical Trials to Check a Drug’s Safety and Efficacy
Phase I: Determining the maximum tolerated dose
Phase II: Finding out about the drug’s performance
Phase III: Proving that the drug works
Phase IV: Keeping an eye on the marketed drug
Holding Other Kinds of Clinical Trials
Pharmacokinetics and pharmacodynamics (PK/PD studies)
Bioequivalence studies
Thorough QT studies
Part II: Getting Down and Dirty with Data
Chapter 7: Getting Your Data into the Computer
Looking at Levels of Measurement
Classifying and Recording Different Kinds of Data
Dealing with free-text data
Assigning subject identification (ID) numbers
Organizing name and address data
Collecting categorical data
Recording numerical data
Entering date and time data
Checking Your Entered Data for Errors
Creating a File that Describes Your Data File
Chapter 8: Summarizing and Graphing Your Data
Summarizing and Graphing Categorical Data
Summarizing Numerical Data
Locating the center of your data
Describing the spread of your data
Showing the symmetry and shape of the distribution
Structuring Numerical Summaries into Descriptive Tables
Graphing Numerical Data
Showing the distribution with histograms
Summarizing grouped data with bars, boxes, and whiskers
Depicting the relationships between numerical variables with other graphs
Chapter 9: Aiming for Accuracy and Precision
Beginning with the Basics of Accuracy and Precision
Getting to know sample statistics and population parameters
Understanding accuracy and precision in terms of the sampling distribution
Thinking of measurement as a kind of sampling
Expressing errors in terms of accuracy and precision
Improving Accuracy and Precision
Enhancing sampling accuracy
Getting more accurate measurements
Improving sampling precision
Increasing the precision of your measurements
Calculating Standard Errors for Different Sample Statistics
A mean
A proportion
Event counts and rates
A regression coefficient
Chapter 10: Having Confidence in Your Results
Feeling Confident about Confidence Interval Basics
Defining confidence intervals
Looking at confidence levels
Taking sides with confidence intervals
Calculating Confidence Intervals
Before you begin: Formulas for confidence limits in large samples
The confidence interval around a mean
The confidence interval around a proportion
The confidence interval around an event count or rate
The confidence interval around a regression coefficient
Relating Confidence Intervals and Significance Testing
Chapter 11: Fuzzy In Equals Fuzzy Out: Pushing Imprecision through a Formula
Understanding the Concept of Error Propagation
Using Simple Error Propagation Formulas for Simple Expressions
Adding or subtracting a constant doesn’t change the SE
Multiplying (or dividing) by a constant multiplies (or divides) the SE by the same amount
For sums and differences: Add the squares of SEs together
For averages: The square root law takes over
For products and ratios: Squares of relative SEs are added together
For powers and roots: Multiply the relative SE by the power
Handling More Complicated Expressions
Using the simple rules consecutively
Checking out an online calculator
Simulating error propagation — easy, accurate, and versatile
Part III: Comparing Groups
Chapter 12: Comparing Average Values between Groups
Knowing That Different Situations Need Different Tests
Comparing the mean of a group of numbers to a hypothesized value
Comparing two groups of numbers
Comparing three or more groups of numbers
Analyzing data grouped on several different variables
Adjusting for a “nuisance variable” when comparing numbers
Comparing sets of matched numbers
Comparing within-group changes between groups
Trying the Tests Used for Comparing Averages
Surveying Student t tests
Assessing the ANOVA
Running Student t tests and ANOVAs from summary data
Running nonparametric tests
Estimating the Sample Size You Need for Comparing Averages
Simple formulas
Software and web pages
A sample-size nomogram
Chapter 13: Comparing Proportions and Analyzing Cross-Tabulations
Examining Two Variables with the Pearson Chi-Square Test
Understanding how the chi-square test works
Pointing out the pros and cons of the chi-square test
Modifying the chi-square test: The Yates continuity correction
Focusing on the Fisher Exact Test
Understanding how the Fisher Exact test works
Noting the pros and cons of the Fisher Exact test
Analyzing Ordinal Categorical Data with the Kendall Test
Studying Stratified Data with the Mantel-Haenszel Chi-Square Test
Chapter 14: Taking a Closer Look at Fourfold Tables
Focusing on the Fundamentals of Fourfold Tables
Choosing the Right Sampling Strategy
Producing Fourfold Tables in a Variety of Situations
Describing the association between two binary variables
Assessing risk factors
Evaluating diagnostic procedures
Investigating treatments
Looking at inter- and intra-rater reliability
Chapter 15: Analyzing Incidence and Prevalence Rates in Epidemiologic Data
Understanding Incidence and Prevalence
Prevalence: The fraction of a population with a particular condition
Incidence: Counting new cases
Understanding how incidence and prevalence are related
Analyzing Incidence Rates
Expressing the precision of an incidence rate
Comparing incidences with the rate ratio
Calculating confidence intervals for a rate ratio
Comparing two event rates
Comparing two event counts with identical exposure
Estimating the Required Sample Size
Chapter 16: Feeling Noninferior (Or Equivalent)
Understanding the Absence of an Effect
Defining the effect size: How different are the groups?
Defining an important effect size: How close is close enough?
Recognizing effects: Can you spot a difference if there really is one?
Proving Equivalence and Noninferiority
Using significance tests
Using confidence intervals
Some precautions about noninferiority testing
Part IV: Looking for Relationships with Correlation and Regression
Chapter 17: Introducing Correlation and Regression
Correlation: How Strongly Are Two Variables Associated?
Lining up the Pearson correlation coefficient
Analyzing correlation coefficients
Regression: What Equation Connects the Variables?
Understanding the purpose of regression analysis
Talking about terminology and mathematical notation
Classifying different kinds of regression
Chapter 18: Getting Straight Talk on Straight-Line Regression
Knowing When to Use Straight-Line Regression
Understanding the Basics of Straight-Line Regression
Running a Straight-Line Regression
Taking a few basic steps
Walking through an example
Interpreting the Output of Straight-Line Regression
Seeing what you told the program to do
Looking at residuals
Making your way through the regression table
Wrapping up with measures of goodness-of-fit
Scientific fortune-telling with the prediction formula
Recognizing What Can Go Wrong with Straight-Line Regression
Figuring Out the Sample Size You Need
Chapter 19: More of a Good Thing: Multiple Regression
Understanding the Basics of Multiple Regression
Defining a few important terms
Knowing when to use multiple regression
Being aware of how the calculations work
Running Multiple Regression Software
Preparing categorical variables
Recoding categorical variables as numerical
Creating scatter plots before you jump into your multiple regression
Taking a few steps with your software
Interpreting the Output of a Multiple Regression
Examining typical output from most programs
Checking out optional output available from some programs
Deciding whether your data is suitable for regression analysis
Determining how well the model fits the data
Watching Out for Special Situations that Arise in Multiple Regression
Synergy and anti-synergy
Collinearity and the mystery of the disappearing significance
Figuring How Many Subjects You Need
Chapter 20: A Yes-or-No Proposition: Logistic Regression
Using Logistic Regression
Understanding the Basics of Logistic Regression
Gathering and graphing your data
Fitting a function with an S shape to your data
Handling multiple predictors in your logistic model
Running a Logistic Regression with Software
Interpreting the Output of Logistic Regression
Seeing summary information about the variables
Assessing the adequacy of the model
Checking out the table of regression coefficients
Predicting probabilities with the fitted logistic formula
Making yes or no predictions
Heads Up: Knowing What Can Go Wrong with Logistic Regression
Don’t fit a logistic function to nonlogistic data
Watch out for collinearity and disappearing significance
Check for inadvertent reverse-coding of the outcome variable
Don’t misinterpret odds ratios for numerical predictors
Don’t misinterpret odds ratios for categorical predictors
Beware the complete separation problem
Figuring Out the Sample Size You Need for Logistic Regression
Chapter 21: Other Useful Kinds of Regression
Analyzing Counts and Rates with Poisson Regression
Introducing the generalized linear model
Running a Poisson regression
Interpreting the Poisson regression output
Discovering other things that Poisson regression can do
Anything Goes with Nonlinear Regression
Distinguishing nonlinear regression from other kinds
Checking out an example from drug research
Running a nonlinear regression
Interpreting the output
Using equivalent functions to fit the parameters you really want
Smoothing Nonparametric Data with LOWESS
Running LOWESS
Adjusting the amount of smoothing
Part V: Analyzing Survival Data
Chapter 22: Summarizing and Graphing Survival Data
Understanding the Basics of Survival Data
Knowing that survival times are intervals
Recognizing that survival times aren’t normally distributed
Considering censoring
Looking at the Life-Table Method
Making a life table
Interpreting a life table
Graphing hazard rates and survival probabilities from a life table
Digging Deeper with the Kaplan-Meier Method
Heeding a Few Guidelines for Life Tables and the Kaplan-Meier Method
Recording survival times the right way
Recording censoring information correctly
Interpreting those strange-looking survival curves
Doing Even More with Survival Data
Chapter 23: Comparing Survival Times
Comparing Survival between Two Groups with the Log-Rank Test
Understanding what the log-rank test is doing
Running the log-rank test on software
Looking at the calculations
Assessing the assumptions
Considering More Complicated Comparisons
Coming Up with the Sample Size Needed for Survival Comparisons
Chapter 24: Survival Regression
Knowing When to Use Survival Regression
Explaining the Concepts behind Survival Regression
The steps of Cox PH regression
Hazard ratios
Running a Survival Regression
Interpreting the Output of a Survival Regression
Testing the validity of the assumptions
Checking out the table of regression coefficients
Homing in on hazard ratios and their confidence intervals
Assessing goodness-of-fit and predictive ability of the model
Focusing on baseline survival and hazard functions
How Long Have I Got, Doc? Constructing Prognosis Curves
Running the proportional-hazards regression
Finding h
Estimating the Required Sample Size for a Survival Regression
Part VI: The Part of Tens
Chapter 25: Ten Distributions Worth Knowing
The Uniform Distribution
The Normal Distribution
The Log-Normal Distribution
The Binomial Distribution
The Poisson Distribution
The Exponential Distribution
The Weibull Distribution
The Student t Distribution
The Chi-Square Distribution
The Fisher F Distribution
Chapter 26: Ten Easy Ways to Estimate How Many Subjects You Need
Comparing Means between Two Groups
Comparing Means among Three, Four, or Five Groups
Comparing Paired Values
Comparing Proportions between Two Groups
Testing for a Significant Correlation
Comparing Survival between Two Groups
Scaling from 80 Percent to Some Other Power
Scaling from 0.05 to Some Other Alpha Level
Making Adjustments for Unequal Group Sizes
Allowing for Attrition
Biostatistics For Dummies®
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2013 by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, the Wiley logo, For Dummies, the Dummies Man logo, A Reference for the Rest of Us!, The Dummies Way, Dummies Daily, The Fun and Easy Way, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc., and/or its affiliates in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002.
For technical support, please visit www.wiley.com/techsupport.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2013936422
ISBN 978-1-118-55398-5 (pbk); ISBN 978-1-118-55395-4 (ebk); ISBN 978-1-118-55396-1 (ebk); ISBN 978-1-118-55399-2 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
About the Author
John C. Pezzullo, PhD, is an adjunct associate professor at Georgetown University. He has had a half-century of experience supporting researchers in the physical, biological, and social sciences. For more than 25 years, he led a dual life at Rhode Island Hospital as an information technology programmer/analyst (and later director) while also providing statistical and other technical support to biological and clinical researchers at the hospital. He then joined the faculty at Georgetown University as informatics director of the National Institute of Child Health and Human Development's Perinatology Research Branch. He has held faculty appointments in the departments of obstetrics and gynecology, biomathematics and biostatistics, pharmacology, nursing, and internal medicine. He is now semi-retired and living in Florida, but he still teaches biostatistics and clinical trial design to Georgetown students over the Internet. He created the StatPages.info
website, which provides online statistical calculating capability and other statistics-related resources.
Dedication
To my wife, Betty: Without your steadfast support and encouragement, I would never have been able to complete this book. To Mom and Dad, who made it all possible. And to our kids, our grandkids, and our great-grandkids!
Author’s Acknowledgments
My heartfelt thanks to Matt Wagner of Fresh Books, Inc., and to Lindsay Lefevere for the opportunity to write this book; to Tonya Cupp, my special editor, who tutored me in the “Wiley ways” during the first quarter of the chapter-writing phase of the project; to Georgette Beatty, my project editor, who kept me on the path and on target (and mostly on time) throughout the process; to Christy Pingleton, the copy editor, for making sure what I said was intelligible; and to William Miller and Donatello Telesca, the technical reviewers, for making sure that what I said was correct.
Special thanks to Darrell Abernethy for his invaluable suggestions in Chapter 6.
And a special word of appreciation to all my family and friends, who provided so much support and encouragement throughout the whole project.
Publisher’s Acknowledgments
We're proud of this book; please send us your comments at http://dummies.custhelp.com
. For other comments, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002.
Some of the people who helped bring this book to market include the following:
Acquisitions, Editorial, and Vertical Websites
Senior Project Editor: Georgette Beatty
Executive Editor: Lindsay Sandman Lefevere
Copy Editor: Christine Pingleton
Assistant Editor: David Lutton
Editorial Program Coordinator: Joe Niesen
Technical Editors: Dr. William G. Miller, Donatello Telesca
Editorial Manager: Michelle Hacker
Editorial Assistant: Alexa Koschier
Cover Photo: Test tubes © Mike Kemp/jupiterimages; Graph courtesy of John Pezzullo
Composition Services
Project Coordinator: Sheree Montgomery
Layout and Graphics: Carrie A. Cesavice
Proofreaders: Debbye Butler, John Greenough
Indexer: Ty Koontz
Special Help
Tonya Cupp, Sarah Faulkner, Todd Lothery, Danielle Voirol
Publishing and Editorial for Consumer Dummies
Kathleen Nebenhaus, Vice President and Executive Publisher
David Palmer, Associate Publisher
Kristin Ferguson-Wagstaffe, Product Development Director
Publishing for Technology Dummies
Andy Cummings, Vice President and Publisher
Composition Services
Debbie Stailey, Director of Composition Services
Introduction
Biostatistics is the practical application of statistical concepts and techniques to topics in biology. Because biology is such a broad field — studying all forms of life from viruses to trees to fleas to mice to people — biostatistics covers a very wide area, including designing biological experiments, safely conducting research on human beings, collecting and verifying data from those studies, summarizing and displaying that data, and analyzing the data to draw meaningful conclusions from it.
No book of reasonable size can hope to span all the subspecialties of biostatistics, including molecular biology, genetics, agricultural studies, animal research (in the lab and in the wild), clinical trials on humans, and epidemiological research. So I've concentrated on the most widely applicable topics and on the topics that are most relevant to research on humans (that is, clinical research). I chose these topics on the basis of a survey of graduate-level biostatistics curricula from major universities. I hope it covers most of the topics you're most interested in; but if it doesn't, please tell me what you wish I had included. You can e-mail me at jcp12345@gmail.com
, and I'll try to respond to your message.
About This Book
I wrote this book as a reference — something you go to when you want information about a particular topic. So you don’t have to read it from beginning to end; you can jump directly to the part you’re interested in. In fact, I hope you’ll be inclined to pick it up from time to time, open it to a page at random, read a page or two, and get a little something useful from it.
This book generally doesn’t show you the detailed steps to perform every statistical calculation by hand. That may have been necessary in the mid-1900s, when statistics students spent hours in a “computing lab” (that is, a room that had an adding machine in it) calculating a correlation coefficient, but nowadays computers do all the computing. (See Chapter 4 for advice on choosing statistical software.) When describing statistical tests, my focus is always on the concepts behind the method, how to prepare your data for analysis, and how to interpret the results. I keep mathematical formulas and derivations to a minimum in this book; I include them only when they help explain what’s going on. If you really want to see them, you can find them in many biostatistics textbooks, and they’re readily available online.
Because good experimental design is crucial for the success of any research, this book gives special attention to the design of clinical trials and, specifically, to calculating the number of subjects you need to study. You find easy-to-apply examples of sample-size calculations in the chapters describing significance tests in Parts III, IV, and V and in Chapter 26.
Conventions Used in This Book
Here are some typographic conventions I use throughout this book:
When I introduce a new term, I put the term in italics and define it. I also use italics occasionally to emphasize important information.
In bulleted lists, I often place the most important word or phrase of each bulleted item in boldface text. The action parts of numbered steps are also boldface.
I show web links (URLs) as
monotype
text.
When this book was printed, some web addresses may have needed to break across two lines of text. If that happened, rest assured that I haven’t put in any extra characters (like hyphens) to indicate the break. So, when using one of these web addresses, just type in exactly what you see in this book, pretending as though the line break doesn’t exist.
Whenever you see the abbreviation sd or SD, it always refers to the standard deviation.
Anytime you see the word significant in reference to a p value, it means p ≤ 0.05.
When you see the lowercase italicized letter e in a formula, it refers to the mathematical constant 2.718..., which I describe in Chapter 2. (On the very rare occasions that it stands for something else, I say so.)
I alternate between using male and female pronouns (instead of saying “he or she,” “him or her,” and so on) throughout the book. No gender preference is intended.
What You’re Not to Read
Although I try to keep technical (that is, mathematical) details to a minimum, I do include them occasionally. The more complicated ones are marked by a Technical Stuff icon. You can skip over these paragraphs, and it won’t prevent you from understanding the rest of the material. You can also skip over anything that’s in a sidebar (text that resides in a box). Sidebars contain nonessential but interesting stuff, like historical trivia and other “asides.”
Foolish Assumptions
I wrote this book to help several kinds of people, and I assume you fall into one of the following categories:
Students at the undergraduate or graduate level who are taking a course in biostatistics and want help with the topics they’re studying in class
People who have had no formal biostatistical training (perhaps no statistical training at all) but find themselves having to deal with data from biological or clinical studies as part of their job
Doctors, nurses, and other healthcare professionals who want to carry out clinical research
If you’re interested in biostatistics, then you’re no dummy. But I bet you sometimes feel like a dummy when it comes to biostatistics, or statistics in general, or mathematics. Don’t feel bad — I’ve felt that way many times over the years, and still feel like that whenever I’m propelled into an area of biostatistics I haven’t encountered before. (If you haven’t taken a basic statistics course yet, you may want to get Statistics For Dummies by Deborah J. Rumsey, PhD — published by Wiley — and read parts of that book first.)
The important thing to keep in mind is that you don’t have to be a math genius to be a good biological or clinical scientist — one who can intelligently design experiments, execute them well, collect and analyze data properly, and draw valid conclusions. You just have to have a good grasp of the basic concepts and know how to utilize the sophisticated statistical software that has become so widely available.
How This Book Is Organized
I’ve divided this book into six parts, and each part contains several chapters. The following sections describe what you find in each part.
Part I: Beginning with Biostatistics Basics
This part can be thought of as providing preparation and context for the remainder of this book. Here, I bring you up to speed on math and statistics concepts so that you’re comfortable with them throughout this book. Then I provide advice on selecting statistical software. And finally I describe one major setting in which biostatistics is utilized — clinical research.
Part II: Getting Down and Dirty with Data
This part focuses on the raw material that biostatistical analysis works with — data. You probably already know the two main types of data: numerical (or quantitative) data, such as ages and heights, and non-numerical data, such as names and genders. Part II gets into the more subtle (but very important) distinctions between different data types.
You discover how to collect data properly, how to summarize it concisely and display it as tables and graphs, and how to describe the quality of the data (its precision and the uncertainties associated with your measured values). And you find out how the precision of your raw data affects the precision of other things you calculate from that data.
Part III: Comparing Groups
This part describes some of the most common statistical analyses you carry out on your data — comparing variables between groups. You discover how to answer questions like these: Does an arthritis medication reduce joint pain more than a placebo? Does a history of diabetes in a parent predict the likelihood of diabetes in the child? And if so, by how much?
You also find out how to show that there’s no meaningful difference between two groups. Is a generic drug really equivalent to the name brand? Does a new drug not interfere with normal heart rhythm? This endeavor entails more than just not proving that there is a difference — absence of proof is not proof of absence, and there are special ways to prove that there’s no important difference in your data.
Throughout this part, I discuss common statistical techniques for comparing groups such as t tests, ANOVAs, chi-square tests, and the Fisher Exact test.
Part IV: Looking for Relationships with Correlation and Regression
This part takes you through the very broad field of regression analysis — studying the relationships that can exist between variables. You find out how to test for a significant association between two or more variables and how to express that relationship in terms of a formula or equation that predicts the likely value of one variable from the observed values of one or more other variables. You see how useful such an equation can be, both for understanding the underlying science and for doing all kinds of practical things based on that relationship.
After reviewing the simple straight-line and multiple linear regression techniques you probably encountered in a basic stats course, you discover how to handle the more advanced problems that occur in the real world of biological research — logistic regression for analyzing yes-or-no kinds of outcomes, like “had a miscarriage”; Poisson regression for analyzing the frequency of recurring events, such as the number of hospitalizations for emphysema patients; and nonlinear regression when the relationship between the variables can take on a complicated mathematical form.
Part V: Analyzing Survival Data
This part is devoted to the analysis of one very special and important kind of data in biological research — survival time (or, more generally, the time to the first occurrence of some particular kind of event). You see what makes this type of data so special and why special methods are needed to deal with it correctly. You see how to calculate survival curves, test for a significant difference in survival between two or more groups of subjects, and apply the powerful and general methods of regression analysis to survival data.
Part VI: The Part of Tens
The final two chapters of this book provide “top-ten lists” of handy information and rules that you’ll probably refer to often. Chapter 25 describes ten of the most common statistical distribution functions that you encounter in biostatistical research. Some of these distributions describe how your observed data values are likely to fluctuate, and some are used primarily in conjunction with the common significance tests (t-tests, chi-square tests, and ANOVAs). Chapter 26 contains a set of handy rules of thumb you can use to get quick estimates of the number of subjects you need to study in order to have a good chance of obtaining significant results.
Icons Used in This Book
Icons (the little drawings in the margins of this book) are used to draw your attention to certain kinds of material. Here’s what they mean:
Where to Go from Here
You’re already off to a good start — you’ve read this introduction, so you have a good idea of what this book is all about (at least what the major parts of the book are all about). For an even better idea of what’s in it, take a look at the Contents at a Glance — this drills down into each part, and shows you what each chapter is all about. Finally, skim through the full-blown table of contents, which drills further down into each chapter, showing you the sections and subsections of that chapter.
If you want to get the big picture of what biostatistics encompasses (at least those parts of biostatistics covered in this book), then read Chapter 1. This is a top-level overview of the basic concepts that make up this entire book. Here are a few other special places you may want to jump into:
If you’re uncomfortable with mathematical notation, then Chapter 2 is the place to start.
If you want a quick refresher on basic statistics (the kind of stuff that would be taught in a Stats 101 course), then read Chapter 3.
You can get an introduction to clinical research in Chapters 5 and 6.
If you want to know about collecting, summarizing, and graphing data, jump to Part II.
If you need to know about working with survival data, you can go right to Part V.
If you’re puzzled about some particular statistical distribution function, then look at Chapter 25.
And if you need to do some quick sample-size estimates, turn to Chapter 26.
Part I
Beginning with Biostatistics Basics
In this part . . .
Get comfortable with mathematical notation that uses numbers, special constants, variables, and mathematical symbols — a must for all you mathophobes.
Review basic statistical concepts — such as probability, randomness, populations, samples, statistical inference, and more — to get ready for the study of biostatistics.
Choose and acquire statistical software (both commercial and free), and discover other ways to do statistical calculations, such as calculators, mobile devices, and web-based programs.
Understand clinical research — how biostatistics influences the design and execution of clinical trials and how treatments are developed and approved.