EULA

Table of Contents

Introduction

About This Book

Conventions Used in This Book

What You’re Not to Read

Foolish Assumptions

How This Book Is Organized

Part I: Beginning with Biostatistics Basics

Part II: Getting Down and Dirty with Data

Part III: Comparing Groups

Part IV: Looking for Relationships with Correlation and Regression

Part V: Analyzing Survival Data

Part VI: The Part of Tens

Icons Used in This Book

Where to Go from Here

Part I: Beginning with Biostatistics Basics

Chapter 1: Biostatistics 101

Brushing Up on Math and Stats Basics

Doing Calculations with the Greatest of Ease

Concentrating on Clinical Research

Drawing Conclusions from Your Data

Statistical estimation theory

Statistical decision theory

A Matter of Life and Death: Working with Survival Data

Figuring Out How Many Subjects You Need

Getting to Know Statistical Distributions

Chapter 2: Overcoming Mathophobia: Reading and Understanding Mathematical Expressions

Breaking Down the Basics of Mathematical Formulas

Displaying formulas in different ways

Checking out the building blocks of formulas

Focusing on Operations Found in Formulas

Basic mathematical operations

Powers, roots, and logarithms

Factorials and absolute values

Functions

Simple and complicated formulas

Equations

Counting on Collections of Numbers

One-dimensional arrays

Higher-dimensional arrays

Arrays in formulas

Sums and products of the elements of an array

Chapter 3: Getting Statistical: A Short Review of Basic Statistics

Taking a Chance on Probability

Thinking of probability as a number

Following a few basic rules

Comparing odds versus probability

Some Random Thoughts about Randomness

Picking Samples from Populations

Recognizing that sampling isn’t perfect

Digging into probability distributions

Introducing Statistical Inference

Statistical estimation theory

Statistical decision theory

Homing In on Hypothesis Testing

Getting the language down

Testing for significance

Understanding the meaning of “p value” as the result of a test

Examining Type I and Type II errors

Grasping the power of a test

Going Outside the Norm with Nonparametric Statistics

Chapter 4: Counting on Statistical Software

Desk Job: Personal Computer Software

Checking out commercial software

Focusing on free software

On the Go: Calculators and Mobile Devices

Scientific and programmable calculators

Mobile devices

Gone Surfin’: Web-Based Software

On Paper: Printed Calculators

Chapter 5: Conducting Clinical Research

Designing a Clinical Study

Identifying aims, objectives, hypotheses, and variables

Deciding who will be in the study

Choosing the structure of the study

Using randomization

Selecting the analyses to use

Defining analytical populations

Determining how many subjects to enroll

Putting together the protocol

Carrying Out a Clinical Study

Protecting your subjects

Collecting and validating data

Analyzing Your Data

Dealing with missing data

Handling multiplicity

Incorporating interim analyses

Chapter 6: Looking at Clinical Trials and Drug Development

Not Ready for Human Consumption: Doing Preclinical Studies

Testing on People during Clinical Trials to Check a Drug’s Safety and Efficacy

Phase I: Determining the maximum tolerated dose

Phase II: Finding out about the drug’s performance

Phase III: Proving that the drug works

Phase IV: Keeping an eye on the marketed drug

Holding Other Kinds of Clinical Trials

Pharmacokinetics and pharmacodynamics (PK/PD studies)

Bioequivalence studies

Thorough QT studies

Part II: Getting Down and Dirty with Data

Chapter 7: Getting Your Data into the Computer

Looking at Levels of Measurement

Classifying and Recording Different Kinds of Data

Dealing with free-text data

Assigning subject identification (ID) numbers

Organizing name and address data

Collecting categorical data

Recording numerical data

Entering date and time data

Checking Your Entered Data for Errors

Creating a File that Describes Your Data File

Chapter 8: Summarizing and Graphing Your Data

Summarizing and Graphing Categorical Data

Summarizing Numerical Data

Locating the center of your data

Describing the spread of your data

Showing the symmetry and shape of the distribution

Structuring Numerical Summaries into Descriptive Tables

Graphing Numerical Data

Showing the distribution with histograms

Summarizing grouped data with bars, boxes, and whiskers

Depicting the relationships between numerical variables with other graphs

Chapter 9: Aiming for Accuracy and Precision

Beginning with the Basics of Accuracy and Precision

Getting to know sample statistics and population parameters

Understanding accuracy and precision in terms of the sampling distribution

Thinking of measurement as a kind of sampling

Expressing errors in terms of accuracy and precision

Improving Accuracy and Precision

Enhancing sampling accuracy

Getting more accurate measurements

Improving sampling precision

Increasing the precision of your measurements

Calculating Standard Errors for Different Sample Statistics

A mean

A proportion

Event counts and rates

A regression coefficient

Chapter 10: Having Confidence in Your Results

Feeling Confident about Confidence Interval Basics

Defining confidence intervals

Looking at confidence levels

Taking sides with confidence intervals

Calculating Confidence Intervals

Before you begin: Formulas for confidence limits in large samples

The confidence interval around a mean

The confidence interval around a proportion

The confidence interval around an event count or rate

The confidence interval around a regression coefficient

Relating Confidence Intervals and Significance Testing

Chapter 11: Fuzzy In Equals Fuzzy Out: Pushing Imprecision through a Formula

Understanding the Concept of Error Propagation

Using Simple Error Propagation Formulas for Simple Expressions

Adding or subtracting a constant doesn’t change the SE

Multiplying (or dividing) by a constant multiplies (or divides) the SE by the same amount

For sums and differences: Add the squares of SEs together

For averages: The square root law takes over

For products and ratios: Squares of relative SEs are added together

For powers and roots: Multiply the relative SE by the power

Handling More Complicated Expressions

Using the simple rules consecutively

Checking out an online calculator

Simulating error propagation — easy, accurate, and versatile

Part III: Comparing Groups

Chapter 12: Comparing Average Values between Groups

Knowing That Different Situations Need Different Tests

Comparing the mean of a group of numbers to a hypothesized value

Comparing two groups of numbers

Comparing three or more groups of numbers

Analyzing data grouped on several different variables

Adjusting for a “nuisance variable” when comparing numbers

Comparing sets of matched numbers

Comparing within-group changes between groups

Trying the Tests Used for Comparing Averages

Surveying Student t tests

Assessing the ANOVA

Running Student t tests and ANOVAs from summary data

Running nonparametric tests

Estimating the Sample Size You Need for Comparing Averages

Simple formulas

Software and web pages

A sample-size nomogram

Chapter 13: Comparing Proportions and Analyzing Cross-Tabulations

Examining Two Variables with the Pearson Chi-Square Test

Understanding how the chi-square test works

Pointing out the pros and cons of the chi-square test

Modifying the chi-square test: The Yates continuity correction

Focusing on the Fisher Exact Test

Understanding how the Fisher Exact test works

Noting the pros and cons of the Fisher Exact test

Analyzing Ordinal Categorical Data with the Kendall Test

Studying Stratified Data with the Mantel-Haenszel Chi-Square Test

Chapter 14: Taking a Closer Look at Fourfold Tables

Focusing on the Fundamentals of Fourfold Tables

Choosing the Right Sampling Strategy

Producing Fourfold Tables in a Variety of Situations

Describing the association between two binary variables

Assessing risk factors

Evaluating diagnostic procedures

Investigating treatments

Looking at inter- and intra-rater reliability

Chapter 15: Analyzing Incidence and Prevalence Rates in Epidemiologic Data

Understanding Incidence and Prevalence

Prevalence: The fraction of a population with a particular condition

Incidence: Counting new cases

Understanding how incidence and prevalence are related

Analyzing Incidence Rates

Expressing the precision of an incidence rate

Comparing incidences with the rate ratio

Calculating confidence intervals for a rate ratio

Comparing two event rates

Comparing two event counts with identical exposure

Estimating the Required Sample Size

Chapter 16: Feeling Noninferior (Or Equivalent)

Understanding the Absence of an Effect

Defining the effect size: How different are the groups?

Defining an important effect size: How close is close enough?

Recognizing effects: Can you spot a difference if there really is one?

Proving Equivalence and Noninferiority

Using significance tests

Using confidence intervals

Some precautions about noninferiority testing

Part IV: Looking for Relationships with Correlation and Regression

Chapter 17: Introducing Correlation and Regression

Correlation: How Strongly Are Two Variables Associated?

Lining up the Pearson correlation coefficient

Analyzing correlation coefficients

Regression: What Equation Connects the Variables?

Understanding the purpose of regression analysis

Talking about terminology and mathematical notation

Classifying different kinds of regression

Chapter 18: Getting Straight Talk on Straight-Line Regression

Knowing When to Use Straight-Line Regression

Understanding the Basics of Straight-Line Regression

Running a Straight-Line Regression

Taking a few basic steps

Walking through an example

Interpreting the Output of Straight-Line Regression

Seeing what you told the program to do

Looking at residuals

Making your way through the regression table

Wrapping up with measures of goodness-of-fit

Scientific fortune-telling with the prediction formula

Recognizing What Can Go Wrong with Straight-Line Regression

Figuring Out the Sample Size You Need

Chapter 19: More of a Good Thing: Multiple Regression

Understanding the Basics of Multiple Regression

Defining a few important terms

Knowing when to use multiple regression

Being aware of how the calculations work

Running Multiple Regression Software

Preparing categorical variables

Recoding categorical variables as numerical

Creating scatter plots before you jump into your multiple regression

Taking a few steps with your software

Interpreting the Output of a Multiple Regression

Examining typical output from most programs

Checking out optional output available from some programs

Deciding whether your data is suitable for regression analysis

Determining how well the model fits the data

Watching Out for Special Situations that Arise in Multiple Regression

Synergy and anti-synergy

Collinearity and the mystery of the disappearing significance

Figuring How Many Subjects You Need

Chapter 20: A Yes-or-No Proposition: Logistic Regression

Using Logistic Regression

Understanding the Basics of Logistic Regression

Gathering and graphing your data

Fitting a function with an S shape to your data

Handling multiple predictors in your logistic model

Running a Logistic Regression with Software

Interpreting the Output of Logistic Regression

Seeing summary information about the variables

Assessing the adequacy of the model

Checking out the table of regression coefficients

Predicting probabilities with the fitted logistic formula

Making yes or no predictions

Heads Up: Knowing What Can Go Wrong with Logistic Regression

Don’t fit a logistic function to nonlogistic data

Watch out for collinearity and disappearing significance

Check for inadvertent reverse-coding of the outcome variable

Don’t misinterpret odds ratios for numerical predictors

Don’t misinterpret odds ratios for categorical predictors

Beware the complete separation problem

Figuring Out the Sample Size You Need for Logistic Regression

Chapter 21: Other Useful Kinds of Regression

Analyzing Counts and Rates with Poisson Regression

Introducing the generalized linear model

Running a Poisson regression

Interpreting the Poisson regression output

Discovering other things that Poisson regression can do

Anything Goes with Nonlinear Regression

Distinguishing nonlinear regression from other kinds

Checking out an example from drug research

Running a nonlinear regression

Interpreting the output

Using equivalent functions to fit the parameters you really want

Smoothing Nonparametric Data with LOWESS

Running LOWESS

Adjusting the amount of smoothing

Part V: Analyzing Survival Data

Chapter 22: Summarizing and Graphing Survival Data

Understanding the Basics of Survival Data

Knowing that survival times are intervals

Recognizing that survival times aren’t normally distributed

Considering censoring

Looking at the Life-Table Method

Making a life table

Interpreting a life table

Graphing hazard rates and survival probabilities from a life table

Digging Deeper with the Kaplan-Meier Method

Heeding a Few Guidelines for Life Tables and the Kaplan-Meier Method

Recording survival times the right way

Recording censoring information correctly

Interpreting those strange-looking survival curves

Doing Even More with Survival Data

Chapter 23: Comparing Survival Times

Comparing Survival between Two Groups with the Log-Rank Test

Understanding what the log-rank test is doing

Running the log-rank test on software

Looking at the calculations

Assessing the assumptions

Considering More Complicated Comparisons

Coming Up with the Sample Size Needed for Survival Comparisons

Chapter 24: Survival Regression

Knowing When to Use Survival Regression

Explaining the Concepts behind Survival Regression

The steps of Cox PH regression

Hazard ratios

Running a Survival Regression

Interpreting the Output of a Survival Regression

Testing the validity of the assumptions

Checking out the table of regression coefficients

Homing in on hazard ratios and their confidence intervals

Assessing goodness-of-fit and predictive ability of the model

Focusing on baseline survival and hazard functions

How Long Have I Got, Doc? Constructing Prognosis Curves

Running the proportional-hazards regression

Finding h

Estimating the Required Sample Size for a Survival Regression

Part VI: The Part of Tens

Chapter 25: Ten Distributions Worth Knowing

The Uniform Distribution

The Normal Distribution

The Log-Normal Distribution

The Binomial Distribution

The Poisson Distribution

The Exponential Distribution

The Weibull Distribution

The Student t Distribution

The Chi-Square Distribution

The Fisher F Distribution

Chapter 26: Ten Easy Ways to Estimate How Many Subjects You Need

Comparing Means between Two Groups

Comparing Means among Three, Four, or Five Groups

Comparing Paired Values

Comparing Proportions between Two Groups

Testing for a Significant Correlation

Comparing Survival between Two Groups

Scaling from 80 Percent to Some Other Power

Scaling from 0.05 to Some Other Alpha Level

Making Adjustments for Unequal Group Sizes

Allowing for Attrition

Title Page Image

Biostatistics For Dummies®

Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, the Wiley logo, For Dummies, the Dummies Man logo, A Reference for the Rest of Us!, The Dummies Way, Dummies Daily, The Fun and Easy Way, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc., and/or its affiliates in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002.

For technical support, please visit www.wiley.com/techsupport.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2013936422

ISBN 978-1-118-55398-5 (pbk); ISBN 978-1-118-55395-4 (ebk); ISBN 978-1-118-55396-1 (ebk); ISBN 978-1-118-55399-2 (ebk)

Manufactured in the United States of America

10 9 8 7 6 5 4 3 2 1

About the Author

John C. Pezzullo, PhD, is an adjunct associate professor at Georgetown University. He has had a half-century of experience supporting researchers in the physical, biological, and social sciences. For more than 25 years, he led a dual life at Rhode Island Hospital as an information technology programmer/analyst (and later director) while also providing statistical and other technical support to biological and clinical researchers at the hospital. He then joined the faculty at Georgetown University as informatics director of the National Institute of Child Health and Human Development's Perinatology Research Branch. He has held faculty appointments in the departments of obstetrics and gynecology, biomathematics and biostatistics, pharmacology, nursing, and internal medicine. He is now semi-retired and living in Florida, but he still teaches biostatistics and clinical trial design to Georgetown students over the Internet. He created the StatPages.info website, which provides online statistical calculating capability and other statistics-related resources.

Dedication

To my wife, Betty: Without your steadfast support and encouragement, I would never have been able to complete this book. To Mom and Dad, who made it all possible. And to our kids, our grandkids, and our great-grandkids!

Author’s Acknowledgments

My heartfelt thanks to Matt Wagner of Fresh Books, Inc., and to Lindsay Lefevere for the opportunity to write this book; to Tonya Cupp, my special editor, who tutored me in the “Wiley ways” during the first quarter of the chapter-writing phase of the project; to Georgette Beatty, my project editor, who kept me on the path and on target (and mostly on time) throughout the process; to Christy Pingleton, the copy editor, for making sure what I said was intelligible; and to William Miller and Donatello Telesca, the technical reviewers, for making sure that what I said was correct.

Special thanks to Darrell Abernethy for his invaluable suggestions in Chapter 6.

And a special word of appreciation to all my family and friends, who provided so much support and encouragement throughout the whole project.

Publisher’s Acknowledgments

We're proud of this book; please send us your comments at http://dummies.custhelp.com. For other comments, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002.

Some of the people who helped bring this book to market include the following:

Acquisitions, Editorial, and Vertical Websites

Senior Project Editor: Georgette Beatty

Executive Editor: Lindsay Sandman Lefevere

Copy Editor: Christine Pingleton

Assistant Editor: David Lutton

Editorial Program Coordinator: Joe Niesen

Technical Editors: Dr. William G. Miller, Donatello Telesca

Editorial Manager: Michelle Hacker

Editorial Assistant: Alexa Koschier

Composition Services

Project Coordinator: Sheree Montgomery

Layout and Graphics: Carrie A. Cesavice

Proofreaders: Debbye Butler, John Greenough

Indexer: Ty Koontz

Special Help
Tonya Cupp, Sarah Faulkner, Todd Lothery, Danielle Voirol

Publishing and Editorial for Consumer Dummies

Kathleen Nebenhaus, Vice President and Executive Publisher

David Palmer, Associate Publisher

Kristin Ferguson-Wagstaffe, Product Development Director

Publishing for Technology Dummies

Andy Cummings, Vice President and Publisher

Composition Services

Debbie Stailey, Director of Composition Services

Introduction

Biostatistics is the practical application of statistical concepts and techniques to topics in biology. Because biology is such a broad field — studying all forms of life from viruses to trees to fleas to mice to people — biostatistics covers a very wide area, including designing biological experiments, safely conducting research on human beings, collecting and verifying data from those studies, summarizing and displaying that data, and analyzing the data to draw meaningful conclusions from it.

No book of reasonable size can hope to span all the subspecialties of biostatistics, including molecular biology, genetics, agricultural studies, animal research (in the lab and in the wild), clinical trials on humans, and epidemiological research. So I've concentrated on the most widely applicable topics and on the topics that are most relevant to research on humans (that is, clinical research). I chose these topics on the basis of a survey of graduate-level biostatistics curricula from major universities. I hope it covers most of the topics you're most interested in; but if it doesn't, please tell me what you wish I had included. You can e-mail me at jcp12345@gmail.com, and I'll try to respond to your message.

About This Book

I wrote this book as a reference — something you go to when you want information about a particular topic. So you don’t have to read it from beginning to end; you can jump directly to the part you’re interested in. In fact, I hope you’ll be inclined to pick it up from time to time, open it to a page at random, read a page or two, and get a little something useful from it.

This book generally doesn’t show you the detailed steps to perform every statistical calculation by hand. That may have been necessary in the mid-1900s, when statistics students spent hours in a “computing lab” (that is, a room that had an adding machine in it) calculating a correlation coefficient, but nowadays computers do all the computing. (See Chapter 4 for advice on choosing statistical software.) When describing statistical tests, my focus is always on the concepts behind the method, how to prepare your data for analysis, and how to interpret the results. I keep mathematical formulas and derivations to a minimum in this book; I include them only when they help explain what’s going on. If you really want to see them, you can find them in many biostatistics textbooks, and they’re readily available online.

Because good experimental design is crucial for the success of any research, this book gives special attention to the design of clinical trials and, specifically, to calculating the number of subjects you need to study. You find easy-to-apply examples of sample-size calculations in the chapters describing significance tests in Parts III, IV, and V and in Chapter 26.

Conventions Used in This Book

Here are some typographic conventions I use throughout this book:

When I introduce a new term, I put the term in italics and define it. I also use italics occasionally to emphasize important information.

In bulleted lists, I often place the most important word or phrase of each bulleted item in boldface text. The action parts of numbered steps are also boldface.

I show web links (URLs) as monotype text.

When this book was printed, some web addresses may have needed to break across two lines of text. If that happened, rest assured that I haven’t put in any extra characters (like hyphens) to indicate the break. So, when using one of these web addresses, just type in exactly what you see in this book, pretending as though the line break doesn’t exist.

Whenever you see the abbreviation sd or SD, it always refers to the standard deviation.

Anytime you see the word significant in reference to a p value, it means p ≤ 0.05.

When you see the lowercase italicized letter e in a formula, it refers to the mathematical constant 2.718..., which I describe in Chapter 2. (On the very rare occasions that it stands for something else, I say so.)

I alternate between using male and female pronouns (instead of saying “he or she,” “him or her,” and so on) throughout the book. No gender preference is intended.

What You’re Not to Read

Although I try to keep technical (that is, mathematical) details to a minimum, I do include them occasionally. The more complicated ones are marked by a Technical Stuff icon. You can skip over these paragraphs, and it won’t prevent you from understanding the rest of the material. You can also skip over anything that’s in a sidebar (text that resides in a box). Sidebars contain nonessential but interesting stuff, like historical trivia and other “asides.”

Foolish Assumptions

I wrote this book to help several kinds of people, and I assume you fall into one of the following categories:

Students at the undergraduate or graduate level who are taking a course in biostatistics and want help with the topics they’re studying in class

People who have had no formal biostatistical training (perhaps no statistical training at all) but find themselves having to deal with data from biological or clinical studies as part of their job

Doctors, nurses, and other healthcare professionals who want to carry out clinical research

If you’re interested in biostatistics, then you’re no dummy. But I bet you sometimes feel like a dummy when it comes to biostatistics, or statistics in general, or mathematics. Don’t feel bad — I’ve felt that way many times over the years, and still feel like that whenever I’m propelled into an area of biostatistics I haven’t encountered before. (If you haven’t taken a basic statistics course yet, you may want to get Statistics For Dummies by Deborah J. Rumsey, PhD — published by Wiley — and read parts of that book first.)

The important thing to keep in mind is that you don’t have to be a math genius to be a good biological or clinical scientist — one who can intelligently design experiments, execute them well, collect and analyze data properly, and draw valid conclusions. You just have to have a good grasp of the basic concepts and know how to utilize the sophisticated statistical software that has become so widely available.

How This Book Is Organized

I’ve divided this book into six parts, and each part contains several chapters. The following sections describe what you find in each part.

Part I: Beginning with Biostatistics Basics

This part can be thought of as providing preparation and context for the remainder of this book. Here, I bring you up to speed on math and statistics concepts so that you’re comfortable with them throughout this book. Then I provide advice on selecting statistical software. And finally I describe one major setting in which biostatistics is utilized — clinical research.

Part II: Getting Down and Dirty with Data

This part focuses on the raw material that biostatistical analysis works with — data. You probably already know the two main types of data: numerical (or quantitative) data, such as ages and heights, and non-numerical data, such as names and genders. Part II gets into the more subtle (but very important) distinctions between different data types.

You discover how to collect data properly, how to summarize it concisely and display it as tables and graphs, and how to describe the quality of the data (its precision and the uncertainties associated with your measured values). And you find out how the precision of your raw data affects the precision of other things you calculate from that data.

Part III: Comparing Groups

This part describes some of the most common statistical analyses you carry out on your data — comparing variables between groups. You discover how to answer questions like these: Does an arthritis medication reduce joint pain more than a placebo? Does a history of diabetes in a parent predict the likelihood of diabetes in the child? And if so, by how much?

You also find out how to show that there’s no meaningful difference between two groups. Is a generic drug really equivalent to the name brand? Does a new drug not interfere with normal heart rhythm? This endeavor entails more than just not proving that there is a difference — absence of proof is not proof of absence, and there are special ways to prove that there’s no important difference in your data.

Throughout this part, I discuss common statistical techniques for comparing groups such as t tests, ANOVAs, chi-square tests, and the Fisher Exact test.

Part IV: Looking for Relationships with Correlation and Regression

This part takes you through the very broad field of regression analysis — studying the relationships that can exist between variables. You find out how to test for a significant association between two or more variables and how to express that relationship in terms of a formula or equation that predicts the likely value of one variable from the observed values of one or more other variables. You see how useful such an equation can be, both for understanding the underlying science and for doing all kinds of practical things based on that relationship.

After reviewing the simple straight-line and multiple linear regression techniques you probably encountered in a basic stats course, you discover how to handle the more advanced problems that occur in the real world of biological research — logistic regression for analyzing yes-or-no kinds of outcomes, like “had a miscarriage”; Poisson regression for analyzing the frequency of recurring events, such as the number of hospitalizations for emphysema patients; and nonlinear regression when the relationship between the variables can take on a complicated mathematical form.

Part V: Analyzing Survival Data

This part is devoted to the analysis of one very special and important kind of data in biological research — survival time (or, more generally, the time to the first occurrence of some particular kind of event). You see what makes this type of data so special and why special methods are needed to deal with it correctly. You see how to calculate survival curves, test for a significant difference in survival between two or more groups of subjects, and apply the powerful and general methods of regression analysis to survival data.

Part VI: The Part of Tens

The final two chapters of this book provide “top-ten lists” of handy information and rules that you’ll probably refer to often. Chapter 25 describes ten of the most common statistical distribution functions that you encounter in biostatistical research. Some of these distributions describe how your observed data values are likely to fluctuate, and some are used primarily in conjunction with the common significance tests (t-tests, chi-square tests, and ANOVAs). Chapter 26 contains a set of handy rules of thumb you can use to get quick estimates of the number of subjects you need to study in order to have a good chance of obtaining significant results.

Icons Used in This Book

Icons (the little drawings in the margins of this book) are used to draw your attention to certain kinds of material. Here’s what they mean:

This icon signals something that’s really worth keeping in mind. If you take away anything from this book, it should be the material marked with this icon.

I use this icon to flag things like derivations and computational formulas that you don’t have to know or understand but that may give you a deeper insight into other material. Feel free to skip over any information with this icon.

This icon refers to helpful hints, ideas, shortcuts, and rules of thumb that you can use to save time or make a task easier. It also highlights different ways of thinking about some topic or concept.

This icon alerts you to a topic that can be tricky or a concept that people often misunderstand.

Where to Go from Here

You’re already off to a good start — you’ve read this introduction, so you have a good idea of what this book is all about (at least what the major parts of the book are all about). For an even better idea of what’s in it, take a look at the Contents at a Glance — this drills down into each part, and shows you what each chapter is all about. Finally, skim through the full-blown table of contents, which drills further down into each chapter, showing you the sections and subsections of that chapter.

If you want to get the big picture of what biostatistics encompasses (at least those parts of biostatistics covered in this book), then read Chapter 1. This is a top-level overview of the basic concepts that make up this entire book. Here are a few other special places you may want to jump into:

If you’re uncomfortable with mathematical notation, then Chapter 2 is the place to start.

If you want a quick refresher on basic statistics (the kind of stuff that would be taught in a Stats 101 course), then read Chapter 3.

You can get an introduction to clinical research in Chapters 5 and 6.

If you want to know about collecting, summarizing, and graphing data, jump to Part II.

If you need to know about working with survival data, you can go right to Part V.

If you’re puzzled about some particular statistical distribution function, then look at Chapter 25.

And if you need to do some quick sample-size estimates, turn to Chapter 26.

WILEY END USER LICENSE AGREEMENT

Biostatistics For Dummies^®