Stochastic Models and Inference for Genetic Data

Course content

Introduction to topics in Statistical Genetics, that is, the application of statistical methods for modelling and drawing inferences from genetic data, in particular as DNA data. The course will develop mathematical theory and statistical models to understand how genetic data vary in a population

How do we statistically understand genetic variation? How do we describe genetic relatedness between individuals? When did a genetic disease first appear in a population? What we can learn about a population's history from a sample of genetic data?

Random variables describing genetic data from individuals in a population are highly correlated ("exchangeable random variables") and standard asymptotic theory does not apply. The theory and models are based on Markov processes, in discrete and continuous time. Inference procedures are based on Markov Chain Mote Carlo (MCMC), Importance Sampling (IS) and Approximate Bayesian Computation (ABC). These procedures will be discussed in general and applied to the setitng of the course.

Key mathematical/statistical concepts are ancestral processes, the coalescent process, the age and frequency of alleles (genetic types) in populations, and inference for genetic data based on such processes. Relatedness between indivduals is desribed in terms of a stochastic graph.



MSc Programme in Statistics

Learning outcome

At the end of the course the student will have knowledge about how genetic variation is modelled, ancestral processes, and how inference can be made from such processes. The student will have the knowledge to

  • explain population genetic models, like the Wright-Fisher model
  • explain the coalescent process and Ewens sampling formula
  • explain the frequency distribution of alleles (types)
  • explain statistical methods for inference on genetic data (ABC, MCMC)
  • explain what a genealogy is
  • explain the use of Markov chains to model genetic variation



  • The student will acquire the skills to analysis simple genetic data sets, and to extract basic mathematical properties about ancestral processes.

At the end of the course the students will have the competence to carry out inference for (simple) genetic data sets

  • extract relevant mathematical properties of genetic models
  • extract biological insight from mathematical/statistical models

3 hours of lecturing, 2 hours of exercise classes per week for 7 weeks.

See Absalon for a list of course literature.

VidSand1, Stat1, Beting or similar

Student participation is expected, for example, by presentation of exercises and/or course material.

7,5 ECTS
Type of assessment
Written examination, 4 hours under invigilation
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
One internal examiner
Criteria for exam assessment

In order to obtain the grade 12 the student should convincingly and accurately demonstrate the knowledge, skills and competences described under Learning Outcome.

Single subject courses (day)

  • Category
  • Hours
  • Exam
  • 45
  • Lectures
  • 28
  • Exercises
  • 21
  • Preparation
  • 112
  • English
  • 206