Intro to R for Data Science

Course content

R is free to use for everyone and powerful. It has become one of the most widely-used programming languages for statistical analyses in the social sciences and is, for this reason, a highly-sought skill among employers. This is also true for the new emerging field of “Data Science”, which goes way beyond the social sciences. This course will teach you how to do (social) data science with R: You will learn how to get your data into shape, transform and manipulate it, visualise it and how to statistically model it. The course will also briefly introduce students to logistic regression and multilevel modelling. Apart from these skills that are necessary for conducting classical statistics, you will also learn some basic programming in R and how to do reproducible research and report your results using R Markdown. Beware that this class presumes that you have a solid background in basic statistics (i.e., descriptive statistics and multiple OLS regression).

Education

MA Research Methodology and Practice (MSc Curriculum 2015)

Course package (MSc 2015):

Velfærd, ulighed og mobilitet/Welfare, inequality and mobility
Viden, organisation og politik/Knowledge, organisation and politics
Kultur, livsstil og hverdagsliv/Culture, lifestyle and everyday life

Learning outcome

Knowledge:

  • R programming language
  • R studio
  • R markdown

 

Skills:

  • Students will be able to conduct statistical analysis with R.
  • Students will be able to program their own R functions, loops and so in R.
  • Students will be able to prepare presentations and reports with R markdown

 

Competences:

  • Students will increase their analytical and logical cognitive capacities
  • Students should be able to transform and manipulate data to prepare it for statistical analyses. They will be able to think about data in less narrow way, because R is more flexible than other statistical programming languages.
  • Students should be able to conduct own research based on analyses for which they use R.
  • Students should be able to prepare reproducible research reports and presentations with R markdown.

 

Lectures, class assignments, student presentations, a final paper that consists of an empirical analysis reported using R Markdown. Students are expected to contribute actively.

The course is largely based on: Grolemund, G. & Wickham, H. (2017): R for Data Science. O’Reilly. This book is freely available at: http://r4ds.had.co.nz/

 

Other useful books are:

Matloff, N. (2011): The Art of R Programming. No Starch Press

Teetor, P. (2011): R Cookbook. O’Reilly.

This course is no introduction to statistics! I expect that students have a solid background in basic statistics. They should have a thorough understanding of linear regression (OLS) with dummy variable predictors and interaction terms. This is a prerequisite. Otherwise I suggest to first visit my “Applied Multilevel Modeling” course in the fall semester.

Students will need to bring their own laptop.

Oral

I give structured feedback to student presentations, and the final paper. Solutions to the class assignments will be presented as well.

ECTS
7,5 ECTS
Type of assessment
Written assignment
Individual/group.
A written take-home essay is defined as an assignment that addresses one or more questions. The exam is based on the course syllabus, i.e. the literature set by the teacher.

The written take-home essay must be no longer than 10 pages. For group assignments, an extra 5 pages is added per additional student. Further details for this exam form can be found in the Curriculum and in the General Guide to Examinations at KUnet.
Marking scale
7-point grading scale
Censorship form
No external censorship
Criteria for exam assessment

See learning outcome.

  • Category
  • Hours
  • Class Instruction
  • 28
  • Course Preparation
  • 35
  • Preparation
  • 16
  • Exercises
  • 56
  • Exam Preparation
  • 71
  • English
  • 206