Introduction to Data Science (IDS)

Course content

The amount and complexity of available data is steadily increasing. To make use of this wealth of information, computing systems are needed that turn data into knowledge. Machine learning is about developing the required software that automatically analyses data for making predictions, categorizations, and recommendations. Machine learning algorithms are already an integral part of today's computing systems - for example in search engines, recommender systems, or biometrical applications. Machine learning provides a set of tools that are widely applicable for data analysis within a diverse set of problem domains such as data mining, search engines, digital image and signal analysis, natural language modeling, bioinformatics, physics, economics, biology, etc.

The purpose of the course is to introduce non-Computer Science students to probabilistic data modeling and the most common techniques from statistical machine learning and data mining. The students will obtain a working knowledge of basic data modeling and data analysis using fundamantal machine learning techniques.

This course is relevant for students from, among others, the studies of Cognition and IT, Bioinformatics, Physics, Biology, Chemistry, Economics, and Psychology. 

The course covers the following tentative topic list:

  • Foundations of statistical learning, probability theory.
  • Classification methods, such as: Linear models, K-Nearest Neighbor.
  • Regression methods, such as: Linear regression.
  • Clustering.
  • Dimensionality reduction and visualization techniques such as principal component analysis (PCA).
Education

MSc programme in IT and Cognition
MSc programme in Bioinformatics

Learning outcome

At course completion, the successful student will have:

Knowledge of

  • the general principles of data analysis;
  • elementary probability theory for modeling and analyzing data;
  • the basic concepts underlying classification, regression, and clustering;
  • common pitfalls in machine learning.

 

Skills in

  • applying linear and non-linear techniques for classification and regression;
  • elementary data clustering;
  • visualizing and evaluating results obtained with machine learning techniques;
  • identifying and handling common pitfalls in machine learning;
  • using machine learning and data mining toolboxes.

 

Competences in

  • recognizing and describing possible applications of machine learning and data analysis in their field of science;
  • comparing, appraising and selecting machine learning methods for specific tasks;
  • solving real-world data mining and pattern recognition problems by using machine learning techniques.

Lecture and exercise classes

See Absalon when the course is set up.

Very basic calculus and programming knowledge is required.

The courses NDAK16003U Introduction to Data Science (IDS) and NDAB15001U Modelling and Analysis of Data (MAD) have a very substantial overlap both in topics and level, and it is therefore not recommended that students pass both these courses.

ECTS
7,5 ECTS
Type of assessment
Continuous assessment
Assessment of 5-7 homework assignments.
Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
Several internal examiners.
Criteria for exam assessment

See learning outcome.

Single subject courses (day)

  • Category
  • Hours
  • Lectures
  • 28
  • Preparation
  • 14
  • Practical exercises
  • 57
  • Theory exercises
  • 57
  • Project work
  • 50
  • English
  • 206