Natural Language Processing (NLP)
Have you ever wondered how to build a system that can process, understand or generate text automatically? For instance, to translate between languages, answer questions, or recognise the names of people in text? Then this course is for you.
This course will introduce the fundamentals of natural language processing (NLP), i.e., computational models of language and their applications to text. Language is at the heart of human intelligence, giving NLP a central role in Artificial Intelligence research and development.
We will combine machine learning (ML), including fundamental formalisms and algorithms, with a strong hands-on experience, i.e., the practical implementation of the methods for concrete NLP problems.
The course will utilise interactive lecture materials built with Jupyter notebook. Course materials from last year are publicly available here: https://github.com/copenlu/stat-nlp-book; and the course will closely follow last year’s iteration. Please skim these materials if you are in doubt about course prerequisites or course content.
The course covers the following tentative topic list:
• NLP tasks: language modelling, text classification, semantics, information extraction, parsing, pragmatics, machine translation, summarisation, question answering
• methods: text classification, structured prediction, representation and deep learning, conditional random fields, beam search
• implementations: relationship between NLP tasks, efficient implementations
Throughout the course, we will also discuss the themes of discriminative and generative learning and different ways of obtaining supervision for training statistical NLP models.
core NLP tasks (e.g. machine translation, question answering, information extraction)
methods (e.g. classification, structured prediction, representation learning)
implementations (e.g. relationship between NLP tasks, efficient implementations)
identify the different kinds of NLP tasks
choose the correct algorithm for a given problem situation
implement core algorithms in Python
assess the most appropriate algorithms to solve a given NLP problem
distinguish and evaluate the advantages of different approaches to the same task
decompose natural language tasks into manageable components
evaluate systems quantitatively and qualitatively
apply the learned skills in a wider context to areas that face similar challenges, for example data science or political science research, or gene sequencing
The format of the class consists of lectures (including guest lectures), exercises, and project work.
Selected papers and book chapters. See Absalon when the course is set up.
Knowledge of machine learning (probability theory, linear
algebra, classification, neural networks) and programming (Python)
is required, either through formal education or self-study. No
prior knowledge of natural language processing or linguistics is
Relevant machine learning competencies can be obtained through one of the following courses:
- NDAK22002U Advanced Deep Learning (ADL)
- NDAK22000U Machine Learning A (MLA)
- NDAK15007U Machine Learning (ML)
- NDAK16003U Introduction to Data Science (IDS)
- Machine Learning, Coursera
Academic qualifications equivalent to a BSc degree are recommended.
If you are in doubt about if you meet the course prerequisites, you can check the course materials from last year here: https://github.com/copenlu/stat-nlp-book.
This course will teach the fundamentals of natural language processing, in terms of methods, typical tasks and implementations. For those students with a specific interest in opinion and data mining, the course NDAK14004U Web Science (WS) is recommended. There will be no significant overlap between the two courses, and students are welcome to attend both of them.
PhD’s can register for MSc-course by following the same procedure as credit-students, see link above.
- 7,5 ECTS
- Type of assessment
Written examination, 1.5 hours under invigilationWritten assignment, During course
- Type of assessment details
- The exam consists of two parts:
1. A group project to count for 50% of the mark, written during the course (either group members hand-in individual reports or they mark their contribution in the group report).
2. A 1.5 hours written exam that counts for 50% of the mark.
- All aids allowed
The use of Large Language Models (LLM)/Large Multimodal Models (LMM) – such as ChatGPT and GPT-4 – is permitted for the written assignment.
- Marking scale
- 7-point grading scale
- Censorship form
- No external censorship
Several internal examiners
Criteria for exam assessment
See Learning Outcome.
Single subject courses (day)
- Theory exercises
- Practical exercises
- Project work
- Course number
- 7,5 ECTS
- Programme level
- Full Degree Master
- Block 1
- No limit
The number of seats may be reduced in the late registration period
- Study Board of Mathematics and Computer Science
- Department of Computer Science
- Faculty of Science
- Daniel Hershcovich (2-71754d71763b78823b7178)
- Anders Søgaard (8-79756b6d6767786a466a6f34717b346a71)
Are you BA- or KA-student?
Courseinformation of students