Social Data Science: Text Data and Deep Learning

Course content

"Social Data Science: Text Data and Deep Learning" is one of two new courses in Social Data Science, that build on the introductory summer school course in social data science. The courses introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques by practical examples and hands-on experience. In each course, we discuss how novel social data science applications apply these tools.

 

"Social Data Science: Text Data and Deep Learning" focuses on methods for analyzing unstructured data. Unstructured data such as images, video and text used to be confined to small N qualitative studies within the social sciences. How ever, recent developments in both natural language processing (NLP) and computer vision (CV) - broadly speaking the field of AI - hold great promises to social data scientists wishing to supplement deep qualitative readings and analysis of unstructured data, with quantitative insights and generalization from large corpuses of unstructured text and images.

 

The course begins with an introduction to neural networks and transfer learning. In many cases involving unstructured data, the high dimensionality of both language and vision means that the old supervised learning paradigm of training models from scratch using limited instructive samples (training data) is either impossible or very inefficient. In these cases, transfer learning can be used to adopt large pre-trained models, trained on very large labeled or unlabeled datasets, to a specific task. Next, we cover text as data, an abundantly and readily available data source in the form of news articles, speeches, forum threads, social media posts, encyclopedia, et cetera. Lastly, the course introduces methods for using digital images as data.

Education

MSc programme in Economics – elective course.

 

The PhD Programme in Economics at the Department of Economics  - elective course with research module (PhD students must contact the study administration and the lecturer in order to write the research assignment)

 

NOTE: Due to an overlapping syllabus this course cannot be taken if the course "Topics in Social Data Science" (AØKK08371U) has been taken.

 

Learning outcome

After completing the course, the student is expected to be able to:

 

Knowledge:

  • Discuss fundamental concepts in machine learning: model generalization, overfitting, loss functions, the bias variance trade-off and cross-validation.
  • Account for various learning strategies, algorithms as w ell as approaches: clustering and unsupervised learning, supervised learning, semi-supervised learning, transfer learning, multi-task learning.
  • Identify and define the potential of different representations of text, structured and unstructured.

 

Skills:

  • Apply fundamental machine learning tools, including model selection, hyperparameter search and robust model validation.
  • Use neural networks to make predictions from unstructured data.
  • Extract reliable information from text data using supervised learning and techniques from natural language processing.
  • Master computer vision methods to extract features from image data.

 

Competencies

  • Integrate theoretical and applied know ledge within the field of Social Data Science and formulate powerful research questions given an interesting dataset.
  • Construct validated and documented data sets for social science from unstructured text and media data.
  • Communicate results using comprehensive statistics and modern visualization methods in particular plotting new data types.
  • Critically evaluate the implications of results, taking into account model limitations and biases, and systematic noise introduced by data collection and sampling methods.

Lectures and lab sessions with exercises.

The follow ing is a partial, tentative list of course readings.

  • Bishop, Christopher: Pattern Recognition and Machine Learning. Spring Publishing, 2006.
  • Cantu, Francisco & Michelle Torres: "Learning to See: Visual Analysis for Social Science Data".
  • Gentzkow , M., Kelly, B. T., & Taddy, M. Text as Data. Journal of Economic Literature.
  • Grimmer, J., & Stew art, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political
  • texts. Political Analysis, 21(3), 267-297.
  • Hastie, T., & Tibshirani, R. & Friedman, J.(2008). The Elements of Statistical Learning; Data Mining, Inference and Prediction.
  • Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.

It is necessary to have followed the summerschool course "Introduction to Social Data Science" at the Study of Economics, University of Copenhagen (or similar introduction to Python and machine learning). The specific skills needed are:
- Ability to write code in core Python programming as well as the numpy and Pandas packages, including transforming, merging, aggregating.
- Experience with training linear machine learning models, model validation and model selection.
- Experience with transforming data to data for machine learning.

Schedule:
3 hours lectures once a week from week 6 to 20 (except holidays)
2 hours exercise classes once a week from week 6/7 to 20/21 (except holidays)

The overall schema for the BA 3rd year and Master courses can be seen at KUnet:
MSc in Economics => "courses and teaching" => "Planning and overview" => "Your timetable"
BA i Økonomi/KA i Økonomi => "Kurser og undervisning" => "Planlægning og overblik" => "Dit skema"

Timetable and venue:
To see the time and location of lectures and exercise classes please press the link/links under "Se skema" (See schedule) at the right side of this page (F means Spring).

You can find the similar information English at
https:/​/​skema.ku.dk/​ku1920/​uk/​module.htm
-Select Department: “2200-Økonomisk Institut” (and wait for respond)
-Select Module:: “2200-F20; [Name of course]”
-Select Report Type: “List – Weekdays”
-Select Period: “Forår/Spring – Week 5-30”
Press: “ View Timetable”

Written
Oral
Individual
Collective

 

The students will receive:

Written feedback on mandatory assignments.
Immediate feedback from quizzes on the content of the lectures.

ECTS
7,5 ECTS
Type of assessment
Written assignment, 24 hours
individuel take-home assignment. The students are allowed to communicate about the given problem-set but must work on, write and upload the assignment answer individually. Be aware that the plagiarism rules must be complied. The exam assignment is given in English and must be answered in English.
____
Aid
All aids allowed
Marking scale
7-point grading scale
Censorship form
No external censorship
for the written exam. The exam may be chosen for external censorship by random check.
____
Criteria for exam assessment

Students are assessed on the extent to which they master the learning outcome for the course.

 

To receive the top grade, the student must with no or only a few minor weaknesses be able to demonstrate an excellent performance displaying a high level of command of all aspects of the relevant material and can make use of the knowledge, skills and competencies listed in the learning outcomes.

Single subject courses (day)

  • Category
  • Hours
  • Exam
  • 24
  • Preparation
  • 112
  • Lectures
  • 42
  • Class Instruction
  • 28
  • English
  • 206