Courses Offered

COMP3361 Natural language processing

2024-25

Instructor(s):

Yu Tao

(Class A)

No. of credit(s):

Recommended Learning Hours:

Lecture:	26.0
Tutorial:	13.0

Pre-requisite(s):

COMP3314 or COMP3340; and MATH1013 or MATH1853

Co-requisite(s):

Mutually exclusive with:

Remarks:

Course Learning Outcomes

1.	able to understand the motivations and principles for building natural language processing systems
2.	able to master a set of key machine learning / statistical methods which are widely used in and beyond NLP
3.	able to implement practical applications of NLP using tools such as NLTK, Pytorch and Dynet

Mapping from Course Learning Outcomes to Programme Learning Outcomes
	PLO a	PLO b	PLO c	PLO d	PLO i	PLO j
CLO 1	T	T				T
CLO 2	T		T	T
CLO 3	T			T	T

T - Teach, P - Practice
For BEng(CompSc) Programme Learning Outcomes, please refer to here.

Syllabus

Calendar Entry:
Natural language processing (NLP) is the study of human language from a computational perspective. The course will be focusing on machine learning and corpus-based methods and algorithms. We will cover syntactic, semantic and discourse processing models. We will describe the use of these methods and models in applications including syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization. This course starts with language models (LMs), which are both front and center in natural language processing (NLP), and then introduces key machine learning (ML) ideas that students should grasp (e.g. feature-based models, log-linear models and then the neural models). We will land on modern generic meaning representation methods (e.g. BERT/GPT-3) and the idea of pretraining / finetuning.

Detailed Description:

Introduction to NLP, Language Models	Mapped to CLOs
Computational Linguistics / Natural Language Processing, Bigram/trigram models, Smoothing	1
Tagging, Hidden Markov Models	Mapped to CLOs
POS tagging / Named-Entity Recognition (NER), Generative Models, Noisy Channel Model, Hidden Markov Models (HMM), Viterbi Algorithm	1, 2, 3
Log-Linear Models	Mapped to CLOs
Features in NLP, Parameter Estimation (Learning), Regularization	1, 2
Parsing, Context-free Grammars	Mapped to CLOs
Syntactic Structure, Context-free Grammars (CFGs), Ambiguity	2, 3
Probabilistic Context-free Grammars, Lexicalized Context-free Grammars	Mapped to CLOs
CKY Algorithm, Head words, Dependency Parsing	2, 3
Log-Linear Models for Tagging and for history-based parsing	Mapped to CLOs
MEMM, CRF, (advanced) EM algorithm	2, 3
Feedforward Neural Networks, Computational Graphs, Backpropagation	Mapped to CLOs
Neural Networks, Chain rule, Loss function	2, 3
Word Embeddings in Feedforward Networks	Mapped to CLOs
Word2vec, Neural structured prediction (e.g. Tagging and Dependency parsing)	2, 3
Recurrent Networks, LSTMs	Mapped to CLOs
RNN language models, LSTM gates, Seq2seq models	2, 3
Statistical machine translation	Mapped to CLOs
Alignment, phrase-based MT	1, 2
Transformers and Attention mechanism	Mapped to CLOs
Neural Machine Translation, Multi-head attention	2, 3
Contextualized word representation	Mapped to CLOs
BERT, GPT-3, Pretraining and fine-tuning	1, 2, 3

Assessment:
Continuous Assessment: 50%
Written Examination: 50%

Teaching Plan

Please refer to the corresponding Moodle course.

Moodle Course(s)

Courses Offered

COMP3361 Natural language processing

Sign in to your account

Sign in to your account