Courses Offered

COMP3361 Natural language processing

COMP3361 Natural language processing

2022-23
Instructor(s):Kong Lingpeng
(Class A) No. of credit(s):6
Recommended Learning Hours:
Lecture: 26.0
Tutorial: 13.0
Pre-requisite(s):COMP3314 or COMP3340; and MATH1853
Co-requisite(s):  
Mutually exclusive with:  
Remarks:

Course Learning Outcomes

1. able to understand the motivations and principles for building natural language processing systems
2. able to master a set of key machine learning / statistical methods which are widely used in and beyond NLP
3. able to implement practical applications of NLP using tools such as NLTK, Pytorch and Dynet
Mapping from Course Learning Outcomes to Programme Learning Outcomes
 PLO aPLO bPLO cPLO dPLO ePLO fPLO gPLO hPLO iPLO j
CLO 1TTT
CLO 2TTT
CLO 3TTT

T - Teach, P - Practice
For BEng(CompSc) Programme Learning Outcomes, please refer to here.

Syllabus

Calendar Entry:
Natural language processing (NLP) is the study of human language from a computational perspective. The course will be focusing on machine learning and corpus-based methods and algorithms. We will cover syntactic, semantic and discourse processing models. We will describe the use of these methods and models in applications including syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization. This course starts with language models (LMs), which are both front and center in natural language processing (NLP), and then introduces key machine learning (ML) ideas that students should grasp (e.g. feature-based models, log-linear models and then the neural models). We will land on modern generic meaning representation methods (e.g. BERT/GPT-3) and the idea of pretraining / finetuning.

Detailed Description:

Introduction to NLP, Language Models Mapped to CLOs
Computational Linguistics / Natural Language Processing, Bigram/trigram models, Smoothing1
Tagging, Hidden Markov Models Mapped to CLOs
POS tagging / Named-Entity Recognition (NER), Generative Models, Noisy Channel Model, Hidden Markov Models (HMM), Viterbi Algorithm1, 2, 3
Log-Linear Models Mapped to CLOs
Features in NLP, Parameter Estimation (Learning), Regularization1, 2
Parsing, Context-free Grammars Mapped to CLOs
Syntactic Structure, Context-free Grammars (CFGs), Ambiguity2, 3
Probabilistic Context-free Grammars, Lexicalized Context-free Grammars Mapped to CLOs
CKY Algorithm, Head words, Dependency Parsing2, 3
Log-Linear Models for Tagging and for history-based parsing Mapped to CLOs
MEMM, CRF, (advanced) EM algorithm2, 3
Feedforward Neural Networks, Computational Graphs, Backpropagation Mapped to CLOs
Neural Networks, Chain rule, Loss function2, 3
Word Embeddings in Feedforward Networks Mapped to CLOs
Word2vec, Neural structured prediction (e.g. Tagging and Dependency parsing)2, 3
Recurrent Networks, LSTMs Mapped to CLOs
RNN language models, LSTM gates, Seq2seq models2, 3
Statistical machine translation Mapped to CLOs
Alignment, phrase-based MT1, 2
Transformers and Attention mechanism Mapped to CLOs
Neural Machine Translation, Multi-head attention2, 3
Contextualized word representation Mapped to CLOs
BERT, GPT-3, Pretraining and fine-tuning1, 2, 3

Assessment:
Continuous Assessment: 50%
Written Examination: 50%

Teaching Plan

Please refer to the corresponding Moodle course.

Moodle Course(s)

Please login with your CS account (for staff only)