##
Study Program

**3-YEAR FULL-TIME PHD PROGRAM IN CONTEMPORARY INFORMATICS: COMPUTER MODELLING AND DATA MINING**

**1st year, semester 1: **

**Selected problems of computer science I (seminar 45 h) **

The aim of the seminar is to introduce PhD students to the problems of computer science in most of its key trends. For this purpose seven seminars, each led by an eminent specialist in the presented field, are scheduled. In addition to the review of basic knowledge, current research carried out in the field of the object will be presented to the students. The topics of seminars are as follows:

1) Automata and formal languages 2) Introduction to the complexity theory 3) Introduction to combinatorics 4) Discrete Mathematics 5) Introduction to concurrent systems 6) Architecture of computer systems 7) Introduction to game theory

**Advanced statistical methods (30 h Lect + 30 h Lab) **

Advanced statistical methods with a special attention devoted to those which are frequently used in data mining will be discussed. In particular, inferential methods in regression, analysis of variance and generalized linear models as well as methods of modelling nominal data will be covered. One of the main recurring subjects will be checking the model adequacy and assessment of accuracy of model based prognosis. Discussion of existing software implementations of the relevant methods will accompany theoretical material.

**Data Mining (30h Lec + 30h Lab) **

This course covers modern statistical approaches data mining. Within supervised learning, the following topics will be discussed: linear methods for classification, Bayesian and Maximum Likelihood classification, within-class nonparametric density estimation and nearest neighbour classifiers, mixture discriminant analysis and prototype methods and modern approaches to regression analysis. Within unsupervised learning, the following topics will be addressed: finding structure in data and data dimension reduction (classical and non-classical approaches to principal component analysis, multidimensional scaling, factor analysis and independent component analysis), and cluster analysis (k-means and similar approaches including SOM, hierarchical clustering, clustering on subsets of attributes). Proper emphasis will be placed on approaches and problems which cross the boundaries of the two learning paradigms mentioned, i.e., semi-supervised learning, ensemble learning and modern kernel methods (including support vector machines).

**1st year, semester 2: **

**Selected problems of computer science II (seminar 45 h) **

The aim of the seminar is to introduce PhD students to the problems of computer science in most of its key trends. For this purpose seven seminars, each led by an eminent specialist in the presented field, are scheduled. In addition to the review of basic knowledge, current research carried out in the field of the object will be presented to the students. The topics of the seminars are as follows:

1) Basic notions and algorithms of the graph theory 2) Principles of programming languages 3) New SOA-based technologies in distributed systems 4) New developments in data bases 5) Introduction to verification of concurrent systems 6) Mathematic Logic in Computer Science Applications 7) Information and Communication Technologies Assisting People with a Range of Disabilities

**Selected problems of Data Mining (30 h Lect + 30 h Lab) **

The course will present some representative examples of contemporary data mining such as inference graphs, data streams and tree structures, transfer learning, deep learning and meta learning. Subjects as Gaussian process regression, multi-instance learning and uplift modelling will also be discussed. Moreover, typical examples of inference in case of high- and ultra-high dimensionality of feature space will be presented such as microarray and GWAS analysis. Important issues how to merge conclusions from different studies and combine different methods will be discussed. In particular ideas utilising collective wisdom approaches such as comittees of classifiers and methods such as bagging, boosting and random forests will be learned by hands-on approach.

**Advanced Machine Learning – selected problems (30 h Lect + 30 h Lab) **

The first part of the course will cover basic methods and techniques of Machine Learning. Topics will include classification techniques such as decision trees, Bayesian classification, regression, similarity based methods, neural networks, ensemble methods and others. Methods of assessing classifier performance will also be discussed. Unsupervised learning (clustering) will be covered including methods such as K-means and Gaussian mixture models. The second part of the course will include data mining methods, mainly rule based approaches, with primary focus on association rules but including also covering based methods, fuzzy logic and rough set based approaches. Throughout the course, students will read selected research papers related to the material discussed in class.

**2nd year, semester 3: **

**Advanced computer network management – performance analysis and information processing methods (30 h Lect + 30 h Lab) **

The course provides in depth knowledge of mathematical models and methods for computer networks management and communication analysis. The emphasis will be place on the congestion control and avoidance problems, computer networks performance analysis as well as on performance bottlenecks identification. The topics covered will also include management information processing, event correlation methodology and root cause analysis. A particular attention will be given to the theoretical background of the discussed methods. The topics considered are to be illustrated by suitably selected and adapted labs.

**Computer Modeling (30 h Lect + 30 h Lab) **

The lecture is devoted to presentation of principles and problems connected with modelling. Different types of models and their analysis are considered. Most of the time is allocated to the dynamic models described by ordinary or partial differential equations. Among the former ones the class of compartmental models consisting of a set of the first order ordinary equations representing flow and accumulation of materials or energies in some entities, is discussed in details. Different structures, parameter and structural identifiability, indistinquishability, and other problems are reviewed. Constant, time-dependent, and most common nonlinear parameter models, are considered. Construction, classification and most practically important types of models described by partial differential equations are discussed. Some practical examples of such models are given.

**Modern Cryptography (30 h Lect + 30 h Lab) **

The two-semester course is intended as an introduction to problems of cryptography and its applications. In the first semester we shall introduce the mathematical background as well as number-theoretic reference problems behind modern cryptography. We shall mainly discuss public-key cryptosystems and their security parameters. For example, we shall deal with RSA, DSA and ECDSA schemes. We shall consider various digital signatures schemes as well as Diffie-Hellman key agreement protocols.

**2nd year, semester 4: **

**Mining massive data sets (30 h Lect + 30 h Lab) **

The course will address the basic issues related to Big Data analysis and adaptation of data mining and statistical techniques to this framework. Main topics include introduction to Map-Reduce and New Software Stack, link analysis, methods of finding similar items and clustering techniques for massive data sets as well as mining data-streams. Important issues such as dimension reduction and feature selection will be thoroughly discussed as well as the question what is lost and gained by data reduction.

**Natural language processing (30 h Lect + 30 h Lab) **

This course aims to introduce issues concerning collecting and processing information stored in the form of natural language texts (in Polish, but also in English). It presents the fundamental techniques of natural language processing (NLP) and introduces some current NLP research issues and typical application. In particular, problems concerning the description of natural language utterances and methods of morphological, syntactic and semantic analysis are presented. Different language processing techniques of shallow and deep processing of linguistic data are covered: using regular expressions, context-free grammars and unification to describe linguistic phenomena, as well as more robust approaches based on statistical and machine learning methods. They will be introduced in the context of specific applications, i.e. information extraction, automatic summarization, question answering, dialog systems and machine translation.

**Advanced methods in Cryptograhy (30 h Lect + 30 h Lab) **

The two-semester course is intended as an introduction to problems of cryptography and its applications. In the second semester we intend to consider pseudorandom bits and sequences, stream and block ciphers (for example DES, triple DES and AES) as well as hash functions (for example hash functions from the family SHA). We shall discuss linear codes and secret sharing schemes.

**3rd year, semester 5: **

**Modern metaheuristics and their applications (30 h Lect + 30 h Lab) **

The purpose of the lecture is to present the current state of Nature-inspired algorithms and techniques used to solve combinatorial optimization problems, machine learning problems and problems related to simulation of behaviour of complex systems. We start from considering relatively classical techniques such as evolutionary algorithms, simulated annealing, tabu search, cellular and learning automata. Next, we will move to such techniques as artificial immune systems, particle swarm computation and differential evolution. Finally, we will present recent algorithms and techniques such as generalized extremal optimization, gene expression programming and hyperheuristics. We will discuss different applications of metaheuristics, in particular to solve problems of routing in transportation systems and scheduling of tasks in grid and cloud computing systems, to solve management problems in wireless and mobile systems, to design cryptography systems and detect anomalies or attacks in computer networks or to evolve strategies in games.

**Web scale data mining and processing (30h Lect + 30h Lab) **

The lecture will present selected web mining techniques, focussing the processing of documents in terms of their mass availability. A large number of documents requires on the one hand, the application of algorithms that are as simple and efficient as possible (in order to reduce the processing time). On the other hand, it appears that a large number of documents reduces the ambiguity to the extent allowing simple techniques to be sufficient to obtain satisfactory results of the processing. The lecture is based on our own software solutions for web mining, in particular, the grouping of documents and extraction of interesting information.

**Social Informatics (30h Lec + 30h Lab) **

The course is devoted study of the design, deployment and uses of information and communication technologies (ICT) that account for their interaction with institutional and cultural contexts, including organizations and society. Many examples of advantageous use of its tools such as detailed analysis of DARPA Red Balloon Challenge will be provided. Methods such as using collective wisdom , crowdsourcing and crowdfunding will be discussed. The final part of the lecture will be devoted to recommendation systems and methods of mining social-networks.

**3rd year, semester 6: **

**Information theory (30 h Lect + 30 h Lab) **

The lecture is an introduction to the questions of information theory and statistics that are linked by the problem of universal compression. From this point of view, we review various concepts and facts of probability theory and theoretical computer science that concern lossless compression, ergodic processes, Kolmogorov complexity, maximum entropy modelling, exponential families, Bayesian reasoning, and EM algorithm.

**Analysis of temporal data (30 h Lect + 30 h Lab) **

The aim of the course is to present and thoroughly discuss main advanced methods of representation, modelling and prediction for time series (temporal data), in particular techniques of their indexing (query by content), classification, clustering, anomaly detection and segmentation. Methods of analysis and modelling of financial time series, including model based and resampling based prediction methods together with their accuracy evaluation will be discussed. Final part of the course will be devoted to subjects of possible PhD research, in particular model selection methods for time and change point detection for time series data.

**Advanced data analysis and software development with R and SAS **

Main features of R and SAS packages for data analysis and graphics will be discussed. Moreover, students will learn how to handle data effectively in both environments, write applications in SAS/SQL and R and avail themselves to multitude of functions and packages provided by both systems. Important issues such as importing programmes written in other languages, debugging and time efficiency will be addressed as well as multivariate data presentation, handling and inference.

**From the 2nd to the 6th semester (inclusive): **

**PhD seminar (10 hours)**

PhD Seminar is a series of individual meetings with the supervisor according to the approved schedule of the doctoral thesis. Seminar is aimed at providing consultations with the supervisor, during which the progress and problems connected with the implementation of the PhD thesis are supposed to be discussed. Also the seminar may cover the delivering of speech by the student about partial results, achieved within the framework of the PhD thesis.