Artificial intelligence and data mining for medical applications

Instructor: Prof. Matjaž Kukar

Prerequisites: Basic knowledge of probability and statistics, programming, machine learning, and algorithms is required. Knowledge of data analysis and visualization in Python (pandas, sklearn, matplotlib) is recommended, but not strictly required.

Class schedule

  • March 20, 2019 - h: 9-13 - Meeting Room, Building M.
  • March 27, 2019 - h: 9-13 - Meeting Room, Building M.
  • March 29, 2019 - h: 9-13 - Meeting Room, Building M.
  • April 3, 2019 - h: 9-13 - Meeting Room, Building M.

Course duration

Lectures: 8 hours    
Tutorials (practical work in classroom): 8 hours

Course syllabus

The syllabus is based on a selection of modern data mining techniques, experiences and good practices for creating and deploying AI-based solutions for medical problems. In lectures we will review the growing need for healthcare improvements, especially in populous and developing countries. We will focus on specific problems of prediction (diagnostics) in medical application, and examine the points in which it differs from theoretical approaches and applications. Through practical examples and experiences we will evaluate specific situations that differ from the more usual data mining scenarios.
In practical sessions the gained knowledge will be applied to particular data analysis task using open source tools. Student will investigate and solve assignments, based on real-world anonymized medical problems.

  1. Introduction and motivation for using AI in medicine. Typical applications, scenarios and goals. Roles of information systems.
  2. Experimental design and possible issues. STARD and REMARK guidelines. CRISP-DM.
  3. Data acquisition. Encoding diagnoses (ICD) and health measurements (LOINC)
  4. Different types of data: parameterized health measurements (tests), text in natural language, images. The role of natural language processing.
  5. Specific measures for ensuring data quality and detecting possible problems.
  6. Preferred measures of data mining success in medical applications.
  7. Requirements for practical deployment: assessment of uncertainty and explanation of particular predictions.
  8. Deployment of AI-based tools for medical purposes. Ethical issues. Classes and certification of medical devices.
  9. Legislation, GDPR. Privacy, anonymization/pseudonymization and cryptographic approaches.

Objectives and competences

The goal of the course is the students to become acquainted with data mining tools and approaches for solving problems which are difficult or unpractical to tackle with other methods. Students will be able to apply and critically appraise the gained knowledge on real-world medical problems and scenarios. The students shall be able to decide which of the presented techniques should be used for a given problem, to critically evaluate data quality and to develop a prototype solution.

Intended learning outcomes

  • Knowledge and understanding: knowledge of specific requirements, limitations and methods for medical data acquisition, processing and utilization of results (data mining predictive models).
  • Application: the application of the presented methods within open-source data analysis processing tools.
  • Reflection: understanding the suitability of different analytic techniques for specific medical problems, their strengths and weaknesses, understanding technical limitations and ethical dilemmas.
  • Transferable skills: understanding and solving complex problems. Critical reflection of different analytical techniques. Evaluation of data quality and results. Use of analytical tools and information technology.


At the end of the course, students will have the opportunity to take an examination to formally pass the course. The examination will consist of a written part with theoretical and practical questions.


  1. Lang, TA, Altman, DG. Basic statistical reporting for articles published in biomedical journals: the "Statistical Analyses and Methods in the Published Literature". Int J Nurs Stud. 2015, 52(1):5-9
  2. Linusson, Henrik, Ulf Johansson, Henrik Boström and Tuve Löfström. “Reliable Confidence Predictions Using Conformal Prediction.” PAKDD (2016): 77-88.
  3. Shafer, Glenn and Vladimir Vovk. “A Tutorial on Conformal Prediction.” Journal of Machine Learning Research 9 (2008): 371-421.
  4. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should I trust you? Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  5. Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665
  6. EU General Data Protection Regulation (GDPR). Accessed: 12. 2. 2019
  7. Pandas,, Accessed: 12. 2. 2019
  8. Matplotlib,, Accessed: 12. 2. 2019
  9. Scikit-learn,, Accessed: 12. 2. 2019