AID101 Fundamentals of Data Science
Referencing Curricula Print this page
Course Code | Course Title | Weekly Hours* | ECTS | Weekly Class Schedule | ||||||
T | P | |||||||||
AID101 | Fundamentals of Data Science | 3 | 2 | 6 | ||||||
Prerequisite | None | It is a prerequisite to | None |
|||||||
Lecturer | Fahir Kanlic | Office Hours / Room / Phone | ||||||||
fkanlic@ius.edu.ba | ||||||||||
Assistant | Assistant E-mail | |||||||||
Course Objectives | The course will equip students with theoretical and practical knowledge, including technical skills related to data science as a rapidly growing field by using popular programming language. It will introduce students to the latest concepts, principles and tools of data science (including data types, data structures, data manipulation techniques), data modelling, data visualization, machine learning algorithms and techniques, etc. The course emphasizes a hands-on approach to learning data skills, offering a number of interactive exercises by using real-life datasets from a variety of disciplines with the aim of applying many techniques and concepts. By the end of the course, students will improve their theoretical and practical knowledge. Students will gain the skills to apply specific analytics tools and interpret solutions to many problems. By the end of this course, students will be able to: 1. Understand the main steps and challenges of a data science project 2. Apply appropriate methods and tools to collect, clean, explore, and visualize data 3. Perform basic statistical analysis and hypothesis testing on data. 4. Implement and evaluate common machine learning algorithms for classification and regression 5. Communicate and present data science results effectively and responsibly |
|||||||||
Textbook | 1. Grus, J. 2019, Data Science from Scratch: First Principles with Python, 2nd edition, O'Reilly Media 2. McKinney, W. 2017, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, 2nd edition, O’Reilly Media | |||||||||
Additional Literature |
|
|||||||||
Learning Outcomes | After successful completion of the course, the student will be able to: | |||||||||
|
||||||||||
Teaching Methods | Combination of lectures (theory and explaining the background of the topic) and practical exercises (practical work by programming and practicing by using the learned algorithms to a real-world dataset) | |||||||||
Teaching Method Delivery | Face-to-face | Teaching Method Delivery Notes | ||||||||
WEEK | TOPIC | REFERENCE | ||||||||
Week 1 | Introduction to Data Science | Define data science and its applications - Explain the data science workflow - Identify the types and sources of data - Install and use Python and Jupyter Notebook | |||||||||
Week 2 | Data Collection and Wrangling | Use Python libraries to read, write, and manipulate data - Perform data cleaning and preprocessing - Handle missing values and outliers - Apply web scraping and APIs to collect data from the web | |||||||||
Week 3 | Data Exploration and Visualization | Use descriptive statistics to summarize data - Use Python libraries to create various types of plots - Explore the distribution, correlation, and relationship of data - Apply dimensionality reduction techniques to reduce data complexity | |||||||||
Week 4 | Statistical Inference and Hypothesis Testing | - Understand the concepts of population, sample, parameter, and statistic - Apply sampling methods and calculate sampling errors - Perform hypothesis testing and confidence intervals - Interpret p-values and significance levels | |||||||||
Week 5 | Linear Regression | - Understand the concept of linear regression and its assumptions - Implement simple and multiple linear regression using Python - Evaluate the performance and accuracy of linear regression models - Identify and handle multicollinearity, heteroscedasticity, and non-linearity | |||||||||
Week 6 | Logistic Regression | - Understand the concept of logistic regression and its applications - Implement logistic regression using Python - Evaluate the performance and accuracy of logistic regression models - Use confusion matrix, ROC curve, and AUC to measure classification performance | |||||||||
Week 7 | Classification Algorithms | - Understand the concept of classification and its applications - Implement k-nearest neighbors (KNN), decision trees, and random forests using Python - Compare the advantages and disadvantages of different classification algorithms - Tune hyperparameters and optimize classification models | |||||||||
Week 8 | Clustering Algorithms | - Understand the concept of clustering and its applications - Implement k-means, hierarchical clustering, and DBSCAN using Python - Compare the advantages and disadvantages of different clustering algorithms - Evaluate clustering results using internal and external metrics | |||||||||
Week 9 | MID-TERM | |||||||||
Week 10 | Association Rule Mining | - Understand the concept of association rule mining and its applications - Implement Apriori algorithm using Python - Interpret association rules using support, confidence, lift, and leverage - Apply association rule mining to market basket analysis | |||||||||
Week 11 | Text Mining | - Understand the concept of text mining and its applications - Perform text preprocessing using Python (tokenization, stemming, lemmatization) - Apply bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF) to represent text data - Implement sentiment analysis using Python | |||||||||
Week 12 | Natural Language Processing (NLP) | - Understand the concept of natural language processing (NLP) and its applications - Apply regular expressions to extract information from text data - Implement named entity recognition (NER) using Python - Use word embedding to capture semantic meaning of words | |||||||||
Week 13 | Neural Networks | - Understand the concept of neural networks and its applications - Explain the structure and components of a neural network (input layer, hidden layer, output layer) - Implement a simple neural network using Python (feedforward propagation, backpropagation) - Use activation functions, loss functions, optimization algorithms in neural networks | |||||||||
Week 14 | Deep Learning |- Understand the concept of deep learning and its applications - Explain the difference between shallow and deep neural networks - Implement convolutional neural networks (CNNs) using Python for image recognition - Implement recurrent neural networks (RNNs) using Python for sequence modeling - Use TensorFlow and Keras to build and train deep learning models | |||||||||
Week 15 | Data Science Ethics and Social Issues | - Understand the ethical and social issues of data science, such as privacy, fairness, accountability, and transparency - Identify and address potential biases and harms in data collection, analysis, and use - Apply ethical principles and frameworks to data science projects - Communicate and present data science results in a responsible and trustworthy manner |
Assessment Methods and Criteria | Evaluation Tool | Quantity | Weight | Alignment with LOs |
Final Exam | 1 | 30 | ||
Semester Evaluation Components | ||||
Midterm | 1 | 30 | ||
Quizzes | 2 | 10 | ||
Term project and presentation | 1 | 10 | ||
Lab assignments | 10 | 20 | ||
*** ECTS Credit Calculation *** |
Activity | Hours | Weeks | Student Workload Hours | Activity | Hours | Weeks | Student Workload Hours | |||
Lecture hours | 3 | 14 | 42 | Assignments | 2 | 10 | 20 | |||
Active labs | 2 | 14 | 28 | Home study | 1 | 14 | 14 | |||
In-term exam study | 11 | 1 | 11 | Final exam study | 11 | 1 | 11 | |||
Term project/presentation | 2 | 12 | 24 | |||||||
Total Workload Hours = | 150 | |||||||||
*T= Teaching, P= Practice | ECTS Credit = | 6 | ||||||||
Course Academic Quality Assurance: Semester Student Survey | Last Update Date: 08/04/2024 |