Data Science
Accreditations
Check here the detailed study plan
Note: For the 4 mandatory Units lectured in the 1st semester (Big Data Management, Data Science Methodologies and Technologies, Prediction Models, and Pattern Recognition), there is the possibility for international students to enroll in these Units, which implies that there is a possibility that these Units might be lectured in the English language.
Programme Structure for 2024/2025
Curricular Courses  Credits  

Data Driven Strategy Optimization
6.0 ECTS

Scholar Group > Common Branch  6.0 
Time Series Analysis and Forecasting
6.0 ECTS

Scholar Group > Common Branch  6.0 
Deep Learning for Computer Vision
6.0 ECTS

Scholar Group > Common Branch  6.0 
Bayesian Modelling
6.0 ECTS

Scholar Group > Common Branch  6.0 
Big Data Processing and Modeling
6.0 ECTS

Scholar Group > Common Branch  6.0 
Text Mining for Data Science
6.0 ECTS

Scholar Group > Common Branch  6.0 
Advanced Network Analysis
6.0 ECTS

Scholar Group > Paths > Holders of a 1st Cycle in Data Science or related  6.0 
Advanced Distributed Databases
6.0 ECTS

Scholar Group > Paths > Holders of a 1st Cycle in Data Science or related  6.0 
Business Analytics Fundamentals
6.0 ECTS

Scholar Group > Paths > Holders of a 1st Cycle in Data Science or related  6.0 
Big Data Management
6.0 ECTS

Scholar Group > Paths > Holders of 1st Cycle in Other Areas  6.0 
Data Science Methodologies and Technologies
6.0 ECTS

Scholar Group > Paths > Holders of 1st Cycle in Other Areas  6.0 
Prediction Models
6.0 ECTS

Scholar Group > Paths > Holders of 1st Cycle in Other Areas  6.0 
Pattern Recognition
6.0 ECTS

Scholar Group > Paths > Holders of 1st Cycle in Other Areas  6.0 
Ciberlaw
6.0 ECTS

Scholar Group > Common Branch  6.0 
Project Design for Data Science
6.0 ECTS

Scholar Group > Common Branch  6.0 
Dissertation in Data Science
42.0 ECTS

Final Work  42.0 
Master Project in Data Science
42.0 ECTS

Final Work  42.0 
Data Driven Strategy Optimization
LG1. Understand datadriven decisionmaking
LG2. Learn to use dynamic optimization and reinforcement learning algorithms adequately
LG3. Apply and evaluate reinforcement learning algorithms for real situations
LG4. Gain new knowledge/practice in Python
1. The cost of doing nothing in an organization (datarelated doing)
2. Datadriven strategies
3. Markov process, dynamic optimization, Bellman Equation
4. AI basic principles
5. Environment, agents, strategy, actions, loss and gains, experiencebased learning
6. Reinforcement learning algorithms: Qlearning, MultiArmed Bandits, value and Policy Iteration
7. Optimize decisionmaking
8. Case studies
Weekly quiz (4x5%)  20% final grade
Group work/project with individual oral presentation  80% (70% + 10%) final grade
Requiring minimum 10 points to get approval
Title: Osborne, P., Singh, K., Taylor, M., Applying Reinforcement Learning on RealWorld Data with Practical Examples in Python, 2022, .,
Richard S. Sutton and Andrew G. Barto / The MIT Press., Reinforcement Learning. An Introduction, 2018, ·, ·
·
Authors:
Reference:
Year:
Title: Enes Bilgin, Mastering Reinforcement Learning with Python, 2020, .,
Chan, L., Hogaboam, L., Cao, R., Applied Artificial Intelligence in Business, 2022, .,
Authors:
Reference:
Year:
Time Series Analysis and Forecasting
At the end of this learning unit's term, the student must be able to:
LG1. Recognize and apply the classical time series models;
LG2. Recognize and apply ARIMA and GARCH models;
LG3. Recognize and apply multivariate time series models;
LG4. Recognize and apply Machine Learning algorithms (neural networks) for time series forecasting /trading.
LG5. Basic programming and computation with R and Python
LG6. Application of the studied concepts: information and value extraction from realworld data.
P1. Time series (2 lectures)
P1.1. Basic concepts
P1.2. Trends and seasonality
P2. Introduction to univariate stochastic time series models (4 lectures)
P2.1. Stationarity, unit root tests
P2.2. ARMA/ARIMA models
P2.3. Residuals assumptions, diagnoses tests
P2.4. Volatility, risk, ARCH/GARCH models
P2.5. Forecasting, measuring the forecast accuracy
P3. Introduction to multivariate time series models (2 lectures)
P3.1. VAR/VECM models
P3.2. Cointegration analysis and applications
P3.3. Forecasting
P4. Machine (Deep) Learning (6 lectures)
P4.1. Neural networks for time series
P4.2. LSTM, forecasting and trading
P5. Programming/computing with Python
P6. Application of the studied concepts: information and value extraction from realworld data (2 lectures)
The following learning methodologies (LM) will be used:
TM1. Expositional, to the presentation of the theoretical reference frames
TM2. Participative, with analysis of scientific papers
TM3. Active, with the realization of group work;
TM4. Experimental, in computer laboratories, performing analyzes on real data
TM5. Selfstudy, related with autonomous work (AW) by the student, as is contemplated in the Class Planning

The periodic evaluation includes the realization of:
a) An individual test (60%).
b) A team work (40%).
The periodic evaluation requires that students attend at least 80% of classes. The test is covering the entire topics.
In this type of evaluation, the students have to achieve a minimum grade of 8,5 in the individual test and of 10 in the team work. Otherwise the students should do a final exam (minimum approval score: 10).
Title: Ficheiros (slides e scripts) da UC a disponibilizar no elearning/Fenix
Yves Hilpisch (2018), Python for Finance, 2nd Edition, O.Reilly Media, Inc.
Tarek A. Atwan, (2022), Time Series Analysis with Python Cookbook, Packt Publishing.
Mills, T.C. (2019), Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting, Academic Press, Elsevier Inc.
Brooks, C., (2019), Introductory econometrics for finance, 4nd ed., Cambridge University Press.
Authors:
Reference:
Year:
Title: Edward Raff, (2022), Inside Deep Learning: Math, Algorithms, Models, Manning Publications Co.
Louis Owen, (2022), Hyperparameter Tuning with Python, Packt Publishing.
James Ma Weiming, (2019), Mastering Python for Finance: Implement advanced stateoftheart financial statistical applications using Python, 2nd Edition, Packt Publishing.
Juselius, K., (2006), The Cointegrated VAR Model: Methodology and Applications, Oxford University Press.
Authors:
Reference:
Year:
Deep Learning for Computer Vision
O1: To know the basic digital image formation process
O2: To represent an image in different color spaces and in the frequency domain
O3: To perform typical image processing operations
O4: To extract lowlevel characteristics from an image
O5: To implement an automatic learning system based on classic algorithms for image content classification
O6: To know the typical architecture of a convolutional neural network (CNN) and to understand how it works
O7: To solve a medium complexity image classification problem CNNs
O8: To apply knowledge transfer and finetuning methodologies based on pretrained CNNs
O9: To use deep learning algorithms for image objects identification
O10: To know deep learning algorithms for automatic generation of multimedia content
O11: To manipulate images using the OpenCV library
O12: To use the Tensorflow library to develop machine learning applications
C1  Image acquisition and representation
C2  Image operation
C3  Extraction of image features
C4  Introduction to machine learning
C5  Artificial neural networks
C6  Convolutional neural networks
C7  Knowledge transfer
C8  Network architectures for detecting and identifying image objects
C9  Network architectures for automatic content generation
Given the imminently practical nature of the course there is no exam assessment modality  there are only continuous or periodic assessment modalities.
Continuous Assessment (implies attendance to at least 2/3 of the classes)
 Participation in class (20%)  individual, evaluated based on exercises performed during the classes;
 Challenges (20%)  group work, carried out "at home";
 Project (60%)  carried out in a group, includes a report and is subject to an oral discussion with individual evaluation.
Periodic Assessment (mainly intended for those who cannot attend classes)
 Practical test (40%)  individual, held at the end of the academic term;
 Project (60%)  individual or group, includes a report and is subject to an oral discussion with individual evaluation.
All components have a minimum grade of 8 (out of 20) values.
Regardless of the modality followed, the grade for the "Project" component is limited by the individual performance demonstrated in the oral discussion, according to the following rule:
 Very good performance – no limit;
 Good performance – max. of 16 (out of 20) values;
 Sufficient performance – max. of 12 (out of 20) values;
 Poor performance – failed the course.
Title: Tomás Brandão, Materiais da UC disponibilizados na plataforma de elearning, 2023, , 
J. Howse, J. Minichino, Learning OpenCV 4 with Python 3, 3rd Edition, Packt Publishing, 2020, , 
M. Elgendy, Deep Learning for Vision Systems, Manning, 2020, , 
Authors:
Reference:
Year:
Title: M. Nixon, A. Aguado, Feature Extraction and Image Processing for Computer Vision, 4th Edition, Academic Press, 2019, , 
I. Goodsfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, , 
Vários, Tutoriais e documentação da bibliotecas OpenCV, , , https://opencv.org/
Vários, Tutoriais e documentação da biblioteca Tensorflow, , , https://www.tensorflow.org/
R. Szeliski, Computer Vision: Algorithms and Applications, 2nd Edition, Springer, 2021, , https://szeliski.org/Book/
F. Chollet, Deep Learning with Python, 2nd Edition, Manning, 2021, , 
Authors:
Reference:
Year:
Bayesian Modelling
OA1. Understand the basic concepts of Bayesian modelling
OA2. Apply Bayesian regression, classification and optimization models to support decision making
OA3. Apply the Bayesian approach to statistical learning
1. Bayes Theorem and Bayesian paradigm
2. Graphical and hierarchical models
3. Bayesian inference
4. Bayesian optimization
5. Bayesian regression and classification
6. Bayesian latent factor models
The following learning methodologies (LM) will be used:
LM1. Expositional, to the presentation of the theoretical reference frames;
LM2. Participative, with analysis and solution of exercises;
LM3. Active, with the realization of group work; LM4. Experimental laboratory, with development and operation of computer ?models?;
LM5. Selfstudy, related with autonomous work (AW) by the student, as is contemplated in the Class Planning.

Students may choose either Periodical Evaluation or Final Exam.
PERIODICAL EVALUATION:
 group work with minimum grade 8 (50%)
 individual test with minimum grade 8 (50%)
Approval requires a minimum grade of 10.
EXAM EVALUATION:
The Exam Evaluation, in any of the moments legally determined, is a written exam (weight 100%), with a minimum grade of 10 to pass.
Title: Códigos R / python
Vários artigos científicos
Slides aulas
Reich, B. J., S. K. Ghosh (2019), Bayesian Statistical Methods, Boca Raton: Chapman and Hall/CRC
McElreath, R. (2020), Statistical Rethinking: A Bayesian Course with Examples in R and Stan, CRC Press.
Levy, R., Mislevy, R. J. (2016), Bayesian Psychometric Modeling, 1st Edition. Boca Raton: Chapman and Hall/CRC
Kruschke, J. K. (2015), Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
Authors:
Reference:
Year:
Title: Durr, O., B. Sick (2020), Probabilistic deep Learning, Manning Publications Co.
Theodoridis, S. (2020),Machine Learning: A Bayesian and Optimization Perspective, Elsevier Ltd.
Martin, O., R. Kumar, J. Lao (2022), Bayesian Modeling and Computation in Python, CRC Press.
Heard, N. (2021), An Introduction to Bayesian Inference, Methods and Computation, Berlin: Springer Cham.
Albert, J., H. Jingchen (2020), Probability and Bayesian Modeling, Boca Raton: CRC Press/Taylor & Francis Group.
Authors:
Reference:
Year:
Big Data Processing and Modeling
At the end of this course, students should be able:
OA1: to know and understand the principal big data processing platforms
OA2: to understand and know how to apply distributed programming / computing models
OA3: to understand and know the stages (pipeline) of a machine learning big data project
OA4: to apply supervised or unsupervised learning techniques to large scale problems
OA5: to understand what is Deep Learning and its techniques
OA6: to understand and know how to apply techniques for processing data streams in realtime
CP1: Big data platforms
CP2: Machine learning for big data
CP3: Large scale supervised/unsupervised learning
CP4: Introduction to deep learning
CP5: Data stream analysis
Assessment can be performed in one of the following modes:
[1] Periodic assessment, comprising:
 one written test weighting 60% on the final score, with a minimum score of 8 out of 20 to obtain approval in the UC;
 one project (in groups), weighting 40% on the final score.
[2] Final exam consisting of a theory and practice parts to be carried out at IscteIUL (see mandatory details on the Observation's field).
Title:  Mining of Massive Datasets, A. Rajaraman, J. Ullman, 2011, Cambridge University Press.
 Big Data: Algorithms, Analytics, and Applications, KuanChing Li et al., Chapman and Hall/CRC, 2015.
 Learning Spark: LightningFast Big Data Analysis, Holden Karau, A. Konwinski, P. Wendell and M. Zaharia, O'Reilly Media, 2015.
 Understanding Deep Learning, Prince, Simon JD., MIT press, 2023.
 Advanced Analytics with Spark: Patterns for Learning from Data at Scale, Sandy Ryza et al., O'Reilly Media, 2017.
 Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale, Ofer Mendelevitch, Casey Stella and Douglas Eadline, Addisonwesley, 2016.
Authors:
Reference:
Year:
Title:  All of Statistics: A concise course in Statistical Inference, L.Wasserman, Springer, 2003.
 The elements of statistical learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Springer, 2001.
 Deep Learning, Ian Goodfellow and Yoshua Bengio, 2016, MIT Press.
Authors:
Reference:
Year:
Text Mining for Data Science
OA1: Perform tokenization, create dictionaries and perform other processing tasks to prepare text for classification tasks
OA2: Create numerical vector representations from text data
OA3: Build or apply classifiers, such as Naïve Bayes or nearestneighbor
OA4: Perform "tagging" of text data
OA5: Cluster documents using the kmeans algorithm
OA6: Describe basic concepts and methods in text mining, for example document representation, information extraction, text classification and clustering, and topic modeling
OA7: Use available corpora, commercial and opensource text analysis and visualization tools to explore patterns
OA8: Understand conceptually the mechanism of advanced text mining algorithms for information extraction, text classification and clustering, and their applications in realworld problems
OA9: Choose appropriate technologies for specific text analysis tasks, and evaluate the benefit and challenges of the chosen technical solution
Introduction
CP1: Importance of large quantities of text, challenges and current methods
CP2: Unstructured vs. (semi)structured information
CP3: Obtaining and filtering information, information extraction and Data Mining
Document Representation
CP4: Document preprocessing
CP5: Feature extraction: terms as features
CP6: Term weighting schemes
CP7: Vector space models
CP8: Similarity measures
Natural Language Processing
CP9: Language models
CP10: Morphology and partofspeech tagging
CP11: Complex structures: syntactic analysis
CP12: Information extraction
Text Classification
CP13: Introduction to statistical machine learning
CP14: Evaluation
CP15: Generative classifiers
CP16: Discriminative classifiers
CP17: Unsupervised learning
CP18: Text Mining Resources
Case Study
CP19: Sentiment analysis
CP20: Topic classification and identification
This course uses only periodic assessment and does not include exams.
Assessment components:
a) TESTS (2 minitests: 5% each, final test: 40%), taken during the course period;
b) PROJECT (50%).
The TESTS grade can be replaced by a written test to be taken in the assessment period corresponding to the 1st season, 2nd season or special season (Art. 14 of the RGACC).
The PROJECT grade is limited to the TEST grade + 6 points.
Students may improve their grade in the TESTS component by taking a written test during the assessment period corresponding to the 1st season. Students wishing to do so must inform the teachers as soon as the periodic assessment marks are published.
Attendance is not a requirement for approval.
Title: * Machine Learning for Text (2018). Charu C. Aggarwal. https://doi.org/10.1007/978331973531 3
* An Introduction to Text Mining: Research Design, Data Collection, and Analysis 1st Edition (October 11, 2017). Gabe Ignatow, Rada F. Mihalcea. SAGE Publications. https://methods.sagepub.com/book/anintroductiontotextmining
* Speech and Language Processing (3rd ed. draft, 2023), Dan Jurafsky and James H. Martin. Conteúdo disponível em: https://web.stanford.edu/~jurafsky/slp3/
Authors:
Reference:
Year:
Title: * Natural Language Processing for Social Media, Second Edition. Synthesis Lectures on Human Language Technologies. Morgan & Claypool, 2017. Atefeh Farzindar and Diana Inkpen. https://link.springer.com/book/10.1007/9783031021671
* Jacob Eisenstein. Introduction to Natural Language Processing. Adaptive Computation and Machine Learning. The MIT Press, 2019. https://mitpress.mit.edu/9780262042840/introductiontonaturallanguageprocessing/
Authors:
Reference:
Year:
Advanced Network Analysis
After successfully attending the curricular unit, students should be able to:
OA1. Know the fundamental concepts of network science
OA2. Know the essential metrics and methods for describing and analyzing networks
OA3. Know how to use network analysis and visualization software
OA4. Know how to collect data, analyze and model networks
OA5. Know how to analyze diffusion processes in networks
OA6. Implement a network analysis solution to solve a given problem.
CP1. Introduction to the notion of network and Network Science
CP2. Software for network analysis
CP3. Graphs and network metrics
CP4. Static network models
CP5. Power laws and scalefree networks
CP6. Dynamic network models
CP7. Strategic network models
CP8. Processes in networks, percolation, diffusion and research
CP9. Robustness and resilience
CP10. Communities
CP11. Higher order networks and temporal networks
Given the practical nature of the contents, the assessment will encompass a project. Its subject should be aligned with all or part of the syllabus.
Exercises in class (10%).
Project (90%), including teamwork (report and software: 40%, and oral exam: 50%).
All components of the project  proposal, report, software, and oral exam, are mandatory. The minimal classification for each component is 10 on a scale of 0 to 20.
There will be a unique deadline for submitting the project, except for students accepted to the special period of assessment, that will be allowed to submit during that period.
Presence in class is not mandatory.
There is no final exam.
Students aiming to improve their classification can submit a new project in the following scholar year.
Title: Mark Newman , ?Networks?, second edition, Ed. Oxford University Press, 2020
AlbertLaszlo Barabasi, ?Network Science?, Ed. Cambridge University Press, 2016
Available online at http://networksciencebook.com
Authors:
Reference:
Year:
Advanced Distributed Databases
This course aims to enhance students' understanding of distributed database management systems (DBMS). It focuses on providing practical skills in designing, implementing and managing these databases, considering challenges such as replication and fragmentation. The curricular unit highlights the importance of guaranteeing the consistency and durability of data in distributed environments, as well as the efficient integration of multiple databases. Finally, it seeks to encourage students to have a critical and analytical view of future trends and innovations in this field.
1. Introduction to Distributed Database Management Systems (DBMS)
2. Distributed Database Project
3. Distributed Data Control
4. Distributed Transaction Processing
5. Data Replication
6. Database Integration
Given its eminently practical nature, the UC does not provide an assessment by exam.
Therefore, the evaluation will take place in the following ways:
1st season:
1. [60%] Group work with individual presentation and discussion* (min. 10 points)
2. [40%] Written test (min. 8 marks)
* individual discussion is decisive, as poor performance may result in failure in the UC, regardless of the quality of the group work delivered.
2nd season and Special Season:
3. [60%] Individual work without presentation or discussion (min. 10 points)
4. [40%] Written test (min. 8 points)
Title: • M. Tamer Ozsu and Patrick Valduriez. (2019). Principles of Distributed Database Systems (4th. ed.). Springer Publishing Company, Incorporated.
• White, Tom. (2015). Hadoop: The Definitive Guide (4th. ed.). O'Reilly Media, Inc. ISBN: 9781491901632
Authors:
Reference:
Year:
Title: • Moniruzzaman, A B M & Hossain, Syed. (2013). NoSQL Database: New Era of Databases for Big data Analytics  Classification, Characteristics and Comparison. Int J Database Theor Appl. 6.
• Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. (2006). Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation  Volume 7 (OSDI '06). USENIX Association, USA, 15.
Authors:
Reference:
Year:
Business Analytics Fundamentals
LO1. Understand how to use big data and data analytics to outcompete traditional companies in their industries.
LO2. Learn effective data visualization and elaborate efficient reports.
LO3. Develop soft skills, including Teamwork and Collaboration, Communication, Agile and Critical Thinking.
P1. Datadriven decision making.
P2. Types of Analytics.
P3. Data visualization.
P4. Effective Business Presentation / communication; ability to explain complex analytical models and results.
P5. Power BI Analytics Platform.
1st Sitting:
Individual work with digital presentation and oral discussion (100%; minimum of 10 points in the final classification).
(LO 1, 2, 3)
2nd Sitting:
Individual work with digital presentation and oral discussion (100%; minimum of 10 points in the final classification).
(LO 1, 2, 3)
Scale: 020 points.
Title: Aspin, A., Pro Power BI Desktop: SelfService Analytics and Data Visualization for the Power User, 2020, 3rd ed. Edition, Apress.,
Microsoft, Microsoft Learn Power BI, n.a., Microsoft, https://learn.microsoft.com/enus/training/powerplatform/powerbi
Albright, S. & Winston, W., Business Analytics: Data Analysis & Decision Making, 2019, 7th Edition, SouthWestern College Pub,
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. & Silipo, R., Guide to Intelligent Data Science: How to Intelligently Make Use of Real Data, 2020, 2nd Edition, Springer International Publishing,
Knaflic, C. N., Storytelling com dados: um Guia Sobre Visualização de Dados Para Profissionais de Negócios, 2019, Alta Books,
Authors:
Reference:
Year:
Title: McCandless, D., Knowledge is Beautiful, 2014, William Collins,
Bahga, A. & Madisetti, V., Big Data Science & Analytics: A HandsOn Approach, 2016, VPT,
Meier, M., Baldwin, D., & Strachnyi, K., Mastering Tableau 2021: Implement advanced business intelligence techniques and analytics with Tableau, 2021, 3rd Edition, Packt.,
Authors:
Reference:
Year:
Big Data Management
1. Manipulate NoSQL Databases using JSON;
2. Implement distributed and faulttolerant data storage solutions;
3. Data migration between Databases;
4. Design and extract information from a multidimensional Data Warehouse;
5. Develop soft skills, namely Problem Solving, Teamwork and Collaboration and Critical Observation (achieved via assessment process).
1. Relational Databases revision and Advanced (aggregated) SQL Queries in Mysql;
2. Introduction to No SQL Databases of databases implementation in MongoDB;
3. Mapping between Relational Databases and Document Databases;
4. Data extracting using JSON;
5. Redundancy and Data Distribution to manage fault tolerance and large information volume;
6. Data migration between different storage systems;
7. Introduction to data warehouse technology;
8. Data processing and integration to populate a Data Warehouse;
9. Information Extraction from a Data Warehouse (Querying and Reporting).
The UC can be done by exam (1st or 2nd Season) or by periodic evaluation.
Periodic assessment consists of a test (50%), with a minimum grade of 7, and a group project (50%), with the test date coinciding with the 1st Season exam date.
Title: MongoDb Homepage[Text Wrapping Break]Golfarelli, M., Rizzi, S., Data Warehouse Design: Modern Principles and Methodologies, McGrawHill Osborne Media; 1st Edition, May 26, 2009.
Damas, L. SQL  Structured Query Language " FCA Editora de Informática, 2005 (II);
Date, C.J. "An introduction to Database Systems" AddisonWesley Publishing Company, sexta edição, 1995 (I.2, I.3, I.4, II);
NoSQL Database: New Era of Databases for Big data Analytics  Classification, Characteristics and Comparison, A B M Moniruzzaman,?Syed Akhter Hossain, 2013 (https://arxiv.org/abs/1307.0191)
Authors:
Reference:
Year:
Title: 
Authors:
Reference:
Year:
Data Science Methodologies and Technologies
At the end of the Unit, the student will be able to:
OA1. Plan Data Science projects, according to the problem?s context
OA2. Execute and control Data Science projects
OA3. Autonomously develop a critical mindset to choose the best approach to address solutions for real world problems that encompass data preparation, modeling, and results? evaluation
CP1: Introduction to Data Science and Methodologies;
CP2: Identification of problem types and approaches (supervised and unsupervised learning);
CP3. Concepts of extraction and data preparation;
CP4. Modeling and evaluation;
Neural Networks
Feedforward Networks
Backpropagation Algorithm
Hyperparameter Optimization
Applications with R: regressiona and classification
Study cases
CP5. Business Intelligence & Analytics applications;
CP6. Data Science technologies.
1st evaluation period: work group with presentation and individual discussion, 100% of the final grade (10 val. minimum)
2nd evaluation period: individual work, 100% of the final grade (10 val. minimum)
Title: Roiger, R. J. (2020). Just enough R! An interactive approach to machine learning and analytics. CRC Press
Boehmke, B.; Greenwell, R. (2020). Hands ? on Machine Learning with R. CRC Press
Sharda, R., Delen, D., Turban, E., Aronson, J., & Liang, T. P. (2014). Business Intelligence and Analytics: Systems for Decision Support(Required). Prentice Hall.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Provost, F., & Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and dataanalytic thinking. " O'Reilly Media, Inc.".
Authors:
Reference:
Year:
Title: Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISPDM 1.0 Stepbystep data mining guide.
Authors:
Reference:
Year:
Prediction Models
LG1: Understanding data analytics: scopes of application and procedures
LG2: Perform data analytics using R
LG3: Evaluate and interpret the data analytics results
Introduction to Machine Learning: supervised methods to prediction and classification.
PC1: INTRODUCTION
1.1. Regression Problems
1.2. Classification Problems
1.3. Training and Test Sets
1.4. Cross Validation
PC 2: Linear Regression
2.1. Simple Linear Regression
2.2. Multiple Linear Regression
2.3. Applications with R
PC3: Logistic Regression
3.1. Simple Logistic Regression
3.2. Multiple Logistic Regression
3.3. Applications with R
PC4: Decision Treebased Methods
4.1. Decision Trees Algorithms Construction
4.2. Performance Improvement: Bagging and Boosting
4.3. Classification and Regression Trees (CART) Algorithm
4.4. Random Forests
4.5. Applications with R
1st evaluation period
Work group with presentation and individual discussion (100% of the final grade)
 Approval: minimum grade=10 points
2nd evaluation period
Individual work (100% of the final grade)
 Approval: minimum grade=10 points
Title: Hastie, T.; Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.
Berk, R.A. (2017). Statistical Learning from a Regression Perspective. 2nd ed. Springer.
Boehmke, B.; Greenwell, R. (2020). Hands ? on Machine Learning with R. CRC Press.
Authors:
Reference:
Year:
Title: Larose, D., Larose, C. (2015). Data Mining and Predictive Analytics. John Wiley & Sons.
Bradley, E.; Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence and Data Science. Cambridge University Press.
Burger, S. V. (2018). Introduction to Machine Learning with R. O´REILLY.
Roiger, R. J. (2020). Just enough R! An interactive approach to machine learning and analytics. CRC Press.
Authors:
Reference:
Year:
Pattern Recognition
LG1: Understanding unsupervised data analytics
LG2: Use R for unsupervised data analytics
LG3: Evaluate, validate and interpret the results
PC1: Introduction to unsupervised learning methods
PC2: Principal component analysis (PCA)
 Main concepts and steps
 Examples using R
PC3: Non probabilistic clustering techniques
 Hierarchical methods
 Partitioning methods
 Examples using R
PC4. Probabilistic clustering techniques:
 The EM algorithm
 Mixture models
 Latent class models
 Examples using R
PC5: Association rules
 Frequent items and association rules
 Apriori algorithm
 Examples using R
Students may choose either Periodical Evaluation or Final Exam.
PERIODICAL EVALUATION:
 group work with minimum grade 8 (50%)
 individual test with minimum grade 8 (50%)
Approval requires a minimum grade of 10.
EXAM:
The Final Exam is a written exam. Students have to achieve a minimum grade of 10 to pass.
Title: James, G., Witten, D., Hastie, T., Tibshirani, R. (2013), An introduction to statistical learning: with applications in R, New York: Springer.
Hastie, T., Tibshirani, R., Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E. (2014), Multivariate Data Analysis, 7th Edition, Essex, UK: Pearson Education.
Authors:
Reference:
Year:
Title: Wedel, M., Kamakura, W. A. (2000), Market Segmentation. Conceptual and Methodological Foundations (2nd edition), International Series in Quantitative Marketing. Boston: Kluwer Academic Publishers.
McLachlan, G. J., Peel, D.(2000), Finite Mixture Models. New York: John Wiley & Sons.
Lattin, J., D. Carroll e P. Green (2003), Analyzing Multivariate Data, Pacific Grove, CA: Thomson Learning.
Jolliffe, I. (1986), Principal Component Analysis. New York: SpringerVerlag.
Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) (2016), Handbook of Cluster Analysis, Handbooks of Modern Statistical Methods. Boca Raton: Chapman & Hall/CRC.
Aggarwal, C. C., Reddy, C. K. (eds.) (2014), Data Clustering: Algorithms and Applications. Boca Raton: CRC Press.
Authors:
Reference:
Year:
Ciberlaw
This CU aims to raise the students? awareness about the relevance of the principles and rules applicable to ICT uses, their significance as an expression of the values that businesses, markets and technological progress itself should accommodate, while seeking to promote students? knowledge acquisition and encouraging their critical perspectives, combining theory and practice relying on analysis and discussion of case studies.
Introduction: THE TIPC and the sources of national law. Importance of European policies. Constitutional principles, freedoms and rights in the 'software age'. CyberSecurity Law. Computer programs: Related rights. Protection of personal data and privacy: the EU General Data Protection Regulation and the Enforcement Act. Emerging challenges: big data, information quality, cybercrime and algorithmic decision making. Meaning of crisis management. Ethics and mechanisms of criminal participation.
The evaluation shall be carried out on the basis of two individual research papers, in which one of them is submitted by oral presentation in the form to be defined (80%). Active participation in classes will be positively valued in the final classification (20%).
BibliographyTitle: Gonçalves, Maria Eduarda, ?Tensões entre a liberdade de informação e a propriedade intelectual na era digital? in Jorge Bacelar Gouveia e Heraldo de Oliveira Silva (coords.), I Congresso LusoBrasileiro de Direito, Coimbra, Almedina, 2014, p. 275295.
Gonçalves, Maria Eduarda, ?The EU Data Protection Reform and the Challenges of the Big Data. Remaining uncertainties and ways forward?, Information & Communications Technology Law 26 (2), 2017, p. 126.
Gonçalves, Maria Eduarda, Direito da Informação, Novos direitos e modos de regulação na sociedade da informação, Coimbra, Almedina, 2003 (próxima edição programada para 2019).
Reed, C., Computer Law, 7th Edition, Oxford, Oxford University Press, 2012.
Revista do IDN ? Nação e Defesa, n.º 133, CiberSegurança.
MARTINS, José Carlos Lourenço  Gestão de Segurança da Informação e Cibersegurança nas Organizações: Sistema e método, Sílabas & Desafios, outubro de 2021, isbn:9789898842596.
Authors:
Reference:
Year:
Title: https://link.springer.com/content/pdf/10.1007/s11292022095042.pdf
 https://www.academia.edu/39724415/Protocolo_de_Sa%C3%ADda_pol%C3%ADtica_e_plano_no_contexto_da_trilogia_da_Segurança_da_Informação
 https://www.academia.edu/699096/Do_espectro_de_conflitualidade_nas_redes_de_informacao_por_uma_reconstrucao_conceptual_do_terrorismo_no_ciberespaco
 https://www.academia.edu/40494857/Segurança_da_informação_e_cibersegurança_aspetos_práticos_e_legislação
 https://www.academia.edu/699210/CONTRIBUTO_PARA_ESTUDOS_DE_INTELLIGENCE_SOBRE_OS_SETE_ESPAÇOS_DE_CONFLITO_POR_UM_MODELO_HOLÍSTICO_DE_ANÁLISE
LEVITT, Steven D., DUBNER Stephen J. ? Freakconomics, Penguin, 2005.
LINDSTROM, Martin ? Brandwashed, 1.ª ed. Gestão Plus, 2012
GLEICK, James ? Informação, 1.ª ed. Círculo Leitores, 2012.
AYRES, Ian ? Super Crunches, 1.ª ed. Academia do Livro, 2010.
Bibliografia complementar / Complementary Bibliography
Authors:
Reference:
Year:
Project Design for Data Science
OA1. Skill acquisition to define a specific research problem
OA2. Skill acquisition to identify a suitable dataset to answer to the proposed research goal
OA3. Skill acquisition to evaluate and critically discuss the achieved results in the light of the defined problem
OA4. Skill acquisition to conduct a literature review that enables to position the research problem and its relevance
OA5. Skill acquisition of scientific writing
CP1. Framing the research subject
CP2. Defining the research problem and problem
CP3. Conducting literature review
CP4. Defining the scientific body of knowledge
CP5. Identifying and analysing a relevant data source to the research problema
CP6. Critically analyzing the results in Data Science
CP7. Developing scientific writing
1st and 2nd season evaluation: Individual writing of a scientific article and its presentation (100%)
BibliographyTitle: Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact. MIS quarterly, 37(2)
Gastel, B., & Day, R. A. (2016). How to write and publish a scientific paper. ABCCLIO.
Authors:
Reference:
Year:
Title: Agarwal, R., & Dhar, V. (2014). Big data, data science, and analytics: The opportunity and challenge for IS research.
Hall, S. (2017, June). Practise makes perfect: developing critical thinking and writing skills in undergraduate science students. In Proceedings of the 3rd International Conference on Higher Education Advances (pp. 10441051). Editorial Universitat Politècnica de València.
Authors:
Reference:
Year:
Dissertation in Data Science
Learning goals (LG):
LG1 Independent scientific thought and originality
LG2 Scientific skills
LG3 Logical coherence and scientific argumentation
LG4 Quality of the presentation
Syllabus contents (SC):
SC1 Formulate the starting question
SC2Identify the relevant literature and elaborate a theoretical and empirical revision
SC3Formulate the research problem and the hypotheses
SC4 Design a study to test the hypotheses
SC5 Carry out the study
SC6Analyse and interpret the results
SC7Elaborate the dissertation plan
SC8Write the dissertation
A panel of judges in public tests will assess the dissertation, after the supervisor's approval of its conclusion and quality to be presented in public tests. Assessment will be based on the scientific merit of the study and on its theoretical and methodological adequacy.
BibliographyTitle: G. Garson (2001), Guide to Writing Empirical Papers, Theses, and Dissertations, Marcel Dekker Inc
N. Bui, Yvonne (2014). How to write a Master's Thesis, Sage Publications, Inc.
Authors:
Reference:
Year:
Title: Punch, F. Keith (2016), Developing effective research proposals, Sage Publications.
Authors:
Reference:
Year:
Master Project in Data Science
Learning goals (LG):
LG1 Independent scientific thought and originality
LG2 Scientific skills
LG3 Logical coherence and scientific argumentation
LG4 Quality of the presentation
Syllabus contents (SC):
SC1 Formulate the starting question
SC2Identify the relevant literature and elaborate a theoretical and empirical revision
SC3Formulate the research problem and the hypotheses
SC4 Design a study to test the hypotheses
SC5 Carry out the study
SC6Analyse and interpret the results
SC7Elaborate the Master Project plan
SC8Write the Master Project
A panel of judges in public tests will assess the Master Project, after the supervisor's approval of its
conclusion and quality to be presented in public tests. Assessment will be based on the scientific merit of the study and on its theoretical and methodological adequacy.
Title: G. Garson (2001), Guide to Writing Empirical Papers, Theses, and Dissertations, Marcel Dekker Inc
N. Bui, Yvonne (2014). How to write a Master's Thesis, Sage Publications, Inc.
Authors:
Reference:
Year:
Title: Punch, F. Keith (2016), Developing effective research proposals, Sage Publications.
Authors:
Reference:
Year:
Accreditations