This course is designed for students who are new to the world of data science. After the introduction of some basic arithmetic, variables, and data structures in Python, students will start to learn how to collect and extract data from real datasets. Some data analytical skills using the control flows and Python packages (e.g., NumPy, SciPy, Pandas, etc.) will be introduced. To address the needs of big data processing, some distributed computing frameworks (e.g., Spark) and visualization tools with Python will be discussed. Students may apply some basic learning algorithms with Python packages (e.g., scikit-learn) to extract knowledge from data.
- apply the Python language fundamentals, including basic syntax, variables, and process flows, to write their first program
- apply functions and import packages to work with complex and/or large data sets
- apply scientific packages (e.g., NumPy and SciPy) to perform useful computations
- process text file using external packages (e.g., tabula)
- apply stunning data visualization tools to visualize large data sets
This course is designed to enable students to learn the significance of data visualization in data science and big data analytics, and develop knowledge and skills to present quantitative data using data visualization tools. This course emphasizes on the practical aspects of data science with a focus on using R or Python programming language to process data, produce visualizations, and interpret these visualizations. Students will learn the practice of data cleaning, reshaping of data, basic tabulations, aggregations and visual representation in order to increase the understanding of complex data and models.
- Describe the development and principles of data analytics and data visualization
- Identify different types of data (qualitative vs quantitative) and use appropriate analysis techniques (probabilistic, regression, cluster, etc.) best to explore them
- Draw conclusions and formulate hypotheses from data presented graphically
- Apply theories of data analytics and data visualization and competence in using software (Python, R/RStudio, Excel, etc.) for data visualization and data analytics
- Analyze, critique, and revise data visualizations
This course is designed to enable students to learn the database and data mining concepts and techniques for big data analytics and development in different domains. The course concentrates on the practical issues of database and data mining for solving big data problems. The content includes data modeling in database and data warehouse, SQL, Python programming for database, Python programming and R programming for data mining applications. Students will learn the skills of database modeling, querying, and programming, as well as the programming techniques for data mining.
- Model data in relational database using ER techniques
- Construct and develop database applications using SQL and Python language
- Perform data warehouse analysis
- Construct and perform data mining tasks using Python or R language
The course will start from the very beginning of the ML basis. First, the basic concepts such as liner algebra; probability and information theory, and numerical methods will be introduced. Next machine learning overview, inductive learning, and representation learning will be introduced. Basic deep learning processes are designed as artificial neural network; Bayesian Networks and learning; Deep learning and deep neural networks; convolution neural network. Throughout the course, practical methodology of using tools such as Tensorflow, Keras or Scikit-learn etc. will be emphasized.
- Undestand the fundamentals of machine learning, including basic learning techniques through big dataset, and learning process flows
- Use machine lerning tools (e.g., Keras, Scikit-learn and Tensorflow etc) on datasets.
- Design basic learning approaches of Bayesian networks, inductive learning and representation learning etc with the tools
- Design basic artificial neural networks, feedforward neural network; BP algorithm and deep models such as convolution neural networks with the tools
- Introduction to Data Science Programming
Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, An MIT Press; 2016.