E0 259 : Data Analytics
August 2024
Instructors
Ramesh Hariharan (Strand Genomics and ECE, IISc)
Vikram Srinivasan (Founder, Needl.ai and ECE, IISc)
Rajesh Sundaresan (ECE and RBCCPS, IISc)
Teaching Assistants
Tejashree S
Shivam Vinayak Vathsa
Mohd Azfar
Lecture Hours
Lectures: Tuesdays and Thursdays 15:30 - 17:10 hrs
Location: MP 20
First class: Thursday 06 August 2024, 15:30 - 17:10 hrs
Teaching Assistant Hours
TA hour: Thursdays, 18:00 - 19:00 hrs
Location: MP20
Project Presentation
Friday 15 November 2024, 09:00 hrs onward, day long, attendance compulsory
Location: MP20
Examinations
Quiz in lieu of Assignment 2: Saturday 05 October 2024, 16:00 - 17:00 hrs.
Final examination: Thursday 28 November, 14:00 - 17:00 hrs, as per SCC final exam timetable.
2024 Lecture Progression
- Community detection
- Mars orbit
- Effects of smoking
- Primer on hypothesis testing (in preparation for the following modules)
- Cricket - Duckworth-Lewis-Stern method
- Visual neuroscience
- Colour blindness
- Recommendation systems
- Natural language processing
2023 Lecture Progression
- Cricket - Duckworth-Lewis-Stern method
- Primer on hypothesis testing (in preparation for the following modules)
- Community detection
- Effects of smoking
- Visual neuroscience
- Mars orbit
- Covid-19 modelling
- Colour blindness
- Recommendation systems
- Natural language processing
2021 Lectures and assignments
2018 complementary lectures and assignments
2017 complementary lectures and assignments
2016 complementary lectures and assignments
- Module 12: Functional connectivity patterns of the brain (Rajesh Sundaresan)
2015 complementary lectures and assignments
Course syllabus
Data sets from astronomy, genomics, neuroscience, sports, surveillance cameras, and social networks will be analysed to answer specific scientific questions. Statistical tools and modeling techniques will be introduced as needed to analyse the data and eventually address the scientific question.
Course Description
Data Analytics has assumed increasing importance in recent times. Several industries are now built around the use of data for decision making. Several research areas too, genomics and neuroscience being notable examples, are increasingly focused on large-scale data generation rather than small-scale experimentation to generate initial hypotheses. This brings about a need for data analytics. This course will develop modern statistical tools and modeling techniques through hands-on data analysis in a variety of application domains.
The course will illustrate the principles of hands-on data analytics through several case studies (10 such studies). On each topic, we will introduce a scientific question and discuss why it should be addressed. Next, we will present the available data, how it was collected, etc. We will then discuss models, provide analyses, and finally touch upon how to address the scientific question using the analyses.
In one of the previous offerings, we covered the following case studies.
- Astronomy: From Tycho Brahe's observations to the conclusion that Mars moves in an elliptical orbit.
- Visual Neuroscience: Neural correlates predict search difficulty.
- Genomics: Understanding the causes of cancer.
- Sports: The Duckworth-Lewis-Stern method for setting targets in shortened limited overs cricket matches.
- Genomics: The basis for red-green colour blindness.
- Genomics: Population history of India.
- Signal Processing: Video background separation.
- Networks: Community detection.
- Recommendation systems.
Prerequisites
- Random Processes (E2 202) OR Probability and Statistics (E0 232) OR equivalent.
Course Grade
There will be about seven assignments, one on each of the first seven modules. A fair amount of hands-on work is expected. Students will use Python.
- 50/100 : Assignments
- 20/100 : Final examination
- 30/100 : Course project and presentation
Reference Texts
- There is no text book for this course. Slides of lectures will be available on the Moodle page (2024) or from this webpage (previous years).