Module 8 of E0 259, Data Analytics, August 2018

Retinoblastoma data set

Lectures

Lectures 1 and 2 (Ramesh Hariharan)
Lecture 3 (Ramesh Hariharan)

Data set for August 2017 assignment

Unilateral retinoblastoma diagnosis data (.csv file):
This file contains data on the number of months to diagnosis.

Age related cancer data (.csv file, same as slide 80 of Lecture 3):
This file contains the incidence rates in each age group per 100,000 for various cancers.

Assignment

Due: No due date. Discussion is encouraged. But write your own code. Please comply with the ethics policy.

1. Unilateral retinoblastoma.
(a) From the unilateral-retinoblastoma data set, compute the empirical cdf of unilateral retinoblastoma up to 73 months. Scale down the cdf from 1 to 1/15000. Assuming that the disease occurred due to a single event that hits each cell with probability p per month, consider the corresponding cdf and a squared error loss function (integrated across period of interest, here 73 months). Find the best fit p to minimise this loss function.
(b) Assuming that the disease occurred due to two events that hits each cell with probabilities p1 and p2 per month, find the values of p1, p2 assuming a squared error loss function. Use the data only upto 36 months to compute p1 and p2.
(c) Now consider p2(t)=p2 exp(-a(t-36)) when t is at least 36. In other words, assume that the value of p2 tapers of exponentially after 36 months. Find the parameter 'a' assuming squared error loss.
(d) Consider p2 to remain constant and p1 to taper off exponentially after 36 months. Find the parameter 'a' assuming squared error loss. Report which of the three (1a, 1c or 1d) best fits the given data.

2. Age-related cancer.
(a) From the age-related cancer data set, compute the empirical cdf for each of the 8 cases. Considering pi=p for all i, find the value of p and k that minimizes the sum of squared error in each of the cases.
(b) Consider pi=pi-1+d for all i=2,3,...,k. In each case, find the values of p1, d, and k, that minimizes the sum of squared error.

3. Demonstrate a 1-1 mapping between a tournament and its mirror for cyclic tournaments in the context of Lecture 3 of this module. Show that the mapping is indeed 1-1.