Module 1 of E0 259, Data Analytics, August 2019

Cricket data set

Lectures

Lectures (Rajesh Sundaresan)

Data set (Thanks to Eamon McGinn for compiling the data from Cricinfo.)

ODI over-by-over data (.csv file):
This file contains data on ODI matches from 1999 to 2011. It is taken from this site. There is an R code for finding the 'run production functions' in this site, but you will do something marginally different in the following assignment.

Assignment 1

Due: 23:55 hrs, Saturday 31 August 2019. Discussion is encouraged. But write your own code. Please comply with the ethics policy.

1. Using the first innings data alone in the above data set, find the best fit 'run production functions' in terms of wickets-in-hand w and overs-to-go u. Assume the model Z(u,w) = Z0(w)[1 - exp{-Lu/Z0(w)}]. Use the sum of squared errors loss function, summed across overs and wickets.

Note that your regression forces all slopes to be equal at u = 0. You should provide a plot of the ten functions, and report the (11) parameters associated with the (10) production functions, and the total error.

Feel free to use tools for nonlinear regression available in Python. Some date fields are in different format with an extra comma. Write a short script to clean this up. There's no need to send us the cleaning script or the cleaned csv file.