data-gym: Demo notebooks & datasets
General
Data I/O
- Extracting data: background color of cells in Excel files
- Reading in data from Google spreadsheet
- Converting from PDF using batch mode
Plotting
Prelim. analyses
- Discretize variables using numpy.matlib.repmat
- Data visualization via dimensionality-reduction
- Explore some ONC datasets
- Explore the YaleED dataset
- Explore clinical codes; calculate some clinical indices
Preprocessing
Training models
- Logistric regresssion and LSTM using PIMA
- LR and MLP using PIMA
- Time-series classification (WISDM dataset)
Misc.
Common code snippets
Helpful links
Resources
Visual analytics
- “Making Data Visual” by Danyel Fisher, Miriah Meyer
- Visualizing missing data quickly
- Visualization: bad examples
- Violinplot via Seaborn
- Cartograms (explained in first ten seconds)
Statistics & cross-validation
- “Statistical Approaches to the Model Comparison Task in Learning Analytics,” Gardner & Brooks, LAK2017
- Slides “Statistical testing for classification…” by B Evans
- Notes: “[…] ordinal regression is known to perform better than softmax on ordinal data (Cheng et al., 2008)”
Meta-data