Data Model for Data Science Model Performance Evaluation

The Data Scientists need to experiment with models, with feature engineering , with type of model – traditional or deep learning, before making a recommendation with a story to the decision makers.

It is also important in case of models such as Recommendation Systems or Value Estimations to be able record their performance and also the actual events thereof so as to know what needs to be done in future.

A data model which stores relevant data about the data science activity from start to finish of the model lifecycle would help.

There are two reasons why storing this data and analyzing to make improvements make sense.

One, as opposed to a deterministic programming model, a stochastic model may have many options in terms of algorithms. Since in a machine model, the test is only acceptability rather than correct or incorrect, the improvements are almost always possible.

Second, as business conditions and other things impacting the data used as input change, the impact to the output may be more sudden and adjustments either necessary or simply urgent.

I like the paradigm. You use machine learning models to provide diagnosis and suggestions for actions. Then you create machine learning models on these models to do a second order analysis and hence, take action on your original pipeline.

Table – Model Performance

Date Timestamp

Data Set Id

Pipeline Id

Model Id



Table – Pipeline

Pipeline Id

Pipeline Step

Pipeline Step Description

Table – Model Type

Model Type

Model Type Description

Table – Model

Model Id

Model Name

Model Description

Model Type

Table – Raw Dataset

Raw Dataset Id

Raw Dataset

Table – Final Dataset

Final Dataset Id

Final Dataset

Table -Final Dataset Feature

Final Dataset Id

Feature Id

Feature Type

Feature Origin Type

Feature Name

Feature Description

Valid Values

Table- Feature Instance Prediction

Feature Value List

Model Id


Actual Value


With thanks to

Lynn Langit
Srinivasan Srivatsan

