The Data Scientists need to experiment with models, with feature engineering , with type of model – traditional or deep learning, before making a recommendation with a story to the decision makers.
It is also important in case of models such as Recommendation Systems or Value Estimations to be able record their performance and also the actual events thereof so as to know what needs to be done in future.
A data model which stores relevant data about the data science activity from start to finish of the model lifecycle would help.
There are two reasons why storing this data and analyzing to make improvements make sense.
One, as opposed to a deterministic programming model, a stochastic model may have many options in terms of algorithms. Since in a machine model, the test is only acceptability rather than correct or incorrect, the improvements are almost always possible.
Second, as business conditions and other things impacting the data used as input change, the impact to the output may be more sudden and adjustments either necessary or simply urgent.
I like the paradigm. You use machine learning models to provide diagnosis and suggestions for actions. Then you create machine learning models on these models to do a second order analysis and hence, take action on your original pipeline.
Table – Model Performance
Data Set Id
Table – Pipeline
Pipeline Step Description
Table – Model Type
Model Type Description
Table – Model
Table – Raw Dataset
Raw Dataset Id
Table – Final Dataset
Final Dataset Id
Table -Final Dataset Feature
Final Dataset Id
Feature Origin Type
Table- Feature Instance Prediction
Feature Value List
With thanks to