What is the difference between training and test dataset?

Sajid Lhessani
3 min readSep 28, 2019

This is a complementary article related to my data science articles bringing more understanding for my readers.

In a dataset, a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually, a dataset is divided into a training set, a validation set (some people use ‘test set’ instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.

In Machine Learning, we basically try to create a model to predict the test data. So, we use the training data to fit the model and testing data to test it. The models generated are to predict the results unknown which is named as the test set. As you pointed out, the dataset is divided into train and test set in order to check accuracies, precisions by training and testing it on it.

The proportion to be divided is completely up to you and the task you face. It is not essential that 70% of the data has to be for training and rest for testing. It completely depends on the dataset being used and the task to be…

--

--

Sajid Lhessani

Data scientist working in Banking and Capital market. London Based. Follow me on Youtube: https://www.youtube.com/c/AlgorithmicTradingbySajid