Training ML model inside a Docker container

Tirth Patel
5 min readMay 26, 2021

๐Ÿ”˜Steps to be followed to achieve this task ๐Ÿ”˜

๐Ÿ‘‰Pull the Docker container image of CentOS image from DockerHub and create a new container

๐Ÿ‘‰ Install the Python software on the top of docker container

๐Ÿ‘‰ In Container you need to copy/create machine learning model which you have created in jupyter notebook

Pre-requisite

  • Docker should be installed on your system.

First of all, we need to check whether docker is installed or not. We can check using docker โ€” version command. Then start the docker service using command systemctl start docker.

Now, that we have docker service running on our RedHat 8 system. Now, we need to pull the centos image from dockerhub.

To pull any image

So, now centos latest version image has been downloaded. Now, we can create container using this.

To create container we have to use below command

docker run -it --name os1 centos:latest
  • it option means interactive terminal. It will help us to interact with the os by providing the shell i.e terminal.

Now, we have a new container created named as os1 with the image centos latest version. Now, we need to install python3 inside so that we can download various Machine Learning Libraries.

To do this run command yum install python3-pip

Now, its often we need to clear screen. But bydefault, in this image we dont have clear command. So we can ask yum which software provides clear command.

Run command yum whatprovides clear

Similarly we also need ifconfig command to check IP Address in future. So, install the needed software i.e net-tools. You can do yum install net-tools -y.

Now, install pandas library so that we can load the dataset. Run command pip3 install pandas

Now, it has also downloaded numpy. Now, we need to download scikit-learn library which provides functions to create ML models.

pip3 install scikit-learn

Now, our base environment is ready. Now, we need to get the dataset inside the docker container.

Now, there are several ways. One of the easiest way is to upload the dataset on github and download inside the docker container using git clone command.

Here, I have my dataset in my Windows Machine. So first of all, we would have to transfer dataset file from Windows to RHEL8 VM. To do this install WinSCP software.

Provide hostname as the IP Address of RHEL8 VM(check using ifconfig command). Provide username and password. Now, you can drag and drop.

Just drag and drop the file. Left side is your Windows Machine and Right side is RHEL8 VM.

Now, we have file in the Linux system at /root/ path.

But, we have dataset in our Linux system,so we can transfer files to docker container using below command

docker cp <SOURCEFILE_PATH>  <CONTAINER_NAME>:<DESTINATION_PATH>SOURCEFILE_PATH: Path to the file inside your baseOS i.e here RHEL8CONTAINER_NAME: Path of the container name in which you want to                     transfer file. 
Note: Container should be running.
DESTINATION_PATH: Path inside docker container where you wanted to copy the file from baseOS.

Now, our SOURCEFILE_PATH is /root/salary.csv

First, let us create a workspace in our docker container.

mkdir /root/salaryApp/

So, we want to transfer file to /root/salaryApp inside os1 container. So, CONTAINER_NAME is os1 & DESTINATION_PATH is /root/salaryApp

Go to your baseOS. Open a new window and run below command.

docker cp /root/salary.csv os1:/root/salaryApp/

Now, we have our dataset in our workspace inside docker container.

Now, its time to create a python script which can train our model and save the model in our workspace.

We have to create a file using vi. Run command vi main.py

# loading dataset
import pandas as pd
data=pd.read_csv("salary.csv")
print("Dataset has been loaded ...")
# Creating features and target.
feature=data[["YearsExperience"]]
target=data["Salary"]
# loading LinearRegression
from sklearn.linear_model import LinearRegression
model=LinearRegression()
model.fit(feature,target)
print("Model has been created ...")
# Now, our model has been trained
# Saving model
import joblib
joblib.dump(model,'salary_model.pkl')
print("MODEL SAVED SUCCESSFULLY IN WORKSPACE ...")
main.py

Now, when we run main.py file our model will be created and saved inside our workspace i.e /root/salaryApp

Now, let us check whether it is working good or not. Run the file using command python3 main.py

Now, we have model created and saved in this directory. Now, if we want to predict the salary for some years of experience. Create a file predict.py in the same workspace.

import joblib
model = joblib.load("salary_model.pkl")
#predict
exp=int(input("Enter years of experience: "))
pred=model.predict([[exp]])
print("Expected Salary is ",round(pred[0],2)," INR.")

Now, we can run this script. It will first load the model, and predict the salary based on the years of experience we provided.

Now, we have successfully created Model and done prediction inside a docker container.

Thanks for reading ๐Ÿ˜ƒ

--

--