Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
Increase the size of Data Node on the fly.
Prerequisite for this blog is you should have a basic knowledge of Hadoop, how to setup a simple cluster and knowledge of Partition in Linux.
I already have my hdfs-site.xml and core-site.xml file configured and have a hadoop cluster running on VirtualBox. Below is image of datanode running.
Below is image of NameNode i.e Master running
Below is the image of initial cluster having size of 10GB as my datanode has 8GB in it’s root “/” folder.
Now, add two harddisk on the Data Node, so that we can create a Volume Group. These Volume Group (VG) is like Virtual H/D created by merging two or more Physical Volume (PV). The PV is the actual H/D connected to the OS of Data Node.
We can check all Hard Disk attached with “fdisk -l” command.
Now that we have two Hard Disk of 20Gb and 30Gb, we can merge these logically using the concept of LVM.
Step1 is to create Physical Volume of the attached harddisk. We can achieve this by below command.
pvcreate /dev/sdb and pvcreate /dev/sdc
You can also display the information about created PV with pvdisplay /dev/sdb
After having physical volume, we need to create Volume Group(VG). VG is like virtual Hard Disk having the space of both the PV created above. Below is the command to create VG from above two PV.
vgcreate elastichadoop /dev/sdb /dev/sdc
Now that we have a Virtual Hard Disk like VG. We can create partitions from this H/D and format and mount to use it. The partitions created is known as Logical Volume (LV). As it is not created from the actual physical volume, it is logical volume. Here, i have created 25Gb partition and the rest 25Gb is unallocated. Below is the command to create LV.
lvcreate — size 25G — name elasticlv1 elastichadoop
After creating LV, we need to format.
For Format, mkfs.ext4 /dev/elastichadoop/elasticlv1
To mount, we need to mount on the folder where Data Node folder is created. Here, i have datanode folder /dn1.
To mount, mount /dev/elastichadoop/elasticlv1 /dn1
Now that we have 25Gb storage mounted, we can hadoop cluster size, it will be 25Gb.
Put the data on cluster so that we can verify that after increasing the cluster size the data remains intact.
Now, if we have a use-case of increasing the size of DataNode on the fly, that is without shutting of the cluster, we can achieve this by extending the lv size.
To extend LV size, use the command lvextend — size +10G /dev/elastichadoop/elasticlv1
After that we have extended the LV, we also have to format the partition. Now, we face the major challenge, if we format the partition, then it will remove the data present, so we make use of resize2fs.
This will only format the unformatted data. Here, we have the 25 gb formatted data and 10 Gb extra unformatted data. So resize2fs only formats the unformatted data.
Use command : resize2fs /dev/elastichadoop/elasticlv1
And we can see, on the fly the size of the hadoop cluster increased to 35Gb.
We can also check if our data is preserved or not.
Here, we can see without disturbing the data, on the fly we increased the storage of Data Node.
Now, we can also decrease the storage of Data Node as per any use case.
Note: To decrease storage we have to make our folder offline so that user dont face any issue.
There are 5 steps to be followed to decrease storage of Data Node.
Step1 : Make the folder offline i.e unmount the folder from the LV
Use command: umount /dn1
Step2 : Scan the LVM partition
Use command : e2fsck -f /dev/elastichadoop/elasticlv1
Step3 : Recreation of inode table, i.e online format with resize2fs
Use command : resize2fs /dev/elastichadoop/elasticlv1 -30G
Here 30G is beacuse i need to reduce the size of cluster by 5Gb and make it 30Gb. So, we need to format upto 30Gb. The rest can be deleted using lvreduce in the next step. We always have to take care in mind about what is more important, data or storage. it we need data then we cant reduce storage.
Step4 : Reduce size with lvreduce
Use command : lvreduce — size 30G /dev/elastichadoop/elasticlv1
Step5 : Finally mount the LV to DataNode directory
Use command : mount /dev/elastichadoop/elasticlv1 /dn1
Finally, we have to again start the hadoop service and we can see that the storage has been decreased to 30Gb from 35Gb.
Also, the data in the Data in the cluster is not altered.
If you have any suggestions, please feel free to connect with me and put the suggestions in the comment section.