Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Tirth Patel
5 min readOct 28, 2020

Increase the size of Data Node on the fly.

Prerequisite for this blog is you should have a basic knowledge of Hadoop, how to setup a simple cluster and knowledge of Partition in Linux.

I already have my hdfs-site.xml and core-site.xml file configured and have a hadoop cluster running on VirtualBox. Below is image of datanode running.

Datanode

Below is image of NameNode i.e Master running

NameNode

Below is the image of initial cluster having size of 10GB as my datanode has 8GB in it’s root “/” folder.

initial cluster

Now, add two harddisk on the Data Node, so that we can create a Volume Group. These Volume Group (VG) is like Virtual H/D created by merging two or more Physical Volume (PV). The PV is the actual H/D connected to the OS of Data Node.

We can check all Hard Disk attached with “fdisk -l” command.

All H/D attached

Now that we have two Hard Disk of 20Gb and 30Gb, we can merge these logically using the concept of LVM.

Step1 is to create Physical Volume of the attached harddisk. We can achieve this by below command.

pvcreate /dev/sdb and pvcreate /dev/sdc

You can also display the information about created PV with pvdisplay /dev/sdb

Creating Physical Volume
Creating Physical Volume

After having physical volume, we need to create Volume Group(VG). VG is like virtual Hard Disk having the space of both the PV created above. Below is the command to create VG from above two PV.

vgcreate elastichadoop /dev/sdb /dev/sdc

creating vg

Now that we have a Virtual Hard Disk like VG. We can create partitions from this H/D and format and mount to use it. The partitions created is known as Logical Volume (LV). As it is not created from the actual physical volume, it is logical volume. Here, i have created 25Gb partition and the rest 25Gb is unallocated. Below is the command to create LV.

lvcreate — size 25G — name elasticlv1 elastichadoop

lv created

After creating LV, we need to format.

For Format, mkfs.ext4 /dev/elastichadoop/elasticlv1

To mount, we need to mount on the folder where Data Node folder is created. Here, i have datanode folder /dn1.

To mount, mount /dev/elastichadoop/elasticlv1 /dn1

format and mount the lv

Now that we have 25Gb storage mounted, we can hadoop cluster size, it will be 25Gb.

Updated cluster

Put the data on cluster so that we can verify that after increasing the cluster size the data remains intact.

put data on cluster and read the data

Now, if we have a use-case of increasing the size of DataNode on the fly, that is without shutting of the cluster, we can achieve this by extending the lv size.

To extend LV size, use the command lvextend — size +10G /dev/elastichadoop/elasticlv1

extend size

After that we have extended the LV, we also have to format the partition. Now, we face the major challenge, if we format the partition, then it will remove the data present, so we make use of resize2fs.

This will only format the unformatted data. Here, we have the 25 gb formatted data and 10 Gb extra unformatted data. So resize2fs only formats the unformatted data.

Use command : resize2fs /dev/elastichadoop/elasticlv1

And we can see, on the fly the size of the hadoop cluster increased to 35Gb.

We can also check if our data is preserved or not.

no loss of data

Here, we can see without disturbing the data, on the fly we increased the storage of Data Node.

Now, we can also decrease the storage of Data Node as per any use case.

Note: To decrease storage we have to make our folder offline so that user dont face any issue.

There are 5 steps to be followed to decrease storage of Data Node.

Step1 : Make the folder offline i.e unmount the folder from the LV

Use command: umount /dn1

Step2 : Scan the LVM partition

Use command : e2fsck -f /dev/elastichadoop/elasticlv1

Step3 : Recreation of inode table, i.e online format with resize2fs

Use command : resize2fs /dev/elastichadoop/elasticlv1 -30G

Here 30G is beacuse i need to reduce the size of cluster by 5Gb and make it 30Gb. So, we need to format upto 30Gb. The rest can be deleted using lvreduce in the next step. We always have to take care in mind about what is more important, data or storage. it we need data then we cant reduce storage.

Step4 : Reduce size with lvreduce

Use command : lvreduce — size 30G /dev/elastichadoop/elasticlv1

reduced size

Step5 : Finally mount the LV to DataNode directory

Use command : mount /dev/elastichadoop/elasticlv1 /dn1

mounting

Finally, we have to again start the hadoop service and we can see that the storage has been decreased to 30Gb from 35Gb.

Also, the data in the Data in the cluster is not altered.

If you have any suggestions, please feel free to connect with me and put the suggestions in the comment section.

--

--