Shared Trauma

Kalle Happonen

Geek. Product Owner @CSCfi

This time, I'll write about a problem I did manage to DuckDuckGo, but it did little to make me happy after I debugged the problem. Our OpenStack version is Icehouse, this issue has been actively debated and different versions might have different defaults. I'm not completely sure about the Juno status, if this is handled correctly for shared storage. We also use libvirt + kvm, I'm not sure how other hypervisors handle this.

It started when some machines had problems rebooting. The filesystem seemed to be corrupted. The instances' console logs had these kinds of errors in them.

block I/O error in device 'drive-virtio-disk0': Stale file handle (116)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)

We run the OpenStack virtual machines on NFS. It was a nice and easy way of getting live migration support. However, it seems that the OpenStack defaults aren't great if you're using shared backend storage. Mainly one flag.

remove_unused_base_images = True

When the following default flag is also in use.

use_cow_images = True

First a very brief rundown of how virtual machines boot from a glance image. When a virtual machine is scheduled to a compute host, the compute host needs to get the image for it from glance. First it checks if it is already cached. If not it downloads the image to the _base directory under the instance storage directory. Then it converts the image to the raw format from whatever it happened to be. Next, if the use_cow_images is set, a copy on write image is created for the virtual machine. This is basically a live diff between the base image and whatever the virtual machine does with its data. It makes booting virtual machines really fast.

In time these base images might fill up the disk, so that's where the remove_unused_base_images flag comes in. It tells nova to periodically check if there's are base images that are not used by virtual machines, and remove them. This decision is made on the compute node based on its virtual machines. This is bad if it isn't the only one using those base images. Very bad. There will be instances running with deleted base images on other nodes.

The immediate impact of this is not as catastrophic as it sounds. Deleting the base image file means it can't be seen anymore, but it only gets really removed when all handles to it are closed. So the other virtual machines keep chugging along happily, since they have a handle open to the base file. Until you reboot them. Or do anything else that shuts them down and starts them again. That's when kvm (or qemu?) opens the virtual machine image, and then tries to open the deleted base image with disappointing results.

IF you still have the base image in glance, fixing this isn't too bad. First disable the remove_unused_base_images flag on the nodes . Then check what base images are missing. Running this in the shared images directory on a node should tell you (with this you'll get duplicates)

for image in */disk ; do 
  bf=`qemu-img info $image | grep backing | cut -f 2 -d ":"`
  [ ! -e $bf ] && echo "$bf is missing"
done

The names of the images are sha1sum hashes of the glance image ids. So correlating those should give you which images you still can replace.

ids=`nova image-list |grep -v +------ | grep -v Status  | cut -f 2 -d "|"`
for id in $ids ; do 
  echo -n "id $id = shasum "
  echo -n $id | sha1sum
done

The you download the image, and convert it to raw with this command and copy it to the _base directory, and change ownership of it (nova:qemu probably?).

qemu-img convert -O raw <glance-image> <sha1sum-name>

The problem with this is that the base image might have been removed a long time ago, and the glance image might not exist anymore. Luckily in our case, we could recreate the machines which had their base images missing in glance too. If you can't, then maybe one of these will help, if you happen to be in the same unfortunate situation (all of these are untested)

Check if you have backups of the images
Attach a volume to the vm, dd sda to the volume and try to boot from it?
Do some freaky stuff with the open file descriptors of the process to copy the base file?

Conclusion: Bad OpenStack! Do not delete by default.

Kalle Happonen

Geek. Product Owner @CSCfi

January 09, 2015

Subscribe to this blog

Shared Trauma

Share this article with friends