Grizzly Fate - Part 1
We recently updated one of our OpenStack installations from Grizzly to Icehouse. Updating OpenStack by one version is tricky at best, and if you try two versions, you're bound to hit some issues. We did.
Here is a short recap of what extra we had to do for the update. It does not include the basic tasks like updating the packages or stamping the databases. I have also left out is a lot of preparation and testing. For example, we took a dump from the running database, and tried all the database update steps in a separate virtual machine several times, until we got the process working.
Since you can run OpenStack in a million different ways (probably literally), here's our starting and ending setups.
Start:
- OpenStack RDO Grizzly
- Quantum with linuxbridge + vlans
- Manually installed
End:
- OpenStack RDO Icehouse
- Neutron with ml2 plugin and linuxbridge + vlans
- Puppet/Ansible managed HA setup
It was somewhat ambitious jump from Quantum to two versions newer Neutron and ml2 at the same time. The update was a fork-lift upgrade by reinstalling everything on new virtual machines (except the compute nodes) to get a clean environment.
The first issue we hit was that RDO is packaged so that nova only comes with database migration scripts from the previous version to the current version. For example, Icehouse comes with Havana -> Icehouse database migration scripts, but not with Grizzly -> Havana. So we had to find a Havana version of
/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/migrate_repo/versions/
That done (find the Havana RPM, unpack it), the
185_rename_unique_constraints.py
broke on our systems. I'm not sure why, but it tried to work on constraints that weren't there. After fixing the constraints manually, it worked. Now we could comfortably first run the Grizzly -> Havana update, then the Havana -> Icehouse.
The tricky part was the Quantum/Neutron database upgrade. The database migration scripts themselves worked fine. However, Havana introduced a new table called portbindingports. The problem is, that the table starts out empty. In our case, since we didn't start a non-ml2 Neutron version to populate the table, it stayed empty. The ml2 migration script relies on portbindingports being correctly populated. It uses the the table to populate the ml2_port_bindings table.
In an update of another system, we noticed that if the ml2_port_bindings table isn't correctly populated it can cause bigger issues. In our case nova-compute didn't start. A nova-compute, which had virtual machines with ports that were not in the ml2_port_bindings, failed to start with the following message.
NovaException: vif_type parameter must be present for this vif_driver implementation
In the previous system we hacked together an ad-hoc fix, but in this upgrade we could plan better.
To populate the portbindingports table, we ran the following sql magic. This was done after the database update from Grizzly -> Icehouse, before running the ml2 migration script.
use neutron;
insert into portbindingports select ports.id, nova.instances.host from ports join nova.instances on ports.device_id=nova.instances.uuid where device_owner like "compute%";
insert into portbindingports select ports.id,"pouta-net" from ports where status!="DOWN" and device_owner not like "compute%";
After doing these things, we had successfully updated the databases. Or so we thought... But more about that in the next post.
Geek. Product Owner @CSCfi