Yay, backstab hit again

What a nice morning to get to work…one openstack cloud is unreachable via Horizon and monitoring systems.

After quick check it seemed that keystone was not authenticating users at all, what the heck? Yesterday evening it was all good. Checking keystone.log revealed deadlock in db:

2016-06-30 00:22:33.585 9974 ERROR keystone DBDeadlock: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'DELETE FROM token WHERE token.expires <= %s'] [parameters: (datetime.datetime(2016, 6, 29, 11, 44, 8),)]

Hmm, that is worrying but should not kill the system, right? Checking mariadb.log

160615 2:00:10 InnoDB: ERROR: over 95 percent of the buffer pool is occupied by
InnoDB: lock heaps or the adaptive hash index! Check that your
InnoDB: transactions do not set too many row locks.

and lot of other nasty things I do not want to see in <whatnot>db.log. Finally mariadb and httpd restart cleared situation and users were able to authenticate again.

Rigor mortis:
------------------------
LATEST DETECTED DEADLOCK
------------------------
160527 0:36:36
*** (1) TRANSACTION:
TRANSACTION 8739812, ACTIVE 173 sec
mysql tables in use 1, locked 1
LOCK WAIT 66 lock struct(s), heap size 1128888, 7504 row lock(s)
MySQL thread id 607714, OS thread handle 0x7ef9978e3700, query id 436018551 10.39.224.11 keystone_admin updating
DELETE FROM token WHERE token.expires <= '2016-05-26 18:33:05'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 110419 n bits 72 index `PRIMARY` of table `keystone`.`token` trx id 8739812 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 873E4C4, ACTIVE 5 sec updating or deleting
mysql tables in use 1, locked 1
150 lock struct(s), heap size 63928, 7505 row lock(s), undo log entries 1
MySQL thread id 607711, OS thread handle 0x7ef997df7700, query id 436081360 10.39.224.11 keystone_admin updating
DELETE FROM token WHERE token.expires <= '2016-05-26 19:46:25'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 110419 n bits 72 index `PRIMARY` of table `keystone`.`token` trx id 873E4C4 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 106394 n bits 408 index `ix_token_expires_valid` of table `keystone`.`token` trx id 873E4C4 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)

So something to do with token cleaning which is handled by keystone user cronjob:


$# crontab -u keystone -l
# HEADER: This file was autogenerated at 2016-04-14 10:15:25 +0300 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: keystone-manage token_flush
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
*/1 0 * * * keystone-manage token_flush >>/dev/null 2>&1

All right, looks ok…wait… */1 0 * * * That’s like every minute for one hour after midnight, what the !”#?
Checking keystone.pp:

if hiera('CONFIG_KEYSTONE_DB_PURGE_ENABLE',false) {
class { '::keystone::cron::token_flush':
minute => '*/1',
require => Service['crond'],
destination => '/dev/null',
}
service { 'crond':
ensure => 'running',
enable => true,
}
}

I don’t get rationale behind that so I changed it from */1 to 1 and ran
su -s /bin/sh -c "keystone-manage --debug token_flush" keystone

which removed some amount of expired tokens, btw execution took several minutes

Openstack Liberty

# rpm -qa | grep openstack-packstack
openstack-packstack-puppet-7.0.0-0.10.dev1682.g42b3426.el7.noarch
openstack-packstack-7.0.0-0.10.dev1682.g42b3426.el7.noarch
Bugzilla ->

Leave a Reply

Your email address will not be published. Required fields are marked *