xend: take care of dead qemu-dm process
authorKeir Fraser <keir.fraser@citrix.com>
Mon, 26 May 2008 07:53:25 +0000 (08:53 +0100)
committerKeir Fraser <keir.fraser@citrix.com>
Mon, 26 May 2008 07:53:25 +0000 (08:53 +0100)
This patch fix xend as when fatal error happened (e.g. qemu-dm process
was killed) log error message then mark that domain as crashed, do
what specified on crashed in the domain config file. Added some code in
xend to check those crashed hvm DM status each 30 seconds.

Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com>
tools/python/xen/xend/server/SrvServer.py

index 95be674da82912afa5bacc80f214c98216c84f13..5b4a6b0349cd5f79375d7459db0cb542a1405fd6 100644 (file)
@@ -44,6 +44,7 @@ import fcntl
 import re
 import time
 import signal
+import os
 from threading import Thread
 
 from xen.web.httpserver import HttpServer, UnixHttpServer
@@ -148,14 +149,27 @@ class XendServers:
 
             # Reaching this point means we can auto start domains
             try:
-                xenddomain().autostart_domains()
+                dom = xenddomain()
+                dom.autostart_domains()
             except Exception, e:
                 log.exception("Failed while autostarting domains")
 
             # loop to keep main thread alive until it receives a SIGTERM
             self.running = True
             while self.running:
-                time.sleep(100000000)
+                # loop to destroy those hvm domain that whoes DM has dead unexpectedly.
+                for item in dom.domains.values():
+                    if item.info.is_hvm():
+                        device_model_pid = item.gatherDom(('image/device-model-pid', str))
+                        dm_stat_cmd = "ps -o stat --no-headers -p"+device_model_pid
+                        dm_stat = os.popen(dm_stat_cmd).readline().rstrip()
+                        if dm_stat == 'Z':
+                            log.warn("Devices Model for domain " + str(item.domid) + "was killed unexpectedly")
+                            item.info['crashed'] = 1
+                            item.refreshShutdown(item.info)
+                        else:
+                            continue
+                time.sleep(30)
                 
             if self.reloadingConfig:
                 log.info("Restarting all XML-RPC and Xen-API servers...")