everythingpossible - Fotolia
VMware vCloud Automation Center -- now known as vRealize Automation -- is designed to help businesses manage their cloud and provision various data center resources. But getting through the deployment process unscathed can be a challenge when all the disparate components won't play nice together.
In this article, we dive into a deployment of vCloud Automation Center (vCAC) to see where things go wrong and how to find the root cause. This article can't solve all possible problems but can give you some help in analyzing your deployment. For the purposes of this article, because the switch to the name vRealize Automation is relatively new, we will continue to refer to it as vCAC.
Check on the components
With vCAC, it is very important that you are using the correct versions of all the components. Using the wrong version of SSO, database server or other modules can lead to broken features or to erratic behavior. Check the vCAC Support Matrix document in the documentation section at VMware's site. This is also the place to go when you need more background information on the components described in this article.
Before diving into the troubleshooting steps, have a look at the components that are involved. A minimal deployment contains these machines and services:
- Identity Appliance or existing vSphere SSO
- vCAC appliance
- Windows IAAS Server
- SQL Database Server
- Active Directory or OpenLDAP
A diagram from the vCAC 6.1 Reference Architecture technical white paper details the relationships among the components. This document is a must-read when implementing this product. This particular diagram is for a minimal deployment, but the reference architecture document also contains an overview of a larger deployment where all components are duplicated for redundancy.
Another important thing to know about these components is which one is taking care of running what.
Problems with the login process?
The identity Virtual Appliance or your existing SSO server is used for authentication. If you have authentication problems, that's where you need to look. Is SSO still functioning? Can you log in to the vCenter environment with the Web Client?
When you have login problems -- or problems with anything else -- there are a few usual suspects to check. These areas should be configured for the entire vCAC deployment to work correctly, not only the authentication process.
- DNS: Make sure all components have a fully qualified domain name server and are registered in your DNS-environment.
- Time synchronization: Verify that all servers in your vCAC deployment use the same time source and that they are actively synchronizing with that time source.
- Certificates: All servers should have valid SSL certificates.
The vCAC Web portal layout
Once you're logged in, the Web interface displays two areas from two different servers. The area on the left side comes from the vCAC appliance via a Tomcat server. The area on the right side comes from the Internet Information Services (IIS) server that runs on your Windows-based infrastructure as a service (IaaS) server. This is the server that runs IIS for the Model Manager; in a smaller environment it will also run your Distributed Execution Manager Orchestrator and DEM Worker instances.
When the left side loads and the right side gives an HTTP Error 404, look at your Windows IAAS server and verify that IIS is running.
Verify the services
When any of the components in the portal fail, check that all services are registered. To verify that they are, open the Web management interface at port 5480
The services tab will show the registered services. One exception is the sts-service, which is never listed as registered. When any of the services is not registered or reports a status of FAILED, you can dive into the problem or take a shortcut and restart the services. There is no need to restart the entire Linux appliance. Unfortunately, there is no feature to restart individual services, so you will have to restart the entire server. Access the appliance with Secure Socket Shell or login to the local console through the vSphere Web Client, and on the command line execute the following command:
service vcac-server restart
This will invoke stopping and starting the server. In the next image, you can see that the service was started and immediately started again. One of the interesting things to note here is that vCAC starts an instance of a Tomcat server where all the server components are executed. This execution process takes quite a while to initialize.
It may take up to 15 minutes before the services appear in the Web management interface. If you want to trace what's happening, check the messages log file with this command:
tail -f /var/log/messages
This shows the end of the messages log file and will print any new line that is added to the file on the screen. After a while, all services should be registered. If not, it's time to investigate what's wrong with that service.
When one of the services is not registered or has failed, access your vCAC appliance to look at the status of that service.
Search in your browser for the words error or warning to find the lines that contain more information about the service with the problem.
It's impossible to list all possible errors and warnings here, but a little Googling, along with the tutorial I've provided should help analyze and correct most issues.
Check other locations for troubleshooting
When all the components appear to be online or when you can't find any errors or warnings, then the next step is to look into the log files. VMware has collected all log locations in its Knowledge Base article 2074803.
When all services are running and the entire vCAC deployment is up, there is still the possibility that other components failed, such as an endpoint that cannot be reached or storage that is full. For those types of problems, look for errors and warnings under the Infrastructure tab in the Monitoring section under Log. This page allows you to search for errors and warnings based on keywords and other filters.
The Distributed Execution Status under the Infrastructure>Monitoring>Log section can also hold some clues when problems arise. This is where the DEM orchestrator and DEM Worker instances are listed and where the status of running workflows resides.
If a DEM Worker is offline -- and it's the only DEM worker -- none of the workflows will process. If you have multiple DEM Workers, your workflows might still process, but it might take longer for them to finish. If you have a large number of pending workflows, end users might start complaining about delays with requests.