We have FMS HA set up on one child, and both are in federation.config file on the federation parent server ; the first 2 (02a and 02b) are 2 HA nodes.
This was observed within the web console on the Federator as those hosts reporting to 02b were not active (although we knew they were and when using the web console logged into 02b it showed all the hosts as active).
When 02a failed over to 02b, the Federator indicated in the FMS log that it could not contact it on the JNDI port 1099.
2011-07-11 16:20:36.003 WARN [QuartzScheduler.Utility_Worker-2] com.quest.nitro.service.federation.FederationConnectionManager - Unable to connect to the remote server "jnp://foglight-02b:1099". The server may not be running or may not support working with federation servers.
The Federator could not resolve the short-hostname (not FQDN).
When pinging the short-hostname from the Federator, it could not resolve or find the target hostname.
STATUS: Add fully qualified domain name (FQDN) to the local hosts file of all the target federated boxes.
1) The CAUSE was found using the jmx-console logged in as user 'foglight.
2) Scroll-down to Domain Name: com.quest.nitro.fed
3) Select service: FederationConnectionManager
4) Scoll-down to function java.lang.String diagnosticSnapshotAsString()
and click the INVOKE button. The list indicated the truncated hostnames (not FQDN).