The cluster module is used to specify cluster behavior.
The cluster module is configured in the ecelerity-cluster.conf
file. The following is the default configuration:
cluster {
cluster_group = "ec_cluster"
control_group = "ec_console"
logs = [
rejectlog = "/var/log/ecelerity/rejectlog.cluster"
paniclog = "/var/log/ecelerity/paniclog.cluster"
mainlog = "/var/log/ecelerity/mainlog.cluster"
acctlog = "/var/log/ecelerity/acctlog.cluster"
bouncelog = "/var/log/ecelerity/bouncelog.cluster"
]
Replicate "inbound_cidr" {}
Replicate "outbound_cidr" {}
Replicate "outbound_domains" {}
Replicate "outbound_binding_domains" {}
Replicate "shared_named_throttles" {}
# DuraVIP network topology hints
Topology "10.1.1.0/24" {
cidrmask = "32"
interface = "eth1"
}
}
The following are the configuration options defined within this module:
- arp_all_hosts
-
When plumbing a DuraVIP™, you can either aggressively send out ARP information to ensure that the network knows about the IP address assignment (true) or target the ARP to specific hosts of interest (false).
Default value is
true
. You may consider changing this tofalse
if your network experiences problems with the burst of ARP traffic around the DuraVIP™ move. - cluster_group
-
Defines cluster communication. Default value is
ec_cluster
. Under normal circumstances, this option should be left at the default value.The DuraVIP™ system will coordinate IP ownership responsibilities via the
cluster_group
EVS group. - control_group
-
Defines the cluster communication. Default value is
ec_console
. Under normal circumstances, this option should be left at the default value.Each node can respond to normal console commands received on the
control_group
. The cluster manager utilizes this group to issue cluster-wide configuration commands to update and discover changes in configuration information. - duravip_balance_set_size
-
When balancing DuraVIP™s, how many to process as a batch in response to a balance request.
Clusters with large numbers of DuraVIP™s (especially when they are not explicitly preferenced) will take less time to converge if this number is increased. It is imperative that this number be set consistently across all nodes, as inconsistent values across the nodes will result in a cluster that will not converge (since the nodes will not all agree on the same parameters). Therefore, it is strongly recommended that all the nodes be brought down before changing this option. The value must be greater than
1
. - heartbeat_start_delay
-
How many seconds to wait after startup before the cluster heartbeat is activated. Default value is
15
. - heartbeats_per_sec
-
How often to send a heartbeat. The heartbeat is used to help detect "byzantine" nodes in the cluster. Default value is
1
. - if_check_interval
-
How often to run through a maintenance cycle to make sure that the interfaces plumbed on the system match up to the cluster internal view. Default value is
30
. - if_down_limit
-
As part of the maintenance cycle, when detecting that we need to plumb an IP address, how long to wait before deciding that we should bring it online. This avoids rapid "flapping". Default value is
4
. - log_active_interval
-
This option, along with
log_idle_interval
, is used to tune centralized logging (logmove). When logmove is actively sending data to the log aggregator, it will sleep forlog_active_interval
seconds between each segment send. When the job idles (no segments are pending), then it will sleep forlog_idle_interval
seconds before looking for another segment. Default value is1
. - log_group
-
When this option is enabled, the paniclog messages are broadcast over spread, using the specified group name. Another spread enabled application or the
spuser
tool can then listen in on paniclog events. - log_idle_interval
-
Amount of time to sleep before looking for another segment. Default value is
10
. - logs
-
The
logs
dictionary configures the log file server. In conjunction with aggregated logging, the log file service makes up the clustered logging solution. This service lets subscribers connect to Momentum and request a "replay" of logs since their last checkpoint and then checkpoint the reader. This is a durable logging mechanism for aggregation.Default values are as follows:
logs = [ rejectlog = "/var/log/ecelerity/rejectlog.cluster" paniclog = "/var/log/ecelerity/paniclog.cluster" mainlog = "/var/log/ecelerity/mainlog.cluster" acctlog = "/var/log/ecelerity/acctlog.cluster" bouncelog = "/var/log/ecelerity/bouncelog.cluster" ]
Each logfile that should be serviced is given a key name and corresponding local path that should match the path portion of the
cluster://
log destination specified in the other loggers throughout your configuration. This dictionary defines the logs that the cluster module on the node will tell the log aggregator are available for aggregation.The log aggregator will periodically attempt to connect to the nodes to pull logs. It does this by connecting to the address configured in the
ECCluster_Listener
stanza on the node. Once connected, it will try to consume records from the jlogs published by the node and write that data to log files on the log aggregator. - nodeaddr
-
Canonical cluster address for the node.
If not specified, gethostbyname(
nodename
) is used to determine the address. The address must be routable via the cluster network, and must not be 127.0.0.1. - nodename
-
Override the node name that is used to canonically identify this cluster node.
The
nodename
is determined according to the following logic: When ec_ctl runs, it determines the node name (and subcluster) as configured fromcluster.boot
and exports EC_SUB_CLUSTER and EC_NODE_NAME to the environment. If you do not explicitly configure the nodename option, the cluster module will look for the EC_NODE_NAME environment variable and take that as the value. If EC_NODE_NAME is not set in the environment, it will use the system hostname, truncated at the first ‘.
’. Note also that modules can use the cluster_nodename hook to determine the effective value of thenodename
. - unconditional_rebind
-
Whether the full set_binding logic is invoked when assessing messages for internal cluster message moves or whether to use an optimization that avoids calling out to whatever set_binding logic is in place. Default value is
true
. - view_balance_interval
-
How often DuraVIP™ views are subject to balancing. This option is similar to
view_mature_time
and should to be adjusted without consultation with support. Default value is10
. - view_broadcast_interval
-
When non-zero, how often to speculatively broadcast a view announcement to the cluster. Should not be needed except in rare cases when the cluster does not seem to be in sync with views; only enable as directed by support. Default value is
0
. - view_mature_time
-
How long a DuraVIP™ view needs to remain unchanged before considering it "mature".
Increasing the value will make the cluster take longer to fully converge and balance DuraVIP™s. Reducing the value will make it take less time. This option should not generally need to be altered, but you may consider doing so if the cluster is experiencing instability. Best to seek advice from support if that is the case. Default value is
5
.
The replication component of the cluster module is considered its most powerful and versatile feature. The Replicate
directive allows you to employ a sound and efficient replication framework to the data managed within Momentum.
The default ecelerity-cluster.conf
file defines the following replication:
cluster { ... Replicate "inbound_cidr" {} Replicate "outbound_cidr" {} Replicate "outbound_domains" {} Replicate "outbound_binding_domains" {} Replicate "shared_named_throttles" {} ... }
The following replication types are supported:
Replication Types
- cache
-
Replicate a data cache across the cluster
In addition to native Momentum data, it is possible to replicate user controlled data sets as well (such as caches). This provides a transparent and convenient mechanism to cache data from a module level in a medium that is accessible via every node participating in the cluster.
- inbound_cidr
-
Replicate inbound SMTP connections
Such metrics as the number of current connections from a specific netblock are calculated locally by referencing an internal structure called a CIDR tree. By specifying
Replicate "inbound_cidr" {}
, you tell the clustering subsystem to share all the local information about inbound connections tracked in its CIDR tree with every other node in the cluster (and vice versa). Using this shared information, the replication system will maintain an aggregated "cluster-wide" CIDR tree representing all inbound connections to the cluster. This option is included in the defaultecelerity-cluster.conf
file. - outbound_binding_domains
-
Replication in support of
cluster_scope_max_outbound_connections
The
Replicate "outbound_binding_domains" {}
stanza ensures that thecluster_scope_max_outbound_connections
option works cluster-wide. This option is included in the defaultecelerity-cluster.conf
file. - outbound_cidr
-
Replicate outbound SMTP connections
Similar to
Replicate "inbound_cidr" {}
, specifyingReplicate "outbound_cidr" {}
tells the clustering subsystem to share all the local information about outbound connections tracked in its CIDR tree with every other node in the cluster (and vice versa). The same is possible for outbound connections grouped by destination domain viaReplicate "outbound_domains" {}
. For outbound connections, it may be desirable to be more granular than aggregating on a cluster-wide premise. This option is included in the defaultecelerity-cluster.conf
file. - outbound_domains
-
Replicate outbound domains
This option is included in the default
ecelerity-cluster.conf
file. For details, see outbound_cidr . - OBTM
-
Replicate outbound message throttles
- OBTC
-
Replicate outbound connection throttles
- shared_named_throttles
-
Enable replication of shared throttles
- metrics
-
Maintain cluster-wide time series views for an IP address or CIDR block
- eccmgr_metrics
-
Update the eccmgr but not other nodes
The following are the options valid within this scope:
Options
- parameters
-
Defines which metrics will be replicated across the cluster. Possible values for this option are as follows:
-
snapshot
-
connect
-
delivery
-
transfail
-
permfail
-
reception
-
rejection
-
audit_series
-
gauge_cache
You may define multiple values in the following way:
parameters="connect;reception;rejection"
.There is no default value for this option.
-
- type
-
Replication type, as listed in Replication Types. There is no default value for this option.
- max_delay
-
Maximum amount of time to buffer replication messages before sending them out. Default value is
1.0
. - max_pending
-
Maximum number of replication messages to buffer before sending them out. Default value is
100
. - max_size
-
Maximum size of a replication cache. This option is only valid for caches. Default value is
-1
indicating the maximum supported integer size. - binding_group
-
This paramater is only valid for outbound_cidr and outbound_domains replication types. You can define multiple groups using the class option to track based on different binding groups. There is no default value for this option.
- class
-
This option is useful when you need multiple replication objects for the same type. So, for example, you could make a replicate object named
oubound_cidr_foo
for binding groupfoo
, but you would then have to define the class asoutbound_cidr
so it knew what type to register for. There is no default value for this option.In practice this isn't used much, if at all.
For additional information about using the Replication scope, see Data Replication
The DuraVIP™ featureset maintains the availability of MultiVIP© bindings and listener services on IP addresses despite node failures. Each binding or listener that should be managed in this fashion should be marked with a Enable_Duravip = true
option.
The following is the default DuraVIP™ network topology defined in the ecelerity-cluster.conf
file:
cluster {
... other options here ...
# DuraVIP network topology hints
Topology "10.1.1.0/24" {
cidrmask = "32"
interface = "eth1"
}
}
Because Momentum is responsible for adding and removing the corresponding IP addresses, more information must be known about the IP networks and physical interfaces on which these IPs will reside. Within the cluster module configuration, the following options in the Topology
scope provide this additional information:
- interface
-
In the example
Topology
scope shown previously,10.1.1.0/24
informs the clustering module that IPs in the range specified will be added to theeth1
ethernet interface. - cidrmask
-
When bringing an IP address online, you must also know the netmask it will be using. The cidrmask option indicates the number of bits in the netmask for a given IP address. In the example, the IP address should be added with a /32 netmask (i.e. 255.255.255.255). It is most common to add IP aliases with a 255.255.255.255 netmask, but this can vary between operating systems.
For additional details about DuraVIP™, see DuraVIP™: IP Fail over .
The cluster module can be controlled and queried through the console. Note: These commands do not execute when issued from within the eccmgr
service.
The following cluster commands are available:
- cluster abort
-
Abort the job with the specified id.
- cluster arp show
-
Resolve the MAC addresses of the cluster. Sample output follows.
12:34:15 ecelerity(/tmp/2025)> cluster arp show 10.80.116.204 [00:e0:81:63:5c:e8] 43s 10.80.116.202 [00:30:48:74:28:24] 13s 10.80.116.201 [00:e0:81:63:5d:64] 8s 10.80.117.25 [00:30:48:52:f9:06] 8s 10.80.117.2 [00:00:5e:00:01:0c] 8s 10.80.116.203 [00:30:48:74:29:ee] 8s
- cluster duravip move *`from_host`* *`to_host`*
-
The only safe way to do a duravip move is using a broadcast cluster duravip move command from within the
eccmgr
service. For details, see broadcast cluster duravip movefrom_host
to_host
. - cluster duravip announce view
-
This command announces a view update using the current local view.
Warning
If you modify DuraVIP™ bindings, a possible race condition means that a config reload taking effect on multiple machines at the same time can cause nodes to disagree about who owns which binding. For this reason, it is strongly suggested that you broadcast this command to the cluster by issuing the command broadcast cluster duravip announce view immediately after config reload . Doing this synchronizes ownership of the bindings and eliminates a possible race condition among the nodes.
- cluster duravip debug { on | off }
-
Turn DuraVIP™ debugging on or off.
- cluster duravip show
-
Show the current state of DuraVIP™ bindings. Sample output follows.
12:35:44 ecelerity(/tmp/2025)> cluster duravip show DuraVIP State: 10.80.116.50/flowgomail-hotmail-50/flowgomail-hotmail-51/flowgomail-hotmail-52: (owned,safe) multivip [view stable for 5420 seconds] [configuration stable for 5409 seconds] labrat-2 multivip labrat-1 multivip * labrat-4 multivip 10.80.116.53/flowgomail-hotmail-53: (owned,safe) multivip [view stable for 5420 seconds] [configuration stable for 5466 seconds] * labrat-2 multivip labrat-1 multivip labrat-4 multivip 10.80.116.54/flowgomail-hotmail-54: (owned,safe) multivip [view stable for 5420 seconds] [configuration stable for 5465 seconds] * labrat-2 multivip labrat-1 multivip labrat-4 multivip ...
The asterisk on the left indicates that the current machine owns that particular DuraVIP™.
- cluster duravip show tables
-
Display the DuraVIP™ state tables in XML format.
- cluster help
-
Display the available console commands.
- cluster info
-
Display the current operating status and parameters. Sample output follows.
13:38:31 ecelerity(/tmp/2025)> cluster info Daemon: 4803 [m:#ec0-22787#labrat-1,s:#ec1-22787#labrat-1] Control Group: ec_console Cluster Group: ec_cluster (labrat-4 is leader) Logger Group: none
- cluster membership
-
Display the current operating status and parameters including the private name of the node, the node name, and the node type. If the node is disconnected, no information is available. Sample output follows.
22:16:53 /tmp/2025> cluster membership Private Name: #ec0-20768#pono Node Name: pono Subcluster: default Node Type: Momentum Version: 3.0.10.30663 r(30669) Address: 10.79.0.20:4802 Groups: ec_console, ec_sc_default, ec_cluster Private Name: #ec0-08422#uhalehe Node Name: uhalehe Subcluster: default Node Type: Momentum Version: 3.0.10.30663 r(30669) Address: 10.79.0.15:4802 Groups: ec_console, ec_sc_default, ec_cluster Private Name: #barca#barca Node Name: (Null) Subcluster: (Null) Node Type: Manager Version: 3.0.10.30663 r(30669) Address: 0.0.0.0:0 Groups: ec_cluster
- cluster nodeaddr
-
Show the cluster protocol service address. Sample output follows.
13:40:17 ecelerity(/tmp/2025)> cluster nodeaddr 10.80.116.201:4802
- cluster nodename
-
Show the name of the local node.
- cluster reset
-
### Warning
Reset the node cluster membership. This command restarts Spread, forcing a new negotiation of DuraVIP™'s and as such should only be used in consultation with support.
- cluster shared list
-
This command displays the currently managed objects. For example:
22:36:50 ecelerity(2025)> cluster shared list Currently managed objects: inbound_cidr [lazy, cidrtree w/ snapshots] outbound_cidr [lazy, cidrtree w/ snapshots] outbound_domains [lazy, gauge table w/ snapshots]
The name of the type of shared object is followed by a description. In all current versions of Momentum all our objects are "lazy", which refers to a network protocol optimization when serializing the data and sharing it with the cluster. This is followed by an expanded representation of the replication object type; cidrtree, gauge table etc.
- cluster shared show *`obj_name`*
-
Display the specified shared object. Sample output follows.
15:26:28 ecelerity(/tmp/2025)> cluster shared show inbound_cidr lazy, snapped cidrtree 'inbound_cidr', view 'global'
- cluster shared show *`obj_name`* from *`node_name`*
-
Display the specified shared object from node's perspective.
- cluster show logs
-
Show the size, name and location of the cluster logs. Sample output follows.
15:40:34 ecelerity(/tmp/2025)> cluster show logs mainlog [on disk: 1604005 bytes] [subscriber: 'master' @ 00000000:0000a43a] paniclog [on disk: 9184 bytes] [subscriber: 'master' @ 00000000:00000059] rejectlog [on disk: 3950 bytes] [subscriber: 'master' @ 00000000:0000001e] bouncealllog [on disk: 0 bytes]