Skip to main content

Performance Tips

Last updated May 2024

Momentum is an exceptionally powerful all-in-one email infrastructure solution. For several reasons, however, the default configuration shipped with the installation bundle does not run at full speed for all the use cases. This chapter provides you with some tips to optimize the Momentum performance ratings.

CPU Optimization

With Supercharger licensed feature, Momentum runs on top of several event loop schedulers and uses multicore CPUs with improved efficiency. In this model, it is also possible to assign dedicated event loops to listeners (e.g. the HTTP one) with desired concurrency. On the other hand, the default configuration is based solely on the thread pools to offload specific tasks, therefore Momentum keeps running on top of the original master event loop only, and it can be occasionally bottlenecked.

The Supercharger's "75% of CPU cores" formula works fine on systems largely SMTP-driven. For systems with larger message generation flows (i.e., REST injections), the number of event loops can be limited to 4 or 5, with higher concurrency values assigned to the msg_gen thread pools (see gen_transactional_threads configuration here). For instance:

msg_gen  {
  (...)
  gen_transactional_threads = 4
}

Also, the CPU thread pool is expected to be used for a lot of functions in the REST flows, so it is recommended to increase the concurrency from its default value (of 4):

ThreadPool "CPU" {
  concurrency = 8
}

Last, it is recommended to assign separated event loops for listeners to reduce latency and improve the overall performance. For instance, the following configuration assigns dedicated event loops to the ESMTP and HTTP listeners:

ESMTP_Listener {
  event_loop = "smtp_pool"
  (...)
}
(...)
HTTP_Listener {
  event_loop = "http_pool"
  (...)
}

Better Caching

Momentum has some built-in caches that can be tuned to improve performance. The following are the most important ones:

Generic Getter

This cache is used for parameters that are not in a binding/domain scope. So anything that's global, or module configuration, exist in the generic getter cache. This cache gets a lot of traffic, so setting it in ecelerity.conf to something like few million entries is reasonable:

generic_getter_cache_size = 4000000

Regex Match

The match cache saves results of queries against regular expression domain stanzas. This cache is enabled by default, but its size is very small by default (16384 entries). Making it larger is a great idea, especially if user is using any regular expression domain stanzas:

match_cache_size = 2000000

Boosting jemalloc Performance

jemalloc has demonstrated excellent performance and stability. Because of that, it became the default Momentum's memory allocator. However, it is possible to get even more from it by tuning the MALLOC_CONF environment variable.

Add these lines to /opt/msys/ecelerity/etc/environment file (or create it):

MALLOC_CONF="background_thread:true"
export MALLOC_CONF

then (re)start the ecelerity service.

Tuning Lua

Lua has a garbage collector that can be tuned to improve performance. The following are some recommended settings:

In the ecelerity.conf file:

ThreadPool "gc" {
    concurrency = 10
}
(...)
scriptlet "scriptlet" {
    (...)
    gc_every = 20
    gc_step_on_recycle = true
    gc_stepmul = 300
    gc_threadpool = “gc”
    gc_trace_thresh = 1000
    gc_trace_xref_thresh = 1000
    global_trace_interval = 13
    max_uses_per_thread = 5000
    reap_interval = 13
    use_reusable_thread = true
}

Enforce these settings in the /opt/msys/ecelerity/etc/environment file:

USE_TRACE_THREADS=true
export USE_TRACE_THREADS
LUA_USE_TRACE_THREADS=true
export LUA_USE_TRACE_THREADS
LUA_NUM_TRACE_THREADS=8
export LUA_NUM_TRACE_THREADS
LUA_NON_SIGNAL_COLLECTOR=true
export LUA_NON_SIGNAL_COLLECTOR

Miscellaneous Configuration

These are ecelerity.conf settings that are known to improve performance on different tasks of Momentum. Before applying them, however, review their documentation and make sure they fit to your environment and use cases:

fully_resolve_before_smtp = false
growbuf_size = 32768
inline_transfail_processing = 0
initial_hash_buckets = 64
keep_message_dicts_in_memory = true
large_message_threshold = 262144
max_resident_active_queue = 1000
max_resident_messages = 100000

Miscellaneous Tips

  • Don't forget to adjust sysctl settings for best TCP connections performance;
  • Prefer chunk_logger over logging to paniclog. The reasons are taken from the chunk_logger page:

Logging to the paniclog in the scheduler thread (the main thread) can limit throughput and cause watchdog kills. (...) [It] involves disk I/O, and writing to the paniclog in the scheduler thread may block its execution for a long time, thereby holding up other tasks in the scheduler thread and decreasing throughput.

Was this page helpful?