fcmaes extends scipy optimize with additional optimization methods, fast C++/Eigen-based implementations, and a coordinated parallel retry mechanism. It supports multi-threaded use of several gradient-free algorithms partly implemented in C++. It is designed for real-world problems, including multi-objective and constrained optimization. The main algorithms are available in both Python and C++. They support parallel fitness evaluation as well as parallel retry strategies.
Detailed benchmark results are available in performance.
-
Built for optimization problems that are difficult to solve and benefit from modern many-core CPUs.
-
Supports parallel fitness evaluation and several parallel retry strategies.
-
Scales well with the number of available CPU cores.
-
Keeps algorithm overhead low, even in high-dimensional problems.
-
Includes the multi-objective and constrained optimizer MODE. It combines ideas from Differential Evolution and NSGA-II and supports parallel function evaluation. It also uses enhanced multiple constraint ranking, which improves constraint handling in engineering design problems.
-
Supports quality-diversity methods, including CVT-map-elites, a CMA-ES emitter, and the new "diversifier" meta-algorithm based on CVT-map-elites archives.
-
Offers a selection of efficient single-objective optimizers.
-
Provides an ask-tell interface for CMA-ES, CR-FM-NES, DE, MODE, and PGPE.
The native BiteOpt backend now also provides an ask/tell interface through bitecpp.Bite_C.
This makes it possible to parallelize a single BiteOpt optimization run:
generate a batch of candidate points with ask(), evaluate them externally in parallel, and feed the resulting objective values back with tell().
This is most useful for expensive objective functions where wall-clock time is dominated by the simulation or model evaluation. In that setting, evaluating several candidates concurrently can reduce elapsed optimization time even if the optimizer itself needs a few more function evaluations.
There is an important trade-off: BiteOpt normally updates its internal selection strategy immediately after each evaluation.
In ask/tell mode these updates are delayed until tell(), so a whole batch is generated from a frozen optimizer state.
That delayed feedback introduces a convergence penalty compared to the ordinary sequential BiteOpt loop.
In other words, ask/tell usually improves parallel throughput, but may reduce convergence efficiency measured in evaluations.
So the rule of thumb is:
-
For cheap objective functions, prefer the standard sequential BiteOpt run or parallel retry.
-
For expensive objective functions, BiteOpt ask/tell can be a very practical way to parallelize one optimization run across several cores or workers.
autoresearch-circuit applies the same split-brain pattern to biochemical circuit design, inspired by the CircuiTree paper. An LLM proposes circuit topology — which genes regulate which. fcmaes tunes the kinetic rates against a noisy stochastic simulator. A validation stage stress-tests the result with knockouts, knockdowns, and parameter jitter. Structure and numbers stay separated, each handled by the right tool.
autoresearch-trading applies Karpathys autoresearch-idea to trading algorithm optimization. LLMs are good at structure. Classical optimizers are good at searching numbers. So I built a system that separates the two. I think this pattern goes beyond trading. Anywhere you have a simulator and a mix of logic + tunable numbers, this approach may be useful.
See Optimization Assistant How to set up, configure and use an open source AI optimization assistant specialized for fcmaes.
See Moran Process Optimize a Moran Process. Check out if you have to optimize parameters for an expensive simulation. Includes a detailed comparison with Optuna.
See BuckinghamExamples Reveal the underlying structure of physical models by using evolutionary algorithms to fit continuous powers that maximize data collapse R2 under variance constraints.
See Spherical t-design Shows how to compute weighted spherical t-designs to optimize laser fusion reactor designs.
LLMs can help to generate code implementing a trading strategy. It can even propose ways to optimize the final return. prophet_opt.py shows:
-
The LLM prompts used to generate the strategy back-testing code.
-
How to identify the parameters to optimize using the AI.
-
How the parameter optimization process can be automated efficiently utilizing trading simulations executed in parallel.
-
Switch from ctypes C++ binding to nanobind.
-
MacOS fully supported - including ARM.
-
Improved documentation and tutorials
-
Improved type annotations
-
Improved test coverage
-
Logging now based on loguru. All examples are adapted.
-
New dependency: loguru.
numbais optional and mainly used as a speedup for MAP-Elites and some examples. -
New tutorial related to the GECCO 2023 Space Optimization Competition: ESAChallenge.
-
You can define an initial population as guess for multi objective optimization.
-
Pure Python versions of the algorithms are now usable also for parallel retry. Pure Python features: Algorithms: CMA-ES, CR-FM-NES, DE, MODE (multiple objective), Map-Elites+Diversifier (quality diversity). All Python algorithms support an ask-tell interface and parallel function evaluation. Additionally parallel retry / advanced retry (smart boundary management) are supported for these algorithms.
-
Python version > 3.7 required, 3.6 is no longer supported.
-
PEP 0484 compatible type hints useful for IDEs like PyCharm.
-
Most algorithms now support an unified ask/tell interface: cmaes, cmaescpp, crfmnes, crfmnescpp, de, decpp, mode, modecpp, pgpecpp. This is useful for monitoring and parallel fitness evaluation.
-
Added support for Quality Diversity [QD]: MAP-Elites with additional CMA-ES emitter, new meta-algorithm Diversifier, a generalized variant of CMA-ME, "drill down" for specified niches and bidirectional archive <→ store transfer between the QD-archive and the smart boundary management meta algorithm (advretry). All QD algorithms support parallel optimization utilizing all CPU-cores and statistics for solutions associated to a specific niche: mean, stdev, maximum, minimum and count.
Derivative free optimization of machine learning models often have several thousand decision variables and require GPU/TPU based parallelization both of the fitness evaluation and the optimization algorithm. CR-FM-NES, PGPE and the QD-Diversifier applied to CR-FM-NES (CR-FM-NES-ME) are excellent choices in this domain. Since fcmaes has a different focus (parallel optimizations and parallel fitness evaluations) we contributed these algorithms to EvoJax which utilizes JAX for GPU/TPU execution.
To utilize modern many-core processors all single-objective algorithms should be used with the parallel retry for cheap fitness functions, otherwise use parallel function evaluation.
-
MO-DE: A new multi-objective optimization algorithm merging concepts from differential evolution and NSGA. Implemented both in Python and in C++. Provides an ask/tell interface and supports constraints and parallel function evaluation. Can also be applied to single-objective problems with constraints. Supports mixed integer problems (see CFD for details)
-
CVT-map-elites/CMA: A new Python implementation of CVT-map-elites including a CMA-ES emitter providing low algorithm overhead and excellent multi-core scaling even for fast fitness functions. Enables "drill down" for specific selected niches. See mapelites.py and Map-Elites.
-
Diversifier: A new Python meta-algorithm based on CVT-map-elites archives generalizing ideas from CMA-ME to other wrapped algorithms. See diversifier.py and Quality Diversity.
-
BiteOpt algorithm from Aleksey Vaneev BiteOpt. Only a C++ version is provided. If your problem is single objective and if you have no clue what algorithm to apply, try this first. Works well with almost all problems. For constraints you have to use weighted penalties.
-
Differential Evolution: Implemented both in Python and in C++. Additional concepts implemented are temporal locality, stochastic reinitialization of individuals based on their age and oscillating CR/F parameters. Provides an ask/tell interface and supports parallel function evaluation. Supports mixed integer problems (see CFD for details)
-
CMA-ES: Implemented both in Python and in C++. Provides an ask/tell interface and supports parallel function evaluation. Good option for low number of decision variables (< 500).
-
CR-FM-NES: Fast Moving Natural Evolution Strategy for High-Dimensional Problems, see https://arxiv.org/abs/2201.11422. Derived from https://github.com/nomuramasahir0/crfmnes . Implemented both in Python and in C++. Both implementations provide parallel function evaluation and an ask/tell interface. Good option for high number of decision variables (> 100).
-
PGPE Parameter Exploring Policy Gradients, see http://mediatum.ub.tum.de/doc/1099128/631352.pdf . Implemented in C++. Provides parallel function evaluation and an ask/tell interface. Good option for very high number of decision variables (> 1000) and for machine learning tasks. An equivalent Python implementation can be found at pgpe.py, use this on GPUs/TPUs.
-
Wrapper for cmaes which provides different CMA-ES variants implemented in Python like separable CMA-ES and CMA-ES with Margin (see https://arxiv.org/abs/2205.13482) which improves support for mixed integer problems. The wrapper additionally supports parallel function evaluation.
-
Dual Annealing: Eigen based implementation in C++. Use the scipy implementation if you prefer a pure Python variant or need more configuration options.
-
Expressions: There are two operators for constructing expressions over optimization algorithms: Sequence and random choice. Not only the single objective algorithms above, but also scipy and NLopt optimization methods and custom algorithms can be used for defining algorithm expressions.
-
pip install fcmaes -
Recommended Python environment: miniforge Python 12.
Wheels are built by the release CI. If no matching wheel is available for your
platform or Python version, pip falls back to a source build.
-
pip install fcmaes -
Install C++ runtime libraries https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads
-
Recommended Python environment: miniforge Python 12.
For parallel fitness function evaluation use the native Python optimizers or the ask/tell interface of the C++ ones. Python multiprocessing works better on Linux. To get optimal scaling from parallel retry and parallel function evaluation you may still use:
-
Linux subsystem for Windows WSL.
The Linux subsystem can read/write NTFS, so you can do your development on a NTFS partition. Just the Python call is routed to Linux. If performance of the fitness function is an issue and you don’t want to use the Linux subsystem for Windows, think about using the fcmaes java port: fcmaes-java.
Usage is similar to scipy.optimize.minimize.
For parallel retry use:
from fcmaes import retry
ret = retry.minimize(fun, bounds)The retry logs mean and standard deviation of the results, so it can be used to test and compare optimization algorithms: You may choose different algorithms for the retry:
from fcmaes.optimizer import Bite_cpp, De_cpp, Cma_cpp, Sequence
ret = retry.minimize(fun, bounds, optimizer=Bite_cpp(100000))
ret = retry.minimize(fun, bounds, optimizer=De_cpp(100000))
ret = retry.minimize(fun, bounds, optimizer=Cma_cpp(100000))
ret = retry.minimize(fun, bounds, optimizer=Sequence([De_cpp(50000), Cma_cpp(50000)]))Here https://github.com/dietmarwo/fast-cma-es/blob/master/examples you find more examples. Check the tutorials for more details.
Runtime:
-
numpy: https://github.com/numpy/numpy, version >= 1.20
-
scipy: https://github.com/scipy/scipy, version >= 1.8
-
scikit-learn: https://github.com/scikit-learn/scikit-learn (for CVT-Map-Elites), version >= 1.1
Compile-time dependencies for source builds:
-
Eigen https://gitlab.com/libeigen/eigen (version >= 3.4.0 is required for CMA).
-
PCG Random Number Generation https://github.com/imneme/pcg-cpp - used in all C++ optimization algorithms.
-
LBFGSpp: https://github.com/yixuan/LBFGSpp/tree/master/include - used for dual annealing local optimization.
Prebuilt wheels are built by CI for Linux, Windows, and macOS.
Optional dependencies:
Example dependencies:
-
pykep: pykep. Install with 'pip install pykep'.
-
Pull requests and branch pushes build wheels on Linux, Windows, and macOS.
-
A manual GitHub Actions run with
publish_target=testpypipublishes to TestPyPI. -
Tags matching
v*publish to PyPI. -
CI releases use Trusted Publishing, so no long-lived PyPI API token is required in GitHub Actions.
@misc{fcmaes2025,
author = {Dietmar Wolz},
title = {fcmaes - A Python-3 derivative-free optimization library},
note = {Python/C++ source code, with description and examples},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {Available at \url{https://github.com/dietmarwo/fast-cma-es}},
}