Computational effort and emulation error
There is nothing better to force a new perspective on the work we do than the questions from other people. In this post we revisit the idea of emulation inspired by some surgical questions formulated by Kris Villez, aiming at understanding the relation between an emulator and the simulator it is based on.
These are the questions we will discuss:
Can my emulator make exact predictions of my simulator output for any input?
If my emulator makes exact predictions, will it be faster than my simulator?
How small should my emulation error be?
The figure above summarizes the discussions in this post. They involve a simulator (“current simulator” in the figure), and a function(al) used to compute some low dimensional output which is the relevant signal for the task at hand, i.e. we are looking for an input-output relation (where output is not the output of the simulator, but of some dimension reducing function(al)). An emulator is built to map inputs to this low dimensional output.
Lets describe the different items in the figure above.
Y-axis: Emulation error
The vertical axis of the figure indicates the emulation error, that is the error we obtain when we test our emulator on new inputs and compare the emulation result with the output of the simulator. In machine learning this is usually called the test or generalization error.
X-axis: Computational effort
The horizontal axis shows how much computation we need to get a result. The current simulator defines the computational budget, i.e. an emulator that needs a similar computational effort is not interesting. Computational effort is usually measured as the amount of time we have to wait to get an output from the simulator (runtime), but it could be measured in other terms relevant for the application at hand, e.g. electrical power consumption.
In this axis we also marked the effort required by an ideal implementation of the simulator (“ideal simulator”): the ultimately simplified simulator. For very complex systems this ideal simulator tends to be the system itself.
There are two lines emanating from the wrongest emulator, i) “Simulator simplification” and, ii) “Exact prior”. We will explain the latter in subsequent paragraphs. The former, “Simulation simplification”, refers to the situation in which, by using the data available, we are able to build an emulator that is faster than the current simulator and is also exact. This is seldom possible using the data-driven methods commonly used in emulation, and it also depends heavily on the nature of the simulated process. Such situation is more related to the optimization of algorithms, data structures, and hardware. We will not directly discuss this situation here, but see the part on “Exact emulation”.
Task error tolerance
Another important aspect of the figure is the horizontal green line labeled “Task error tolerance”. That is, the task might be accomplish with the same performance even if the emulator doesn’t match the simulator+function(al) results exactly. For discussions on this you can read about approximate computing 1, but the intuition is quite straight forward: take as the output of your simulator+function(al) and say that the relevant signal is a linear combination of those outputs, i.e . Now consider an emulator that produces a wrong output . Any error that fulfills the equality will produce exact results on .
In other words being wrong doesn’t imply that the error will propagate to the relevant signals. Hence the task, expressed above as a linear combination of the outputs, might give us some room for error and the possibility of saving some computational resources. This is one of the reasons why emulation is very useful when the task reduces the dimension of the simulator outputs. The reduction might create an effective “null-space” in which we can place the emulation error. I write “null-space” in quotes, because I am abusing a concept from linear algebra, but the ideal holds. In thermodinamics we would speak of entropy to quantify the size of the allowed error set (a sort of partition function).
Here the people conversant with parametric and non-parametric methods will have to excuse my hand-waving explanations. With the aim of making this article valuable for the non-experts some correctness was thrown away.
In most cases, improving our emulator by reducing the error increases the computational effort. For example if we are using a fixed basis of functions to regress the simulation results (a parametric method, e.g. generalized Fourier series, polynomial chaos expansion, or Wiener/Volterra series, etc), the emulator can get better by adding new basis functions/kernels and adapting already existing ones (e.g. basis pursuit (denoising), matching pursuit, dictionary learning, etc.).
In the case of non-parametric methods such as Gaussian Processes, this could be achieved by adding more data points.
When the emulation error crosses the level defined by the task tolerance, we have found the optimal emulator. This sounds good, but the problem is that most of the time we cannot calculate the task error tolerance, or the magnitude of the error is not actually a good indicator. Take our example above: two emulator with the same error magnitude are not necessarily both perpendicular to .
Exact prior and prior mismatch
When should we expect to achieve a zero error? Only when we are able to guess/find the correct representation of the input-output relation we are trying to learn.
Lets get away from math for a second to illustrate the negative case. What I wrote above is something like: if you are using cats features to represent birds, you can expect to get close but not that close, and you will probably never get a good representation of a bird. More mathematically, we could say that if the relation we are trying to learn is not contained in the “space” generated by the chosen basis then we can hope for a small but not negligible error. A common example of this phenomena is given by the difficulty in approximating a discontinuous function by a finite series of continuous functions, i.e. Gibbs phenomena.
The positive case is embodied by the Nyquist–Shannon sampling theorem, in which a signal is perfectly recovered only using a finite set of observations.
All boils down to the same thing, if your prior knowledge (your assumptions about the input-output relation) is good, then you can hope for a perfect emulation. Furthermore, if your emulation method is cheaper than your simulator+function(al) you will get a sort of simulator simplification because your emulator will be cheaper to evaluate than your simulator and still produce perfect outputs. However, I wouldn’t expect that this happens very often in practice…but it is not impossible.
On exact emulation
To illustrate the possibility of exact emulation, I need you to use your imagination. Imagine that there is an alien race that has not yet discovered Hooke’s law of linear elasticity. But they are quite advance in terms of molecular simulations; they can build simulators that calculate forces between many many atoms. Among these aliens there is a great inventor who is developing a device that when deformed produces a force.
The inventor sets up a simulator using their molecular technique. The model describes the shape of the device for any deformation and it calculates the forces between all the molecules in the device. This way the inventor can simulate the force generated at each deformation. Their simulator is, of course, computing a lot more than just that force value. It is indeed tracking the positions and velocities of all the molecules in the device and calculating all the inter molecular forces. This is a very costly simulator indeed!
To accelerate the molecular simulator, the inventor produces a dataset of small deformations and the correspoding forces. Using this dataset they build an interpolating function, i.e. an emulator, that will provide the force for any deformation. It turns out that the interpolant is just a linear function, the emulation is exact and it is also many times faster than the simulator! The inventor just discovered Hooke’s law.
How is this possible?
One the one hand, the emulator is not calculating all the things the simulator is calculating. On the other hand, the total force generated by the device is an aggregation of all the intermolecular forces and although each force has a very complicated relation with the deformation the total force does not. So two ingredients are at play, i) the emulator does not compute everything the simulator does, and ii) the observed output (force) maintains a simple relationship with the input (deformation). That is, we are throwing information away and, with some luck, the relation between the remaining magnitudes can be expressed in simpler terms.
When these two ingredients are present you can hope to find a very good emulator. In physics people use the word renormalization to refer to the process of throwing irrelevant information and the calculation of the new effective model (also called effective theory or coarse-grained model). The method is fully formalized for quantum fields, but the ideas are spreading to general models 2 3. I expect emulation will profit a lot from these developments.
Han, J., & Orshansky, M. (2013). Approximate computing: An emerging paradigm for energy-efficient https://doi.org/10.1109/ETS.2013.6569370 ↩
Wolpert, D. H., Grochow, J. A., Libby, E., & DeDeo, S. (2014). A framework for optimal high-level descriptions in science and engineering—preliminary report. arXiv preprint arXiv:1409.7403. ↩
Crutchfield, J. P., James, R. G., Marzen, S., & Varn, D. P. (2014). Understanding and Designing Complex Systems: Response to” A framework for optimal high-level descriptions in science and engineering—preliminary report”. arXiv preprint arXiv:1412.8520. ↩