Useful Links

2014-02-26T18:01:01Z

2009-11-03T09:39:04Z

Admin: /* How can I make the toolbox run faster? */

== General ==

=== What is a global surrogate model? ===

A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:

* design space exploration, to get a ''feel'' of how the different parameters behave
* sensitivity analysis
* ''what-if'' analysis
* prototyping
* visualization
* ...

In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.

See also the [[About]] page.

=== What about surrogate driven optimization? ===

When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.

Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox.

Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.

A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].

=== What is (adaptive) sampling? Why is it used? ===

In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].
SUMO provides a number of different sampling algorithms: [[SampleSelector]]

Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].

=== What about dynamical, time dependent data? ===

The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...

The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.

Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].

For more information on this topic [[Contact]] us.

=== What about classification problems? ===

The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].

=== Can the toolbox drive my simulation code directly? ===

Yes it can. See the [[Interfacing with the toolbox]] page.

=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===

The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.

=== What happened to the M3-Toolbox? ===

The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.

=== How can I stay up to date with the latest news? ===

To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== What is the roadmap for the future? ===

There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== Will there be an R/Scilab/Octave/Sage/.. version? ===

At the start of the project we considered moving to one of the available open source alternatives to Matlab. However, after much discussion we decided against this for several reasons(*), including:

* The quality and amount of available Matlab documentation
* The quality and number of Matlab toolboxes
* Many well documented interfacing options (esp. Java)
* Existing experience and know-how

Matlab sure has its problems and deficiencies but the number of advanced algorithms and toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. The other platforms remain on our radar however, and we do look into them from time to time. In principle it would even be possible to write a bridge between Matlab and them.

(*) We are not saying those projects are poor or useless, quite the contrary. Its just that given our situation, goals, and resources at the time, Matlab was the best choice for us.

=== What are collaboration options? ===

We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.

The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.

If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.

In either case (code or dataset) remember:

* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])
* include a short description of your data:
** number of inputs and number of outputs
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)
** if the outputs are real or complex valued
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).
** if possible the expected range of each output (or scaled if you do not wish to disclose this)
** if possible the names of each input/output + a short description of what they mean
** any further insight you have about the data, expected behavior, expected importance of each input, etc.

If you have any further questions or comments related to this please [[Contact]] us.

=== Can you help me model my problem? ===

Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]

== Installation and Configuration ==

=== What is the relationship between Matlab and Java? ===

Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.

=== What is Java, why do I need it, do I have to install it, etc. ? ===

The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].

=== What is XML? ===

XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.

The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:

<source lang="xml">
<Foo attr="bar">bla bla bla</Foo>
</source>

Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)

Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.

If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]

=== Why does SUMO use XML? ===

XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.
So basically, it would not make sense for SUMO to use any other format :)

=== I get an error that SUMO is not yet activated ===

Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.

Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].

== Upgrading ==

=== How do I upgrade to a newer version? ===

Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.

If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.

== Using ==

=== I have no idea how to use the toolbox, what should I do? ===

See: [[Running#Getting_started]]

=== I want to try one of the different examples ===

See [[Running#Running_different_examples]].

=== I want to model my own problem ===

See : [[Adding an example]].

=== I want to contribute some data/patch/documentation/... ===

See : [[Contributing]].

=== How do I interface with the SUMO Toolbox? ===

See : [[Interfacing with the toolbox]].

=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===

See [[General_guidelines]].

=== Ok, I generated a model, what can I do with it? ===

See: [[Using a model]].

=== How can I share a model created by the SUMO Toolbox? ===

See : [[Using a model#Model_portability| Model portability]].

=== I dont like the final model generated by SUMO how do I improve it? ===

Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:

* accuracy in the training data
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing
* accuracy outside the training data
** this is the validation or test error, how important is proper generalization (usually this is very important)
* what does accuracy mean to you? a low maximum error, a low average error, both, ...
* smoothness
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example
* are some regions of the response more important than others?
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important
* are there particular special features that your model should have
** for example, capture underlying poles or discontinuities correctly
* extrapolation capability
* ...

It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].

Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].

Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:

* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)
* the model type you are using is not ideally suited to your data
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)

You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.

If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.

=== My data contains noise can the SUMO-Toolbox help me? ===

The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.

Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.

=== What is the difference between a ModelBuilder and a ModelFactory? ===

See [[Add Model Type]].

=== Why are the Neural Networks so slow? ===

The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:

* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.

See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

=== How can I make the toolbox run faster? ===

There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.

* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.

* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files

* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.

* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:

<source lang="xml">
<Profiling>
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true">
<Output type="toImage"/>
<Output type="toFile"/>
</Profiler>

<Profiler name=".*" enabled="true">
<Output type="toFile"/>
</Profiler>
</Profiling>
</source>

The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.

* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.
* If you have a multi-core/multi-cpu machine:
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]

If you are having problems with very slow or seemingly hanging runs

* Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.
* Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange.

Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].

To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].

Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.

=== How do I build models with more than one output ===

Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]

=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===

See : [[Adaptive Modeling Mode]].

=== How do I change the error function (relative error, RMSE, ...)? ===

The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.

Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:

<source lang="xml">
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/>
</source>

On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:

<source lang="xml">
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/>
</source>

The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].

=== How do I enable more profilers? ===

Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.

=== What regular expressions can I use to filter profilers? ===

See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].

=== How can I ensure deterministic results? ===

See : [[Random state]].

=== How do I get a simple closed-form model (symbolic expression)? ===

See : [[Using a model]].

=== How do I enable the Heterogenous evolution to automatically select the best model type? ===

Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.

=== What is the combineOutputs option? ===

See [[Running#Models_with_multiple_outputs]]

=== What error function should I use? ===

The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).

So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.

Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.

A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].

=== I just want to generate an initial design (no sampling, no modeling) ===

Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.

=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===

Use a Dataset design component, for example:

<source lang="xml">
<InitialDesign type="DatasetDesign">
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/>
</InitialDesign>
</source>

=== What is a level plot? ===

A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:
<gallery>
Image:levelplot.png
</gallery>
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.

===I am getting a java out of memory error, what happened?===
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]

=== How do I force the output of the model to lie in a certain range ===

See [[Measures#MinMax]].

=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===

That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.

=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===

When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.

=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===

When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).

Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.

HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.

On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).

As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.

== Troubleshooting ==

=== I have a problem and I want to report it ===

See : [[Reporting problems]].

=== I sometimes get flat models when using rational functions ===

First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.

The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:

* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.

=== When using rational functions I sometimes get 'spikes' (poles) in my model ===

When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.

So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.

However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:

* Simply wait for more data, usually spikes disappear (but not always).
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.
* Increase the population size if using the genetic version
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.

However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem

=== There is no noise in my data yet the rational functions don't interpolate ===

[[FAQ#I sometimes get flat models when using rational functions |see this question]].

=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===

You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path.

In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.

=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===

This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.

=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===

This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.

=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===

This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.

=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===

You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.

=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===

First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].

This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.

If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:

# typing 'usejava jvm' should return 1
# typing 's = java.lang.String', this should ''not'' give an error
# typing 'version('-java')' should return at least version 1.5.0

If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).

=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===

You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.

=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===

See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]

=== I get a warning about duplicate samples being selected, why is this? ===

Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.

=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===

There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].

# '''Sampling off'''
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm
# '''Sampling on'''
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).
## ''Multi objective'': Combination of 1.2 and 2.1.

(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!

=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===

This is normal, the warning printed out before the error explains why:

''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''

By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.

=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===

This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.

If you still have problems you can override the default logfile name and location as follows:

In the <FileHandler> tag inside the <Logging> tag add the following option:

<code>
<Option key="Pattern" value="My_SUMO_Log_file.log"/>
</code>

This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].

=== The Toolbox crashes with "Too many open files" what should I do? ===

This is a known bug, see [[Known_bugs#Version_6.1]].

If this does not fix your problem then do the following:

On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.

On Linux again don't shut down Matlab but:

* open a new terminal window
* type:
<source lang="bash">
lsof > openFiles.txt
</source>
* Then [[Contact|send us]] the following information:
** the file openFiles.txt
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).
** the output of
<source lang="bash">
uname -a ; df -T ; mount
</source>

As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.

In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.

=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===

The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.

=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===

You most likely forgot to [[Installation#Extension_pack|install the extension pack]].

=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===

Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.

So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.

What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:

* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)

<source lang="bash">
/etc/X11/xorg.conf
</source>

Note: the exact location of the xorg.conf file may vary on your system.

* Look for the following line:

<source lang="bash">
Load "glx"
</source>

* Comment it out by replacing it by:

<source lang="bash">
# Load "glx"
</source>

* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].

Note:
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]
* if you wish to debug further yourself please check the Xorg log files and those in /var/log
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.

=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===

This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.

=== The toolbox seems to keep on running forever, when or how will it stop? ===

The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].

Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.

While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.

Changelog

2009-10-19T09:13:34Z

Admin:

Below you will find the detailed list of changes in every new release. For a more high level overview see the [[Whats new]] page.

== 6.2.1 - 19 October 2009 ==

* This release fixes a number of bugs from 6.2. All users are strongly requested to upgrade.

== 6.2 - 6 October 2009 ==

* A new neural network modelbuilder "ann". This is a lot faster than the existing "anngenetic" and the quality of the models is roughly the same
* The sample selection infrastructure is now much more powerful, sample selection criteria can be combined with much more flexibility. This opens the way to dynamic variation of sampling criteria.
* Support for Input constraints / multiple output sampling in the LOLA-Voronoi sample selection algorithm
* Support for auto-sampled inputs (e.g., frequency in an EM context) in LOLA-Voronoi. This is useful if a particular input is already sampled by your simulator.
* Automatic filtering of samples close to each other in CombinedSampleSelector
* Support for TriScatteredInterp in InterpolationModel when it is available (Matlab version 2009a and later)
* Sample selectors that support it (for example: LOLA-Voronoi) now give priorities to new samples, to that samples are submitted and evaluated in order of importance.
* Support for pre-calculated Latin Hypercube Designs, these will be automatically downloaded and used where possible and will improve performance
* The Blind Kriging models have been improved and can now also be used as ordinary Kriging models. Since these models are superior to the existing DACE Toolbox models, the DACE Toolbox backend has been removed.
* The EGOModelBuilder (do model parameter optimization using the EGO algorithm) now uses a nested blind kriging model instead of one based on the DACE Toolbox. This allows for better accuracy
* The Kriging correlation functions can now be chosen automatically (instead of only the correlation parameters)
* Support for multiobjective optimization in the EGO framewok (extended version of probability of improvement)
* DelaunaySampleSelector, OptimizeCriterion support the same set of criterions
* EGO Improvement criteria can now be used together with DACEModel, RBFModel, and SVMModel (LS-SVM backend only)
* Added a model type and builder that does linear/cubic/nearest neighbour interpolation
* All error functions and measures now consistently deal with complex valued data and multiple output models
* Various improvements in the Model Info GUI as part of the Model browser tool
* Improved stability in LRMMeasure, a behavioral complexity metric to help ensure parsimonious models
* The profiler GUI has been updated and improved, and support for textual profilers has been added.
* Improved performance when using Measures, especially for models with multiple outputs.
* Improved management of the best model trace, also in pareto mode
* Removed the debug output when using (LS-)SVM models and added compiled mex files for Windows
* Ported the remaining classes to Matlabs Classdef format
* Increased use of the parallel computing toolbox (if available) in order to speed up modeling
* Improved the Matlab file headers so the help text is more informative (always includes at least the signature)
* Support for plotting the model prediction uncertainty in the model browser (only for 1D plots and not supported by all model types)
* Added support for so-called "reference by id" on every level of the config. If a tag of a particular type is defined on top-level with an id, it can be referenced everywhere else, instead of copying it entirely. See rationalPoleSupression sample selector and patternsearch Optimizer, for example.
* EmptyModelBuilder added - in case you just want to use the sequential design facilities of the toolbox, but not its models.
* Various cleanups and bugfixes

== 6.1.1 - 17 April 2009 ==

* Various cleanups and bugfixes (see [[Known bugs]] for 6.1)

== 6.1 - 16 February 2009 ==

* The default error function is now the Bayesian Error Estimation Quotient (BEEQ)
* Full support for multi-objective model generation, multiple measures can now be enforced simultaneously. This can also be applied to generating models with multiple outputs (combineOutputs = true). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output.
* The model browser GUI now supports QQ plots
* The Gradient Sample Selection Algorithm has been renamed to the Local Linear Sample Selector (LOLASampleSelector)
* The modelbuilders have been refactored and some removed. This is a result of the optimizer hierarchy being cleaned up. Adding a new model parameter optimization routine should now be more straightforward.
* The interface classes have been renamed to factories as this is more correct. All implementations have been ported to Matlab's new Classdef format and the inherritance hierarchy has been cleaned up. It should now be significantly easier to add support for new approximation types.
* The ModelInterfaces are now known as ModelFactories, this is more correct. Note that the XML tagnames have been changed as well.
* The Model class hierarchy has been converted to the new Classdef format. This means that models generated with previous versions of the toolbox will no longer be loadable in this version.
* The heterogenetic model builder for automatic model type selection has been cleaned up and made more robust.
* Rational models now support all available modelbuilders. This means that order selection can be done by PSO DIRECT, Simulated Annealing, ... instead of just GA and Sequential.
* New optimizers added are (they can also be used as model builders): Differential Evolution
* Added a Blind Kriging model type implementation as a backend of KrigingModel
* Addition of an EGO model builder. This allows optimization of the model parameters using the well known Efficient Global Optimization (EGO) algorithm. In essence this uses a nested Kriging Model to predict which parameters should be used to build the next model.
* Trivial dependencies on the Statistics Toolbox have been removed
* Added a new smoothness measure (LRMMeasure) that helps to ensure smooth models and reduce erratic bumps. It works best when combined with other Measures (such as SampleError for ANN models)
* Models now have a simple evaluateDerivative() method that allows one to easily get gradient information. The base class implementation is very simple but works. Models can override this method to get more efficient implementations.
* Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's.
* Many speed improvements, some quite significant.
* Various cleanups and bugfixes

== 6.0.1 - Released 23 August 2008 ==

* Fixed a number of (minor) bugs in the 6.0 release

== 6.0 - Released 6 August 2008 ==

* Many important bugs have been fixed that could have resulted in sub-optimal models
* Addition of a Model Browser GUI, this allows you to easily 'walk' through multi-dimensional models
* Moved the InitialDesign tag outside of the SUMO tag
* Some speed improvements
* Removed support for dummy inputs
* Measure scores and input/output names are saved inside the models, allowing for more usable plots
* Added the project directory concept, each example is now self contained in its own directory
* #simulatorname# can now be used in the run name, it will get replaced by the real simulator name
* Input dimensions can be ignored during sampling if the simulator samples them for you. This is useful in EM applications for example where frequency points can be cheap.
* Logging framework revamped, logs can now be saved on a per run basis
* The global score calculation has changed! it is a weighted sum of all individual measures. (the weights are configurable but default to 1)
* Added a simple polynomial model where the orders can be chosen manually
* Countless cleanups, minor bugfixes and feature enhancements

== 5.0 - Released 8 April 2008 ==

* In April 2008, the first public release of the '''Surrogate Modeling (SUMO) Toolbox''' (v5.0) occurred.
* A major new release with countless fixes, improvements, new sampling and modeling algorithms, and much more.

List of changes:

* Fixed the 'Known bugs' for v4.2 (see Wiki)
* data points now have priorities (assigned by the sample selectors)
* Vastly reworked and improved the sample evaluator framework
** robust handling of failed or 'lost' data points
** pluggable input queue infrastructure to make advanced scheduling policies possible
* The number of samples to select each iteration is now selected dynamically, based on the time needed for modeling, the length of one simulation, the number of compute nodes available, ... A user specified upper bound can till be specified of course.
* Model plots are now in the original space instead of the normalized ([-1 1]) space
* The default error function is now the root relative square error (= a global relative error)
* Intelligent seeding of each new model parameter optimization iteration. This means the model parameter space is searched much more efficiently and completely
* Added a fast Neural Network Modeler based on FANN (http://fann.sf.net)
* Added a Neural Network Modeler based on NNSYSID (http://www.iau.dtu.dk/research/control/nnsysid.html)
* The LS-SVM model type has been merged with the SVM model type. The SVM model now supports three backends: libSVM, SVMlight, and lssvm
* Added a SampleSelector using infill sampling criterions (ISC).
** The expected improvement from EGO/superEGO is provided among others. (only usable with Kriging and RBF)
* More robust handling of SSH sessions when running simulators on a remote cluster
* The TestSamples measure has been renamed to ValidationSet
* The Polynomial model type has been renamed to the more apt Rational model
* The grid and voronoi sample selectors have been renamed to Error and Density respectively
* Drastically reduced memory usage when performing many runs with multiple datasets (datasets are cached)
* Added utility functions for easily summarizing profiler data from a large number of runs
* Lots of speed improvements in the gradient sample selector
* The default settings have been harmonized and much improved
* The (LS)SVM parameter space is now searched in log10 instead of ln space
* Added a TestMinimum measure
** compares the minimum of the surrogate model against a predefined value (for instance a known minimum)
* Added a MinimumProfiler
** tracks the minimum of the surrogate model versus the number of iterations
* Movie creation now works on all supported platforms
* Added an optimizer class hierarchy for solving subproblems transparantly
* Cleaned up the structure of all the model classes so they no longer contain an interface object. This was confusing and led to error prone code. Virtually all subsref and subassgn implementations have also been removed.
* The MinMax measure is now enabled by default
* The Optimization framework was removed (and replaced) for various reasons, see: http://sumowiki.intec.ugent.be/index.php/FAQ#What_about_surrogate_driven_optimization.3F
* Fixed the file output of the profiler, formatting is correct now
* New implementation of a maximin latin hypercube design
** Minimizes pairwise correlation
** Minimizes intersite distance
* Removed dependency of factorial design on the statistics toolbox
* Added a plotOptions tag, this allows for more customisability of model plots (grey scale, light effects, ...)
* Profiler plots can now also be saved as JPG, PNG, EPS, PDF, PS and SVG
* Countless cleanups, minor bugfixes and feature enhancements

== 4.2 - Released 18 October 2007 ==

* Fixed the 'Known bugs' for v4.1 (see Wiki)
* Simulators can be passed options through an <Options> tag
* Added a fixed model builder so you can manually force which model parameters to use
* Removed ProActive dependency for the SGE distributed backend
* Improved Makefile under unix/linux
* Data produced by simulators no longer needs to be pre-scaled to [-1 1], this can be done automatically from the simulator configuration file
* Deprecated the optimization framework. It is currently under re-design and a better, more integrated version, will be released with the next toolbox version.
* Lots of cleanups, minor bugfixes and small feature enhancements
* In October 2007, the development of the M3-Toolbox was discontinued.

== 4.1 - Released 27 July 2007 ==

* Fixed the 'Known bugs' for v4.0 (see Wiki)
* Vastly improved test sample distribution if a test set is created on the fly
* Gradient sample selector now works with complex outputs and has improved neighbourhood selection
* Speed and usability improvements in the profiler framework
* Improvements in the profiler DockedView widget (added a right click context menu)
* Addition of some new examples
* Added an option (on by default) that selects a certain percentage of the grid sample selector's points randomly, making the algorithm more robust
* Some cleanups, minor bugfixes and feature enhancements

== 4.0 - Released 22 June 2007 ==

* IMPORTANT: the best model score is now 0 instead of 1, this is more intuitive
* Reworked and improved the model scoring mechanism, now based on a pareto analysis. This makes it possible to combine multpile measures in a sensible way.
* Added a proof of concept surrogate driven optimization framework. Note this is an initial implementation which works, but don't expect state of the art results.
* Cleanup and refactoring of the profiler framework
* The profiling of model parameters has been totally reworked and this can now easily be tracked in a nice GUI widget
* Cleanup of error function logic so you can now easily use different error functions (relative, RMS, ...) in the measures
* Improved model plotting
* Support for the SVMlight library (you must download it yourself in order to use it)
* Added a MinMax measure which can be used to suppress spikes in rational models
* Support for extinction prevention in the heterogenetic modeler
* Fixed warnings (and in some cases errors) when loading models from disk
* Respect the maximum running time more accurately
* Many cleanups, minor bugfixes and feature enhancements

== 3.3 - Released 2 May 2007 ==

* Fixed incorrect summary at the end of a run
* Fixed bug due to duplicate sample points
* Ability to evaluate multiple samples in parallel locally (support for dual/multi-core machines)
* Speedups when reading in datasets
* Added 2 new modelbuilders that optimize the parameters using;
** Pattern Search (requires the Matlab direct search toolbox)
** Simulated Annealing (requires Matlab v7.4 and the direct search toolbox)
** The Matlab Optimization Toolbox (includes different gradient based methods like BGFS)
* A new density based sample selction algorithm (VoronoiSampleSelector)
* New simulator examples to test with
* Addition of a profiler to generate levelplots
* Ability to generate Matlab API documentation using m2html
* New neural network training algorithms based on Differential Evolution and Particle Swarm Optimization
* It is now possible to call the toolbox with specific samples/values directly, e.g., go('myConfigFile.xml',xValues,yValues);
* Many minor bugfixes and feature enhancements

== 3.2 - Released 9 Mar 2007 ==

* Many important bugfixes
* Documentation improvements
* Fully working support for RBF models
* New measure profilers that track the errors on measures
* Many new predefined functions and datasets to test with. We now have over 50 examples!

== 3.1 - Released 28 Feb 2007 ==

* Small bugfixes and usability improvements
* Improved documentation
* Working implementation of a heterogenous evolutionary modelbuilder
* More examples

== 3.0 - Released 14 Feb 2007 ==

* Availability of pre-built binaries
* Extensive refactoring and code cleanups
* Many bugfixes and usability improvements
* Resilience against simulator crashes
* Ability to set the maximum running time for one sample evaluation
* Vastly improved Genetic model builder + a neural network implementation
* Addition of a RandomModelBuilder to use as a baseline benchmark
* Possible to add dummy input variables or to model only a subset of the available inputs while clamping others
* Improved multiple output support
** outputs can be modeled in parallel
** each output can be configured separately (eg. per output: model type, accuracy requirements (measure), sample selection algorithm, complex handling flag, etc)
** mutliple outputs can be combined into one model if the model type supports this
* Noisy (gaussian, outliers, ...) versions of a given output can be automatically added
* New and improved directory structure for output data
* New model types:
** Kriging (based on the DACE MATLAB Kriging Toolbox by Lophaven, Nielsen and Sondergaard)
** Splines (based on the MATLAB Splines Toolbox, only for 1D and 2D)
* Now matlab scripts can be used as datasources (simulators) as well
* New initial experimental design
** Based on a dataset
** Combination of existing designs
** Based on the complexity of different 1D fits
* Addition of new datasets and predefined functions as modeling examples

== 2.0 - Released 15 Nov 2006 ==

* Initial release of the M3-Toolbox - open source