Tutorial: Neural Synthesis in Max 8 with RAVE

Learn to perform neural audio synthesis inside Max 8 using nn~.

Are you looking to investigate a little deeper neural audio synthesis within patching environments? This tutorial is made for you!

Installation

We will use nn~ to interface neural audio synthesis models with both Max and Pure Data. Then be sure to collect the files corresponding your platform on the last release!

Install nn~

For Max 8

Just unarchive the nn_max_msp_OS_ARCH.tar.gz archive, and place the folder in the Packages folder of your Max 8 folder. You can place the folder to another place, but do not forget to add the location in Max's File Preferences!

For PureData

Just unarchive the nn_max_msp_OS_ARCH.tar.gz archive, and place the folder in the externals folder of your Pd folder. Do not forget to remove the quarantine of MacOS by lauching within terminal :

cd /path/to/nn/folder % replace by the location of your external!
xattr -r -d com.apple.quarantine .

Download a model

Do not forget that nn~ is only a bridge between patching environements and neural synthesis models, such that you will need to download a nn~-compatible model. nn~ is compatible so far with RAVE and vschaos2, so go on the corresponding pages to fetch the models you wan to try out.

Important : in Max, your models have to be accessible within Max file preferences, so be sure to put them in an appropriate location. In PureData, the models must be in the same folder as the external.

Generating audio with RAVE

nn~ installed? Models downloaded? We are now ready to play a little bit with nn~.

Audio transformation with forward

The most straightforward way of generating sound with RAVE through nn~ is the forward function. Arguments for nn~ are :

nn~ MODEL_NAME [METHOD_NAME] [BUFFER_SIZE]

where MODEL_NAME is the name of the model (for example, vintage.ts), METHOD_NAME the name of the method (forward by default), and BUFFER_SIZE the inner buffer size used by nn~ to transform the sound (takes smallest by default). To use your model as an audio effect, just plug an audio input, and audio output :

and, that's all! You should be able to perform neural transformation of your incoming sound 🎶

Tip : you can disable the internal DSP of an nn~ box by sending it the enable 0 / enable 1 message. Very convenient to save some DSP!

Latent manipulations with encode & decode

That was too easy, so let's make things a little more difficult. Both RAVE and vschaos2 are auto-encoders, meaning that they take sound as an input, generate sound as an output, and are trained to reconstruct the incoming sounds of the dataset. This processing is based on two separate processes :

  • an encoding process, where a given window of incoming audio (let say 2048 samples) is transformed into a set a latent variables (128 parameters in general)
  • and a decoding process, that inverts these 128 latent variables back into sound.

The forward function is actually just the chaining of these two functions. With nn~, you can access these two functions separately with the encode and decode functions.

Each output of the encode corresponds to a latent dimension of the input audio, such that you can access each latent parameter separately. In vschaos2 all the latent are given to user, while in RAVE only a subpspace of these latent dimensions can be controlled, depending on the true latent space morphology (see video for more information).

Accordingly, the decode function has a number of inputs amounting to the number of latent dimensions (+ conditioning entries for vschaos2). Hence, connecting every thread from encode to decode comes back to the forward function ; however, we can now access individual latent to perform some transformations over the latent space. Here's a great aspect of auto-encoding architectures : having a full spectrum between an audio effect, where all the latents coming from an audio input are given to the decoder, and a synthesizer, where all the latents are directly controlled by the user. By example, we can mix encoding, controlled and automatized latents with the following patch :

You can find the code below, in example 1 section. Here, we enter the four first latent dimensions from the audio encoder, but manipulate the 5th and 7th dimension to a controllable slider, and the 6th and 8th dimension to a parametrized LFO. Latent dimensions are sampled from an isotropic normal distribution during training, such that most information is usually lying between -3 and 3 ; however, this may depend on your model, so do not hesitate to adapt it through exploration.

RAVE usually sorts dimensions by their impact on the output sound, such that this system allows to be reactive enough to the input audio, but still controllable to make it also sensitive to user input. Such hybrid conditioning of the decoder then allows endless sound shaping through neural synthesis ; do not hesitate to try out any idea you can have!

Multi-channel functionnalities (Max 8 only)

Now, let's investigate a little deepers the multi-channel functionalities of nn~ with the mc.nn~ and mcs.nn~ objects. These objects are very convenient for both patching and sparing some CPU load, so do not hesitate to integrate them in your workflow!

Batched transformation of sounds

Imagine you want to decode several sounds at the same time : a genuine approach would be to duplicate the nn~ box for each sound, as in the following image :

However, besides the fastidiousness of repetitive patching, this strategy is also dramatically inefficient in terms of CPU cost : indeed, the model is there copied 4 times in the RAM, and the processing load is multiplied by four. If your computer is not a racing horse, this can provoke CPU overload and glitchy audio clicks.

Fortunately, mc.nn~ is there to save us! mc.nn~ uses the multi-channel feature of Max 8 to perform batch processing of sounds, meaning that it can process several inputs at the same time using a single model. Furthermore, depending on your architecture, the model may paraellize these processes in an efficient way : minimum CPU cost, and a single model in RAM. No problems anymore! To do this, just gather your sounds with an mc.pack~ module, and send it to a mc.nn~ instance.

Latent manipulation with mc.nn~ and mcs.nn~

In addition to spare time and CPU load, multi-channel can also be used to efficiently perform latent operations with mcs.nn~ ; but, first, let's see the difference between mc.nn~ and mcs.nn~.

  • mc.nn~ will have the same amount of input / outputs than its nn~ counterpart, and automatically adapt to the lowest number of channels of its inputs. For example : if every incoming inputs has 4 channels, the outputs will be 4 channels ; though, if a single one has 3 channels, outputs will have 3 channels.
  • mcs.nn~ takes every inputs/outputs of a single instance in one input, such that the number of batches must be declared at initialization. Imagine a decode function with 8 inputs ; a mcs.nn~ isis forward 1 will then have one input, requiring 8 channels. To process 4 inputs at a same time, you will need a mcs.nn~ isis forward 4 with 4 inputs, each one requiring 8 channels.

In the example below, both mc.nn~ and mcs.nn~ are used to decode 4 sounds at the same time. These two objects are absolutely equivalent in terms of performance, and are just different to allow different uses. mcs.nn~, by example, is very convenient to perform batch operations on the latent dimensions of different sounds, as in the little patch below.

In this example, we added in a very simple way some latent noise to two different batches. mcs.nn~ then allows to manipulate all the latents at the same time, but to perform different operations for each batch, as the threads are separated (an operation that would have been very tedious in mc.nn~). You can find the code below, in example 2 section.

Well, that's it! Do not hesitate to ask your questions in the RAVE VST Forum.

Compressed patches

Example 1

<pre><code>
----------begin_max5_patcher----------
1122.3oc0XErjahCD8r8WAEG2xwFIP.dNmTU1C6w8zVolRFzXqLBIBRj3IoR
91WIAXSBXO3wxYpbvXnkDO8590Rs3aym4uQrmH88ty6+7lM6aymMyZxXXV6y
y7Kv6yXXosa9YhhBBW4unoMEYuxZOQsyKmVzYmlasJ17w2.AcF40ETNinruI
vQihZUm0fVqkXU1NJe68UjLUy7KAEuDsva85kAK7P.yUXvx.uOXFw2mO2bYw
UxBzoYQfaXQbXxMmEX8b4n4xJhT2KrhJ32ynbRlsUcGgCoIXsanIJFYoVB5F
xy2+0QBTfTGwffnkQvTTBn8ZjNrACswsvvqhObxWzyzAzQR29iwHTzEPHvYx
eRf14NHn4OKEfFJzNllWi5oRRy.LSHNl46TNlgYDufkd.Xh2aBW5EtbLJiFk
xwuPJCSrQMfNh9rb9kwVIilSpFiIwGRCwU3BhhTcOgi2vH8keNIvBfnNUo9g
n.2yxKT2F5FcaLJ7OFcKzM51VJ+ZpaA2dcaKKe0zs+0np1w2mGdojKBAGt8Q
qFNI41pgefIzyyQH2ZGETgmYaSMEGtsYhUAiBFv6GDUEXU+jjgw7Ed9av7sN
1Wjd68EipAfgouV9hSjGj8TFiLVtP7MLU3zRBmlJP9ZNNaLtgt.tclpDgqiM
DHBX2BNB0c0kwGN+GdTIU5kqKcOmLBYFuBwzKMP0Rlvlh1aEp+12p8.cI7SP
2KovhzmktfT3kR2EN4tWjSqjgehQkpCZZQEcKU+5TjhRg0OX2Ic3AXmx5cJZ
AQppHZeUCiO5KJoZiYhJi+SeRxmcUxGDLl3KaYhM8laci3S0XFU8jcxsAKoY
9iF7P8WYjqFG8yDYCsBYPyhLg8qrvBpplSMqkdmWTT2aqvH4FLstPgf4WN0N
SwUZV9geIVwH7spNOrYtoc7YOJO1wbrB2JBZUAy7yXzxCepldMnaBuQJX0Jh
1UXeq9+CNixUB4Nu2+16V8uRRkbUtn3QZ0ijUuU+xkZO7pbLs39Bk8evp2sW
Ugyzl+6hMjLJirJaWkn.WtSvIKwzGZcF1.htYtVE0r+wY61iTdiHDWmSEFK8
5Pq9rdc35f.X5g5mMsIIriAaieJDfRRWGkDFmFAAfEZSs2Gffw5VS67elgyD
hxd5QqGTvUDt5doBqHct29NxdipyjM8r4ll70VKsosVD8MeUke4qnYUkF6+b
trTTWk0Ih5Nceurrbc5mVgbj1cqu6EehkKlLPwS.HyAu8BtVjPSAoHWfT3Dc
dQWKPvoPoPWPIvTPB5BjBlnyKo25WlSKZV682CxngHCtJjWOEmqY1AtRma7T
A5ZihoS.nXGfiQaGNgPVnCvANAbfN.GvDvA3.blhT+ZiOSYaDzOwYGkLOUfC
bct7TVRd3JxMaXiKK+rtln1NagPWl8GEUGNxfutj1lGsGF2uh7Yprekm93Jc
YmJcIJ0UM0AsOt4bX1ZHq30zVNpI2bS4a5R3M0LIKwYskXpqze92m++.u94U
J
-----------end_max5_patcher-----------
</code></pre>

Example 2

<pre><code>
----------begin_max5_patcher----------
1062.3oc6X0zaaiCD8r8uBAcrqqsnjohbO0Cs.6hE6wcuTTDPSyXyFJRURpV
GTT+ae4GRxxIRtx01In.8PBMGRwYduY3Liz2FOJboXKQEF7lfODLZz2FOZjS
jUvnp4iByQawLjxssPEVTP1ENwuDckSnX4mdc7hZg7xbQolQztmHpRZARi2P
4quURvZuFSffoQSpG.IQMCAeb+YQ40GUrU12GO19uICzdwh7bBWWaaZxVmxC
YHsQb.WPUj.jwf2um1fJ6jAELKyAi49AfaHtWLA9IvDm7Uiw8DHkim9G6Bhl
1ENR6FGfJodQ5GJHdPDlWxzT7FDmSXJ5ZNhE1X+c4IAPGLSuwN.8Shuj9w9w
rhTfjFu4tfrtPNrajGeBHexIyGvHWHcrmVVDeb93mIFnfgdfQU5lqhBIcM0X
WLBesdi2LbWnBMF98p8laaxYdXCFjnbhlHukvQKYj1A4GRbvGoOMIuP3PgMH
uOVcOS10ur+shh0TAGIeXuohYzhMD55MNZMA1b94hUtSNbIRQwMffZbDXgz5
GLGUaLbmfwDecMSrrkEG8CCnAySbC23hqSRa4AuSHyQbc2pyYV5Rt4PshmOu
dgOWhXT8CO01OLfnNLmlSTZIwXdUNzF0uBoQUgGUwGd5pIYdqELKgVpDrRs4
phO1H7ePXJWKTaB9y28lY+qhHUyVIxumJumL6clCWYrkYqPz7ay0tQvr2uUK
QXi3+JeIASYjY+sIzZ5+YdVRxTD8tJv3nGyxbSHkSYGea2S49PRT4JpvJo0F
phVKiSVDEcyBHn0ZLgnnEc4n.A2lW+Vk1jQnleZyDsdpZQtad9eD7ww6kbh2
H6uxX7IWDINJ8pWYr2Lppob9t.phpBVQvlaZAwcgJvKQh0ZhIxyHwYW9JM2w
DlynKDGc5YK6kMrXeIxld3XkQRR8Yg7kUidBXcIgbaN852bwq5o4BP10s4BX
jq9IHyQFvjmylK5qgJvMWWLWEm+hzPkqm3cAu0Zsp.PZWvO84wkC.C5ZN3Jk
9iv6M8G.9Bl9CrH4pk96hzXI3WqFKSe9ZrrxCtv+9AdG3u6q7w8Uh2HE4nhM
BN4X8Ud7sMn9JShyhhfKZpg870WoixCYT9i+.LtnFq7CukpDkRbcXecGXAf8
dNiK0buoNj4C6+rLV2amYBFrlfCQSoGrob5pBgITPU+YQRlBMQ7vXeDeTyry
01R8.731VEUcVZxVyeXZ5rwDXHZJ9BnIvf8rWjXneDlbTb+wP.eTi+kxiMub
T8rykExFBeCuD78PhVqMmyiumODMAuD2KFrm8rYuAioyl8hFpe5P1ymTGUT7
ESgupc6ThoIqOIj1oYSbSob+T2KtEJIegpZWrODIMsNnMs.TJ8E61l5apx0i
hjWRqJ1Xf2XaMZSCb1BipBDtpEFSedi+93+Wpifq6
-----------end_max5_patcher-----------
</code></pre>