This post examines some of the details of the technology underlying the electronic part of get used to it man.... For a gentler overview of the same topic from the composer's perspective take a look at Matthias Kriesberg's blog entry..
The post is split into two sections. The first section is a summary of the components of the performance system and the information that passes between them. The second section looks at how the system implements the performance measurement, evaluation and sound modification functions which drive the electronic part. Along the way there are notes on the current state of the system and how our development plan is supported by our pending Workshop data collection experiment.
The electronic part of get used to it man ... is formed by modifying the sound of the piano as it is performed. The physical setup is conceptually simple: Microphones pick up the piano and are fed to a computer which modifies the sound before emitting it through speakers placed around the piano. The algorithm used to control how the sound is modified is based on evaluating how the piece is performed relative to the score and past performances of the piece by the same or other performers. The bulk of this post describes the technology with supports and implements the performance evaluation and sound modification algorithms.
It is also worth noting that the electronic part is really only a modification of the sound of the piano. Although there is no technical reason that new musical material could not be added by the computer as it modifies the sound, that's not how this piece works. In this case all of the notes originate with the piano part, the electronic part then modifies the sound of the piano, but doesn't add or subtract any notes or structural content.
The diagram of the performance system architecture depicts the connections between the parts of the performance program. This architecture is typical of many score-driven interactive music systems.
The easiest way to understand the diagram is by beginning with the piano and following the rightward pointing arrows.
The red boxes in the diagram represent units which are primarily based on static data. That is data which is created prior to the performance - like the score. The blue and green boxes represent components which process data in real-time as events occur during the performance. The green boxes represent standard interactive system components, whereas the blue boxes are specialized to this piece.
The score follower is critical to the operation of the system because it maintains the real-time location of performance within the score. This is important because the performance measurement and evaluation, as discussed below, are score dependent.
The score follower developed for this piece operates in two modes 'tracking' and 'search'. In 'tracking' mode the follower has high confidence that it knows where the player is in the score and updates that location by scanning a localized window for the best match (minimum edit distance) between the performance data and the score. In search mode the follower has low confidence that it is locked onto the true location the player is executing. It then begins scanning ahead of the last known high confidence position in search of a new high confidence location. This approach has been shown to work well with this piece based on the performance data we have collected so far.
The machine readable score is an enhanced version of the score used by the pianist. The enhancements support connections to the other parts of the performance system, particularly the 'Parameter Database'. Other enhancements support better audio rendering which is used for simulating the piece during development.
Currawong Music Labs has developed customized tools for forming the machine readable score from musical score files (MusicXML) as well as for adding additional annotations for the electronic part. For more about this take a look at the score_gen function of the cmtools program.
The actual sound transformation algorithm is embedded in the 'Signal Processing and Routing' unit. In technical terms the transform is a non-linear filter which operates in the frequency domain. In slightly less obscure terms, the transform consists of a short time Fourier transform, followed by a bank of simple non-linear functions which operate on each of the channels of the spectral magnitude, followed by an inverse Fourier transform. The non-linear function is similar to the style of transfer function commonly used for time domain dynamics processors. In this case however the transform is applied separately to each channel of the magnitude spectrum, rather than the time-domain audio signal. The result is that different channels are affected differently by the non-linear function depending on their magnitude in that channel. The number of channels used for the Fourier transform channels varies, but is usually the 512 or 1024, and the sample rate is 96 kilohertz.
Space doesn't permit a complete discussion of this algorithm however an article to follow will examine it in more detail. In the meantime have a look at the code that implements it here.. To hear what these transforms actually sound like try out our Transform Audition Tool.
The 'Performance Measurment' unit implements a set of algorithms which attempt to quantify some elements of the performance. The current set of measurements collect information related to dynamics, tempo, grace note articulation, and note accuracy. The measurements are not taken continuously but rather during selected segments placed every few seconds throughout the score. The segments, and the specific measurements within them, were selected intentionally in order to measure how a pianist is performing relative to the score. No one of the measurements is particularly revealing, but taken as whole they build an elemental picture of how a player performs the piece.
Once a given set of measurements is complete they are passed on to the 'Evaluation' unit. The job of the 'Evaluation' unit is to place the current performance in both a local and historic context and in doing so generate difference information which will ultimately drive the transform algorithms.
Historic context in this case refers to the difference between how a segment is performed versus how this same segment was performed previously by the same or different players. We see the ability of the piece to carry a history of previous performances and to re-contextualize itself within that history as an important feature of this technology.
Local context refers to how a segment was performed relative to earlier segments by the same player. This is useful because it gives information that helps to characterize the player and the evolution of the performance over time. It is also a source of scarce statistical data which may prove important in calibrating the system in the presence of a deficit of other data.
Deciding how best to achieve these goals is one of the primary motivations for the Currawong Project sponsored development workshop outlined here. The working hypothesis for this experiment is that statistical models of the measurement data will be able to gauge both the quality (signal versus noise) and the relative location of a performance in the model-space of other performances.
The 'Evaluation' unit is therefore expected to work based on several models which the system refers to as profiles. A segment profile describes how a given segment was performed across multiple players. The player profile describes how a player performed the piece over segments. During a concert performance the current player's profile is therefore necessarily incomplete. A simple player profile might describe overall dynamic level or tempo. A more complex profile would describe the variance in grace note duration of a specific set of segments.
To determine the final set of variables used to form the profiles requires analysis of real data. The goal of this analysis is to uncover the measurements and statistics that give stable and musically interesting results with regard to how an individual player navigates the piece and how performances are meaningfully different.
The 'Parameter Generation' unit translates performance evalution information to a new set of values which will update the sound produced by the audio transformation algorithms. In geometric terms this process maps from the performance evaluation space to the audio transform parameter space.
The true output of the 'Evaluation' unit awaits the results of our performance experiment therefore the final implementation of the 'Parameter Generation' unit is also not fully determined. However, since the final step in the processing chain, the signal processing algorithms, are known, then at least the output of the 'Evaluation' unit can be partially described. This has allowed the development of a provisional working version of the 'Parameter Generation' unit which supports the audition of hypothetical performances.
Based on this provisional tool a structure for describing the transforms as having a fixed and a dynamic part has been arrived at. In signal processing terms the fixed part represents a signal processing topology and the variable parts are the values which are assigned to the topology variables. To get an intuitive view of what this means take a look the Transform Audition Tool. The colored overlays in this app represent different fixed signal processing topologies. The tool is able to realize the transforms by drawing on a pre-selected dynamic part. This isn't as flexible as the actual performance system, which can generate the dynamic part in response to the performance, but is still useful to sketch an outline of possible renderings of the electronic part.
Having described all the parts of the system there are a few notes to make regarding the system as a whole.
As described above it may appear that once a set of measurements is complete that the subsequent processing immediately begins to affect the transform algorithms. In fact there is a time offset between where the measurements are taken and where the results of those measurements are applied. This is in contrast to many interactive music systems which seek to respond continuously and immediately to a player's current gesture. The time offset is intentionally selected by the composer on a case by case basis and is generally on the order of a few seconds. This approach affords the opportunity to select where a given set of measurements will influence a subsequent part. It also simplifies the measurement algorithms which are then not constrained by real-time, or even causal, considerations since the section under examination can complete prior to being analyzed.
get used to it man... is designed to work in either a concert setting or as an installation/performance. The description here assumes a concert setting, but the installation/performance version works in a very similar manner. The primary technical difference is that the Steinway Spirio would be actuated under the control of recorded measurements from actual performances.
The approach to performance measurement and evaluation outlined here might be criticized as being overly reductive. The measurements and evaluation employed are not comparable to what a skilled listener brings to performance analysis. Likewise what is referred to here as historic contextualization bears little relation to how a musicologist might use that term. The metrics collected by this system however are typical of the data collected during a scientific or quantitative musical performance experiment. The results from this research show that these metrics are known to be stable and indicative of some elements of a performance. Seen another way, these are the kinds of measurements that machines can make and so this is what is used. This approach is analogous to the measurements made by web-analytics companies as they track online behavior. Web technology uses simple measurements of what is clicked on in an effort to build a more nuanced profile of the subject's interests and motivation.
Finally we end with what really matters: the code. All of the parts of the system are implemented in 'C' (with a smattering of C++) and released as open source under the GNU GPL license. The real-time signal processing framework is hosted by our kc programming environment and you can find most of the actual code in the libcm library .
© currawong project 2021