an experiment in contemporary dance
what is statistical experimental design?
Statistical experimental design is a way to set up experiments to get as much information as possible, as efficiently as possible. We’re gonna go right in with a classic (read: pedantic) example:
Say you work in a manufacturing plant, and you make metal bolts. You’re seeing a decrease in quality in your bolts, and you need to figure out how to keep the quality top-notch. You’re pretty sure either the cutting speed or the cooling temperature of the process is what’s causing variations in your bolts. Traditionally, you might make a bunch of bolts at different cutting speeds to see how they do, pick the optimal cutting speed, and try making a bunch of bolts at optimal cutting speed with different cooling temperatures, and finding the optimal temperature. And then you’re done!
It’s not a bad plan, but this traditional experiment has two downsides:
Testing one thing at a time hides insight into how the things interact with each other. If you find the optimal cutting speed first, and then temperature, you might miss a true best setting – temperature could vastly change the process at a different (sub-optimal) cutting speed. This new pair of settings could have a more optimal outcome than either cutting speed or temperature alone.
it’s expensive to make a bunch of bolts, and we live in the real world on a budget.
Enter experimental design! Carefully structured tests can measure the impact of many factors, with fewer resources.
let's get clear on some vocabulary.
run = a one-minute dance piece
factors = controllable variables = the things i can change in a piece of choreography
output = results = the answers to the survey
main effect = how a single factor affects the audience
2nd-order interaction = how two main effects work together to change the audience's opinion
so... statistics and dance?
In college, I kept myself busy earning two degrees – one in industrial engineering, and one in dance. At Northwestern University, the engineering school and the dance studios are on opposite ends of campus, and there’s only ten minutes between classes, so I had it down to a science: pull sweats on over my dance clothes, run out to the most strategically-placed bike rack, hustle along the Lake Michigan path to the back entrance of the Technological Institute, run up the stairs, and slide into my seat at Tech just in time.
I am pretty sure the other engineering students didn’t know what to think about the (usually sweaty) girl in legwarmers – and I am also pretty sure I was the only dance major who rushed to the computer lab to use the queuing simulation software after performances instead of going out for margaritas… so it’s fair to say that a) this right-brain-left-brain thing is a core part of who I am and b) I had a lot of thinking time as i ran around not being social in college.
I used those bike rides up and down campus to shake off one way of thinking to get into the other – and this idea started on one of those rides, when one side of my brain was fired up about the way we talk about live performance, and the other side was fired up about the power of statistics.
Also, if I had a nickel for every frat boy who cleverly asked me if i was “going to engineer dances for a living,” I’d …. buy you lunch or something. What now, bros?!?
So anyway. Here we are. A seed of an idea planted in 2007 that I haven’t been able to shake for ten years…!
At its heart, this experiment is about opening up a conversation about dance. The public generally feels comfortable talking about music, or television or movies; we don’t always have the vocabulary – or opportunity – to speak about dance. This experiment is gently poking fun at the So You Think You Can Dance formulaic kick-jump-spin way of creating “moving” pieces of choreography… but it’s also poking fun at the idea that dance should only be academic and "serious." it’s both, and I believe we, as artists, can do a better job empowering our audiences to trust their gut and what they love… and arming them with the language to share it with us. Here’s a start.
the analysis and the factors
We measured five things: classical movement, unison dancing, rhythmic clarity, relationship between the performers, and accompaniment.
Why those? Aren’t there more?
There are an infinite number of things that affect your opinion of a piece of dance. We love art because it’s subjective; it touches us with its mystery. We all know it’s not this simple, and really, we’d never want it to be. I'll start this analysis by dropping the façade of scientific infallibility, and we can still enjoy the rigorous math together. (Isn’t that why you came here? Rigorous math?)
I started the selection in a brainstorm with dancers and friends – we freewrote pages and pages of all the things that describe a piece of dance. then I cut out everything that wasn’t about the choreography (like costumes, set design, lighting) and everything that was out of my choreographic control (audiences’ favorite choreographers and companies, famous performers). I categorized the remaining factors into three groups:
movement factors: the actual motions the dancers make
examples: style of dance, small movements vs. big movements, sharp movements or flowing, fast or slow
compositional factors: how the movements are put together
examples: how many dancers are on stage, are they dancing in unison, do dancers have a relationship to one another
musical factors: how the movement is accompanied
examples: pop music vs. classical music vs. no music, fast or slow accompaniment, unusual music choices vs. expected pairings
I selected five factors from the three groups because of… logistics. Long story short, five factors was the sweet spot to ensure we could measure each factor and each second-order interaction with statistical significance, while keeping the dance-watching part of it reasonable. There are 32 one-minute pieces, which allowed us to get that statistical significance and still fit into an entertainment-friendly evening of a live show.
the analysis and the factors
Like any good experimenter, I figured out this sweet spot the hard way. In 2014, we did a studio showing of this project with eight factors. However, the structure of that experiment meant that our second-order interactions were confounded with one another. We could tell when our audiences were responding to an interaction, but we couldn’t tell which ones. Bummer. You can read about it here.
Long story long, we fit our experiment to a factorial design, noted as 2^k. “k” is the number of factors we’re measuring, and the “2” means that each factor is at two levels (we’ll call it “high” vs. “low,” or “on” vs. “off”). We’ll note the factors as capital letters A, B, C, D, and E. Feel free to gloss over the level of detail here… you’ll get the picture. I chose this design for its resolution as well – it’s resolution V, where the identity relationship is I=ABCDE, which describes how the factors overlap with one another. Resolution V was ideal for this because all main factor effects and second-order interactions can be measured clearly. Third-order interactions are confounded with some second-order interactions, but that’s getting crazy for this experiment, and we feel happy here.
At least I had a notebook – the dancers had to hold all this choreography in their brains and bodies! We rehearsed two or three times a week, January through May of 2016, for our four-show series at The Tank, a fabulous little theatre in midtown Manhattan.
We set the 32 pieces in 8-piece segments, and we used those segments to randomize the order between shows without totally throwing the dancers for a loop with full randomization. We know audiences take a few pieces to “feel out” their ratings – it’s rare that a first piece would receive a 1 or a 10, because there’s nothing to compare it to yet. You may have noticed this in yourself through the fully randomized video version today, too.
In the live show, audiences used the same survey you’re using today online. We’d perform each one-minute piece, pause for them to record responses, and have them submit their surveys at intermission.
During intermission I’d download the results and plug them into a spreadsheet to fit a regression to the responses. This is what happens behind the scenes in this online version – but it really was exciting to stand there and plug in the numbers while everyone was milling around getting intermission drinks!
There are two changes I made for the online analysis:
1. I normalized all responses. Often individuals skew positive or negative -- some audience members never rated anything below an 8, and some never went outside of 4-6. This is true to each person's perspective, I'm sure, but it doesn't make for helpful results; I normalized each individual's range to be the full 1-10.
2. We allow partial responses in the online version; for any unwatched pieces, we assign a personal average. It's not perfect but it keeps missing entries from throwing off the regression, and it feels worth it since asking for 45 minutes of attention is... a lot to ask.
The big difference between the live show and this online version is what happens after the results. Online, you get to see your own individual preferences, and compare to the audience as a whole – both your online responses and responses of our live audiences in 2016 – and see examples of pieces that exhibit factors you responded to.
Live audiences didn’t get to see their individual results; instead, we used the aggregate audience’s regression curve to create a custom dance piece for that audience, in the moment. I would jot down the audience’s results and scurry backstage to the dancers; we’d huddle and I'd share something like “okay, they responded to classical movement, non-unison, and a positive second-order interaction between rhythmic clarity & traditional accompaniment.” And then I’d step onstage to share this analysis with the audience while they made created a custom dance. The magic of the theatre...!
I wish there was a way to re-create that magic online. Alas, with statistical power comes responsibility, and we can’t just hang out in the studio all day waiting to make custom dances for you.
Armed with my new set of five factors, I mapped out 32 pieces of choreography. My choreography notebook is covered in charts of plus and minus signs, and I'd start choreographing by writing something like this at the top of the page:
the conversation & what's next
We ended each live performance with a conversation; which was by far my favorite part. We always had a juicy conversation, and we learned something new every time. I offer this conversation to you as well! Email questions & comments to firstname.lastname@example.org, and we’ll keep a running conversation live on the blog.
Also to come on the blog – as we get more responses, we’ll be able to continue the analysis, including demographic conclusions (age, familiarity with dance, and who knows what else…)!
...Also also, I haven't told you the full truth until just now. I said this was a factorial experiment with five factors... it is, but it's also a fractional factorial experiment designed 2^(k-p), in our case 2^(7-2). Still resolution V, but I need some more data before I can give you those results, so you'll have to keep you eyes peeled for that one, too.
Thanks for experimenting with us!
nerd out some more
Here's the spreadsheet doing the calculations. It's admittedly.... more creative than elegant. If you want a walkthrough or have any questions... email me! It essentially works from the right-most tabs to the left-most, from the live show responses, to the online responses coming in, the analysis, the normalization, the regression, and the charts and information feeding this website.
Articles about experimental design:
The vimeo page with all videos (and other good stuff)