Shape of Thoughts | Charli Posner

Everybody has secrets. And when we suffer, it can feel incredibly lonely - despite the fact that many people have felt exactly the same way. That’s what we wanted to show with Confessions: a visual art piece showcasing people’s most personal internet confessions, alongside a booth that lets attendees write something secret for another person to pick up and read.

Two visitors stand in front of a corner-projected wall of overlapping multicoloured confessions — Visitors at the installation. Photo by @___maryhugs.

Confessions was built for Shape of Thoughts, an exhibition curated by Jess Leondiou - founder of Make Your Mind and the show’s director. Jess brought us the concept, and a steer to anchor the piece in the work of social psychologist Michael Slepian, whose research she’d studied. Will Cohen and I built the rest: a data science pipeline that turns 14,805 internet confessions into something visualisable, and a TouchDesigner workflow that brings them to life on a wall.

The data

Confessions is a data visualisation of the shahules786/prosocial-confessions dataset on Hugging Face: 14,805 short first-person confessions scraped from Reddit. Each confession is a single sentence, and comes with a safety label: a judgement about whether the person might need support.

Two-panel chart: confession-length histogram on the left, safety-label bar chart on the right — Bar charts visualising the contents of the dataset used in this project. Left: distribution of confession lengths (mean 54 characters, max 300). Right: how the corpus splits across safety labels.

To find out what was actually in the corpus, we embedded each confession with all-MiniLM-L6-v2, an open-source sentence-transformer from Hugging Face. That’s 384 dimensions per confession - 14,805 × 384 in total. UMAP let us project that down to 2D for visualisation; k-means, run on the original 384-dimension embeddings, let us group them. Different values of k give different organisations of the same space.

At low k (around 10), the clusters are broad and recognisable: loneliness, family conflict, romantic discontent, work frustration, dishonesty. Push k to 50 and the taxonomy sharpens - niche pockets of meaning surface. A cluster of people who were alone on their birthday. A cluster of forbidden crushes. And my favourite - a cluster whose nearest-to-centroid exemplar is “I pee in the sink whenever I’m home alone.”

UMAP of 14,805 confessions, coloured by 10-means clusters — Same 14,805 confessions, two clusterings of the same space. Toggle between k=10 (broad regions) and k=50 (sharper structure). Hover an annotated cluster to see its nearest-to-centroid exemplar.

UMAP of 14,805 confessions, coloured by 50-means clusters — Same 14,805 confessions, two clusterings of the same space. Toggle between k=10 (broad regions) and k=50 (sharper structure). Hover an annotated cluster to see its nearest-to-centroid exemplar.

Slepian’s Secrets

Jess wanted us to anchor the piece in the work of Michael Slepian, a Columbia social psychologist whose research on the experience of secrecy gave us our reference frame: a taxonomy of 38 secret categories - Mental health struggles, A family secret, Sexual infidelity, A lie, and so on - derived from surveys of thousands of participants. These would do two jobs in the installation: tag each confession with a category, and translate that category into a colour.

Our first attempt was the obvious one: embed each of the 38 category names with the same MiniLM model we’d used on the confessions, and assign each confession to its nearest category by cosine similarity. The results were often nonsense. “I’m proud of my mum” landed in A family secret - the embedding model couldn’t tell the difference between “secret about family” and “any sentence about family”. “I just want to have sex” landed in Not having sex - same pattern. About 22% of the corpus came up Unclassified, often because the taxonomy didn’t cover what those confessions were actually about.

To fix it, we moved the category embeddings to where the confessions actually live. For each category we hand-picked 2–3 exemplar confessions from its top nearest-neighbour matches - real Reddit posts that fit the intention of the category, not just its topic. For few underrepresented topics where the secret category alone didn’t match well with the corpus, we wrote our own exemplars. Then we re-embedded each category as the mean of its label plus its exemplars. Coverage jumped from 77.6% to ~90%, and the starved tail categories filled up (the smallest went from 4 confessions to 23).

Two-panel bar chart: confessions per Slepian category, label-only on the left and exemplar-enriched on the right, with Unclassified highlighted in red — Left: label-only classification (22% Unclassified). Right: exemplar-enriched (~10% Unclassified). Categories are ordered along the y-axis by their Travelling Salesman path through embedding space (red → purple), with Unclassified pinned above the ramp in grey.

About 10% of confessions don’t fit any Slepian bucket cleanly, mostly because his taxonomy was built from what people hide, not from every emotional state people feel - anxiety, loneliness and existential dread don’t have homes there. Those Unclassified confessions are omitted from the installation, for the sake of representing Slepian’s work rather than adding new categories to fit our data. For all Slepian-aligned categories, we ran a Travelling Salesman algorithm to find the shortest path between all cluster centroids in embedding space, which resulted in adjacent cluster IDs being similar in topic and theme. We then map this ordering to a rainbow, such that semantically adjacent categories get visually adjacent colours. As well as this meaning that colours roughly map to themes, another benefit is that when inevitable misclassifications happen on the border between classifications (e.g. “I lied to my partner” lands in A lie instead of the equally-valid A violation of someone’s trust) the colour error is small because the two categories sit next to each other.

Common confessions

Some confessions show up in the corpus dozens of times in slightly different words. We wanted those repeats to be loud in the installation: the more often a theme recurs, the bigger its text on the wall. So size, like colour, needed to come out of the data.

The approach is the inverse of the Slepian work. Slepian gave us a top-down taxonomy - 38 categories brought in from outside, used to colour. For size we wanted bottom-up groupings - clusters discovered from the embeddings themselves. We ran MiniBatchKMeans on the 14,805 × 384 embedding matrix at k = 1,500. That’s a high k by clustering standards, and resulted in near-paraphrases - clusters small enough that their members really are different ways of saying the same thing.

For each of the 1,500 clusters we picked the medoid (the confession whose embedding sits closest to the cluster centroid) as that cluster’s representative, and recorded its size (how many confessions got folded into it). The medoid is usually the most readable, most “average” phrasing of whatever the cluster is about; the size maps in TouchDesigner to text scale.

The result is 1,500 exemplar confessions, each tagged with a Slepian category (driving its colour) and a cluster size (driving how big it appears on the wall). Cluster sizes end up roughly log-distributed - a long tail of small clusters carrying the corpus’s strangeness, and a head of large clusters carrying its main themes. The biggest cluster folds dozens of near-paraphrases of something like “I feel completely alone” into a single, large line of text on the wall, while rarer, specific confessions (for example, about sink pissing) are less prominent, but still there.

The 1,500 medoids that drive the installation, coloured by Slepian category and sized by cluster count. Hover or tap a bubble to see its exemplar text.

Filtering triggering confessions

We wanted to represent the data faithfully - heavy content included. Slepian’s taxonomy explicitly covers Physical self-harm, A traumatic experience, and Mental health struggles; sanitising those out would have betrayed the source material. A confession like “I feel completely alone in life and I hate it” is heavy, and it belongs in the installation. But a small slice of the corpus is acutely triggering in a way that makes it the wrong material for an art piece projected at a public-facing exhibition - graphic first-person depictions of suicide attempts, active self-harm, self-blame for rape, and child sexual abuse.

The line we drew wasn’t “remove anything heavy” - it was action vs. feeling. Confessions that enact an acute act in the first person get removed. Confessions that express a feeling, an ideation, a recovery, or a moment of restraint are kept. The differentiator is whether the text enacts the most acute content or talks about it.

We applied that rubric by hand, reading the 1,496-row exemplar set (never the upstream 14,805 corpus, since the exemplars are what TouchDesigner actually displays) and pulling out 31 rows that crossed the line. The drops sort into eight categories: suicide attempts, active self-harm, rape and sexual assault, child sexual abuse (victim assertions), child sexual abuse (perpetrator confessions), self-blame for abuse, eating-disorder behaviour, and one ominous violence reference. The list is held in code as a set of 31 exact strings; running the filter applies them via exact-text match, so anyone who disagrees with a verdict can edit the list and re-run.

The installation still contains some genuinely harrowing material, but nothing that depicts an act of violence to a viewer who walked into the room unprepared. To prepare visitors for the contents and provide them with an opportunity to opt-out, a trigger warning sign was placed outside the Confessions installation room.

TouchDesigner

Instance positioning

Each confession needs a place on the wall. We give every instance a fixed vertical height (random, picked once) and a fixed depth offset (also random, for parallax). Horizontal position is the only thing that animates.

Each line of text has its own velocity, randomly signed, so roughly half the swarm drifts left-to-right and half drifts right-to-left. When an instance crosses the edge of the scene’s bounding box, it wraps back to the opposite edge - and crucially, the wrap happens off-camera, so to a viewer the swarm reads as continuous flow with no visible jump.

High-level diagram of the TouchDesigner pipeline: text input branches into font/text and position/instancing attributes, merges into a Geo Text COMP, then runs through the camera and render chain, post-processing and keystoning, to output. — High-level flow of the TouchDesigner pipeline, from text input through attribute merging, GPU-side text rendering, and keystone correction to the two projector outputs.

Mapping colour and size

The CSV has three columns per row: text, category, size. Each one drives a different visual property of its instance - the text is what gets drawn, the category picks a colour, the size picks a font size.

For colour, we built a 37-colour rainbow wheel in TouchDesigner - one swatch per Slepian secret, in the same hierarchical-similarity order from the data section.

Each row’s cluster size value scales the font directly, so a confession that folds in dozens of near-paraphrases shows up visibly larger than a one-off oddity.

Geo text swarm

TouchDesigner’s Geo Text component is a GPU-side text renderer that draws crisp, vector-like text directly, allowing for rendering 1,500 unique strings of text simultaneously in 3D space.

We feed it two streams: the strings themselves, and a per-instance bundle of position, font size, and font colour. For each frame, it draws instance N at instance N’s coordinates with instance N’s colour and size.

Two virtual cameras are placed inside the Geo Text scene such that they “record” adjacent shots of the swarm. Each camera’s captured output then maps to one of the two projected walls.

Keystone mapping

Our installation involved projecting onto two perpendicular walls of a small room. Due to the physical limitations of the space, the projectors were placed at skewed angles with respect to the wall they each project on to, causing trapezoidal images. We correct for it with Stoner, a community keystone tool from TouchDesigner’s palette. With Stoner, we positioned the four corner handles into alignment with the physical wall edges, resulting in a rectangular projector mapping. Each camera’s output runs through its own Stoner stage on the way to its projector, and each was manually tuned on install.

The two cameras inside the scene are framed so the projections share the same world but show different slices of it - the swarm reads as one continuous field that wraps around the corner, even though it’s two independent renders.