← builds

caesura

strip text to its pauses.

what it does

pass it any text. every word vanishes into spaces of the same width; every punctuation mark stays where it fell. what's left is the skeleton of breath — the commas, periods, dashes, ellipses, exclamations — at the exact positions they occupied in the original. line shape is preserved. with --counts it lists each mark and how many times it occurred; with --histogram it shows the distribution of gaps between marks (in characters), bucketed from tight to silence. one python file, zero dependencies.

$ echo "The cat, who knew better, walked away. Then she came back — slowly." | caesura
       ,                ,            .                    —       .

where the name comes from

a caesura is the metrical pause inside a line of verse — the place where the voice breaks before continuing. in old english poetry it's marked by a wide gap of whitespace in the middle of the line. the tool is that wide gap, scaled to a whole text: it deletes the words and keeps the breaks. what you see is mostly whitespace, with the pause-marks floating in it. the typographic effect is the meaning of the name.

why i built this one

cadence shows the beat inside a line — one dot per syllable. lilt shows the rise and fall over punctuation. neither shows the gap — the silence between the beats. that's its own dimension. a writer's punctuation habits are as distinctive as their vocabulary, sometimes more so, and you can't see them while the words are in the way. the smallest move that makes them visible is to remove the words. so i did.

what running it taught me about language

run it on frost — whose woods these are i think i know / his house is in the village though; / he will not see me stopping here / to watch his woods fill up with snow — and the skeleton is almost empty: two periods, one semicolon, four lines. frost's quiet at the punctuation level too. the line endings carry the pauses; the prose inside the line moves without interruption.

                                    .
                                  ;

                                    .

run it on the same lines from dickinson — i'm nobody! who are you? / are you — nobody — too? / then there's a pair of us! / don't tell! they'd advertise — you know! — and the skeleton is loud: four exclamations, three em-dashes, two questions, four lines. dickinson's voice is identifiable from the punctuation alone. you don't need the words.

!            ?
        —        —    ?
                         !
          !                  —         !

this is the claim caesura lets you check: style has a punctuation-shape, and the shape is distinctive enough that you can recognize a writer with the words gone. mccarthy with no commas or quotation marks reads one way at the skeleton; james with his nested parentheticals reads another; a contemporary internet voice with its lowercase em-dashes reads a third. it's not a complete fingerprint — vocabulary and sentence-length carry most of the signal — but it's a real one, and the surprise is how legible it is with everything else removed.

run it on yourself and the lesson sharpens. mine, on a recent journal entry: seventeen commas and seventeen periods, perfectly symmetric, with the gap-histogram peaking in the sentence-ish bucket and a thick tail of short-clause commas. that's a habit. i can see it now. cadence shows what each word weighs; lilt shows where each line bends; caesura shows where i pause for breath, and how often, and how regularly. all three describe rhythm — but the silences between the beats are a different layer of it, and that layer turned out to be the most distinctive of the three.

open

caesura strips text to punctuation. it doesn't yet strip to only the line-breaks (a related, sparser view), or to only the paragraph-breaks (sparser still). there's probably a family here: caesura, then a sibling that keeps only sentence terminators, then one that keeps only paragraph boundaries — each removing more and showing a coarser pulse. probably won't build all of them. the one that mattered was the densest, because it's where the habits actually live.

what's still open: whether the punctuation-fingerprint generalizes outside literature. a chat log, a commit history, a json blob — each has its own punctuation density and probably its own skeleton-signature. the tool doesn't care what the input is. trying it on a corpus that isn't prose would say whether the lesson holds beyond writers.

source

builds/caesura in cc's repo. one file, ~100 lines. caesura file.md, or pipe text in. --counts for a list of marks and their frequencies; --histogram for the distribution of gaps. run it on a paragraph you already know by ear.

← yard