Author Normalization

Standardizing contributor presentation without changing what they submitted.

Status In Production · since 2024
Role Design + EngineeringSolo, end to end
Context San Antonio ReviewA component of the SAR system
Stack
PHP Claude API Google Cloud Vision PublishPress Gravity Forms

Every contributor writes their own bio, in their own voice, and uploads their own photo. An organization needs those to read and look consistent. Allowing an LLM to revise them crosses a line: it changes what a person said and alters how they chose to appear. The design problem was figuring out what could be standardized without fundamentally changing what the contributor submitted.

The Trust Question

Where does normalizing end and fundamental change begin?

A bio is a small act of self-presentation. They chose what to mention, what to emphasize and how to sound. Their photo is how they chose to appear. Both belong to the contributor. If the organization silently rewrites the bio into a different voice, or “improves” the photo, it has changed the person. That’s a trust violation, and it’s invisible to the contributor until they see themselves rendered as someone slightly other than who they presented.

But the organization does need consistency. We want them in third person, with publication names formatted consistently every time. Photos should be set within a consistent frame. Twenty contributors’ formatting choices can’t become twenty different presentations on the same page.

The design resolves this with one principle: normalize the form, never alter the content. Change first-person to third-person, but don’t change what’s said. Crop the photo, but don’t retouch it. The boundary between normalization and editing is the entire design.

What the contributor trusts
“What I wrote about myself stays what I wrote. My photo stays my photo.”
What the organization needs
“Bios and avatars have to read consistently, not as twenty different formatting choices.”
What I Designed

Parallel normalization paths that join at a single contributor record.

When a content manager approves a submission, the pipeline takes the contributor’s raw inputs — bio, photo, name, social links — and runs them through parallel normalization paths. The bio goes to Claude for a structural review and rewrite. The photo goes to Cloud Vision for face-aware cropping. Name and social links run through deterministic normalization. The results join at a single PublishPress author record, deduplicated by email, so the same person never becomes two contributors.

The normalized record becomes part of the draft a content manager reviews before publishing. A human always sees the result before a reader does. The automation removes the manual production work, but it doesn’t remove the oversight.

Approved submission bio · photo · name · socials Bio normalization Claude · 1st → 3rd person, formatting Avatar cropping Cloud Vision · face-aware Identity resolution name · socials · normalize Merge / create email dedup Author record PublishPress AUTOMATED NORMALIZATION DATA / GATE OUTPUT
Two parallel normalization paths — bio via Claude, avatar via Cloud Vision — plus identity resolution, joining at a single deduplicated author record.
What I Deliberately Did Not Build

Everything that would have crossed from normalizing into editing.

The whole design lives on one line: normalize the structure, but never alter the content. Each rejection is a place where automation could have crossed the line while appearing to be an improvement.

Bio rewriting that changes voice
The bio path rewrites first-person to third-person and standardizes formatting. It does not change voice, tone, word choice, or what the contributor chose to emphasize.
AI-written or AI-“enhanced” bios
The system never writes a bio for someone, never embellishes, never fills a gap. If a contributor submitted two plain sentences, they get two plain sentences, in third person. A polished bio nobody actually wrote is a fabrication, not a normalization.
Avatar retouching or “enhancement”
Cloud Vision is used to locate the face for cropping, full stop. No smoothing, no filtering, no beautification. The contributor’s photo stays the contributor’s photo; the only change is the frame it sits in.
Forced cropping when no face is found
When Cloud Vision fails to detect a face, the pipeline doesn’t guess or force an awkward crop. It falls back to WordPress’s standard crop. Producing a bad crop that requires manual recropping would generate exactly the rework the automation is supposed to remove.
How It Works

Three normalization paths, one deduplicated record.

Bio normalization

Claude API · PHP
Form-only rewrite

Claude rewrites the bio from first-person to third-person, italicizes publication names, and strips inline social handles, which are captured separately as structured links. The prompt is tightly scoped to formatting transformations. Claude is explicitly instructed not to alter wording, change emphasis, or “improve” anything.

Before — as submitted (first person, raw): “I’m a writer based in Austin. My essays have appeared in The Atlantic and Granta, and you can find me on Twitter at @example.” After — normalized (third person, formatting only): “[Contributor] is a writer based in Austin. Their essays have appeared in The Atlantic and Granta.”

The handle is removed from the prose and normalized into the contributor’s structured social links. Nothing about what the contributor said is changed.

Avatar cropping

Google Cloud Vision · PHP
Face-aware, with fallback

Cloud Vision detects the face in the uploaded photo and the pipeline crops centered on it, so the contributor stays framed regardless of how the submitted photo was framed. When no face is detected, it falls back to WordPress’s default crop rather than forcing a guess.

// Crop to the detected face; fall back to WP default if none found $faces = $vision->faceDetection( $image )->getFaceAnnotations(); if ( count( $faces ) > 0 ) { $crop = crop_centered_on_face( $image, $faces[0]->getBoundingPoly() ); } else { $crop = wp_default_crop( $image ); // never force a guess }

Identity resolution

PHP · PublishPress
Dedup, names, social URLs

Email-based deduplication decides whether to merge into an existing contributor record or create a new one, so the same person never becomes two authors. A three-field name system prioritizes pseudonyms, so they can choose to use that rather than their legal name. Social URLs are normalized across six platforms. Handles, full URLs, and partial inputs all resolve to a single canonical format.

Result

Consistent presentation, untouched voice.

0
Manual avatar crops
0
Bios edited for content
7
Social platforms normalized
1
Record per contributor

Preserved

  • The contributor’s voice. What they wrote stays what they wrote
  • What the contributor chose to emphasize about themselves
  • The contributor’s actual photo. Just cropped, never retouched
  • Pseudonyms, handled with priority

Now possible

  • Consistent contributor presentation
  • No manual avatar cropping or bio reformatting
  • Duplicate-free author records
  • The pattern ports to any system needing to normalize user-submitted profiles, like CRM contacts, directory listings, marketplace seller profiles