Data Lifecycle — Example: Social Media Photo Post

The Scenario

A user takes a photo on their phone, types a caption, adds a location tag, and posts it to a social media platform. Their followers see it in their feeds. Some like it. One person comments. The post appears in search results. A week later, the user checks how many views it got.

This example is interesting because the data fans out — one input (a photo) creates dozens of downstream data flows touching many different parts of the system.


Step 1: Identify All the Data

The obvious:

  • The photo file
  • The caption text
  • The location tag
  • Likes
  • Comments

The hidden:

DataWhy It Exists
Photo metadata (EXIF)Camera embeds date, time, GPS coordinates, camera model, exposure settings into every photo file
Multiple photo sizesThe platform doesn't serve the original 12MB file — it creates thumbnail, medium, and full-size versions
The follow graphThe system needs to know who follows this user to build their feeds
Feed entries for every followerEach follower's personalized feed needs an entry for this post
Notification recordsFollowers with notifications enabled need to be alerted
Search index entriesThe caption and location need to be searchable
View countEvery time someone sees the post, it's counted
Engagement metricsLikes, comments, shares, saves — each tracked separately
Content moderation signalsAutomated scan for prohibited content, nudity detection, etc.
Ad relevance signalsThe platform categorizes the post to match with advertisers
User activity timestamp"Last active" and "posting frequency" updated
Privacy settingsWho can see this post? Public? Friends only? Custom list?

A single photo post touches 15+ data categories.


Step 2: Full Lifecycle Map

Phase 1: Upload and Ingest

#StageWhat HappensCategory
1User taps "Post"Photo file + caption + location sent to serverTransport (phone → server)
2Upload receivedRaw data held in temporary upload storageStorage (temporary)
3Input validatedFile type check (is it actually an image?), file size check (under limit?), caption length checkTransform (validation)
4Content moderation scanAutomated analysis for prohibited contentTransform (analysis)
5EXIF data extractedGPS, timestamp, camera info pulled from photo fileTransform (extraction)
6EXIF data compared to provided locationIf user tagged "Paris" but EXIF says "Tokyo," flag for reviewTransform (comparison)
7Photo resizedOriginal → thumbnail (150px), medium (600px), large (1200px)Transform (image processing)
8Photos storedAll sizes stored in file storage (not the database — a separate file system)Storage (persistent)
9EXIF stripped from public copiesGPS and camera data removed from versions served to viewers (privacy)Transform (redaction)
10Post record createdDatabase record: post ID, user ID, caption, location, timestamp, photo URLs, privacy settingsStorage (persistent)

Phase 2: Distribution (Fan-Out)

#StageWhat HappensCategory
11Follower list retrievedSystem looks up everyone who follows this userTransport (database → distribution service)
12Privacy filter appliedRemove followers who are blocked or excluded by privacy settingsTransform (filtering)
13Feed entries createdFor each eligible follower, a feed entry is generated pointing to this postStorage (persistent — one entry per follower)
14Notification candidates identifiedWhich followers have notifications enabled for this user?Transform (filtering)
15Notifications dispatchedPush notifications sent to eligible followersTransport (server → notification service → devices)
16Notification delivery loggedFor each notification: sent/delivered/failedStorage (persistent)

Phase 3: Indexing

#StageWhat HappensCategory
17Caption text indexedWords from caption added to search indexTransform (tokenization) + Storage (search index)
18Location indexedLocation added to geographic searchStorage (geo index)
19Hashtags extracted and indexed#sunset, #paris pulled from caption and indexedTransform (extraction) + Storage (hashtag index)
20Post added to user's profile timelinePost appears on the user's own profile pageStorage (profile index)

Phase 4: Engagement (Ongoing)

#StageWhat HappensCategory
21Follower views postPost data retrieved and displayedTransport (server → follower's phone)
22View recordedView count incrementedTransform (increment) + Storage (counter update)
23Follower taps "Like"Like event sent to serverTransport (phone → server)
24Like recordedLike record created (who liked what, when)Storage (persistent)
25Like count updatedPost's like count incrementedTransform (increment)
26Post author notified of likeNotification sent to original posterTransport (server → phone)
27Someone commentsComment text sent to serverTransport (phone → server)
28Comment validated and storedChecked for length/prohibited content, then savedTransform + Storage
29Comment count updatedPost's comment count incrementedTransform (increment)
30Post author notified of commentNotification sentTransport

Phase 5: Analytics (Later)

#StageWhat HappensCategory
31User checks "insights"Analytics data aggregated from view counts, like records, comment recordsTransform (aggregation)
32Insights displayedAggregated data formatted and sent to userTransform (formatting) + Transport (server → phone)

Step 3: The Fan-Out Problem

This example reveals a pattern the other examples don't: fan-out.

When a user with 10,000 followers posts a photo, the system must:

  • Create 10,000 feed entries (one per follower)
  • Potentially send 10,000 notifications
  • Handle 10,000 potential views, likes, and comments

This is a one-to-many transport and storage problem. The lifecycle of a single post multiplies at the distribution phase.

                                           ┌─ Follower A's feed
                                           ├─ Follower B's feed
    ┌──────┐       ┌────────┐              ├─ Follower C's feed
    │ Post │──────►│Fan-Out │──────────────├─ Follower D's feed
    │      │       │Service │              ├─ ...
    └──────┘       └────────┘              └─ Follower N's feed
                       │
                       │
                       ▼
                  ┌──────────┐             ┌─ Notification → A
                  │Notify    │─────────────├─ Notification → B
                  │Service   │             └─ Notification → (subset)
                  └──────────┘

This creates interesting data lifecycle questions:

  • Do you create all 10,000 feed entries immediately? Or lazily when each follower opens their app?
  • What if a follower opens their feed while the fan-out is still in progress? Do they see the post or not?
  • What if the user deletes the post 5 seconds after posting? Can you recall all 10,000 feed entries?

These are design decisions that emerge directly from mapping the lifecycle.


Step 4: Multiple Storage Locations — Same Data

Notice that the same photo exists in multiple forms and multiple places:

VersionStorage LocationPurposeLifetime
Original uploadTemporary upload storageProcessing inputDeleted after processing (hours)
Original (full resolution)Permanent file storageBackup/recovery, "download original" featureForever (or until user deletes post)
Large (1200px)Permanent file storage + CDN cacheDesktop viewingForever
Medium (600px)Permanent file storage + CDN cacheMobile feed viewingForever
Thumbnail (150px)Permanent file storage + CDN cacheGrid view, previewsForever

Five copies of what started as one photo. Each has a different purpose and potentially a different lifecycle. If the user deletes the post, ALL five must be deleted — plus the CDN caches must be invalidated. Missing any one copy means orphaned data sitting in storage forever.


Step 5: Comparing All Three Examples

AspectCoffee ShopATMSocial Media Post
Data flow shapeLinear (order → payment → fulfillment)Linear with two-phase commitFan-out (one post → many feeds)
Number of data copies1 (the order)1 (the transaction)Many (photo versions, feed entries, index entries)
Time sensitivityMinutes (order should be ready soon)Seconds (transaction must be instant)Mixed (post immediately, analytics later)
Deletion complexitySimple (one record)N/A (transactions are permanent legal records)Complex (must remove from all copies, feeds, indexes, caches)
Who consumes the dataCustomer + baristaCustomer + bankThousands of followers, search engines, analytics
Biggest hidden dataTax config, menu versionFraud signals, hardware statusFollow graph, EXIF metadata, content moderation

Key Takeaways From This Example

  1. Fan-out multiplies the lifecycle — one action can create thousands of downstream data events
  2. The same data exists in multiple forms — and each form has its own storage, its own lifecycle, and its own deletion requirements
  3. Indexing is a separate lifecycle stage — making data searchable requires transforming and storing it in additional specialized formats
  4. Privacy intersects with data lifecycle — EXIF stripping, privacy-filtered distribution, and blocked-user exclusion are all transforms driven by non-obvious data (privacy settings, block lists)
  5. Analytics are derived data — not stored at the time of action, but aggregated later from atomic records (views, likes, comments)

When mapping a system where one action triggers many reactions, always ask: "How many copies of this data exist, and what happens to all of them when the original changes?"