18 — V2 → V3 data migration (PM-facing)

This is the plain-language version of how we plan to move Swasti’s existing mForm V2 data into the new V3 Frappe + mobile-app system. Read this before reading the engineer-facing detail at migration/plan.md in the repo. Nothing has been migrated yet. We’re at the planning stage.

The picture in one diagram

V2 (running today)                              V3 (where we're going)
─────────────────────                           ─────────────────────
 Mongo collections                              Frappe doctypes
                                                + mobile app
  form_1000  →  500K member responses    →    Member  (1 row each)
  form_1002  →  540K scheme apps          →    Scheme Application
  form_1003  →  502K scheme followups     →    Scheme Application Followup
  form_1004  →   60K document apps        →    Document Application
  form_1005  →   13K document followups   →    Document Application Followup
  form_1010  →  1.9K health screenings    →    Health Screening V3  (PM decision)
  form_1011  →   72 HS followups          →    Health Screening Followup (PM decision)

  users      →  507 V2 surveyors          →    Frappe Users with surveyor roles
  geography  →  states/dists/blocks/etc.  →    already imported (May 6)
  masters    →  schemes/docs/donors etc.  →    already imported (May 6)

About 1.6 million response rows to move. Geography and master data is already done — that landed in the V3 patches back in May.

What we mean by “migration”

We read each V2 response, translate its fields into V3 shape, and insert it as a row in the matching V3 doctype. V2 keeps running untouched throughout — until the very last “cut” step, where V2 is briefly frozen so we can snapshot it and bring V3 in sync.

We do not copy the V2 database into V3. The shape is different. The V3 doctypes were designed during the kickoff walkthroughs with you and PM — sometimes V3 added fields V2 doesn’t have (Disability Certificate attach), sometimes V3 dropped fields V2 had (some “looping” sub-questions). The mapping is what bridges the two.

How one Member’s record will travel

Take a real-shaped V2 row (names changed). On the V2 side, in form_1000:

_id:           ObjectId("66e9f3eb…")          ← V2's unique id
formId:        1000                           ← which V2 form
b_pro_name:    "Lakshmi Devi"
order5:        "Suresh Kumar"                 ← Father / Spouse
b_prof_dob:    "1992-04-15"
b_prof_mob:    "9876543210"
b_prof_sex:    "Female"
order13:       "Married"                      ← Marital
order17:       "No"                           ← Disabled?
state:         ObjectId("6700…")              ← Karnataka, by ref
village:       ObjectId("6705…")              ← Avalahalli
userId:        ObjectId("66af…")              ← the surveyor
createdAt:     2024-11-03 09:14:11 UTC
… + 40 more orderN fields …

After migration, the same row lives in V3 as a Member doctype:

name:                MEM-2026-00xxx           ← V3 auto-assigned
v2_response_id:      66e9f3eb…                ← preserved for audit trail
b_pro_name:          "Lakshmi Devi"           ← same field name
order5:              "Suresh Kumar"           ← same field name
b_prof_dob:          "1992-04-15"             ← same
b_prof_mob:          "9876543210"             ← same
b_prof_sex:          "Female"                 ← same
order13:             "Married"                ← same
order17:             "No"                     ← same
state:               "Karnataka"              ← V3 server_name, via lookup
village:             "Avalahalli"             ← V3 server_name, via lookup
surveyor:            lakshmi.devi@…           ← V3 Frappe User
creation:            2024-11-03 09:14:11

For Member, 48 of 56 fields carry across with the exact same column name — V3 kept V2’s orderN convention deliberately, so the mapping is mechanical. 6 more fields (the “Other IDs” cluster) reshape from 7 flat V2 columns into a single V3 child table with up to 3 rows. 1 field (order24, the row-count selector) is dropped — V3 derives the count from the child table itself. 1 field (order20, “Additional Details” free-text) is the only one needing a PM decision: drop, keep, or merge into a notes field.

The geography and surveyor references translate from V2 ObjectId to V3 server_name via lookup tables we already built.

The other forms are harder

Scheme Application / Scheme Followup / Document Application / Document Followup — these were rewritten in V3 during the kickoff PM walkthroughs (notes 11, 13). V3 dropped the orderN convention and used semantic field names (donor_name instead of name_donor, father_spouse_name instead of order5). So the mapping doc now does a two-pass match: first by exact column name (HIGH confidence), then by label text similarity (MED confidence with a percent score — you’ll see “100% — PM confirm” on rows where the V2 question text and V3 field label are word-for-word identical). The geography columns (state, district, block, gramPanchayat, village) appear flat on every V2 form but on V3 are stored only on Member — every other form derives them via the Member link. The mapping doc shows those as DROP with that explanation, so PMs don’t have to wonder where State went.

Form	HIGH	MED	DROP	CHILD	MANUAL	of total
Member	48	0	1	6	1	56
Scheme Application	3	22	6	0	12	43
Scheme Application Followup	0	10	4	0	23	37
Document Application	4	26	6	0	3	39
Document Application Followup	0	10	4	0	25	39
Health Screening V3	6	24	5	0	56	91
Health Screening Followup	0	25	0	0	72	97

Per-form PM walkthrough scope is the MANUAL count plus a confirm-pass on the MED rows — for Document Application that’s 3 rows to decide + 26 to nod through. For Health Screening V3, where the V3 doctype is a greenfield rewrite from PM Fahim’s walkthrough video, 56 V2 questions have no V3 home — that’s the trigger for the separate Health Screening migrate y/n decision before the walkthrough.

Health Screening V3 — even more so. V3 HS is a greenfield rewrite from PM Fahim’s walkthrough video; the fields, the question order, the status pills are different. The PM call here is whether we migrate V2 HS data at all, or start fresh on V3 from this point forward. Volumes are low (1,913 V2 screenings + 72 follow-ups) so either path is defensible.

What we need from you (the PM decisions)

Before any code touches any data:

Date cutoff — do we migrate all-time V2 data, or last N months/years? Older rows may reference geographies that have been re-named since.
Health Screening — migrate the 1,913 V2 screenings at all, or treat V3 HS as a clean restart from V3 launch?
Surveyor scope — migrate all 507 V2 users, or only those active in the last (e.g.) 6 months? Inactive accounts still have data attributed to them.
Dedupe — testers have been creating sample Members on staging during Phase 6 triage; if a real V2 Member happens to share Name + Father + DOB + Phone with a tester row, what do we do (skip / merge / flag for review)?
V2 freeze window — when does V2 stop accepting new writes? We need a clean snapshot moment to do the final cut.
Per-form mapping review — for each of the 5 in-scope forms, we need a 30–45 min walkthrough of the mapping doc to confirm/correct the field-by-field translation. Member first (it’s the most ready), then Scheme App, then Followup, then Doc, then HS (if in scope).

Translations come along for the ride

V2 stored every question and most options in 5–7 languages — mr, ta, kn, ml, te, hi, plus English. V3’s field labels live in English in the doctype JSON; Frappe’s translation engine swaps them at render time based on the surveyor’s User.language. The mobile SDK already supports this via mobile_auth.get_translations?lang=<code>.

We extract the V2 translations and emit them as Frappe-format CSVs at migration/translations/{te,ta,kn,ml,mr,hi}.csv. After PM sign-off they drop into mform_swasti/translations/ and load on the next bench migrate — no app rebuild needed.

Current extracted rows: 75 for Telugu / Kannada / Malayalam, 65 for Tamil, 52 for Marathi, 0 for Hindi (V2 left almost everything in English on Hindi forms). The extractor is deliberately conservative — only exact-EN-source matches, no position-based option joins, no echoes-of-English. PM-visible quality issues from the V2 source are listed in migration/translations/README.md.

What we will NOT do without a green light

Touch V2 prod beyond read-only inventory queries.
Migrate any row before its per-form mapping doc is signed off.
Skip the staging dry-run.
Skip the UAT mirror.
Run the prod migration without a known-good rollback path.

Where the engineer-facing detail lives

In the swasti-mform-migration repo:

migration/plan.md — the full engineer plan
migration/inventory.md — V2 collection census
migration/mappings/<doctype>.md — per-form V2 → V3 field tables
migration/v2-form-schemas/form_NNNN.md — V2 question dumps
migration/samples/ — 5-row sample reads (gitignored, PII)

Timeline (proposed)

Week	What	Who
Wk 1	Per-form mapping walkthroughs — Member first, then Scheme, then Document	Engineer + PM (Fahim)
Wk 2	Status taxonomy + surveyor mapping confirmation; ELI5 sign-off	PM + Akshat
Wk 2	Extended inventory pass on prod (off-peak)	Engineer
Wk 3	Staging dry run, sample migration	Engineer; PM smoke-tests on Vivo
Wk 4	Full staging migration	Engineer
Wk 5	UAT migration + PM/QA cycle	Engineer + Akshat
Wk 6	Prod cut (locked window)	Engineer + DevOps + PM signed off

Six weeks end-to-end, assuming PM availability for the per-form reviews. Faster if mapping reviews compress.

18-v2-to-v3-migration-eli5