Swasti · mForm V2→V3

13 — Scheme eligibility (authoritative)

This note specifies how the V3 Scheme Application form surfaces only the schemes a member qualifies for, derived from the PM-supplied Scheme Master CSV (raw/scheme-master-2026-05-20.csv — a dump from V2 production, 186 schemes).

Why this matters: V2’s “Eligible Schemes” picker isn’t a flat list — it’s filtered against the member’s profile. V3 must do the same. The list goes from 186 → typically 30–80 schemes depending on the member’s state / age / gender / caste / occupation / income.

Per Fahim (2026-05-20): “After profiling only applicable schemes are shown.”

This note covers the matching algorithm, the canonicalisation strategy for the document master, the gap analysis vs the current Scheme Master doctype, and the implementation plan.


1. Source data shape

The PM CSV is one-hot encoded — instead of one “states” column with comma-separated values, each state / gender / caste / etc. has its own column whose value is either the category name (this scheme applies) or empty (it doesn’t).

FamilyColumnsEncoding
State8State_Kerala, State_Maharashtra, State_Tamil Nadu, State_Madhya Pradesh, State_Andhra Pradesh, State_Karnataka, State_Rajasthan, State_Uttar Pradesh
Gender3Gender_Male, Gender_Female, Gender_Other
Caste4Caste_OBC, Caste_ST, Caste_SC, Caste_General
Marital Status4Marital Status_Single, _Widowed, _Married, _Divorced
Occupation52Occupation_Farmer, Occupation_Student, … (full list in CSV)
Documents required165Select Documents_Aadhaar Card, etc.
Age1 regexAge Rule (Regex) — applied against str(age_in_years)
Income1 regexIncome Rule — applied against str(annual_income_INR)

Cell semantics: non-empty cell = scheme applies to that category. Empty = doesn’t apply. A scheme that should apply to all four castes has all four caste columns filled with the category name.


2. Matching algorithm

A scheme S is applicable to member M if ALL of the following hold:

M.state          ∈ S.states
M.gender         ∈ S.genders
M.caste          ∈ S.castes
M.marital_status ∈ S.marital_statuses
M.occupation     ∈ S.occupations
regex(S.age_rule, str(M.age))            matches
regex(S.income_rule, str(M.annual_income)) matches

Empty regex or .* = no filter on that dimension (always passes).

Empty-family semantics (asked Fahim 2026-05-20, response pending): if a scheme has zero values filled in a family, do we exclude or include all? Our current reading: exclude. Will lock once confirmed.

Pseudo-SQL (Frappe-side query):

SELECT name, scheme_name, category, description, scheme_link
FROM `tabScheme Master` s
WHERE
  EXISTS (SELECT 1 FROM `tabScheme Eligibility State` WHERE parent = s.name AND state = :member_state)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Gender` WHERE parent = s.name AND gender = :member_gender)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Caste` WHERE parent = s.name AND caste = :member_caste)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Marital` WHERE parent = s.name AND marital_status = :member_marital)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Occupation` WHERE parent = s.name AND occupation = :member_occupation)
  AND (s.age_rule = '.*' OR :member_age REGEXP s.age_rule)
  AND (s.income_rule = '.*' OR :member_income REGEXP s.income_rule)

The current Scheme Master doctype stores these as Select (single-value) fields — won’t fit the multi-value reality. See gap analysis below.


3. Data-coverage stats (V2 production, current state)

DimensionCoverage
StateRJ 66%, KA 65%, TN 64.5%, UP 64.5%, AP 63%, MP 61%; Kerala 1%, Maharashtra 0%
GenderFemale 99.5%, Male 87%, Other 86%
CasteST 99.5%, OBC/SC/General 99% each
Marital StatusSingle 98%, Widowed 96%, Married 96%, Divorced 95%
OccupationFarmer 79%, Student 72%, 50+ others with long-tail coverage
Age107/186 no filter; common bands 18-50, 18-60, 60+, 18-119
Income140/186 no filter; common caps ₹6L, ₹3L, ₹5L, ₹1L

These are the live numbers from V2 prod — not a draft, not incomplete. A Maharashtra member in V2 today sees an empty applicable-schemes list. V3 inherits that behaviour until the client tags Maharashtra-applicable schemes.

Universal schemes (no filters at all): 0/186. Every scheme has at least one filter — usually the state list.


4. Age & income regex patterns

PM expresses age/income limits as regex matched against the string form of the number.

RegexMeans
.*no filter
^(1[89]$|^[2-4][0-9]$|^50)$18 ≤ age ≤ 50
^(1[8-9]|[2-6][0-9]|70)$18 ≤ age ≤ 70
^(6[0-9]$|^[7-9][0-9]$|^1[01][0-9]$|^120)$age ≥ 60
^(0|[1-9][0-9]{0,4}|[1-5][0-9]{5}|600000)$annual income ≤ ₹6L
^([0-9]{1,5}|[1-4][0-9]{5}|500000)$annual income ≤ ₹5L

For implementation we store both: the raw regex (canonical from PM) and pre-derived age_min, age_max, income_max columns populated at import time. The derived columns drive the indexed query; the raw regex is the audit trail. If derivation fails for a future weird regex, fall back to REGEXP for that scheme.


5. Document master — single source of truth

Documents are referenced from three places — Member Profiling (which IDs/docs the member possesses), Scheme Application (required docs to apply), Document Application (which document the surveyor is helping the member get). All three use the same Document Master doctype. One canonical name per document, used everywhere.

Today’s state — two parallel masters:

DoctypeRowsUsed by
Civic Document Type12Member Profiling b_prof_doc
Document Master32Document Application

8 documents overlap with mostly identical naming (Bank Account, Birth Certificate, Community Certificate, Death Certificate, Domicile Certificate, Family ID, Income Certificate, PAN Card, Passport, Ration Card, Voter ID). One naming conflict: Aadhaar (Civic) vs Aadhar Card (Document Master). Government-of-India canonical is Aadhaar Card — we adopt that.

Target state — single Document Master:

FieldTypePurpose
document_nameData (autoname)Canonical Title-Case name, e.g. Aadhaar Card
descriptionSmall TextOptional, surveyor-readable explanation
legacy_mongo_idDataIdempotent re-import key

No category field on Document Master. The master is the canonical name registry. Each form declares its own visible subset — surveyors don’t see all 184 docs in one form.

FormVisible subsetHow it’s defined
Member Profiling b_prof_doc12 civic IDs (Aadhaar Card, PAN Card, Voter ID, …)Hardcoded in kMemberFlow schema; names match Document Master rows
Scheme Application required_documentsPer-scheme list (typically 3–8 documents)Loaded from the Scheme Master row’s required_documents child rows — populated from the V2 CSV Select Documents_* columns at import time
Document ApplicationApplication-specific subset (~32 docs today)Existing field, references Document Master

All three reference the same canonical names — so a member who marked Aadhaar Card in the Member form, a scheme that requires Aadhaar Card, and a Document Application for Aadhaar Card all join cleanly on the same string. That’s the entire point of unifying the master.

Canonicalisation rules (Title Case, applied in importer):

FromTo
Aadhaar / Aadhar CardAadhaar Card
Disability certificate / Disability CertificateDisability Certificate
Bhamashah Card (dup row)single row
Email ID (dup row)single row
Pan CardPAN Card
Ration cardRation Card
Birth certificate / Domicile certificate (case only)Title Case
All other 155+ scheme-only docsTitle Case, no trailing whitespace

Migration plan (executed in a single seeder pass):

  1. Run a unification importer that:
    • Upserts the 12 Civic Document Type rows into Document Master using canonical names (e.g. AadhaarAadhaar Card).
    • Normalises the 32 existing Document Master rows to canonical names.
    • Reads the 165 scheme-master document column names, normalises to canonical, upserts into Document Master.
    • Final expected size: ~184 unique documents (32 from current Document Master, 12 from Civic Document Type — all already in DM — and 152 net-new from the scheme CSV).
  2. Update the Member form schema (b_prof_doc) in kMemberFlow to use canonical names (e.g. AadhaarAadhaar Card). The hardcoded 12-item list stays — the names just align with Document Master rows so a member’s selection maps cleanly to scheme requirements and downstream Document Applications.
  3. Wire the Scheme Master doctype’s required_documents child table as a Link → Document Master. The scheme master CSV importer populates this from the per-row Select Documents_* columns, mapping each raw column name to its canonical Document Master row.
  4. Migrate existing Member rows that stored the old name — one-time data fix in the Frappe DB and the mobile SQLite cache (AadhaarAadhaar Card).
  5. Deprecate Civic Document Type (drop the doctype after the migration; remove from the Mobile Configuration sync list).

The Member form’s picker still shows the same 12 items — surveyor experience doesn’t change. But the canonical name now flows cleanly through: a member who marked they have Aadhaar Card, a scheme that requires Aadhaar Card, and a Document Application for Aadhaar Card all join on the same string.


6. Gap analysis vs current Scheme Master doctype

Current mform_swasti Scheme Master doctype has scalar fields (Select for one value each):

state (Link to State)            — single state only
gender (Select)                  — single gender only
caste (Select)                   — single caste only
marital_status (Select)          — single marital_status only
occupation (Select)              — single occupation only
age_rule (Data)                  — free-text regex
income_rule (Data)               — free-text regex
select_documents (Small Text)    — comma-separated list
GapTypeFix
state is single Link, must be multi-valueMISSINGConvert to Table MultiSelect (child doctype with state Link → State).
gender / caste / marital_status / occupation each need to allow multi-valueMISSINGSame: Table MultiSelect children.
age_min, age_max, income_max derived columnsMISSINGAdd as Int; populated by importer from the regex. Original regex kept in age_rule / income_rule for fidelity.
required_documents as Table MultiSelect of Scheme Required DocumentMISSINGCurrently select_documents is comma-separated Small Text. Should be a child doctype so the Document Application flow can reuse the canonical list.
Raw age_rule and income_rule (free text)KEEPSource-of-truth regex from PM. Derived min/max are the primary lookup keys.
0 schemes tagged to MaharashtraNOT A DEFECTThis is V2 prod truth. V3 inherits.
Kerala has 2 schemes onlyNOT A DEFECTV2 prod truth.

7. Implementation plan

  1. Schema migration in mform_swasti:
    • Create child doctypes: Scheme Eligibility State, Scheme Eligibility Gender, Scheme Eligibility Caste, Scheme Eligibility Marital, Scheme Eligibility Occupation, Scheme Required Document.
    • Migrate Scheme Master fields to Table MultiSelect.
    • Add derived: age_min, age_max, income_max.
  2. Document Type seeder (seeders/document_types_canonical.py):
    • Normalise the 165 raw columns → 162 canonical Title-Case names.
    • Upsert into a new Scheme Required Document doctype.
    • Link the 7 Member-form-aligned names to the existing Civic Document Type rows.
  3. Scheme Master CSV importer (seeders/scheme_master_csv.py, mirror the donor importer):
    • Read all 186 rows.
    • For each one-hot family column with a non-empty cell, append a child row.
    • Parse the age/income regexes to derive numeric bounds.
    • Keep legacy_mongo_id = Transaction Id for idempotent re-runs.
  4. Frappe applicable-schemes APImform_swasti.api.applicable_schemes(member_name):
    • Reads the member’s state / gender / caste / marital / occupation / age / income.
    • Joins against the child doctypes, applies regex bounds.
    • Returns name, description, link, required_documents.
  5. Mobile: replace the placeholder Scheme Application picker with a search-over-applicable list, calling the API on screen open. Cache locally so the surveyor can work offline after first open.

8. Open items

  1. Empty-family semantics — asked Fahim 2026-05-20 in mForm V3 <> Swasti. Current reading: a scheme with 0 values filled in a family is excluded for everyone. Confirm before locking the seeder logic.
  2. Document canonicalisation — confirmed: standardise to Member-form Title Case on our side; no V2-side cleanup needed. (Fahim’s confirmation also pending — informational.)
  3. Age / income regex format for future iterations — would Fahim prefer to send age_min, age_max, income_max as numeric columns next time? Quality-of-life ask; not blocking.

This is the source of truth. Future Scheme Application work must reference this note. Update when Fahim resolves the open items above.


Last updated 2026-05-04