13 — Scheme eligibility (authoritative)

This note specifies how the V3 Scheme Application form surfaces only the schemes a member qualifies for, derived from the PM-supplied Scheme Master CSV (raw/scheme-master-2026-05-20.csv — a dump from V2 production, 186 schemes).

Why this matters: V2’s “Eligible Schemes” picker isn’t a flat list — it’s filtered against the member’s profile. V3 must do the same. The list goes from 186 → typically 30–80 schemes depending on the member’s state / age / gender / caste / occupation / income.

Per Fahim (2026-05-20): “After profiling only applicable schemes are shown.”

This note covers the matching algorithm, the canonicalisation strategy for the document master, the gap analysis vs the current Scheme Master doctype, and the implementation plan.

1. Source data shape

The PM CSV is one-hot encoded — instead of one “states” column with comma-separated values, each state / gender / caste / etc. has its own column whose value is either the category name (this scheme applies) or empty (it doesn’t).

Family	Columns	Encoding
State	8	`State_Kerala`, `State_Maharashtra`, `State_Tamil Nadu`, `State_Madhya Pradesh`, `State_Andhra Pradesh`, `State_Karnataka`, `State_Rajasthan`, `State_Uttar Pradesh`
Gender	3	`Gender_Male`, `Gender_Female`, `Gender_Other`
Caste	4	`Caste_OBC`, `Caste_ST`, `Caste_SC`, `Caste_General`
Marital Status	4	`Marital Status_Single`, `_Widowed`, `_Married`, `_Divorced`
Occupation	52	`Occupation_Farmer`, `Occupation_Student`, … (full list in CSV)
Documents required	165	`Select Documents_Aadhaar Card`, etc.
Age	1 regex	`Age Rule (Regex)` — applied against `str(age_in_years)`
Income	1 regex	`Income Rule` — applied against `str(annual_income_INR)`

Cell semantics: non-empty cell = scheme applies to that category. Empty = doesn’t apply. A scheme that should apply to all four castes has all four caste columns filled with the category name.

2. Matching algorithm

A scheme S is applicable to member M if ALL of the following hold:

M.state          ∈ S.states
M.gender         ∈ S.genders
M.caste          ∈ S.castes
M.marital_status ∈ S.marital_statuses
M.occupation     ∈ S.occupations
regex(S.age_rule, str(M.age))            matches
regex(S.income_rule, str(M.annual_income)) matches

Empty regex or .* = no filter on that dimension (always passes).

Empty-family semantics (asked Fahim 2026-05-20, response pending): if a scheme has zero values filled in a family, do we exclude or include all? Our current reading: exclude. Will lock once confirmed.

Pseudo-SQL (Frappe-side query):

SELECT name, scheme_name, category, description, scheme_link
FROM `tabScheme Master` s
WHERE
  EXISTS (SELECT 1 FROM `tabScheme Eligibility State` WHERE parent = s.name AND state = :member_state)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Gender` WHERE parent = s.name AND gender = :member_gender)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Caste` WHERE parent = s.name AND caste = :member_caste)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Marital` WHERE parent = s.name AND marital_status = :member_marital)
  AND EXISTS (SELECT 1 FROM `tabScheme Eligibility Occupation` WHERE parent = s.name AND occupation = :member_occupation)
  AND (s.age_rule = '.*' OR :member_age REGEXP s.age_rule)
  AND (s.income_rule = '.*' OR :member_income REGEXP s.income_rule)

The current Scheme Master doctype stores these as Select (single-value) fields — won’t fit the multi-value reality. See gap analysis below.

3. Data-coverage stats (V2 production, current state)

Dimension	Coverage
State	RJ 66%, KA 65%, TN 64.5%, UP 64.5%, AP 63%, MP 61%; Kerala 1%, Maharashtra 0%
Gender	Female 99.5%, Male 87%, Other 86%
Caste	ST 99.5%, OBC/SC/General 99% each
Marital Status	Single 98%, Widowed 96%, Married 96%, Divorced 95%
Occupation	Farmer 79%, Student 72%, 50+ others with long-tail coverage
Age	107/186 no filter; common bands 18-50, 18-60, 60+, 18-119
Income	140/186 no filter; common caps ₹6L, ₹3L, ₹5L, ₹1L

These are the live numbers from V2 prod — not a draft, not incomplete. A Maharashtra member in V2 today sees an empty applicable-schemes list. V3 inherits that behaviour until the client tags Maharashtra-applicable schemes.

Universal schemes (no filters at all): 0/186. Every scheme has at least one filter — usually the state list.

4. Age & income regex patterns

PM expresses age/income limits as regex matched against the string form of the number.

Regex	Means
`.*`	no filter
`^(1[89]$\|^[2-4][0-9]$\|^50)$`	18 ≤ age ≤ 50
`^(1[8-9]\|[2-6][0-9]\|70)$`	18 ≤ age ≤ 70
`^(6[0-9]$\|^[7-9][0-9]$\|^1[01][0-9]$\|^120)$`	age ≥ 60
`^(0\|[1-9][0-9]{0,4}\|[1-5][0-9]{5}\|600000)$`	annual income ≤ ₹6L
`^([0-9]{1,5}\|[1-4][0-9]{5}\|500000)$`	annual income ≤ ₹5L

For implementation we store both: the raw regex (canonical from PM) and pre-derived age_min, age_max, income_max columns populated at import time. The derived columns drive the indexed query; the raw regex is the audit trail. If derivation fails for a future weird regex, fall back to REGEXP for that scheme.

5. Document master — single source of truth

Documents are referenced from three places — Member Profiling (which IDs/docs the member possesses), Scheme Application (required docs to apply), Document Application (which document the surveyor is helping the member get). All three use the same Document Master doctype. One canonical name per document, used everywhere.

Today’s state — two parallel masters:

Doctype	Rows	Used by
`Civic Document Type`	12	Member Profiling `b_prof_doc`
`Document Master`	32	Document Application

8 documents overlap with mostly identical naming (Bank Account, Birth Certificate, Community Certificate, Death Certificate, Domicile Certificate, Family ID, Income Certificate, PAN Card, Passport, Ration Card, Voter ID). One naming conflict: Aadhaar (Civic) vs Aadhar Card (Document Master). Government-of-India canonical is Aadhaar Card — we adopt that.

Target state — single Document Master:

Field	Type	Purpose
`document_name`	Data (autoname)	Canonical Title-Case name, e.g. `Aadhaar Card`
`description`	Small Text	Optional, surveyor-readable explanation
`legacy_mongo_id`	Data	Idempotent re-import key

No category field on Document Master. The master is the canonical name registry. Each form declares its own visible subset — surveyors don’t see all 184 docs in one form.

Form	Visible subset	How it’s defined
Member Profiling `b_prof_doc`	12 civic IDs (Aadhaar Card, PAN Card, Voter ID, …)	Hardcoded in `kMemberFlow` schema; names match Document Master rows
Scheme Application `required_documents`	Per-scheme list (typically 3–8 documents)	Loaded from the Scheme Master row’s `required_documents` child rows — populated from the V2 CSV `Select Documents_*` columns at import time
Document Application	Application-specific subset (~32 docs today)	Existing field, references Document Master

All three reference the same canonical names — so a member who marked Aadhaar Card in the Member form, a scheme that requires Aadhaar Card, and a Document Application for Aadhaar Card all join cleanly on the same string. That’s the entire point of unifying the master.

Canonicalisation rules (Title Case, applied in importer):

From	To
`Aadhaar` / `Aadhar Card`	`Aadhaar Card`
`Disability certificate` / `Disability Certificate`	`Disability Certificate`
`Bhamashah Card` (dup row)	single row
`Email ID` (dup row)	single row
`Pan Card`	`PAN Card`
`Ration card`	`Ration Card`
`Birth certificate` / `Domicile certificate` (case only)	Title Case
All other 155+ scheme-only docs	Title Case, no trailing whitespace

Migration plan (executed in a single seeder pass):

Run a unification importer that:
- Upserts the 12 Civic Document Type rows into Document Master using canonical names (e.g. Aadhaar → Aadhaar Card).
- Normalises the 32 existing Document Master rows to canonical names.
- Reads the 165 scheme-master document column names, normalises to canonical, upserts into Document Master.
- Final expected size: ~184 unique documents (32 from current Document Master, 12 from Civic Document Type — all already in DM — and 152 net-new from the scheme CSV).
Update the Member form schema (b_prof_doc) in kMemberFlow to use canonical names (e.g. Aadhaar → Aadhaar Card). The hardcoded 12-item list stays — the names just align with Document Master rows so a member’s selection maps cleanly to scheme requirements and downstream Document Applications.
Wire the Scheme Master doctype’s required_documents child table as a Link → Document Master. The scheme master CSV importer populates this from the per-row Select Documents_* columns, mapping each raw column name to its canonical Document Master row.
Migrate existing Member rows that stored the old name — one-time data fix in the Frappe DB and the mobile SQLite cache (Aadhaar → Aadhaar Card).
Deprecate Civic Document Type (drop the doctype after the migration; remove from the Mobile Configuration sync list).

The Member form’s picker still shows the same 12 items — surveyor experience doesn’t change. But the canonical name now flows cleanly through: a member who marked they have Aadhaar Card, a scheme that requires Aadhaar Card, and a Document Application for Aadhaar Card all join on the same string.

6. Gap analysis vs current `Scheme Master` doctype

Current mform_swasti Scheme Master doctype has scalar fields (Select for one value each):

state (Link to State)            — single state only
gender (Select)                  — single gender only
caste (Select)                   — single caste only
marital_status (Select)          — single marital_status only
occupation (Select)              — single occupation only
age_rule (Data)                  — free-text regex
income_rule (Data)               — free-text regex
select_documents (Small Text)    — comma-separated list

Gap	Type	Fix
`state` is single Link, must be multi-value	MISSING	Convert to Table MultiSelect (child doctype with `state` Link → State).
`gender` / `caste` / `marital_status` / `occupation` each need to allow multi-value	MISSING	Same: Table MultiSelect children.
`age_min`, `age_max`, `income_max` derived columns	MISSING	Add as Int; populated by importer from the regex. Original regex kept in `age_rule` / `income_rule` for fidelity.
`required_documents` as Table MultiSelect of `Scheme Required Document`	MISSING	Currently `select_documents` is comma-separated `Small Text`. Should be a child doctype so the Document Application flow can reuse the canonical list.
Raw `age_rule` and `income_rule` (free text)	KEEP	Source-of-truth regex from PM. Derived min/max are the primary lookup keys.
0 schemes tagged to Maharashtra	NOT A DEFECT	This is V2 prod truth. V3 inherits.
Kerala has 2 schemes only	NOT A DEFECT	V2 prod truth.

7. Implementation plan

Schema migration in mform_swasti:
- Create child doctypes: Scheme Eligibility State, Scheme Eligibility Gender, Scheme Eligibility Caste, Scheme Eligibility Marital, Scheme Eligibility Occupation, Scheme Required Document.
- Migrate Scheme Master fields to Table MultiSelect.
- Add derived: age_min, age_max, income_max.
Document Type seeder (seeders/document_types_canonical.py):
- Normalise the 165 raw columns → 162 canonical Title-Case names.
- Upsert into a new Scheme Required Document doctype.
- Link the 7 Member-form-aligned names to the existing Civic Document Type rows.
Scheme Master CSV importer (seeders/scheme_master_csv.py, mirror the donor importer):
- Read all 186 rows.
- For each one-hot family column with a non-empty cell, append a child row.
- Parse the age/income regexes to derive numeric bounds.
- Keep legacy_mongo_id = Transaction Id for idempotent re-runs.
Frappe applicable-schemes API — mform_swasti.api.applicable_schemes(member_name):
- Reads the member’s state / gender / caste / marital / occupation / age / income.
- Joins against the child doctypes, applies regex bounds.
- Returns name, description, link, required_documents.
Mobile: replace the placeholder Scheme Application picker with a search-over-applicable list, calling the API on screen open. Cache locally so the surveyor can work offline after first open.

8. Open items

Empty-family semantics — asked Fahim 2026-05-20 in mForm V3 <> Swasti. Current reading: a scheme with 0 values filled in a family is excluded for everyone. Confirm before locking the seeder logic.
Document canonicalisation — confirmed: standardise to Member-form Title Case on our side; no V2-side cleanup needed. (Fahim’s confirmation also pending — informational.)
Age / income regex format for future iterations — would Fahim prefer to send age_min, age_max, income_max as numeric columns next time? Quality-of-life ask; not blocking.

This is the source of truth. Future Scheme Application work must reference this note. Update when Fahim resolves the open items above.

Scheme eligibility & applicable-schemes logic