india-location-normalizer

TotalClaw 作者 totalclaw

将印度房地产位置文本规范化为规范的城市和地点字段（孟买和浦那 v1），具有信心和未解决的标志。当潜在客户包含 Goregaon、Andheri W、PCMC、Hinjewadi、Baner 或 Wakad 等别名时使用。推荐的链位置：领先提取者，然后是印度位置标准化者，然后是情绪优先评分者。请勿用于写入或出站操作。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~vishalgojha-india-location-normalizer

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~vishalgojha-india-location-normalizer/file -o vishalgojha-india-location-normalizer.md

# India Location Normalizer

Resolve messy India locality aliases into canonical location fields without side effects.

## Quick Triggers

- Normalize Mumbai/Pune location aliases from extracted leads.
- Map PCMC and Hinjewadi variants to canonical localities.
- Resolve Mumbai shorthand like `Scruz`, `Khar`, `Andheri W`, `Turner Road`, `Carter Road`.
- Standardize locality names before scoring or storage.

## Recommended Chain

`message-parser -> lead-extractor -> india-location-normalizer -> sentiment-priority-scorer`

Target KPI for production tuning: improve canonical Mumbai/Pune locality resolution versus extractor-only baseline.

## Execute Workflow

1. Accept lead-location payload from Supervisor.
2. Validate input against `references/location-normalizer-input.schema.json`.
3. Use `references/india-location-aliases-v1.json` as the authoritative lookup map.
4. Match in this order:
   - exact alias match (case-insensitive)
   - token-normalized alias match (trim punctuation, collapse spaces)
   - conservative fuzzy match only when clearly unambiguous
5. Return one normalized location record per input lead with:
   - `city`
   - `locality_canonical`
   - `micro_market`
   - `matched_alias`
   - `confidence`
   - `unresolved_flag`
6. Validate output against `references/location-normalizer-output.schema.json`.

## Enforce Boundaries

- Never parse raw chat exports.
- Never extract non-location entities.
- Never write to Google Sheets, databases, or files.
- Never send messages or trigger external channels.
- Never auto-resolve low-confidence ambiguous aliases.

## Handle Ambiguity

1. If multiple localities match equally, set `unresolved_flag: true`.
2. If no confident match exists, preserve input in `matched_alias` and mark unresolved.
3. Prefer false-negative over false-positive for city/locality assignment.

---

## 中文说明

# 印度位置规范化器

在不产生副作用的情况下，将杂乱的印度地点别名解析为规范的位置字段。

## 快速触发

- 从提取的潜在客户中规范化孟买/浦那的位置别名。
- 将 PCMC 和 Hinjewadi 的各种变体映射到规范地点。
- 解析孟买的简写形式，如 `Scruz`、`Khar`、`Andheri W`、`Turner Road`、`Carter Road`。
- 在评分或存储之前标准化地点名称。

## 推荐的链

`message-parser -> lead-extractor -> india-location-normalizer -> sentiment-priority-scorer`

生产调优的目标 KPI：相较于仅使用提取器的基线，提升孟买/浦那规范地点的解析效果。

## 执行工作流

1. 接受来自 Supervisor 的潜在客户位置载荷。
2. 根据 `references/location-normalizer-input.schema.json` 校验输入。
3. 使用 `references/india-location-aliases-v1.json` 作为权威查找映射表。
4. 按以下顺序进行匹配：
   - 精确别名匹配（不区分大小写）
   - 标记规范化别名匹配（去除标点、合并空格）
   - 仅在明确无歧义时进行保守的模糊匹配
5. 为每个输入的潜在客户返回一条规范化位置记录，包含：
   - `city`
   - `locality_canonical`
   - `micro_market`
   - `matched_alias`
   - `confidence`
   - `unresolved_flag`
6. 根据 `references/location-normalizer-output.schema.json` 校验输出。

## 强制边界

- 切勿解析原始聊天导出内容。
- 切勿提取非位置实体。
- 切勿写入 Google Sheets、数据库或文件。
- 切勿发送消息或触发外部渠道。
- 切勿自动解析低置信度的有歧义别名。

## 处理歧义

1. 如果多个地点同等匹配，设置 `unresolved_flag: true`。
2. 如果不存在可信匹配，在 `matched_alias` 中保留输入并标记为未解决。
3. 在城市/地点分配上，宁可假阴性也不要假阳性。