employee-skills-importer

TotalClaw 作者 totalclaw

解析员工技能 CSV 文件，识别技能类别和个人技能，从员工表中查找员工 ID，并为技能类别、技能和员工技能表生成幂等 SQL INSERT 语句。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~inna-demidova-employee-skills-importer

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~inna-demidova-employee-skills-importer/file -o inna-demidova-employee-skills-importer.md

## 概述（中文）

解析员工技能 CSV 文件，识别技能类别和个人技能，从员工表中查找员工 ID，并为技能类别、技能和员工技能表生成幂等 SQL INSERT 语句。

## 原文

# Employee Skills Importer

This skill automates the process of importing employee skills from CSV files into a Supabase database. It parses the CSV, checks what already exists in the database, and generates idempotent SQL scripts to insert missing data.

## Overview

The skill performs a 3-step process:
1. **Identify and insert missing skill categories** - Extract categories from CSV headers, check database, generate INSERT script
2. **Identify and insert missing skills** - Extract skills with their categories, check database, generate INSERT script  
3. **Generate employee_skills INSERT script** - Map employees by name, link skills, create final INSERT statements

## CSV Format Requirements

The CSV must have:
- **Row 1**: Empty or metadata (ignored)
- **Row 2**: Skill category names spanning multiple columns
- **Row 3+**: Individual skill names (column headers, may span multiple rows due to line breaks)
- **Employee data rows**: Employee data with First Name, Last Name in first two columns, followed by skill experience values

Example structure:
```
,,,,,,.NET,,,,,Front-end,,,Java,,,
First Name,Last Name,Full Name,Unit,...,C#,ASP.net,MVC,...,JavaScript,HTML,CSS,...,Java,Spring,...
John,Doe,John Doe,Unit 1,...,5,4,3,...,6,6,5,...,0,0,...
```

## Workflow

### Step 1: Skill Categories

1. Parse row 2 to extract unique category names
2. Query the database to check existing categories:
   ```sql
   SELECT name FROM skill_categories
   ```
3. Generate idempotent INSERT for missing categories:
   ```sql
   INSERT INTO skill_categories (name) 
   VALUES ('Category1'), ('Category2'), ('Category3')
   ON CONFLICT (name) DO NOTHING;
   ```

### Step 2: Skills

1. Parse skill name rows and map to categories from row 2
2. Query database for existing skills:
   ```sql
   SELECT s.name, sc.name as category_name 
   FROM skills s 
   LEFT JOIN skill_categories sc ON s.category_id = sc.id
   ```
3. For each skill to insert:
   - Find the category_id using a subquery
   - Generate idempotent INSERT:
   ```sql
   INSERT INTO skills (name, category_id)
   VALUES 
     ('C#', (SELECT id FROM skill_categories WHERE name = '.NET')),
     ('JavaScript', (SELECT id FROM skill_categories WHERE name = 'Front-end'))
   ON CONFLICT (name) DO NOTHING;
   ```

### Step 3: Employee Skills

1. Parse employee rows (first_name, last_name, skill values)
2. Query employees table to get employee IDs:
   ```sql
   SELECT id, first_name, last_name FROM employees
   ```
3. For each employee, for each skill with non-zero experience:
   - Look up employee_id by matching first_name + last_name
   - Look up skill_id using subquery
   - **CRITICAL: Use TRIM() in WHERE clause to handle whitespace variations in database**
   - Generate INSERT:
   ```sql
   INSERT INTO employee_skills (employee_id, skill_id, years_of_experience)
   VALUES 
     (
       (SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
       (SELECT id FROM skills WHERE name = 'C#'),
       5
     )
   ON CONFLICT (employee_id, skill_id) DO UPDATE 
   SET years_of_experience = EXCLUDED.years_of_experience;
   ```

## Important Notes

### Database Schema
- `skill_categories` table: id (uuid), name (text, unique)
- `skills` table: id (uuid), name (text, unique), category_id (uuid FK to skill_categories)
- `employees` table: id (uuid), first_name (text), last_name (text)
- `employee_skills` table: id (uuid), employee_id (uuid FK), skill_id (uuid FK), years_of_experience (real)

### Idempotency
All generated SQL scripts use `ON CONFLICT` clauses to ensure they can be run multiple times without errors:
- For categories and skills: `ON CONFLICT (name) DO NOTHING`
- For employee_skills: `ON CONFLICT (employee_id, skill_id) DO UPDATE SET years_of_experience = EXCLUDED.years_of_experience`

### Data Handling
- Skip employees with zero or empty experience values for a skill
- Handle numeric experience values (can be integers or decimals like 0.5, 1.7, etc.)
- Clean up skill names by trimming whitespace and removing line breaks
- Skip rows where employee lookup fails (employee not found in database)
- Handle multi-line CSV cells properly
- **CRITICAL: Deduplicate employee-skill pairs before generating SQL** - Keep the highest years value when duplicates exist
- **CRITICAL: Automatically correct employee name spellings** - Use fuzzy matching to find and correct minor spelling differences (e.g., "Victoriia" → "Viktoriia")
- **CRITICAL: Trim all employee names** - Remove leading/trailing whitespace from all names
- **CRITICAL: Use TRIM() in SQL WHERE clauses** - Database may have extra spaces (e.g., "Yurii   Solokha" with 3 spaces)
- **CRITICAL: Skip employees with no match** - If no close match found in database, exclude those records and report them

### Error Prevention
- Always use subqueries for foreign key lookups rather than hardcoding UUIDs
- Validate that category names match between row 2 and skill lookups
- Report any employees from CSV not found in the database
- Report any skills that couldn't be mapped to categories

**CRITICAL - Prevent Duplicate Key Violations:**
1. Before generating the employee_skills INSERT, deduplicate all records by (first_name, last_name, skill)
2. When duplicates exist, keep the record with the highest years_of_experience value
3. This prevents: `ON CONFLICT DO UPDATE command cannot affect row a second time`

**CRITICAL - Automatic Name Correction:**
1. Before generating SQL, validate ALL employees exist in the database
2. For employees not found by exact match:
   - Use fuzzy matching (Levenshtein distance or similar) to find close matches in database
   - If a close match is found (e.g., "Victoriia" → "Viktoriia"), automatically use the database spelling
   - If no close match is found, skip the employee entirely
3. Generate a report showing:
   - Employees with automatic corrections applied: "CSV name → Database name"
   - Employees skipped (no match found): List with number of skills skipped
4. This prevents: `null value in column "employee_id" violates not-null constraint`

## Output Format

The skill produces three SQL scripts plus one report file:

**1_insert_categories.sql**
```sql
-- Insert missing skill categories
INSERT INTO skill_categories (name) 
VALUES ('.NET'), ('Front-end'), ('Java')
ON CONFLICT (name) DO NOTHING;
```

**2_insert_skills.sql**
```sql
-- Insert missing skills with category mapping
INSERT INTO skills (name, category_id)
VALUES 
  ('C#', (SELECT id FROM skill_categories WHERE name = '.NET')),
  ('ASP.net', (SELECT id FROM skill_categories WHERE name = '.NET')),
  ('JavaScript', (SELECT id FROM skill_categories WHERE name = 'Front-end'))
ON CONFLICT (name) DO NOTHING;
```

**3_insert_employee_skills.sql**
```sql
-- Insert employee skills
-- Records have been deduplicated and filtered for valid employees only
-- Using TRIM() in WHERE clause to handle whitespace in database
INSERT INTO employee_skills (employee_id, skill_id, years_of_experience)
VALUES 
  (
    (SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
    (SELECT id FROM skills WHERE name = 'C#'),
    5
  ),
  (
    (SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
    (SELECT id FROM skills WHERE name = 'JavaScript'),
    6
  )
ON CONFLICT (employee_id, skill_id) DO UPDATE 
SET years_of_experience = EXCLUDED.years_of_experience;
```

## Execution Steps

When the user provides a CSV file:

1. **Parse the CSV structure**
   - Read the file and validate format
   - Extract category names from row 2
   - Extract skill names from subsequent rows (handling multi-line cells)
   - Map each skill to its category based on column positions

2. **Query existing data from Supabase**
   - Fetch all existing skill_categories
   - Fetch all existing skills with their categories
   - Fetch all employees (id, first_name, last_n