employee-skills-importer
Parse employee skills CSV files, identify skill categories and individual skills, look up employee IDs from an employees table, and generate idempotent SQL INSERT statements for skill_categories, skills, and employee_skills tables.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install clawskills:clawskills~inna-demidova-employee-skills-importercURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~inna-demidova-employee-skills-importer/file -o inna-demidova-employee-skills-importer.md# Employee Skills Importer
This skill automates the process of importing employee skills from CSV files into a Supabase database. It parses the CSV, checks what already exists in the database, and generates idempotent SQL scripts to insert missing data.
## Overview
The skill performs a 3-step process:
1. **Identify and insert missing skill categories** - Extract categories from CSV headers, check database, generate INSERT script
2. **Identify and insert missing skills** - Extract skills with their categories, check database, generate INSERT script
3. **Generate employee_skills INSERT script** - Map employees by name, link skills, create final INSERT statements
## CSV Format Requirements
The CSV must have:
- **Row 1**: Empty or metadata (ignored)
- **Row 2**: Skill category names spanning multiple columns
- **Row 3+**: Individual skill names (column headers, may span multiple rows due to line breaks)
- **Employee data rows**: Employee data with First Name, Last Name in first two columns, followed by skill experience values
Example structure:
```
,,,,,,.NET,,,,,Front-end,,,Java,,,
First Name,Last Name,Full Name,Unit,...,C#,ASP.net,MVC,...,JavaScript,HTML,CSS,...,Java,Spring,...
John,Doe,John Doe,Unit 1,...,5,4,3,...,6,6,5,...,0,0,...
```
## Workflow
### Step 1: Skill Categories
1. Parse row 2 to extract unique category names
2. Query the database to check existing categories:
```sql
SELECT name FROM skill_categories
```
3. Generate idempotent INSERT for missing categories:
```sql
INSERT INTO skill_categories (name)
VALUES ('Category1'), ('Category2'), ('Category3')
ON CONFLICT (name) DO NOTHING;
```
### Step 2: Skills
1. Parse skill name rows and map to categories from row 2
2. Query database for existing skills:
```sql
SELECT s.name, sc.name as category_name
FROM skills s
LEFT JOIN skill_categories sc ON s.category_id = sc.id
```
3. For each skill to insert:
- Find the category_id using a subquery
- Generate idempotent INSERT:
```sql
INSERT INTO skills (name, category_id)
VALUES
('C#', (SELECT id FROM skill_categories WHERE name = '.NET')),
('JavaScript', (SELECT id FROM skill_categories WHERE name = 'Front-end'))
ON CONFLICT (name) DO NOTHING;
```
### Step 3: Employee Skills
1. Parse employee rows (first_name, last_name, skill values)
2. Query employees table to get employee IDs:
```sql
SELECT id, first_name, last_name FROM employees
```
3. For each employee, for each skill with non-zero experience:
- Look up employee_id by matching first_name + last_name
- Look up skill_id using subquery
- **CRITICAL: Use TRIM() in WHERE clause to handle whitespace variations in database**
- Generate INSERT:
```sql
INSERT INTO employee_skills (employee_id, skill_id, years_of_experience)
VALUES
(
(SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
(SELECT id FROM skills WHERE name = 'C#'),
5
)
ON CONFLICT (employee_id, skill_id) DO UPDATE
SET years_of_experience = EXCLUDED.years_of_experience;
```
## Important Notes
### Database Schema
- `skill_categories` table: id (uuid), name (text, unique)
- `skills` table: id (uuid), name (text, unique), category_id (uuid FK to skill_categories)
- `employees` table: id (uuid), first_name (text), last_name (text)
- `employee_skills` table: id (uuid), employee_id (uuid FK), skill_id (uuid FK), years_of_experience (real)
### Idempotency
All generated SQL scripts use `ON CONFLICT` clauses to ensure they can be run multiple times without errors:
- For categories and skills: `ON CONFLICT (name) DO NOTHING`
- For employee_skills: `ON CONFLICT (employee_id, skill_id) DO UPDATE SET years_of_experience = EXCLUDED.years_of_experience`
### Data Handling
- Skip employees with zero or empty experience values for a skill
- Handle numeric experience values (can be integers or decimals like 0.5, 1.7, etc.)
- Clean up skill names by trimming whitespace and removing line breaks
- Skip rows where employee lookup fails (employee not found in database)
- Handle multi-line CSV cells properly
- **CRITICAL: Deduplicate employee-skill pairs before generating SQL** - Keep the highest years value when duplicates exist
- **CRITICAL: Automatically correct employee name spellings** - Use fuzzy matching to find and correct minor spelling differences (e.g., "Victoriia" → "Viktoriia")
- **CRITICAL: Trim all employee names** - Remove leading/trailing whitespace from all names
- **CRITICAL: Use TRIM() in SQL WHERE clauses** - Database may have extra spaces (e.g., "Yurii Solokha" with 3 spaces)
- **CRITICAL: Skip employees with no match** - If no close match found in database, exclude those records and report them
### Error Prevention
- Always use subqueries for foreign key lookups rather than hardcoding UUIDs
- Validate that category names match between row 2 and skill lookups
- Report any employees from CSV not found in the database
- Report any skills that couldn't be mapped to categories
**CRITICAL - Prevent Duplicate Key Violations:**
1. Before generating the employee_skills INSERT, deduplicate all records by (first_name, last_name, skill)
2. When duplicates exist, keep the record with the highest years_of_experience value
3. This prevents: `ON CONFLICT DO UPDATE command cannot affect row a second time`
**CRITICAL - Automatic Name Correction:**
1. Before generating SQL, validate ALL employees exist in the database
2. For employees not found by exact match:
- Use fuzzy matching (Levenshtein distance or similar) to find close matches in database
- If a close match is found (e.g., "Victoriia" → "Viktoriia"), automatically use the database spelling
- If no close match is found, skip the employee entirely
3. Generate a report showing:
- Employees with automatic corrections applied: "CSV name → Database name"
- Employees skipped (no match found): List with number of skills skipped
4. This prevents: `null value in column "employee_id" violates not-null constraint`
## Output Format
The skill produces three SQL scripts plus one report file:
**1_insert_categories.sql**
```sql
-- Insert missing skill categories
INSERT INTO skill_categories (name)
VALUES ('.NET'), ('Front-end'), ('Java')
ON CONFLICT (name) DO NOTHING;
```
**2_insert_skills.sql**
```sql
-- Insert missing skills with category mapping
INSERT INTO skills (name, category_id)
VALUES
('C#', (SELECT id FROM skill_categories WHERE name = '.NET')),
('ASP.net', (SELECT id FROM skill_categories WHERE name = '.NET')),
('JavaScript', (SELECT id FROM skill_categories WHERE name = 'Front-end'))
ON CONFLICT (name) DO NOTHING;
```
**3_insert_employee_skills.sql**
```sql
-- Insert employee skills
-- Records have been deduplicated and filtered for valid employees only
-- Using TRIM() in WHERE clause to handle whitespace in database
INSERT INTO employee_skills (employee_id, skill_id, years_of_experience)
VALUES
(
(SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
(SELECT id FROM skills WHERE name = 'C#'),
5
),
(
(SELECT id FROM employees WHERE TRIM(first_name) = 'John' AND TRIM(last_name) = 'Doe'),
(SELECT id FROM skills WHERE name = 'JavaScript'),
6
)
ON CONFLICT (employee_id, skill_id) DO UPDATE
SET years_of_experience = EXCLUDED.years_of_experience;
```
## Execution Steps
When the user provides a CSV file:
1. **Parse the CSV structure**
- Read the file and validate format
- Extract category names from row 2
- Extract skill names from subsequent rows (handling multi-line cells)
- Map each skill to its category based on column positions
2. **Query existing data from Supabase**
- Fetch all existing skill_categories
- Fetch all existing skills with their categories
- Fetch all employees (id, first_name, last_name)
3. **Generate Script 1: Categories**
- Compare CSV categories against database
-