Chemical Structure Converter

ClawSkills 作者 aipoch-ai v0.1.0

Convert between IUPAC names, SMILES strings, and molecular formulas for chemical compounds. Supports structure validation, identifier interconversion, and cheminformatics data preparation for drug discovery and chemical research workflows.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install clawskills:aipoch-ai~chemical-structure-converter
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aaipoch-ai~chemical-structure-converter/file -o chemical-structure-converter.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/dcfbd32ef78765b5ac843615bb5076312a01db59
# Chemical Structure Converter

Interconvert between different chemical structure representations including IUPAC names, SMILES strings, molecular formulas, and common names. Essential for cheminformatics workflows, database standardization, and compound registration in drug discovery and chemical research.

**Key Capabilities:**
- **Multi-Format Conversion**: Convert between IUPAC names, SMILES, InChI, and molecular formulas
- **SMILES Validation**: Validate SMILES syntax for structural correctness
- **Batch Processing**: Process multiple compounds for database standardization
- **Identifier Lookup**: Retrieve all available identifiers for known compounds
- **Structure Standardization**: Normalize chemical representations for consistency

---

## When to Use

**✅ Use this skill when:**
- **Standardizing chemical databases** with mixed naming conventions
- Preparing **compound libraries** for virtual screening or cheminformatics analysis
- **Converting structures** from publications (IUPAC names) to machine-readable formats (SMILES)
- **Validating SMILES strings** before using in computational chemistry tools
- **Registering new compounds** in chemical inventory systems
- **Matching compounds** across different databases with different identifier types
- Creating **structure-activity relationship (SAR)** tables with consistent formatting

**❌ Do NOT use when:**
- Needing **3D structure generation** or conformer search → Use molecular modeling software (RDKit, OpenBabel)
- Performing **quantum chemistry calculations** → Use Gaussian, ORCA, or similar packages
- Working with **reaction schemes** or multi-step synthesis → Use reaction planning tools
- Requiring **patent structure searching** → Use specialized patent databases (SciFinder, STN)
- Converting **biological sequences** (DNA, protein) → Use bioinformatics tools
- Needing **spectral data prediction** (NMR, MS) → Use specialized prediction software

**Related Skills:**
- **上游 (Upstream)**: `chemical-storage-sorter`, `adme-property-predictor`
- **下游 (Downstream)**: `molecular-docking-predictor`, `bio-ontology-mapper`

---

## Integration with Other Skills

**Upstream Skills:**
- `chemical-storage-sorter`: Classify chemicals by hazard group before storage registration
- `adme-property-predictor`: Convert structures to standardized formats before ADME prediction
- `safety-data-sheet-reader`: Extract chemical names from SDS for structure lookup

**Downstream Skills:**
- `molecular-docking-predictor`: Convert compound libraries to 3D structures for docking
- `bio-ontology-mapper`: Map chemical structures to standardized ontologies (ChEBI, PubChem)
- `lab-inventory-tracker`: Register standardized chemical identifiers in inventory

**Complete Workflow:**
```
Literature/Patent → chemical-structure-converter → adme-property-predictor → molecular-docking-predictor → Hit Selection
```

---

## Core Capabilities

### 1. Multi-Format Chemical Identifier Conversion

Convert chemical structures between different representation formats for database interoperability.

```python
from scripts.main import ChemicalStructureConverter

converter = ChemicalStructureConverter()

# Convert compound name to all available identifiers
chemical_name = "aspirin"
data = converter.name_to_identifiers(chemical_name)

if data:
    print(f"Compound: {chemical_name}")
    print(f"IUPAC Name: {data['iupac']}")
    print(f"SMILES: {data['smiles']}")
    print(f"Formula: {data['formula']}")
    print(f"Molecular Weight: {data['mw']} g/mol")

# Output:
# Compound: aspirin
# IUPAC Name: 2-acetoxybenzoic acid
# SMILES: CC(=O)Oc1ccccc1C(=O)O
# Formula: C9H8O4
# Molecular Weight: 180.16 g/mol
```

**Supported Conversions:**

| From → To | Method | Use Case |
|-----------|--------|----------|
| **Name → SMILES** | Database lookup | Literature to database |
| **SMILES → IUPAC** | Structure recognition | Machine to human readable |
| **IUPAC → SMILES** | Name parsing | Chemical registration |
| **SMILES → Formula** | Atom counting | Quick MW calculation |

**Best Practices:**
- ✅ **Use canonical SMILES** for database storage (ensures uniqueness)
- ✅ **Validate conversions** with known reference compounds
- ✅ **Preserve stereochemistry** during conversions (use @/@@ in SMILES)
- ✅ **Check tautomeric forms** - different representations may exist

**Common Issues and Solutions:**

**Issue: Compound not in local database**
- Symptom: Returns "Unknown structure" for valid compounds
- Solution: Use external databases (PubChem, ChemSpider APIs) for lookup; add common compounds to local database

**Issue: Multiple valid SMILES for same compound**
- Symptom: Different SMILES strings represent same molecule
- Solution: Use canonical SMILES generation (requires RDKit or similar)

### 2. SMILES String Validation

Validate SMILES syntax to ensure structural integrity before computational processing.

```python
from scripts.main import ChemicalStructureConverter

converter = ChemicalStructureConverter()

# Validate SMILES strings
smiles_examples = [
    "CC(=O)Oc1ccccc1C(=O)O",  # Aspirin - valid
    "CCO",                     # Ethanol - valid
    "C(=O",                    # Invalid - unclosed parenthesis
    "C1CCCCC",                 # Invalid - unclosed ring
]

for smiles in smiles_examples:
    is_valid, message = converter.validate_smiles(smiles)
    status = "✅ Valid" if is_valid else "❌ Invalid"
    print(f"{smiles:<30} {status}: {message}")

# Output:
# CC(=O)Oc1ccccc1C(=O)O        ✅ Valid: Valid SMILES syntax
# CCO                          ✅ Valid: Valid SMILES syntax
# C(=O                         ❌ Invalid: Mismatched parentheses
# C1CCCCC                      ❌ Invalid: Ring closure error
```

**Validation Checks:**

| Check | Description | Example Error |
|-------|-------------|---------------|
| **Parentheses** | Matching ( and ) | `C(=O` - missing closing |
| **Brackets** | Matching [ and ] | `[Na+` - missing closing |
| **Ring closures** | Matching digits | `C1CC` - ring not closed |
| **Atom validity** | Recognized elements | `@` - invalid character |
| **Valence** | Chemical validity | `C(C)(C)(C)(C)C` - 5 bonds to C |

**Best Practices:**
- ✅ **Always validate** SMILES before using in downstream tools
- ✅ **Check for aromaticity** (lowercase c,n,o in SMILES)
- ✅ **Verify stereochemistry** (@ symbols for chirality)
- ✅ **Use explicit hydrogens** when ambiguity exists

**Common Issues and Solutions:**

**Issue: Valid syntax but chemically impossible**
- Symptom: SMILES passes validation but structure is unrealistic
- Solution: Use chemical validation tools (RDKit SanitizeMol) for deeper checks

**Issue: Tautomeric ambiguity**
- Symptom: Keto/enol forms represented differently
- Solution: Use tautomer canonicalization if consistency required

### 3. Batch Structure Processing

Process multiple chemical structures simultaneously for database standardization.

```python
from scripts.main import ChemicalStructureConverter

converter = ChemicalStructureConverter()

# Batch process compound list
compound_list = [
    "aspirin",
    "caffeine", 
    "glucose",
    "ethanol",
    "unknown_compound"
]

results = []
for compound in compound_list:
    data = converter.name_to_identifiers(compound)
    if data:
        results.append({
            'name': compound,
            'iupac': data['iupac'],
            'smiles': data['smiles'],
            'formula': data['formula'],
            'mw': data['mw']
        })
    else:
        print(f"⚠️  Warning: '{compound}' not found in database")

# Display results table
print("\n" + "="*80)
print(f"{'Name':<20} {'Formula':<15} {'MW':<10} {'SMILES'}")
print("="*80)
for r in results:
    print(f"{r['name']:<20} {r['formula']:<15} {r['mw']:<10.2f} {r['smiles'][:40]}")
```

**Best Practices:**
- ✅ **Process in batches** of 100-1000 for large databases
- ✅ **Log missing compounds** for manual review
- ✅ **Export to CSV** for Excel/chemoinformatics tools
- ✅ **Include CAS numbers** when available for verific