HomePage | QuickStart | Navigate

The Set|File Site

Page generated from: QSet_Implementation_Guide.md



QSet Parser Implementation Guide

Version: 4.2 Target: Language-agnostic implementation guide Spec Reference: QSet_Spec_v4_2.md

---

Table of Contents

---

Overview

What is QSet?

Q-Set is a simplified, human-readable data format for configuration files, tables, and structured text. It's a subset of the full SET-File specification (v4.2), designed for simplicity and ease of implementation.

Key Design Principles

File Extension

---

Architecture

Three-Tier Library Design


┌─────────────────────────────────────┐
│   QSet_Read (Basic Parser)          │  ← Start here
│   - Parse to indexed arrays          │
│   - No field interpretation          │
│   - Minimal overhead                 │
└─────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│   QSet_Read_Table (Table-Aware)     │  ← Add table support
│   - Recognizes {field|definitions}  │
│   - Returns associative arrays      │
│   - Column-based access             │
└─────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│   QSet_CRUD (Full Operations)       │  ← Complete functionality
│   - Create, Update, Delete          │
│   - Build files from scratch        │
│   - Nested function architecture    │
└─────────────────────────────────────┘

Why This Architecture?

---

Core Parsing Algorithm

State Machine


State: OUTSIDE_GROUP
  - Ignore lines (comments)
  - Detect [GROUPNAME] → enter REGULAR_GROUP
  - Detect [{GROUPNAME}] → enter TEXT_GROUP

State: REGULAR_GROUP
  - Parse lines: split on |, handle escapes, trim fields
  - Detect empty line → exit to OUTSIDE_GROUP
  - Detect [EOG] → exit to OUTSIDE_GROUP
  - Detect new group → exit and enter new group

State: TEXT_GROUP
  - Capture lines as-is (no parsing)
  - Do NOT exit on empty lines
  - Detect [EOG] at line start → exit to OUTSIDE_GROUP
  - Detect new group → exit and enter new group

Pseudocode


groups = {}
current_group = null
in_text_group = false
line_num = 0

for each line in file:
    line_num++

    # Check for [EOG] FIRST (before other patterns)
    if line matches "^\[EOG\]$":
        current_group = null
        in_text_group = false
        continue

    # Check for text group start
    if line matches "^\[\{([A-Za-z0-9_-]+)\}\]$":
        group_name = captured_name
        groups[group_name] = {type: "text", content: "", line_start: line_num}
        current_group = group_name
        in_text_group = true
        continue

    # Check for regular group start
    if line matches "^\[([A-Za-z0-9_-]+)\]$":
        group_name = captured_name
        groups[group_name] = {type: "regular", rows: [], line_start: line_num}
        current_group = group_name
        in_text_group = false
        continue

    # Handle content based on state
    if current_group is null:
        # Outside groups - ignore (comment)
        continue

    if in_text_group:
        # Append to text content
        append line to groups[current_group].content
    else:
        # Regular group
        if line is empty:
            current_group = null
            continue

        fields = parse_line(line)
        append {line: line_num, data: fields} to groups[current_group].rows

return groups

Line Parsing (Regular Groups)


function parse_line(line):
    fields = []
    current = ""
    escaped = false

    for each char in line:
        if escaped:
            if char == '|':
                current += '|'              # Escaped pipe
            else:
                current += '\\' + char      # Not a pipe, keep backslash
            escaped = false
        else if char == '\\':
            escaped = true
        else if char == '|':
            fields.append(trim(current))
            current = ""
        else:
            current += char

    # Handle trailing backslash
    if escaped:
        current += '\\'

    fields.append(trim(current))
    return fields

---

Data Structures

Internal Representation

javascript
{
  groups: {
    "GROUPNAME": {
      type: "regular",           // or "text"
      line_start: 10,            // Line number where group starts
      line_end: 15,              // Line number where group ends (optional)

      // For regular groups:
      rows: [
        {
          line: 11,              // Original line number
          data: ["field1", "field2", "field3"]  // Parsed fields
        },
        ...
      ],

      // For text groups:
      content: "raw text content\nwith newlines"
    }
  }
}

Why Track Line Numbers?

---

API Design Patterns

Naming Conventions

Function Prefixes:

Return Values:

The Peg Concept

Many operations accept a "peg" parameter for flexible targeting:

python
# Peg as integer → row number
qset_crud_set_field(file, 'USERS', 1, 2, 'new@email.com')
#                                   ↑
#                               row number

# Peg as string → search first field
qset_crud_set_field(file, 'USERS', '2', 2, 'new@email.com')
#                                   ↑
#                              key to match

Benefits:

Nested Function Architecture (CRUD)

Layer 1: Find Functions (return line numbers)

_qset_find_line_by_row(qset, group, row_num) → line_number
_qset_find_line_by_key(file, group, key) → line_number (optimized)
_qset_find_line_by_field(qset, group, field, value) → line_number
Layer 2: Update Functions (operate on line numbers)

_qset_update_field_at_line(file, line_num, field_num, value)
_qset_update_line_data(file, line_num, new_data)
_qset_delete_line_at(file, line_num)
_qset_insert_line_at(file, line_num, data)
Layer 3: Public API (combines find + update)

qset_crud_set_field(file, group, peg, field_num, value) {
    line = find_line(peg)
    update_field_at_line(line, field_num, value)
}

Benefits:

---

Implementation Phases

Phase 1: Basic Parser (QSet_Read)

Goal: Parse Q-Set files into simple data structures

Deliverables:

Test Coverage: Estimated Effort: 4-8 hours

---

Phase 2: Table Support (QSet_Read_Table)

Goal: Add field definition awareness

Deliverables:

Test Coverage: Estimated Effort: 2-4 hours

---

Phase 3: CRUD Operations (QSet_CRUD)

Goal: Modify existing files and build from scratch

Core CRUD Deliverables:

Build from Scratch Deliverables: Test Coverage: Estimated Effort: 8-16 hours

---

Testing & Validation

Test Files Provided

    • test_data_complete.qset - Comprehensive test data
- 28 groups covering all features and edge cases
    • test_data_expected.qset - Expected results (human-readable)
- QSet format describing parser output
    • test_data_expected.json - Expected results (for JSON addicts)
- Same data in JSON format

Test Strategy

Unit Tests:

Integration Tests: Validation Tests:

Common Edge Cases

    • Empty fields - a||c should parse as ["a", "", "c"]
    • Trailing empty fields - a|b| should be ["a", "b", ""]
    • Escaped pipes - a\|b|c should be ["a|b", "c"]
    • Backslash before pipe - test\ |data preserves backslash
    • [EOG] detection - Only at line start, not mid-line
    • Group name validation - Only alphanumeric, _, -
    • Unicode content - Properly handle UTF-8
    • Whitespace trimming - After split, preserve internal
---

Language-Specific Notes

Python

python
# Use split() with escape handling
# Dict for groups
# List comprehensions for filtering

groups = {}
with open('file.qset', 'r', encoding='utf-8') as f:
    for line_num, line in enumerate(f, start=1):
        # Process line

Libraries:

---

JavaScript/Node.js

javascript
// Use fs.readFileSync or fs.promises
// Object for groups
// Array methods (map, filter, find)

const groups = {};
const lines = content.split(/\r\n|\r|\n/);
lines.forEach((line, idx) => {
    const lineNum = idx + 1;
    // Process line
});

Considerations:

---

Java

java
// Use BufferedReader
// HashMap for groups
// ArrayList for rows

Map<String, Group> groups = new HashMap<>();
try (BufferedReader reader = new BufferedReader(
        new FileReader(filename, StandardCharsets.UTF_8))) {
    String line;
    int lineNum = 0;
    while ((line = reader.readLine()) != null) {
        lineNum++;
        // Process line
    }
}

---

C#

csharp
// Use File.ReadLines() or StreamReader
// Dictionary for groups
// List<T> for rows

var groups = new Dictionary<string, Group>();
int lineNum = 0;
foreach (string line in File.ReadLines(filename, Encoding.UTF8))
{
    lineNum++;
    // Process line
}

---

Go

go
// Use bufio.Scanner
// map for groups
// slices for rows

groups := make(map[string]*Group)
scanner := bufio.NewScanner(file)
lineNum := 0
for scanner.Scan() {
    lineNum++
    line := scanner.Text()
    // Process line
}

---

Ruby

ruby
# Use File.foreach
# Hash for groups
# Arrays for rows

groups = {}
File.foreach(filename, encoding: 'utf-8').with_index(1) do |line, line_num|
    # Process line
end

---

Performance Considerations

Parsing Performance

Large Files: Optimization: First Field Search

# Instead of parsing every line:
for line in lines:
    first_pipe = line.find('|')
    first_field = line[0:first_pipe].strip()
    if first_field == search_value:
        # Found it! Now parse full line

Write Operations:

---

Common Pitfalls

❌ Don't Do This:

    • Checking [EOG] after regular group pattern
- [EOG] will match \[([A-Za-z0-9_-]+)\] and create a group named "EOG"
    • Escaping all backslashes
- C:\Windows becomes C:Windows
    • Not trimming fields
- | value | should become "value", not " value "
    • Treating text groups like regular groups
- Text groups have no delimiter processing
    • Forgetting empty line endings
- Regular groups end on empty lines ---

Reference Implementation

PHP Implementation Available:

Use as reference for: ---

Getting Help

Resources:

Community: ---

License

QSet Specification: CC BY 4.0 This Implementation Guide: CC BY 4.0

---

_End of QSet Implementation Guide v4.2_



Page last modified on January 05, 2026, at 03:55 PM