Page generated from: QSet_Implementation_Guide.md

QSet Parser Implementation Guide

Version: 4.2 Target: Language-agnostic implementation guide Spec Reference: QSet_Spec_v4_2.md

---

Overview
Architecture
Core Parsing Algorithm
Data Structures
API Design Patterns
Implementation Phases
Testing & Validation
Language-Specific Notes

---

Overview

What is QSet?

Q-Set is a simplified, human-readable data format for configuration files, tables, and structured text. It's a subset of the full SET-File specification (v4.2), designed for simplicity and ease of implementation.

Key Design Principles

Human-readable first - Files should be easy to read and edit
Minimal escaping - Only \| needs escaping in regular groups
Line-based parsing - Simple state machine, no complex grammar
Default delimiters - Pipe | for fields, no configuration needed

File Extension

Primary: .qset
Alternative: .set (when using Q-Set conventions)

---

Architecture

Three-Tier Library Design


┌─────────────────────────────────────┐
│   QSet_Read (Basic Parser)          │  ← Start here
│   - Parse to indexed arrays          │
│   - No field interpretation          │
│   - Minimal overhead                 │
└─────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│   QSet_Read_Table (Table-Aware)     │  ← Add table support
│   - Recognizes {field|definitions}  │
│   - Returns associative arrays      │
│   - Column-based access             │
└─────────────────────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│   QSet_CRUD (Full Operations)       │  ← Complete functionality
│   - Create, Update, Delete          │
│   - Build files from scratch        │
│   - Nested function architecture    │
└─────────────────────────────────────┘

Why This Architecture?

Incremental complexity - Implement basic features first, add advanced later
User choice - Users include only what they need
Clear separation - Each library has focused responsibility
Reusable components - Higher libraries build on lower ones

---

Core Parsing Algorithm

State Machine


State: OUTSIDE_GROUP
  - Ignore lines (comments)
  - Detect [GROUPNAME] → enter REGULAR_GROUP
  - Detect [{GROUPNAME}] → enter TEXT_GROUP

State: REGULAR_GROUP
  - Parse lines: split on |, handle escapes, trim fields
  - Detect empty line → exit to OUTSIDE_GROUP
  - Detect [EOG] → exit to OUTSIDE_GROUP
  - Detect new group → exit and enter new group

State: TEXT_GROUP
  - Capture lines as-is (no parsing)
  - Do NOT exit on empty lines
  - Detect [EOG] at line start → exit to OUTSIDE_GROUP
  - Detect new group → exit and enter new group

Pseudocode


groups = {}
current_group = null
in_text_group = false
line_num = 0

for each line in file:
    line_num++

    # Check for [EOG] FIRST (before other patterns)
    if line matches "^\[EOG\]$":
        current_group = null
        in_text_group = false
        continue

    # Check for text group start
    if line matches "^\[\{([A-Za-z0-9_-]+)\}\]$":
        group_name = captured_name
        groups[group_name] = {type: "text", content: "", line_start: line_num}
        current_group = group_name
        in_text_group = true
        continue

    # Check for regular group start
    if line matches "^\[([A-Za-z0-9_-]+)\]$":
        group_name = captured_name
        groups[group_name] = {type: "regular", rows: [], line_start: line_num}
        current_group = group_name
        in_text_group = false
        continue

    # Handle content based on state
    if current_group is null:
        # Outside groups - ignore (comment)
        continue

    if in_text_group:
        # Append to text content
        append line to groups[current_group].content
    else:
        # Regular group
        if line is empty:
            current_group = null
            continue

        fields = parse_line(line)
        append {line: line_num, data: fields} to groups[current_group].rows

return groups

Line Parsing (Regular Groups)


function parse_line(line):
    fields = []
    current = ""
    escaped = false

    for each char in line:
        if escaped:
            if char == '|':
                current += '|'              # Escaped pipe
            else:
                current += '\\' + char      # Not a pipe, keep backslash
            escaped = false
        else if char == '\\':
            escaped = true
        else if char == '|':
            fields.append(trim(current))
            current = ""
        else:
            current += char

    # Handle trailing backslash
    if escaped:
        current += '\\'

    fields.append(trim(current))
    return fields

---

Data Structures

Internal Representation

javascript
{
  groups: {
    "GROUPNAME": {
      type: "regular",           // or "text"
      line_start: 10,            // Line number where group starts
      line_end: 15,              // Line number where group ends (optional)

      // For regular groups:
      rows: [
        {
          line: 11,              // Original line number
          data: ["field1", "field2", "field3"]  // Parsed fields
        },
        ...
      ],

      // For text groups:
      content: "raw text content\nwith newlines"
    }
  }
}

Why Track Line Numbers?

Error reporting - Show users which line has problems
CRUD operations - Know exactly where to update/delete
Debugging - Trace parsed data back to source

---

API Design Patterns

Naming Conventions

Function Prefixes:

qset_load() - Parse file
qset_get_() - Retrieve data

qset_find_() - Search operations
qset_crud_() - Modification operations

_qset_() - Internal/private functions

Return Values:

null - Item not found
false - Operation failed / error
Data - Success

The Peg Concept

Many operations accept a "peg" parameter for flexible targeting:

python
# Peg as integer → row number
qset_crud_set_field(file, 'USERS', 1, 2, 'new@email.com')
#                                   ↑
#                               row number

# Peg as string → search first field
qset_crud_set_field(file, 'USERS', '2', 2, 'new@email.com')
#                                   ↑
#                              key to match

Benefits:

Single function handles multiple use cases
Intuitive: numbers = position, strings = search
Type checking determines behavior

Nested Function Architecture (CRUD)

Layer 1: Find Functions (return line numbers)


_qset_find_line_by_row(qset, group, row_num) → line_number
_qset_find_line_by_key(file, group, key) → line_number (optimized)
_qset_find_line_by_field(qset, group, field, value) → line_number

Layer 2: Update Functions (operate on line numbers)


_qset_update_field_at_line(file, line_num, field_num, value)
_qset_update_line_data(file, line_num, new_data)
_qset_delete_line_at(file, line_num)
_qset_insert_line_at(file, line_num, data)

Layer 3: Public API (combines find + update)


qset_crud_set_field(file, group, peg, field_num, value) {
    line = find_line(peg)
    update_field_at_line(line, field_num, value)
}

Benefits:

Reusable - Find functions used by all operations
Testable - Each layer tested independently
Clean - Public API is simple, complexity hidden
Efficient - Find once, update once

---

Implementation Phases

Phase 1: Basic Parser (QSet_Read)

Goal: Parse Q-Set files into simple data structures

Deliverables:

qset_load(filename) - Parse file
qset_parse_string(content) - Parse from string
qset_get_row(qset, group, row_num) - Get row
qset_get_group(qset, group) - Get entire group
qset_get_text(qset, group) - Get text content
qset_find(qset, group, value) - Search for value
qset_list_groups(qset) - List all groups

Test Coverage:

Regular groups with pipes
Text groups with raw content
Escape sequences (\|)
Empty lines between groups
Comments outside groups

Estimated Effort: 4-8 hours

---

Phase 2: Table Support (QSet_Read_Table)

Goal: Add field definition awareness

Deliverables:

qset_table_has_fields(qset, group) - Check for field def
qset_table_get_fields(qset, group) - Get field names
qset_table_get_row(qset, group, row) - Associative array
qset_table_get_cell(qset, group, row, field) - Specific cell
qset_table_get_column(qset, group, field) - Column values
qset_table_find_by_field(qset, group, field, value) - Search

Test Coverage:

Groups with {field|definitions}
Associative array returns
Column extraction
Field-based searching

Estimated Effort: 2-4 hours

---

Phase 3: CRUD Operations (QSet_CRUD)

Goal: Modify existing files and build from scratch

Core CRUD Deliverables:

qset_crud_set_field(file, group, peg, field, value) - Update field
qset_crud_update_row(file, group, peg, data) - Update row
qset_crud_delete_row(file, group, peg) - Delete row
qset_crud_add_row(file, group, data) - Add row
qset_crud_get_fields(file, group) - Get field defs
qset_crud_set_fields(file, group, fields) - Set field defs
qset_crud_delete_fields(file, group) - Remove field defs

Build from Scratch Deliverables:

qset_crud_create_file(file) - New empty file
qset_crud_add_group(file, name, type) - Add group
qset_crud_set_text(file, group, content) - Set text content
qset_crud_add_comment(file, comment, peg1, peg2) - Add comments

Test Coverage:

Field updates (by row and by key)
Row additions and deletions
Field definition management
Building complete files from scratch
Comment positioning (header, footer, attached, floating)

Estimated Effort: 8-16 hours

---

Testing & Validation

Test Files Provided

test_data_complete.qset - Comprehensive test data

- 28 groups covering all features and edge cases

Regular groups, text groups, empty groups
Escaped content, unicode, special characters

test_data_expected.qset - Expected results (human-readable)

- QSet format describing parser output

Field-by-field validation data

test_data_expected.json - Expected results (for JSON addicts)

- Same data in JSON format

Easier for automated testing

Test Strategy

Unit Tests:

Test each function independently
Cover edge cases (empty fields, escapes, etc.)
Verify line number tracking

Integration Tests:

Parse complete test file
Verify all groups detected
Check data integrity

Validation Tests:

Compare parser output to expected results
Verify field counts, row counts
Check escaped content parsing

Common Edge Cases

Empty fields - a||c should parse as ["a", "", "c"]
Trailing empty fields - a|b| should be ["a", "b", ""]
Escaped pipes - a\|b|c should be ["a|b", "c"]
Backslash before pipe - test\ |data preserves backslash
[EOG] detection - Only at line start, not mid-line
Group name validation - Only alphanumeric, _, -
Unicode content - Properly handle UTF-8
Whitespace trimming - After split, preserve internal

---

Language-Specific Notes

Python

python
# Use split() with escape handling
# Dict for groups
# List comprehensions for filtering

groups = {}
with open('file.qset', 'r', encoding='utf-8') as f:
    for line_num, line in enumerate(f, start=1):
        # Process line

Libraries:

re for regex matching
json for JSON output (if needed)

---

JavaScript/Node.js

javascript
// Use fs.readFileSync or fs.promises
// Object for groups
// Array methods (map, filter, find)

const groups = {};
const lines = content.split(/\r\n|\r|\n/);
lines.forEach((line, idx) => {
    const lineNum = idx + 1;
    // Process line
});

Considerations:

Handle both LF and CRLF line endings
Use strict mode for better error checking

---

Java

java
// Use BufferedReader
// HashMap for groups
// ArrayList for rows

Map<String, Group> groups = new HashMap<>();
try (BufferedReader reader = new BufferedReader(
        new FileReader(filename, StandardCharsets.UTF_8))) {
    String line;
    int lineNum = 0;
    while ((line = reader.readLine()) != null) {
        lineNum++;
        // Process line
    }
}

---

C#

csharp
// Use File.ReadLines() or StreamReader
// Dictionary for groups
// List<T> for rows

var groups = new Dictionary<string, Group>();
int lineNum = 0;
foreach (string line in File.ReadLines(filename, Encoding.UTF8))
{
    lineNum++;
    // Process line
}

---

Go

go
// Use bufio.Scanner
// map for groups
// slices for rows

groups := make(map[string]*Group)
scanner := bufio.NewScanner(file)
lineNum := 0
for scanner.Scan() {
    lineNum++
    line := scanner.Text()
    // Process line
}

---

Ruby

ruby
# Use File.foreach
# Hash for groups
# Arrays for rows

groups = {}
File.foreach(filename, encoding: 'utf-8').with_index(1) do |line, line_num|
    # Process line
end

---

Performance Considerations

Parsing Performance

Large Files:

Use streaming/line-by-line reading (don't load entire file)
Build indexes lazily (only when needed)
Consider memory vs speed tradeoffs

Optimization: First Field Search


# Instead of parsing every line:
for line in lines:
    first_pipe = line.find('|')
    first_field = line[0:first_pipe].strip()
    if first_field == search_value:
        # Found it! Now parse full line

Write Operations:

Read → Modify → Write pattern
Use temp file + atomic rename
Consider file locking for concurrent access

---

Common Pitfalls

❌ Don't Do This:

Checking [EOG] after regular group pattern

- [EOG] will match \[([A-Za-z0-9_-]+)\] and create a group named "EOG"

Fix: Check [EOG] FIRST

Escaping all backslashes

- C:\Windows becomes C:Windows

Fix: Only escape when before pipe

Not trimming fields

- | value | should become "value", not " value "

Fix: Trim after split

Treating text groups like regular groups

- Text groups have no delimiter processing

Fix: Check group type before processing

Forgetting empty line endings

- Regular groups end on empty lines

Text groups do NOT end on empty lines
Fix: State machine must track group type

---

Reference Implementation

PHP Implementation Available:

qset_read.php - Basic parser (~200 lines)
qset_read_table.php - Table support (~300 lines)
qset_crud.php - Full CRUD (~900 lines)

Use as reference for:

Algorithm implementation
Edge case handling
Function signatures
Test strategies

---

Getting Help

Resources:

QSet Specification: QSet_Spec_v4_2.md
Test Files: test_data_complete.qset
Expected Results: test_data_expected.qset / .json
PHP Reference: qset_read.php, qset_read_table.php, qset_crud.php

Community:

Website: https://setfiles.org
Issues: Report problems or ask questions
Contributions: Share your implementation!

---

License

QSet Specification: CC BY 4.0 This Implementation Guide: CC BY 4.0

---

_End of QSet Implementation Guide v4.2_

Page last modified on January 05, 2026, at 03:55 PM

QSet Parser Implementation Guide

Table of Contents

Overview

What is QSet?

Key Design Principles

File Extension

Architecture

Three-Tier Library Design

Why This Architecture?

Core Parsing Algorithm

State Machine

Pseudocode

Line Parsing (Regular Groups)

Data Structures

Internal Representation

Why Track Line Numbers?

API Design Patterns

Naming Conventions

The Peg Concept

Nested Function Architecture (CRUD)

Implementation Phases

Phase 1: Basic Parser (QSet_Read)

Phase 2: Table Support (QSet_Read_Table)

Phase 3: CRUD Operations (QSet_CRUD)

Testing & Validation

Test Files Provided

Test Strategy

Common Edge Cases

Language-Specific Notes

Python

JavaScript/Node.js

Java

C#

Go

Ruby

Performance Considerations

Parsing Performance

Common Pitfalls

❌ Don't Do This:

Reference Implementation

Getting Help

License