Kryptostack
README

KryptoStack is licensed under the GNU General Public License v3.0.
See the LICENSE file for more details.

Project Goal

A minimalist stack-based programming language designed as a flexible
framework for implementing and exploring cryptologic algorithms.
Its syntax and semantics are inspired by PostScript and Forth, offering
a concise and expressive environment for learning and experimentation.

Manifesto

KryptoStack is an open-source project.

The following interests could be addressed by the project:

  • A minimalist implementation of a stack-oriented programming language
  • The implementation of mathematical algorithms with a focus on cryptography
  • The execution of pre-built cryptographic algorithms
  • The coding of the interaction between pre-built algorithms

The project and the code are written in English.
For mathematical operators, whose creation is a primary goal of the project,
multilingualism is enabled.

Readability and clarity of the program code take precedence over optimization
in the implementation.

All artifacts can be built with a minimal toolset. Currently, these are
the GNU C++ compiler, make, flex, Bash, Boost and git. The project is hosted
on Gitlab, and the environment's features like CI/CD are used. However, a
build of all artifacts and tests must always be possible without Gitlab.

Documentation is preferably embedded in the code and should be cleanly
extractable with Doxygen. All documentation is included in the git repository.

The main branch of the code must always allow a build of the artifacts
at the turn of the day, and these must pass the automated tests.

There is a constant refactoring of the code and documentation.

There is an extensive catalog of automated tests. When making design
decisions, the testability of a possible solution is given high priority.

Quick

Quick Build

Install, if not already on your system, the GNU C++ compiler, make, flex,
the quadmath library, the Boost library and git. For the documentation
open the GitLab Pages of the project. There you can find the latest
project and user information.

To build the artifacts and run all tests:

‍make e2e

Optionally install Doxygen and LaTeX to generate the HTML documentation
in ./public/ with:

‍make public

Command Line Interfaces

Command Line Interfaces

Quick Runs

To run an example:

‍make ks; make parser; ./kb Examples/Factorial.ks

To run all tests:

‍make e2e

To run all examples:

‍make examples

Use it as a calculator:

‍kc 25346 95095 gcd
kc 1 2 1 33 { mul } for

Technologies

This project leverages the following core technologies:

  • GNU C++ 20: utilizing modern C++
  • git: for version control
  • GitLab with features: issues, pipelines, pages, releases, etc.
  • make: automating the build process for the command-line and for CI/CD
  • flex: generating the lexical analyzer
  • Bash: scripting for automating tasks
  • Doxygen: generating documentation for both the project's code and the
    custom KryptoStack language
  • Markdown: a lightweight format for documentation.
  • quadmath: supporting 128-bit integers and 128-bit floating point numbers
  • LaTeX: enhancing the visual quality of Doxygen-generated documentation
  • Boost C++ 1.83: for various extensions, e.g. asserts, unit testing

Optional tools:

  • cppcheck and clang-tidy: static analysis for code quality and
    bug prevention
  • KDevelop and/or Visual Studio Code: IDEs for efficient development
  • cloc: to get simple code metrics
  • lcov and gcov: do get and visualize test coverage data

Tools from the pipeline:

  • shellcheck: a Unix shell lint integrated in the GitLab pipeline
  • markdownlint-cli2: a markdown lint integrated in the GitLab pipeline
  • GitLab SAST: a GitLab-maintained security scanner for the pipeline

Glossary

Build Type

There are three build types for different software processes
and environments: DEVELOP PRODUCTION PROFILE

Core Code

If a KryptoStack operator is coded directly in C++, it is referred to as core code.

ks

The second interpreter pass reads and interprets its KSN-Format input.

KS-Format

The file format for the stack-based programming language KryptoStack.
Each line of code can contain commands, which are short, case-sensitive
keywords, and operands, which are pushed onto a data stack and
manipulated by operators. The language supports various data types,
including integers, real numbers, booleans, strings, arrays, and
dictionaries, and follows a reverse Polish notation (RPN) syntax, where
operators follow their operands.

KSN-Format

The exchange format between the first interpreter pass called "parser"
und the second interpreter pass called "ks".

Operator

A subspecies of semantic objects, that operates on other semantic
objects, the stacks or the interpreter itself. Some of the operators are
procedures and the others are core code.

Object Type Code or OTCode

A one character code for each SO class. The virtual ot() member function
of all SO classes return this code and allows explicit type-based decisions
beside the C++ language poylmorphisms.
See: object types

parser

The first interpreter pass checks the syntax and converts the programm
into KSN-Format.

Procedure Level

The procedure level counts the nesting level of curly braces in the
program code.

Property

For the classes Interpreter and Context there are attributes, together with
Setters and Getters, which change the behaviour. These properties can be set
with KryptoStack language operators.

Semantic Object or SO

Forth calls them words.

Task Tag

We use IDEA:, TODO: and FIXME: as Task Tags

Verbose Mode

All tools support a -v option to generate more run time information.

Vocabulary

A subdirectory as container of ks-Files which creates one or more
functional related dictionaries if executed.

Parser Step Details

Reads standard input stream, runs checks and normailzes it into the
KSN-Format on standard out.

Ignores the remainder of lines starting with %.

Halts execution upon error detection.

Normalizes I tokens by stripping leading plus signs.

Removes the opening and closing parentheses from the string.

Removes all single backslashes from the string, preserving double
backslashes (\\) and newline sequences (\n). These are the only
escape sequences recognized in the KSN format.

The -v option activates verbose mode.

The -h option prints a command line help.

KSN-Format Details

This is a simple line-oriented format.
Each line begins with a KSNCode followed by a colon (:).
The code identifies the data type of the subsequent value.

The differnt KSNCodes and data types are:

>I: A signed integer
>R: A floating-point number
>B: A boolean literal
>S: An unquoted string with just two escape sequences \\ and \n
>N: A literal name token consisting of any characters but whitespaces
>X: An executable name token consisting of any characters but whitespaces
>#: The value is a comment
>E: An error message

Object Types

This is a list of SO classes and their OTCode.

All objects are executable or non-executable aka literal.
This executabe status becomes manifest with different OTCodes for
name objects and array objects.

Simple Objects

A duplicate of an object of these types duplicates the value of the object.

SOL - 0
The null object is used as placeholder in arrays.

SOB - B
A boolean value.

SOI - I
A 128-bit integer.

SOM - M
A mark object.

SON - N or n
N ... for non-executable name objects
n ... for executable name objecs

SOO - O
A regular registered operator.

SOo - o
An unregistered operator.

SOR - R
A real number.

Composite Objects

A duplicate of an object of these types shares its value with the
original object.

SOA - A and a
A ... for a non-executable array
a ... for an executable array aka procedure

SOD - D
A dictionary is a list of key-value pairs of SO's.

SOS - S
A string.

SOK - K
A stack of SO's.

Files and Directories

File Types

Extension Description
ks KryptoStack language source
cpp, h C++ source
md Markdown documentation
l, c flex source and generated code
yml YAML data
json JSON data and configuration
sh, inc Bash script
none executables and some Bash scripts
tpl Bash template
cloc CLOC configuration
kdev4 Kdevelop configuration
gcno profiling structure information
gcda profiling run data
html, css HTML source
info lcov data
out end-2-end test reference output
Extra Description
.gitignore git ignore
LICENSE License ASCII text
Doxyfile Doxygen configuration
compile_flags.txt clang configuration

Directories of the git Repository

./Coverage
Test coverage data.

./Examples
Example programs.

./.git
That's the git Repository.

./.kdev4
Is a living propsal for a KDevelop configuration.

./public
Location for Gitlab Pages web server files.
The Doxygen-generated HTML documentation can be found in ./public/doxygen/

./Suite*
Each 'Suite' directory contains a collection of test cases.

./tmp
Exists for temporary files e.g. for testing purposes.

./Tools
Contains TOOLS for the development, e.g. git hooks.

./vocabularies
Is only a container for its subdirectories.
These subdirecories are called vocabularies and the names of the
subdirectories are used to identify the vocabularies.

./.vscode
Is a living propsal for a Visual Studio Code configuration.

Examples

See EXAMPLES

Programming Language

See LANGUAGE

Tests

See TESTS

Tools Directory

See TOOLS.

Build Types

There are three so-called build types. They denote build and, in particular,
compilation settings for different environments.

DEVELOP Compilation without time-consuming optimizations.
Used to develop and debug the code.

PRODUCTION Compilation with good optimization. All debug-code is removed.
All asserts are removed.
Used for the GitLab pipeline.

PROFILE Compilation without any optimization. No inlineing of functions.
Compilation with instrumented code to generate profiling data.
Used to analyze function, line and branch test coverage.

The make utility can be directed by: BUILD_TYPE=PRODUCTION make ks or
as in the GitLab pipeline configuration by: make BUILD_TYPE=PRODUCTION ks.

The command line tools kb, ks, and parser print the build type with which
they were generated as the first line in their help text.

Release Strategy

Version Numbering Scheme

We use a two-part Version Number consisting of a major and minor
version, separated by a period. We started with "0.9".
Release Numbers are assigned to these version numbers by appending
a sequential number starting from 1. Therefore, the first release number
was 0.9.1. When the version number is increased, the last number is reset
to 1. Consequently, the first release of version 1.0 is 1.0.1.
Release numbers are stored in git as Release Tags prefixed with 'v'.
Thus, the first release tag is "v0.9.1". In git, the first commit intended
for a release is tagged with this release tag.
The last commit in the release cycle receives a Release Completion Tag
in git. This release completion tag in git has the form of the release tag
with the appended text "--release". Therefore, the first release
completion tag is named "v0.9.1--release".
For each release completion tag, a Gitlab Release Object, i.e., a
release in the sense of Gitlab, is created.

CHANGELOG

When a new release is created, a new section is opened in the
CHANGELOG.md for this release. The Release Date of the previous
release is also assigned then. This is the date the respective Git
release object was created.
The section of the previous release in the CHANGELOG is manually revised
when a new release is created. The revised content of the CHANGELOG section
for a release serves as the Release Notes.

For each commit, the corresponding commit message should be appended to
the CHANGELOG. This can be done automatically by installing a git hook
from TOOLS.

What is a Release?

A release consists of:

  • a release number
  • a release tag in Git
  • a release completion tag in Git
  • all associated commits
  • a section in the CHANGELOG
  • the revised release notes
  • a Gitlab release object

Release Change Check List

  • check for FIXME's in the task tags
  • check markdownlint output
  • check SAST artifact-report
  • assess the lcov HTML output
  • update CHANGELOG with release date
  • revise CHANGELOG section to create the release notes
  • last commit of remaining changes
  • git push
  • with current vx.y.z add vx.y.z–release tag to this last commit
  • git push origin tag vx.y.z--release
  • check pipelines in Gitlab
  • create a Gitlab release object
  • open new section in CHANGELOG with date "open"
  • first commit of new release cycle
  • git push
  • add vx.y.z+1 tag
  • git push origin tag vx.y.z+1
  • make clean
  • review make release-info output
  • make e2e
  • review ./ks -h release-info output

Naming and Structure Conventions

The first character of a class name is an uppercase letter.

Class attributes end with an underscore.

Parameters are consistently prefixed with p_.

Functions within anonymous namespaces are consistently prefixed with s_.

Test cases names are prefixed with B_.

Test suite names are prefixed with Suite.

using namespace xyz; is not used at all.

An indentation consists of 2 spaces.

A maximum of 4 levels of indentation should be maintained (a never nesting approach).

We use the dollar sign character as part of identifiers.

The test coverage is determined using gcov. The latter requires
a clean separation of code lines for each statement to be measured.

Order of Class Members

{c++}
class AClass {
private:
protected:
public:
// starting with ctors and dtors
public: /* accessor */
// setters and getters
public: /* virtual */
// virtual member functions
public: /* other */
// other member functions
};

Prerequisites and makefile and Gitlab Integration

prerequisites are:

  • GNU CPP with all usual tools like make
  • flex
  • doxygen
  • quadmath library
  • Boost library

The available makefile targets can be displayed by using the the
make-command without options.

We use Ghostscript as our reference implementation for PostScript.
Hint: To invoke gs without rendering, you can set the environment
variable export GS_DEVICE=nullpage.

The makefile will be triggered from the Gitlab pipelines. All process
details are implemented in the makefile and its helping scripts.

The different images in use for the CI/CD stages and the various
package prerequsites are documented within the GitLab YAML file for CI/CD.

Interpreter Exection Model

main() loop

Main interpreter loop

  • Reads a KSN-line and calls s_ksnline().
  • Then processes the execution stack until its empty.
    SOOs will be executed
    Executable SONs calls their SON::load_exec()
    All other SOs will be push onto the operand stack.
  • Repeat.

Interpreter::ksnline()

Processes one line of the KSN-format input.

  • Executable names outside of procedures and {, } are pushed onto
    the execution stack..
  • Everything else is pushed onto the operand stack.

SON::load_exec()

Look up a name and executes it.

  • Procedures will be unfolded onto the execution stack.
  • Operators will be executed.
  • Executable names will be called recursively.
  • Other SOs will be pushed to the operand stack.

SOO::exec()

Calls the C++ machine code associated with the SOO and SOo.

SOA::unfold2exec()

Unfolds duplicates of the array-content to the execution stack.

operator bind

Replaces executable names with operator objects recursively into elements
that are SOA. Also does an optimization/compilation if Context::compile_
ist set to true.

operators begin and end

These dictionary stack manipulations influence what is found by
SON::load_exec().

operators exec

The exec operator pushes

onto the execution stack.

loop operators

The loop operators push

onto the execution stack.

C++ Notes

The minumum C++ standard in use is C++ 20 with GNU extensions.

Remarkable C++ features in use:

  • lambdas and function pointers
  • multiple inheritance
  • virtual dtors
  • anonymous namespaces
  • 128 bit integer and 128 bit floating point data types
  • initialization of static inline int attributes within
    the class definition
  • static class members and static member functions
  • curiously recurring template pattern
  • explicit ctors
  • [[nodiscard]] and [[noreturn]] attributes
  • auto type deduction and decltype()
  • shared pointers
  • iterators for STL classes
  • std::initializer_list<>
  • dollar signs in identifiers
  • constexpr and regular usage of const qualifiers
  • delegating ctors
  • in-class initialization
  • deleted und defaulted methods
  • std::source_location

C++ features that we intentionally avoid:

  • exception handling

Patterns and architecture:

  • singleton pattern (class Interpreter)
  • adapter pattern for functions and classes (stoint128(); class SOK)
  • pipes ( parser | ks )
  • composite pattern (class SO hierarchy)
  • ? proxy (without interface: class Counter)
  • polymorphic containers (class SOA)
  • null objects (class SOL)
  • bridge pattern / opaque pointers (class PimplTime)
  • rudimentary design by contract with prerequisites, postrequisites and class invariants

Design by Contract

Invariants, pre- and postconditions are documented inline with Doxygen.

Certain contracts can be fullfilled by strong typing:

  • size_t ensures a non-negative integer
  • enum AngularUnit
  • enums opError and inError
  • etc.

const qualifier for call parameters IDEA: TBD

Class Invariants

Classes with invariants have to inherit protected from class DbC
from dbc.h.

A protected bool invariant() noexcept const implements all invariant-checks
listed in the Doxygen @invariant class documentation.

If a class has a parent class with an invariant(), then it must call
this Parent::invariant().

#ifndef DBC_IS_VOID guards the invariant checker and auxiliary code
to exclude them in BUILD_TYPE PRODUCTION.

DBC_INV_CTOR(classname); will be used

  • at the end of ctors
  • at the beginning of dtors

DBC_INV; will be used

  • in non-constant public functions
  • not in static functions
  • not for single parameter setters

DBC_INV_RAII(classname) can used to force the call to invariant()
at function return.

It does the checks, but not in BUILD_TYPE production.

IDEA: no instantiation forced with non-public ctors, etc.

Preconditions

The preconditions are listed in the Doxygen @pre member function
documentation. DBC_PRE ist used for the implementation of the
precondition-checks.

Postconditions

The postconditions are listed in the Doxygen @post
member function documentation.
DBC_POST ist used for the implementation of the postcondition-checks.

Boost

The installation of the boost library in a GitLab pipeline image requires a
apt -y install gfortran- libboost-all-dev, because of Fortran config issues.

Following boost modules are in use:

Boost.Assert

BOOST_ASSERT, BOOST_ASSERT_MSG, BOOST_ASSERT_IS_VOID
are used as building blocks for
DBC_PRE, BDC_POST, DBC_INV, DBC_INV_CTRO, DBC_INV_RAII and DBC_IS_VOID

Boost.Test

See TESTS

Static Code Analysis

and code statistics

cppcheck

cppcheck ist used as integrated code tool in KDevelop and as target within
the makefile.

clang-tidy

clang-tidy ist controlled by the makefile.
The output is send to tmp/statcode1.txt
Integration into Viusal Studio Code is TBD.

Clazy

Integrates with KDevelop. detailed config TBD.
makefile integration TBD.

Count Lines Of Code

the tool cloc is integrated in the makefile. It generates CLOC,
which is part of the documentation.

shellcheck

shellcheck is a Unix shell lint. It is integrated with the component
mechanism in the GitLab pipeline.
Some checks by shellcheck are disabled generally within the GitLab
YAML configuration.
SC2086 is disabled with a comment convention within the shell scripts
with # shellcheck disable=SC2086#.

markdownlint-cli2

markdownlint-cli2 is a Markdown lint. It is integrated with the
component mechanism in the GitLab pipeline. The check is configured
to always succeed.

KDevelop Editor

Has build-in analysis for the editor.
KDevelop requires a compile_commands.json file to run clang-tidy
and Clazy.
bear generates this file, with an integration in the makefile.

Visual Studio Code Editor

Has build-in analysis for the editor.

GitLab SAST

GitLab SAST is a GitLab-maintained pipeline feature for automatic security tests.

Task Tags

The task tags information is extracted using the Tools/tasktags.sh script.
This script is directly integrated into the makefile, making it part of
the automated build process. Editors and IDEs support this kind of tagging
by highligthing the texts.

We utilize three distinct tags to categorize different types of tasks.
However, specific configuration might be required depending on the editor
or IDE used.

FIXME:
This is a known bug in the software. This bug should be fixed in the
next software release.

TODO:
A task that still needs to be completed. This task should be finished
before the next major software release.

IDEA:
This is an idea for a potential improvement to the software. Implementing
this idea is planned, but the exact timing is not yet determined.

The current list of task tags can be found under TASKTAGS

GitLab Notes

The following GitLab features are used:

  • Issues & Incidents for planning
  • CI/CD Pipelines with build, test, and deploy stages
  • Pages for hosting the program documentation
  • Issue Boards
  • Milestones for a thematic structuring of tickets
  • Releases
  • Tags to mark the beginning and the end of a release cycle.
  • Components to import ready to run CI/CD steps:
    • to-be-continuous/bash
    • components/sast
    • components/markdownlint

Additional features will be integrated over time.

KDevelop

KDevelop creates a .kdev4 file and a ./kdev4 directory. These files
are integrated into the git repository to share them.

Visual Studio Code

VSC creates a ./vscode directory. This directory is integrated into
the git repository to share it.
The 'Todo Tree' extension can handle our task tags appropriately.

Doxygen

Commenting is done in Doxygen Javadoc style. This allows for loose
integration into KDevelop, which parses these comments.
JAVADOC_AUTOBRIEF ist enabled in Doxyfile to eliminate explicit
@brief tags.
The Doxygen output is located in the ./public/doxygen directory.
The launch of Doxygen is integrated as a target within the makefile.
The Doxygen output is published as GitLab pages. The process is automated
using GitLab's CI/CD features.

We use the style /** to start comments. Functions are documented at
their point of declaration. Inline LaTeX formulas are used to pimp up
the ouput.

References