Copyright 2023 Brian Davis - CC-BY-NC-SA

Introduction

In early 2023 I worked through the very excellent https://craftinginterpreters.com/ by Robert Nystrom. It's empowering and educational in a way that sent me down a very deep rabbit hole of language design. My brain latched onto this problem and couldn't put it down, with no other goal than my own enjoyment. I will use this series to synthesize and organize my thoughts.

For lack of better name my mythical language is called Dyn and rhymes with Djinn.

Basic Syntax Decisions and Why

Minimalism

I'm fascinated by minimalist languages for two reasons. First, if there is less of the language to remember getting to fluency is faster and less of my working memory is taken up by the language itself. Python is generally considered an easy language to learn. Much easier than say, C++, but I still find myself having to look up the names of the magic methods and the order of arguments in hasattr.

Second, I think it's fun to start with a very basic language and build up the paradigms you need, be they functional or object-oriented or what have you. I get a kick out of doing this in Lua, and practitioners of forth and lisp report similar joy.

There is an argument to be made whether building an ad-hoc object model is a Good Thing™, or not, but since this entire exercise is about doing what I feel is fun, I'm going to run with it.

ALGOL

I'm going to roughly stick to the structure of the ALGOL family of languages. While I find the minimalism of lisp and forth fascinating, I find matching nested parenthesis tedious and postfix notation an unnecessary cognitive load. This means I will have code blocks, nested functions, and lexical scope.

Statements vs. Expressions

For my purposes a statement is a piece of code that does not result in a value, while an expression does result in a value. Typical expressions look like 1+2 while statements looks like print(4). The advantage of making control structures like if into expressions, (ala rust) is that it becomes easier to eliminate intermediate variables, which are each opportunities to introduce several classes of errors. I am not an expression purist, meaning I think sometimes algorithms are clearer for having some intermediate variables, but I also want more tools at my disposal to eliminate them.

That means that the ternary operator (ie. if/else as an expression) is one of the first things I wanted to added to my language. I like the way rust allows the same syntax for both if/else in statement and expression forms. That is much nicer than C's ternary operator. I find it particularly annoying that the order of operands is different between C and python.

As I think about what keywords should be statements versus expressions, the only ones I'm confident shouldn't be expressions are the flow control statements. return and similar keywords, do not make sense to me as expressions.

I'm still on the fence about variable definition as an expression or statement. On the one hand I don't think any sane person wants to be defining new variables in the middle of an expression. On the other hand I have this idea about using the local scope as a first class data structure that kind of wants variable definition to be an expression. More on that later.

Typed vs Untyped

I think any modern language should have type inference as table stakes. But whether it uses static, dynamic, or gradual types depends on the goals of the language. For myself, I like the metaphor of gradual stiffening. I want to start prototyping with a lot of flexibility and then gradually lock down type definitions as the design becomes clearer. I think a modern language needs fully dynamic types as first class citizens, and also the full capability of static types.

I also want to take it a step further and have data validation and requirements more in line with modern databases. I should be able to constrain a variable (field) to a set of values, etc.

Using a keyword for defining a variable isn't strictly necessary for the grammar to work. However, I am drawn to using a let keyword. The added visual clarity is a signal to think more deeply about whether the variable is necessary and what constraints I am prepared to place on it.

One thing I'm still puzzling over is how to indicate variable type when referring to a variable. I don't like that variable definitions and their use can be so far separated, requiring the programmer to remember the type of the variable when using it. But the .Net convention of including the type in the name is really annoying and the old convention of Perl that included a sigil in the name to indicate type was also not great. This is an open question for me.

Whitespace

I've been using python for a long time and I have always made the case that significant whitespace is not a big deal. You pick a convention and stick with it or, even better, an automatic formatter. That said, there are annoyances. Copy/pasting code sometimes breaks it. Posting it on the web sometimes breaks it. And there is the oddball code base that mixes up spaces and tabs and that can be a real headache.

With this experience I will not choose significant whitespace for a language I am designing. What I will do is write a canonical formatter because consistency is a Very Good Thing™. And I will make it easy to opt-out for certain code blocks, because sometimes hand formatted code, particularly data, is the best.

Line Endings

Lua proves that semicolons are not necessary for delineating statements. I will allow semicolons to separate statements, but if a newline is encountered when the parser is not looking for an operand consider it an implicit semicolon. This means that to split a long expression across lines, simply end the line with an operator. No need to escape the newline or require explicit semicolons.

I think the rule is simple enough for the programmer to work with and not be annoyed by missing semicolon errors.

Keywords vs Sigils

I prefer words to symbols. Sigil operators should be few and obvious, ie. follow ALGOL family conventions. Anything non-obvious should use a keyword.

Blocks

I want to avoid a ton of brace matching but without significant whitespace we have to use something. Curly braces are so ubiquitous in ALGOL family languages that I think they are acceptable. While it is up to the language designer to make choices that encourage a relatively flat structure, it is ultimately up to the programmer to avoid a nested brace hell.

I am going to use {} to contain a block of code, a list of statements. The most common kind of statement is the expression statement.

Rust introduced the idea that a block can result in a value, equal to the last expression in the block. My idea is that a block resolves to a data structure composed of the block's local scope. How this data structure fits with my type model and what sort of data structure that is is a question I will explore in future posts.

Conclusion

There is a lot more I could cover I'm going to stop here with basic syntax decisions. So far I have a minimal language in the ALGOL tradition, with more keywords than sigils, braces instead of significant whitespace, implicit or explicit line endings, and gradual typing. Nothing particularly radical there but I've hinted at some of the areas I want to explore with gradual typing, granular data modeling, and local scope as a first class data structure.