NEXT: learn then incorporate erlang and prolog. consider prolog as the ideal between of contracts and type systems.
it’s important for developers to learn
-
C (procedural)
-
prolog (declarative, horn clause-based)
-
apl (concatenative, tensor-based)
-
assembly (procedural, purely stateful, register machine)
-
haskell (purely functional, λ-calculus)
-
factor (stack machine)
-
erlang (agent & message-based, π-calculus)
these are different from common langs like python, ruby, go, java, &c. C fits in this category, except that C is the simplest version of all these, and so is best. common langs are all complex, interchangable, and mixed-paradigm—all of which are bad qualities.
learning all these languages is important because each language is very different from all the rest. each is a simple language with one underlying structure, one paradigm. one can look at these languages altogether and infer the fundamentals of computing, and see how paradigms completely at odds with each other nonetheless are turing-complete, convenient, and elegant.
considering such things makes one realize how bloated & confused most other langs are, so choosing simple langs is obvious and liberating. the simple langs demenstrate that hardly are "languages" needed, but instead simple notations for describing equivalent formal systems. such study reveals that common assumptions are false. truths:
-
variables aren’t needed
-
can remove arbitrary chunks of programs and a valid program remains
-
making self-modifying programs is plain & simple
-
we can run programs backwards
TODO: replace "e.g."'s by total consideration.
what are the necessary components of the minimal practical programming language? turing machines are the simplest, but impractical. here’s what we’ll do:
-
look at languages' models only. e.g. lisp uses lists, factor uses stacks, and om uses function composition.
-
consider these models as mathematical structures, then we’ll describe them by graphs with unlabeled edges and valueless nodes
-
compare these graphs. e.g. list is synonymous with stack (and arrays are equivalent but with different efficiency for random access and reshaping,) functions are representable by lambdas, which can be described by lists, so functions are isomorphic to lists (composing lists is (via
cons) is equivalent to composing lambdas.) actual "computation" is calledevalor β-reduction.
what do we have so far, then?
-
things can be stored in stacks / lists / lambda formals or registers / alists / maps / defines (as lua shows us,
x = 4is the same as setting keyxto value4in the global map (_Gin lua)). these are generally seen as lists whose elements may be a pair of key & value. -
data inevitably have scope, even if global. they can be accessed only by procedures that have access to their scope. lacking global scope, scope endowment is accomplished via scoped binds, naturally accomplished by parameter passing (
letis just alternative syntax to lambda composition.)
it should be enough to simply define data as relations of other data; this is merely lists, generally graphs, which specify constraints/relations of data, including access and mutation, thereby coding synchronization or other relations of puts & gets. then it should be enough to define programs as mutations (regardless of whether in-place mutation or function) thereof. this is pretty easy currently in any lisp for which mutation is faster than recursion: make some few global vars in a module. allow access to it wherever necessary. you can use such simple, few globals as stack(s) or registers. now we could make this work for recursion-preferring langs, but that’s a little more difficult for reasons described in the example in just-use-lists.adoc.
what all good models have in common is minimizing every variable’s scope. however, the language must make doing so elegant. many langs fail here; it’s often preferable to use larger-than-necessary scopes for convenience.
-
to help us remember that we’re just using the lambda calculus, let’s call "eval"
β -
lambdas are pairs of an input list and output list a la picolisp
-
quotation will always be quasiquotation (henceforth qq), because it’s only more capable than quotation without substitution. there’s no need to terminate a list with
(); if you want to recurse, just useatom. -
with such qq & λ,
consis redundant and thus omitted -
car,cdr,if/cond,atom,eq, and lambdas will be altogether replaced by a common generalization of them: parsers. they’re likecondon steroids: maps from predicates to values, but with syntax for extracting data and expressing predicates neatly, not obviously distinguishing between equality vs predicate matches, nor predicate satisfacton of form vs value. every lambda’s formals supports parser syntax. -
though
applymight not be strictly necessary (and can be considered a convenience macro,) we don’t need to consider it; we’re already using qq, and unquote is a part of that. we can use unquote instead ofapplya la janet.
thus our language’s grammar is: qq, β, λ [parser]; and its vocabulary is interned symbols, words/bytes, and pairs. this is purely functional; adding set would change that.
-
macros (as a lang feature) aren’t needed; for metaprogramming (mp) just use qq & β. also, mp is nice like racket instead of messy like
defmacrobecause lambdas are parsers. -
lambdas will accept an optional label, for easy recursion/goto/continuation. there will not be a "define" form. all binds will be accomplished by parsers.
with stack based langs, we’ve only positional parameters, not named ones, and we access them by pop [factor, elisp], so we don’t even need identifiers (including ordinals) to reference them. however, this is just a less convenient version of parsers.
TODO: what about coparsers to help ppl write valid programs?
static (or early) means "determinable before runtime;" dynamic (or late) means "determined only during runtime." think of static vs dynamic arrays in c: static ones' addresses can be known without running the program; dynamic ones aren’t knowable, and even the size isn’t knowable before running the program, and even then, it changes throughout the program!
dynamic indicates polymorphism (or variation, instead of being constant): one of many values will be chosen, and we don’t know which without tracing program state, whereas static means that only one value is possible. for example, if run is an ad-hoc polymorphic identifier (e.g. a method of an abstract class in c++ or java) then its value is determined by an object, e.g. obj:run(). this lua syntax is akin to oopy obj.run() in java, but is actually syntactic sugar for run(obj)—an equivalent haskell-style functional approach. this example of dynamic connoting polymorphism is specifically one of dynamic binding. static or dynamic of a binding refers to the bind’s value.
-
static is concrete enough that we can use non-algebraic rewrite rules.
-
static is always potentially faster than dynamic.
firstly, neither scope nor binds is necessary, as assembly language demonstrates. however, it’s arguable that the names of registers are identifiers bound to given values, and that the scope is totally global—a "zero" scope, so to speak. there’s no avoiding the facts that:
-
values are addressed somehow, whether by address in a heap, or position on a stack, &c
-
in all expressions, subexpressions are either bound or free, and there must be a rule for determining which
_dynamic binding_ is sometimes used [to refer to late binding], but is more commonly used to refer to dynamic scope.
consider the following picolisp code:
(de f () (+ x 4)) ; (1)
(f) ; NIL ; (2)
(de x . 65) ; (3)
(f) ; 69 ; (4)|
Tip
|
this shows another of picolisp’s good features: NIL propogation instead of crashing on free identifier. |
the combination of dynamic binding and dynamic scope allows this code to be valid: in line 1, the dynamic binding allows x to accept whatever value it’s bound to when f is invoked. dynamic scope allows x in f to inherit the top-level value of x.
|
Note
|
being top-level isn’t praticularly relevant; all that matters is that the x in f is an inner-more scope than x outside of f.
|
though dynamic binding makes parameter passing unnecessary, parametrs are still nice, since they allow unbound (anonymous) expressions to be given to functions for use; it frees us of the need to bind everything to names.
if we’d used dynamic binding and lexical scope, then f’s definition would be invalid; `x isn’t in its scope. lexical scope is known for closures; let’s look at a practical dyn.bind/lex.scope example:
(de f () ; closure of x
(de x . 4)
(+ x y))
(f) ; NIL; y isn't bound
(de y . 6)
(f) ; 10. dynamic binding: f can use y, and as per lexical scoping rules, y is in f's scope.
x ; NIL. x is bound only within fexcept that that doesn’t actually happen in picolisp since link;picolisp doesn’t use lexical scoping. i’ve yet to learn (let alone understand) picolisp’s bind & scope mechanisms.
-
it seems that dynamic scope implies dynamic binding, since scope determines binds' values.
-
scoping relates to:
-
deallocation in gc langs
-
context delimitation
-
semantics of free variables. usually illegal, but
-
in lua and picolisp free vars are null
-
theorem provers (e.g. agda,) type checkers (e.g. haskell,) or logical deduction systems (e.g. datalog) could use them as part of a reïfication engine.
-
-
-
generally every binding syntax has its own associated scoping rule, even if many use the same rules. for example, the
for(&al loop) syntaxes in algol languages bind where the scope is the loop body. -
lexical scoping is more natural to function composition (applicative style;) dynamic scoping is more natural to mutation.
-
racket’s parameters are dynamically scoped
of course, scope & binds are concerns only if they’re used, which they aren’t in concatenative paradigms, aside from possibly defining functions, which always (i think, at least in apl & factor) have scopes exactly their parameters (ɑ [& ω] or the stack.)
for a statement in one context to be able to modify another context is a grave mistake, completely confused and senseless. one should have either [pure] functions or subroutines (which do not return a value; they’re pure mutation.) within either a function or subroute (collectively subprograms) definition one may bind; these binds are valid only within the body (and not in subprograms called within the subprogram) and are freed upon the subprogram terminating. subprograms are then merely delimited sections of the whole program.
what makes programming difficult is when expectations about program behavior aren’t clear. the ability to merge multiple different rules is a primary cause of such difficulty & danger. thus multiparadigm is good if there’s also separation of paradigms, such as purely functional or purely mutative. a clear violation of this design principle is languages featuring a local keyword, implying that there’s no single consistent scoping rule, which means that we as programmers generally need to read through every single subprogram just to know its behavior. haskell uses the IO monad as a clever yet overly restrictive solution to clearly delimiting/marking pure vs impure functions.
to consider a "single program" as such is foolish; we should be able to add or remove any subprograms and still be left with a valid (though possibly nonsense) program.
iteration or recursion. generally goto where dataflow is a cycle [graph theory]. given that goto is just funcall, goto is a useful generalization, suggesting that it should be used for all program sequencing (deriving execution paths from a graph of statements [graph theory]).
the crux of this document: many languages demand constraints for the sake of safety. i say that it’s better to demand such simplicity that safety is hardly needed; that the liklihood of someone doing something improper is small because they have few options, and what options exist are always encoded obviously rather than following some special [complex] syntax, convention, or model. simple syntax, conventions, and models are good. for example, stack-based langs or lisp are simple; they each have few rules that define them. this means fewer things for programmers (or compilers or interpreters) to consider. fewer possibilities means higher predictability, and so the programmer’s expectation of what’s happening is more likely to accurately describe what’s actually happening!
-
btw fortran is faster than hand-written asm b/c fortran has a very good optimizer
interesting langs not yet considered, (but not necessarily to be considered:)
-
roc (potentially better than haskell for programming (cf type algebra.) terser syntax, maybe faster, non-curried though, type checking always succeeds if types are correct, and type annocations are never needed, MUCH improved notation for ADTs, and ADTs are closer to row-polymorphic types)
-
rust
-
pony
-
mercury (based on charity, if memory serves)
and illumos
things like go, zig, and other langs that’re basically fast python/ruby/js/v will never be considered unless one is found with particular algebraic language properties or a particularly interesting runtime model.
picolisp: completely hackable (including modifiable during runtime,) uses multiple processes instead of multithreading (thus actor-based concurrency)* factor: concatenative, monoidal, optimized j: concatenative, parallel
*as joe armstrong said, "[system] threads are evil anyway because they share resources. you have nice things in operating systems which are actually isolated, so one process can’t fuck up another process' internal data structures, but threads are evil, 'cause what’s the difference between threads and processes? it’s that threads can fuck up each others' internal data structures, so they’re absolutely the things you don’t want to program with."
all three: simple, based on one data structure (list, stack, tensor), efficient (both cpu & mem; enabled due to language symmetry,) algebraic (particular patterns of lisp, monoids & stack updates, tensors,) data-based (both picolisp & factor see programs as data to which functions can be applied. i’m unsure how this is with j.)
erlang (to learn:) distributed (built on π-calculus,) fault-tolerant (b/c of agent independence,)
all of them altogether:
-
systems that update others or themselves incrementally such that each increment does not destabalize the system (i.e. the system can recover; yes, it may error, but it can recover)
-
systems that work together and grow together. yes, some may die, and others may spawn new ones
principles:
-
fault-tolerant
-
isolation (how are things related or unrelated; if unrelated, then one breaking causes the other to break. the surest way to maintain stability is to reduce dependence)
-
concurrency
-
implicitly parallel (like haskell’s evaluation of applicative do blocks)
-
-
-
-
0 downtime (updates during execution)
-
processes repair other processes that are to broken to repair themselves (i.e. processes stabalize destabalized processes. this is an alternative to killing and spawning a new, replacement process)
-
upon death, its occurence & reason are sent to a living node, which passes that info to wherever it should go
these principles should be applied to data storage, too.
"each module being a unit of service and a unit of failure. a failure does not propogate beyond the module."
joe armstrong’s talk, systems that run forever self-heal and scale demonstrates that sequential programming is inherently flawed and is therefore a bad practice (excepting small programs like cat that serve one simple function that’s merely evaluated once per invocation.) also all erlang processes being concurrent explains the adage, "let it crash." in such systems "crashing" refers to a cell rather than an animal.
of course erlang satisfies all these things, since it’s built specifically to model physical and organic systems.
-
relation has the same meaning as relation of data, since data just means stuff. data or datum is exactly equal to vacuous unqualified mathematical symbols.
-
smc means self-modifying code
disregard givens; design from scratch by first principles: defining [adj] constraints and their implications. each problem (i.e. thing that needs solving) is partitioned into two classes of constraints: the desire and universal constraints. we always seek the (optimized subset of the) the interesction of those constraints. a simple though abstract example is a solution set of linear equations. we may have one solution, none (i.e. the empty set,) or many (particularly in linear algebra, many always implies infinite.) a less abstract example: the universal constraints of physics are the laws of physics, and we desire to fly. our solution is then the intersection of mathematical expressions that describe flight and physics' universal constraints. this is obviously a complex example: its solution is not obvious, and many solutions exist, naturally partitioned into flight that’s either valid only in fluid or valid otherwise. be it not pretty or simple, it’s realistic. if we want to find the best solutions, then we must consider problems in their grand complexity, not artificially approximated in terms of cookie-cutter niceties—such mental tools as [for computer science] lists or other common data structures. all models more specific than predicate logic skew truth. such "prefab" solutions must be abolished. they may be easier to reason about for humans, but their inherit arbitration makes them more difficult to systematically reason about. this is particularly consequential when we consider that computers are ideal for solving problems systematically! both humans and computers can reason well by rules rather than easy piecewise composition of seemingly "neat" structures not described by predicates. algebra is a study of axioms' implications irrespective of the set over which the axioms hold. this means reasoning only about properties—not mentally tracing dataflow nor the state of a program, which is error-prone, annoying, and unnecessary. example algebraic design are programs described by stacks, arrays, or the lambda calculus. i say described and not describable to mean that the programmer reasons in terms of these structures rather than programs merely permitting expression by such structures. this begets elegance in the same way that an algorithm elegantly expressed in polar coordinates is nicer than one reasoned in descartian coordinates, despite polar/descartian equivalence. we want the user to know how to express programs by an algebra simple enough that the computer can heavily optimize the program; or express a desire in terms of an algebra that a computer can solve in the given context of universal constraints.
there are only two properties to make a program ideal: efficiency and elegance.
- structure
-
generally means form, i.e. arrangement (of data), i.e. particular relation (of data.) i parenthesize "of data" to emphacize that structure is independent of data, but ultimately is useless unless applied to data; structure is abstract over data, and like all abstractions, represents useful truth, but in practice must eventually be reified. pointfree functions are example structure abstracted over data.
-
consider lisp basis:
cons,car,cdr,quote,lambda,def(which binds to data (incl lambdas) or macros a la pil,)if,set,eq,atom(opposite ofpair?,)eval. in additon to pil’s lambda shape, car & cdr can be done exclusively via deconstruction:a b) (cons 1 2) (+ a b. this is the applicative form; the pointfree version is2.map[haskell, scheme] should be called2:and should be an overloaded form of fold (same function, different (default) params.)-
to avoid
apply(which should be done) all functions will take a list of arguments that will be parsed-out; much work will be in optimizing parsing fn args. -
problems: encourages recursion, requiring optimizations/translations to stateful version. using continuations (viz named let) should be easily translatable to assembly jump statements.
-
describes intermediate data. this should be replaced or optimized into pipelines that maximize allocated memory reuse.
-
how can i merge sexps perfection with photon basis e.g.
a == b ⇒ _? do i so need? no;condcovers this perfectly.
-
-
compare FORTRAN against j and picolisp
-
revise notes. reserve function for the mathematical concept, and use continuation (or some other possibly more-appropriate term) to refer to memory addresses that the instruction pointer can validly have, i.e. those that can be `goto’d.
-
ensure that i mention the importence of anonymous ADTs: for them to express a program elegantly they must be anonymous, just like functional programming without lambdas (i.e. with only named functions) would be horrible.
-
see https://en.wikipedia.org/wiki/Satisfiability_modulo_theories
-
reconsider type classes in terms of factor’s oop system
-
discuss randomized algorithms & probabilistic data structures
-
discuss ADT constructors/destructors [destructuring aka pattern matching] vs their functional equivalents: constructors & traversals
-
explore arrays as ad-hoc polymorphism e.g. a hierarchy of algebraic type classes can be expressed by a simple spec on arrays: the unit value is stored at position 0; + is stored at 1; × at 2; &c. as in this case, the number may have meaning rather than being arbitrary. the hierarchy is determined (calculable) simply by pointwise addition of arrays, checking which resultant cells are 0/nil. this is really using arrays as tuples that represent abstract structures, then using set-theoretic operations to relate those structures.
-
note in the appropriate place that using data structures add only readability to function composition—"let over lambda."
-
fully expand (to completeness) §programming mindset
-
merge discussions of languages with ./wares-and-langs.adoc
-
discuss beauty as a heuristic for elegance. to determine beauty, express code by audial or visual space, e.g. a beautiful FSM graph will appear beautiful. a visual description of syntax (a la sqlite) will appear beautiful if symmetrical and simple enough. or perhaps it may appear beautiful yet infinitely complex like fractals.
-
revise section on linearity into one that discusses units: 1 as the base case and also the seed for generation, e.g. naturals as (0,1,+), and integers with the addition of inverse, and rationals with addition of division.
-
n-dim structures are products of (n-1)-dim, for both continuous and discrete spaces; discuss this fact respective to arrays, lists, and continuous spaces, finally seeing them all as relations over universally-qualified variables whose meaning is found once a space is assumed, e.g. "∀x" meaning symmetry about x where x is either an integer or real depending on whether the statement is considered in discrete or analytic mathematics. e.g. 0-dim is a point. introduce one "∀", and now you’ve added a dimension: 0-dim := ∃p. p = . 1-dim is ∀[x : 0-dim]. x. 2-dim is ∀[x : 1-dim]. &c. this is _true dimensionality. pseudodimensionality is emulation of dimensionality by modulus, which allows reshaping, e.g. all arrays of shape [a][b]…[c] where a × b … × c is constant can be reshapen into other i.e. reshaping is symmetric about cardinality.
-
discuss array/list equivalence by matrix representation of tree, and compare to 3d and higher-dimensional structures.
-
-
rearrange this document: 1) overview; 2) common fallacies; 3) what programs must be (we’ve a lot to consider even when we’re considering only the most basic language!), and how lisp is the natural language for programs; 3.x) subsection on "programs" as evaluable relations, and that’s implications on how programming relates to general math, language, game theory, &c formal systems; 4) now that we’ve identified the basis for programs, consider structure of complex programs (this is where (0,1,+) (i.e. monoids) will be discussed, not just wrt programs, but in general, again e.g. constucting numbers); 5) why monoids are not enough (we want to be able to calculate programs rather than merely evaluate them.) this section will consider SMT solvers and hoare’s work, evaluating how appropriate each is. however, i must be careful to not consider these systems if they’re foolishly concerned with trivialities, such as excluded middle (continuum fallacy; truth can be a real value) or "whether constructivism is correct", and i must avoid formal systems' common nonsensical considerations such as russel’s paradox (improper definition.) such things are correctly not the concern of practical programmers! note that such nonsenses are always of logic and never of math: they discuss truth/validity rather than structure! as programmers, we care only about useful ideas/programs. we deal with numbers & relations. yes we use logic & math, but only insofar as it helps us program! any math that won’t eventually be implemented by relations of numbers isn’t relevant to us.
-
to introduce both algebra and programming/relation as primitives, first discuss unit & relation, then evaluation, then axioms as a particular variety of relations, and logic as a particular variety of evaluation. this simultaneously introduces the fundamental(s) of mathematics, and demonstrates programs as nothing more than mathematical expressions, where computers can evaluate them.
-
where should i discuss denotational semantics?
-
-
think about goto vs delimcont vs retroactively adding algebraic evaluation rules. this is similar or may include bottom-propogation. consider
(println (+ x (* y (if (= z 0) ⊥ z)))); if z = 0 then the println statement won’t execute b/c the whole expression will be ⊥. the propogation boundary must be defined. for example, if this println were in abeginblock, we’d need to allowbegin’s other statements to evaluate, unless they’re explicitly linked to the println. such propogation boundary determination may require the partitioning of all expressions as being 1) part of another expression; 2) in a `beginblock; 3) in alet*block. propogation would occur for (1) & (3) only, since these are the cases when things are dependent. in the case of aletblock, when all bound values are assumedly needed, it’d be sensible to haveoron expressions that can produce ⊥. (or,⊥) is an alternative toif. (this is simply nixy trees.) (cond,v) where v is any value which propogates, generalizes and is an alternative to maybe, either, etc insofar as short circuiting or addition (i.e. +0 or ×0) and is also an alternative to synchronous exception handling, and backtracking [parsing].-
related: raising exceptions as a control flow mechanism. this is a variety of hook or event-based control flow. perhaps this is what the π-calculus is about?
-
-
discuss [consequences of [technique]] using list instead of maybe, citing ~/programming/nicholaschandoke-me/articles/racket-macros.adoc:§keyword args:general correct solution as example if needed. the idea is simply that list (recursive product with base case 0 = '()) generalizes maybe (coproduct with 0 = Nothing.) because lists feature recursion, they’re superior to maybe. one might suggest that (maybe,cons) is just as good as list, but they’re probably isomorphic. anyway, list is terser and still familiar, and therefore remains preferable.
-
discuss algebra of lists and maybe e.g. #f × n = #f, #f + n = n. cite haskell notes on how maybe generalizes boolean rings, and see wikipedia on boolean lattice structure. discuss how this structure is homomorphic in maybe and list, and see that article and my haskell notes on maybe+list semirings. consider how the lattice generalizes the 2-element boolean algebra so commonly used in cs.
-
-
discuss recursion in terms of unit. algebraically it likely lacks 0, having only the recursion operationo over a set, which would make it a semigroup. is there any useful conception of it as a monoid or more-endowed structure?
-
unquoteshould be available anywhere. outside of a qq, it’seval. also, mentioning just b/c it’s related,spliceshould be useful outside of qq, instead ofapply -
suggest a syntax (both for natural language and computer langs) for "base case & recursive case," e.g. "int &rec list +" to mean an int wrapped in a list, or that wrapped in a list, …. "int &rec list *" would mean int, or int wrapped in a list, or that wrapped in a list….
-
perhaps (list … int) and (list …+ int)
-
-
as an example…of something probably mentioned elsewhere herein, use lists instead of maybe for optionality/short-circuiting, and show traditional lisps (i.e. those with
t&'()) the empty list as 0: + (append) 0 is identity, and × (cartesian product) [TODO: are these correct? product and coproduct should be dual. can i describe cart-prod as a categorical dual of append? likely not. anyway, monadic join is defined in terms of #f or cartesian product, both of which are practically multiplication by 0, i.e. #f or (). point is: we don’t need maybe. list generalizes maybe, and is therefore better. -
(point a00-45) say i’ve a loop
(let loop ([x 0] [y 0]) …), and for the first n iterations,xis used, but thereafter it isn’t. usually it’d stick around in memory. we can say(let loop ([args '(0 0)]) …)so that we can reduce the amount of memory used. however, without special optimization, we’ll lose time in cons & uncons. still, this is an interesting solution.
we can refactor
(define (ema p)
(let ([α (/ 2 (add1 p))])
(let next ([k 0] [x 0])
(λ (y) (if (< k p)
;; x accumulates a mean
(cons "NaN" (next (add1 k) (+ x (/ y p))))
;; x is the most recent ema value
(let ([x (+ (* α y) (* (- 1 α) x))])
(cons x (next k x))))))))into
(define (ema p)
(letrec ([α (/ 2 (add1 p))]
[next (λ (x y) (+ (* α y) (* (- 1 α) x)))]
[f (λ (x) (λ (y) (let ([x (next x y)]) (cons x (f x)))))])
(let loop ([k 0] [x 0])
(λ (y) (if (< k p)
(cons "NaN" (loop (add1 k) (+ x (/ y p))))
(let ([x (next x y)])
(cons x (f x))))))))both of which are effectively equal. an example invocation:
(void (for/fold ([p (ema 3)]) ([x '(0 0 0 1 0 2 5)])
(let ([P (p x)])
(printf "~a " (car P)) (cdr P))))prints NaN NaN NaN 1/2 1/4 9/8 49/16. note that we discard for/fold’s return value, which is a procedure.
-
we not only omit the eventually-redundant
kparameter, but also theifstatement that branches upon it. -
we need to use the
nextfunction in both the average-accumulating and ema-accumulating cases.
things to do when i’ve enough time:
-
look into agda and f*
-
learn lenses further
-
consider HKDTs
-
-
read about free theorems (walder 1987)
-
refactor util.rkt:795~ into lenses or arrays
abstractions with (possibly many) numerical degree(s) are ideal. e.g. recursion schemes, tensors, ADTs, functions. all of these structures represent both data structures and transforms.
-
algebraic (symmetrical except ad-hoc definition of algebra’s rules)
-
the more symmetry something has, the fewer data are needed to describe it, and the more uniform & predictable its behavior. therefore algebras should be compared by a measure of symmetry in order to identify the best algebra.
-
pointfree
-
duals recursion schemes & generation rules (generative functions and/or recursive ADTs)
-
all structures are mathematically just (recursively) nested relations. any operator that doesn’t lose information is a relation:
→,cons,[a b](array). all structures isomorphic to any structure can be used interchangably. therefore the question of which structure to use is purely dependent on the language/runtime’s special considerations of those structures. because all graphs can be expressed by linked lists, and graphs are the most general data structure, we know that arrays and functions can each express any data structure. -
function & data structure equivalence
-
-
-
tacit (e.g. group operation notation)
-
branchless
-
fixed-point arithmetic
-
types should be first class & algebraic (like in the lux proglang) i.e. you can write type families just like ordinary lambdas
-
support & use anonymous data types
-
-
typing should follow type theory convention & arithmetic (seeing types as expressions of numbers and algebraic operators)
-
use types to design programs (namely primitive combinators,) then use an untyped runtime
-
how can asymmetric physical devices (e.g. pumps, diodes, sawteeth) suggest digital analogues?
when designing or programming, have this mindset, think in these terms, ask these questions.
before development:
-
what predicates/structure (predicate = axiom, which are the only things that define abstract structures) define the problem & solution?
-
what, if any, transforms between them do we need to identify? how are they similar i.e. which defining properties or implied properties/behaviors do they exhibit?
-
-
which tools work well with these structures?
during development:
-
of a structure
-
describe it as an element of an algebra or a point in space e.g. binrec is point 2 in the space of recurrence relations.
-
describe it as both abstract structure and data structure. one defines properties/behaviors; the other implements it in terms of relations [of atoms]. e.g. the λ-calculus is an abstract structure of 3 unary operations (ɑ,β,η) (ɑ is parameterized by the value to rename to, much like ln is unary b/c the base e is implied) over the set of lambda expressions. i haven’t identified its structure more specific than merely an abstract structure—seems more lattice-like than field-like, but who knows? anyway, its data structure representation/implementation in fp is…λ’s. we get direct translation! cool. that’s the most efficient evaluation model. however, if we wanted to manipulate lambdas as data, then our programmatic representation would be lists or macros—whatever method considers lambdas as a relation of a list of formals and an output expressed in terms of the formals. the only reason to express lambdas as lists is because of the language design. fortunately lisps make lambdas exactly equivalent to lists, the only difference between the two being whether, at any point in a program, the list is evaluated as a lambda or not. lisp invested the term sexp to unify lists and lambdas.
-
of which sets/categories is it a member i.e. what’re its types? e.g. binrec is in the set/category of recursion schemes.
-
what’re its properties and to which sets do they belong? for example, binrec has some property of value 2. this property can be called degree, norm, rank, w/e.
-
TODO: revise this whose section into parts: 1) initial concept; 2) reasoning to that concept’s conclusion; 3) reasoning about programs in terms of that conclusion.
these…aren’t quite correct considerations; following these considerations to their conclusions, we find that sexps are the natural structure of programs. however, sexps aren’t always the best implementation. the only alternative is arrays, whose inherent difference is only SIMD support and memory allocation & traversal concerns. both of their formal semantics are identical. in fact, the λ-calculus can be entirely reduced: ɑ-translation isn’t needed when all η-reductions have been applied, and β-reduction is equivalent to running a program; therefore programs expressed only as the composition of pointfree functions obviates the λ-calculus; for such programs, the only λ calculus is β-reduction, which is moot since it’s the only thing that separates a program from data i.e. it describes programs as executable data and highlights how programs are ultimately merely binary strings that hardware translates into physical phenomena to achieve computation.
obviously programming directly in bitstrs isn’t practical. still, we should program in a practical algebra most similar to bitstrs i.e. the least-complex practical algebra. matrices may work. on the principle that everything of degree 2+ can be expressed by compositions of things degree 1 (i.e. every element is expressible in terms of a unit element,) i suggest matrices (linear transforms, i.e. transforms of degree 1) and matrix composition (which may or may not be matrix multiplication, which is function composition when we consider matrices as representations of linear functions.) nonlinear expressions (e.g. x2 - x) are expressible by matrices, e.g. in this case [[1 -1]] or the point (1, -1) in the polynomial vector space P(2). linearity is essential in all contexts; any set whose elements cannot be expressed linearly can itself be expressed linearly as is this quadratic polynomial example demonstrates. another example is binrec being composed of two linrecs. note that we must consider whether these linear combinations are linearly independent or not! the linearity is not of the calculation, but of the program itself. i conjecture that all computations are inherently linear i.e. can be expressed by compositions of linear structures, regardless of structure. we may search for counter-examples simply by trying to express any given algorithm/program/function only by linear structures, requiring as many structures as necessary. also, our consideration of our program is only up to the conditions under which the program halts. our consideration does not concern memory manipulation nor other runtime constraints since we can express in-place updates as a pure function that shadows an identifier e.g. we don’t consider whether a swap function returns a new tuple or updates in-place; we care only that both are single steps forward in an algorithm, considering that swap := (a,b) ↦ (b,a) is the same function whether used as let x = swap x in … or let x' = swap x in … or x ← swap x; …. such useless concerns cannot exist in pointfree systems. an example class of systems in which this concept is degenerate/singular is cat-stack-langs such as factor: all factor functions are simultaneously pure and impure, all updating the stack in-place. we may say that functions take from the stack and put new elements, and so they’re pure, but what’s the difference between updating memory in-place vs popping from the stack and replacing what used to be there with an updated value? the state monad also demonstrates that purity is an illusory concept. therefore fp is no more suited to provability than procedural designs; in fact, precisely, systems are provable iff they’re algebraically described.
we want a notation that makes the asymmetry (there exists) obviously distinct from the symmetry (for all.) we want self-similar bases—bootstrap-driven development.
if you can’t describe a structure by only numbers and the set to which it belongs, then you don’t have a good concept. lambdas cannot be, though certain classes of them can, e.g. recursive combinators.
TODO: complete this section
-
state is unsafe
-
safety should be a concern
-
that programming language is [should be] a countable noun
state is program data whose values aren’t defined at runtime. for polymorphic programs this can include the program structure.
the commonly-mentioned "problem with state" is really multiple continuations (as defined in racket reference §1) accessing a common state, where the state change is supposed to affect only a proper subset of those continuations, but instead affects a superset of that subset. solutions are:
-
scope with parameter passing (including threads with mailboxes; see racket’s synchronization model)
-
purity: only mathematical functions—no effectful subroutines. all binds are immutable function parameters. note that concatenative and applicative languages are not different models; they’re different styles. concatenative programming simply prefers higher-order combinators.
-
mutex
-
meticulous management of simple state (namely implemented by a stack and/or register machine)
-
π-calculus or other multi-agent models, producer/consumer, blocking queues, barriers, semaphores, &c
or combinations thereof, e.g. rust’s ownership model, which is basically a combination of semaphores and scope with parameter passing.
obviously stacks[concat]/lists[app] (factor or lisp) are more fp and registers (asm) more state. functions implicitly act on the stack or each function uses particular registers. asm is purely stateful (no functions,) though again a sequence of updates is equivalent to applying a sequence of unary functions over a loop, which is equivalent to a single composed function.
in discussions of pure fp, stateful updates are commonly said to be impossible. however, state is used rather obviously: rather than updating a variable in place, instead we call a closure which returns updated state. so funcall instead of set and programs are compositions of functions instead of updating a state then reading from that state. again, stack-based langs are the degenerate case of state: they’re simultaneously purely functional and purely stateful. they illustrate that (g . f) a where f : a → b, g : b → c (pure fp) is equivalent to a f g (stack) or x = a; f a; g a (purely stateful; f & g do not return anything; they merely update x’s state.) the ability to shadow variables also demonstrates how state is practically used: (let ([x 4]) (let ([x 5]) (print x))) is no better than x = 4 x = 5 print x.
so…what’s the problem with state, again? the problem is shared, emergent state: when the execution of a function affects execution of other functions without our anticipating it. we want to be able to track state, to track emergent behaviors, to know that our system isn’t doing anything that we hadn’t anticipated. while we’re on the subject, type checkers are supposed to guarantee that our program cannot behave outside some specified logical-behavioral constraints. indeed constraining behaviors, and enforcing logical/mathematical structure is good. however, choosing a good canonical basis for programs is a superior alternative to adjoining type systems (or any other systems) onto the actual program—the code that actually produces some effect or result from an initial condition.
whatever you do, but stupidly simple. use registers or a stack, state or function composition. multi-paradigm langs are bad unless the paradigms "work well together" (some reflection of models satisfying common patterns and/or having simple transforms between/among them.) like don’t write a pure function in a non-pure lang, 'cause then people will wonder about its purity. don’t make people wonder; don’t give authors options, because then readers will need to check through all options to see which were used. and don’t rely on the author documenting, since it’s unnecessary and often authors don’t document. even if they do, due to the complexity of natural language, they may explain it in a way that only some people can understand. simple means predictable, meaning that devs don’t need as much awareness when coding; coding is then more automatic, easier to learnless likely to have mistakes, and easier to write parsers/tools for.
see my just-use-lists.adoc article for use of set & reset as a stateful alternative to let. another example is if sum a b; inc [asm] isn’t really different from (add1 (+ a b)) [lisp]. you can make macro or preprocessor that maps procedures to registers that they modify, and thus effectively compose functions. hell, you can just use the stack in assembly, much (or "just?") like you would in factor.
state enables communication among threads (locking mechanism required.) this is fine design. it may require defining a function with a local bind whose access & mutation is regulated by the function, which may use a sync mechanism in addition to more application-specific logic. for systems that support it, a functional alternative is mailboxes or other message passing mechanisms that erlang surely has. state also makes control flow easier to express when multiple branches of a complex dataflow update a common state; this saves us the trouble of refactoring the lambdas within the control flow (which are already altogether complex) to accept or return extra parameters. for an example of this, see just-use-lists.adoc.
ironically the very things that people say are bad about state are the very things that make it good, if done properly! just use locking mechanisms and, for state accessed & updated in various branches in a single thread, just scope the state properly, e.g. using let!
using the λ-calculus, we don’t even need let. without set the λ-calc is implicitly pure. it’s simple but requires λ composition; everything is scoped. commonly it’s lexical scoping, but other scoping rules work, too. (see wikipedia’s consideration of ɑ-renaming.) however, this problem is solved by all languages: they allow top-level definitions (i.e. those outside any lambdas.)
consider:
f :: Int -> String -> String -> IO ()
g :: Int -> String -> String -> Int -> String -> String -> (Int, String, String)quick aside: fun fact! curry in haskell is like apply in lisp! consider how currying is implicit in haskell, as eval is implicit in lisp, and haskell tuples are lisp lists/pairs. curry is binary and apply n-ary, though.
;; typed racket
(: f (-> (-> Int String String) Void))
(: g (-> Int String String Int String String (List Int String String)))
(f 0 "tom" "cook") or (apply f '(0 "tom" "cook"))aside over. now then, this is obviously horrible, especially to refactor. clearly the triple describes a person and should be grouped:
data Person = CommonInputs { n :: Int, name :: String, job :: String }
f :: Person -> IO ()
g :: Person -> Person -> Personwhich is convenient, but requires us to define a type [haskell] or struct [racket]. not bad. in untyped racket’s case the only refactoring we’d need is to change the struct’s definition. all continuations that use the struct remain as they were. however, what if we want more types? we’d need to add upon the struct or add another struct as a function parameter. such concerns are resolved by using a/lists instead of structs. in this case functions would take the a/list as their one parameter. (this is similar to taking a single row-type parameter in purescript or a map in clojure or an object in js.)
things must be in scope to be referenced! the only way to be more convenient than passing parameters is to use globally-scoped binds. immutable globals (as for all immutable binds) is universally accepted as fine. global mutable binds are usually considered the most dangerous thing ever, but again, depending on the program’s needs, adding a locking/sync mechanism or restricting its scope makes it fine, e.g. global static binds, or in scheme, using define-values to define multiple functions that use a common state that’s been bound within the define-values form, acting as a sort of closure. it’s not really a closure because because define-values is a special form separate from a lambda, but whatever. another solution is using parameters in racket (fluid-let) in other lisps.
ultimately lexical scoping solves any unsafe non-concurrent mutability problem(s). TODO: after understanding picolisp’s dynamic binding & lexical scoping model, add to this section. the fact that "things must be in scope to be referenced" is always true, but most programmers are so familiar with lexical scoping that suggesting the ability to define a form in terms of variables that may be declared and bound at runtime is as unintuitive as a logic without modus ponens. this is the only thing more dangerous/flexible than mutable globals in static scoping. its advantage is complete orthogonality & flexibility: we don’t need to include any modules in order to reference variables undeclared. this is dangerous only if we don’t account for the possibility of a variable being unbound, assuming that we can check whether unbound variables are a falsy value; if even trying to reference unbound variables insofar as checking their boundedness (using if or some other, more particular special form e.g. boundp in elisp) raises an error, then dynamic binding is quite dangerous.
a language is a tuple of (semantics,syntax). there’s no particular reason to couple these things. useful uncoupling is the variety of syntaxes that translate to instruction sets such as jvm, clr, asm. clojure and java both go to jvm. c and abcl to asm. in fact, abcl is an example of how one syntax (common lisp) may compile to multiple semantic/runtime systems, e.g. CL→asm, CL→C.
while we’re at it, lisps (most notably picolisp) do not have syntax beyond a literal encoding of data relations, which is the bare minimum needed to express a program, or any structure for that matter! so why have syntax? it clearly isn’t needed or even commonly useful. but let’s assume that sometimes it may be nice; then we use macros (or, again most notably in the case of picolisp, eval.)
so now we’ve reduced language out of existence (or beyond a degenerate form) to the truth of:
-
some bare-necessity syntax (viz one that describes relations)
-
optional "on-demand" syntax for convenience i.e. the "syntax" is merely a small & particular system of relation. usually the relation is a function or constraint on composition of functions of particular forms. it’s important to note that functions aren’t special; they’re just relations. instead, evaluation is special; it enables reasoning about ideas / enacting instructions. notice i didn’t say
eval!evalis only one variety of evaluation. another example variety is type checking, which, rather than executing instructions, parses relations of logical elements, checking for logical consistency or net implication. -
semantics/runtimes
compilers are maps from syntax to semantics: they parse strings obeying syntactical constraints into executable systems obeying semantic constraints. the particularity of syntax is enforced by the compiler and assumed by the programmer. compilers are thus another version of evaluation. compilers evaluation is (usually reductional) translation, whereas interpreters' evaluation is execution. there are an unknown and ever-changing number of various evaluators, and many of them can work together, which means that the number of evaluation systems is the sum of simple evaluation systems and compound ones—a number bound to remaining unknown since it’s so large and difficult to calculate.
for example, chicken scheme compiles to C, which can then be compiled to bytecode or machine code by e.g. llvm or cc. in this case we’re bound to the union of scheme and C semantics: we cannot use chicken to express an operation that’d be impossible in C, nor can it express an operation impossible in scheme,…though in this case, C’s semantics are probably a superset of scheme’s, so we’re really compiling to C doesn’t further restrict our program’s semantics to scheme’s.)
finally, let’s discuss semantics. that’s a plural word. why should we group certain semantic properties? an example is strictness. some languages are "lazy-eval" whereas others are "strict-eval"…by default. anyone can delay evaluation by using coroutines, generators, quoting, wrapping in a lambda, &c. what about typing? that’s a common semantic property. well, frequently it too is optional! python, racket, & js (to name a few) introduced typing retroactively. furthermore, what exactly is typing? it’s logical consideration of structure. but we don’t need typing in order to have that; contracts are an example alternative mechanism. what about typing being more than mere checks, such as type classes for ad-hoc polymorphism? python and racket (and probably js) already have that. frankly, all typing boils-down to if statements or some equivalent system and—you guessed it!—particular relations.
at this point, i’ll just say it: every bit of syntax or semantics is just constraints on relations. code is just data under some interpretation. if data has order [structure] (isn’t chaotic nonsense) then it has information, and we can extract that information. "code" or "language" is nothing more than how we choose to parse/represent structure. the natural language is then one that only encodes relations, and whose syntax uses the minimal number of characters needed to unambiguously represent relations. a natural consequence of such a primitive language is that more complex and convenient notations (e.g. '(1 2 3) to represent '(1 . (2 . (3 . ())))) can be written as mere relations. as it turns-out, lisp is this language; it is nothing more than homoiconic relations an arbitrary subset of which can be evaluated.
carbon is such an interesting element that an entire branch of chemistry is devoted to carbon-containing compounds. that subfield even uses unique notations. of all the elements, carbon is the only one that has such devotion. compounds are merely relations of elements. however, these relations must obey certain rules; some combinations of elements combine to produce compounds and some do not. this is the chemistry algebra: the set of elements, and the set of axiomatic operatiors that enable the production of compounds. we as programmers should seek the mathematical analogues of carbon: versatile elements of the algebra’s set that enable us a great variety of compounds via their great support of various composition rules. that is what i mean by algebra-based programming.
the terms category, algebra, and space are similar and frequently discussions are not so specific that either term is more appropriate than another. an appropriate term that generalizes all of them is [abstract] structure. however, i prefer the term algebra because:
-
it’s simultaneously unambiguous & terse; structure could refer to data structure, abstract structure, or some general notion of form. the disambiguation abstract structure is lengthy.
-
it connotes algebraic [axiomatic] structure & operations. elementary algebra is familiar: permits systematic, axiomatic, deterministic solving algorithms based on iterated inverse operations and re-expressions for certain classes of structures, and naturally suggests other methods, such as galois theory or calculus, to solve problems for which the algorithm fails to produce solutions.
the particular distinctions:
-
categories consider particular elements
-
algebras consider axiomatic operations irrespective of their sets[' elements]
-
space is vague and certain spaces may be algebraic structures or not. regardless, spaces are abstract structures that formally generalize notions of geometric space—namely that all spaces have points; all elements of all spaces can be appropriately described by ordered tuples, and a point’s numeric value always immediately relates it to other points i.e. all points have neighborhoods.
-
many abstract structures that feature "space" in their term are not truly spaces, e.g. the suggestion that probability spaces generalize euclidean spaces is at least misleading and at best probably not helpful
-
spaces are the most common, so by "algebra" i mean "abstract structure satisfying properties (ir)respective of its associated collection where that collection may be defined ad-hoc and/or procedurally, whose elements' relations are described either ad-hoc or by an equivalence relation."
usually we’re progarmming for a purpose beyond play or exploration of structure for its own sake. so disregard programming languages! we don’t want expressivity; we want to encode sensible ideas with least effort! we aren’t trying to express ideas; we’re trying to identify solutions to problems! we want a program that satisfies some needs, and we don’t care how it’s found or what its form is so long as it’s easy and works. expressivity just gets in the way; it offers more options than are necessary. expressivity burdens us with the confounding responsibility of identifying expressions that work elegantly together—no mere feat!
| ad-hoc polymorphism | parametric polymorphism |
|---|---|
asymmetry |
symmetry |
∃ |
∀ |
no closure |
closure |
enumeration |
generation rule |
finite/bounded |
infinite/unbounded |
closed exprs |
non-closed exprs |
-
axioms or basic rules (e.g. the peano numeral data type definition) should be the only ad-hoc statements.
-
ad-hoc polymorphism appropriately encodes constraints, whereas parametric polymorphism expresses variable freedom.
the combination of ad-hoc & parametric polymorphism enables specification of any mathematical rule with exact specificty. this is true of types as well as functions. for example, bounded recursion is a coupling of an asymmetrical rule (base case) and a symmetrical one (recursive case.) commonly ad-hoc rules limit the extension of generation rules. beware types that seem to be symmetrical, e.g.
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]there’s asymmetry in the case on Maybe’s constructors. "but, wait!" you say? "`Maybe is simply a monoid!" if that’s the case, then we should be able to express it in terms of monoidal functors:
unfold :: Applicative f => (b -> f (a,b)) -> b -> [m]then reify that to Maybe and run it. it’s impossible to define unfold without case on `Maybe’s constructors. asymmetry is required for halting.
programs should be calculated, preferably by a computer, but at least by hand. efficient calculation requires symmetry: for everything to obey the same axioms without (ad-hoc) exception. ad-hoc (asymmetry) means arbitrary association, and must be accounted for by cond—a generalization of case or if. (see §branchless for examples.) aside from the syntactic cruft of branching blocks, ad-hoc means more rules—more cases for us to account for, which means more varieties of behaviors to reason about, which means more thinking, more code, and greater chance that we’ll fail to account for edge cases. precisely the trouble with asymmetry is lack of closure. operations that aren’t closed over a set are much more of a hassle to reason about.
every program’s only complexity should be exactly its asymmetry, and a language/notation should make very obvious which parts of it are symmetrical or not.
consider the following definiton of exponential moving average:
(define (ema p ys)
(let-values ([(α) (/ 2 (add1 p))]
[(init rst) (split-at ys p)])
;; ema is a function not over the past p values, but over the past 1 value.
(let loop
;; ema uses its period only for its first value (which is the mean of same period);
;; all successive ema values are functions of only their predecessor.
([v (mean init)] [rst rst])
(if (null? rst)
null
(let ([new-v (+ (* α (car rst)) (* (- 1 α) v))])
(cons new-v (loop new-v (cdr rst))))))))it takes an n-list and returns an (n-p)-list. suppose we’d like to refactor it into a cons-sequence (a form of non-strict list:)
(define (ema p)
(let ([α (/ 2 (add1 p))])
(let next ([k 0] [x 0])
(if (< k p)
;; x accumulates a mean
(cons "NaN" (λ (y) (next (add1 k) (+ x (/ y p)))))
;; x is the most recent ema value
(let ([x (+ (* α y) (* (- 1 α) x))])
(cons x (next k x)))))))this program is incomplete: it references unbound identifier y. in fact, refactoring `if’s "false" clause into a form such that
-
it has the same shape as the "true" clause, namely
(cons value unary-procedure) -
it returns the updated value of x in car
is impossible!
the problem is when k == p: x is the mean of the first p values, but for k > p x is the most recent ema value. the variation of behavior (program structure/shape) being dependent on or partitioned by sgn(p - k) is asymmetry; rather than being symmetric about (p - k), or even asymmetric about (p - k) on (> 0), but instead (p - k) on sgn(p - k).
|
Note
|
asymmetry is here described by the syntax "asymmetric about value on equivalence relation" where the number of partitions resulting from the equivalence relation is called the degree of asymmetry. if "symmetric about" is used, then the "on" clause must be omitted; this case means that there does not exist any equivalance relation of program structure/shape; exactly one shape describes the program (in reduced/simplified form.) |
therefore we must use cond (or case) instead of merely if, since cond supports an arbitrary number of degrees of asymmetry, whereas if supports only 2. any attempts to represent the desired program:
(define (ema p)
(let ([α (/ 2 (add1 p))])
(let next ([k 0] [x 0])
(cond [(= k (sub1 p)) ;; the last NaN value
(cons "NaN"
(λ (y) (let ([P (next (add1 k) (+ x (/ y p)))]) ;; next returns a pair
;; return P with its first element modified
(cons (+ (* α y) (* (- 1 α) (car P))) (cdr P)))))]
[(< k p) ;; x accumulates a mean
(cons "NaN" (λ (y) (next (add1 k) (+ x (/ y p)))))]
[else ;; x is the current ema value
(cons x (λ (y) (next k (+ (* α y) (* (- 1 α) x)))))]))))
(car (for/fold ([p (ema 3)]) ([x '(0 0 0 1 0 2 5)]) (printf "~a " (car p)) ((cdr p) x)))prints NaN NaN NaN 0 1/2 1/4 9/8 49/16. note that the final value is the one returned by the loop, whereas the others were displayed by the display statement inside for/fold.
however, if we refactor the whole function instead of just `if’s "flase" clause, then we can transform the program into a symmetric form. the general technique to do this is to identify the shape of the most complex expression, then express the other forms by that same shape.
(define (ema p)
(let ([α (/ 2 (add1 p))])
(let next ([k 0] [x 0])
(λ (y)
(if (< k p)
;; x accumulates a mean
(cons "NaN" (next (add1 k) (+ x (/ y p))))
;; x is the most recent ema value
(let ([x (+ (* α y) (* (- 1 α) x))])
(cons x (next k x))))))))
(void (for/fold ([p (ema1 3)]) ([x '(0 0 0 1 0 2 5)])
(let ([P (p x)])
(printf "~a " (car P))
(cdr P))))which we can refactor into
(letrec ([α (/ 2 (add1 p))]
[next (λ (x y) (+ (* α y) (* (- 1 α) x)))]
;; produce ema values
[f (λ (x) (λ (y) (let ([x (next x y)]) (cons x (f x)))))])
;; accumulate mean into x for first p elements
(let loop ([k 0] [x 0]) (λ (y) (if (< k p)
(cons "NaN" (loop (add1 k) (+ x (/ y p))))
(let ([x (next x y)]) (cons x (f x)))))))you may wonder why we need to transform this program but not the original one which returned a list. the answer is that the original’s asymmetry was encoded in the split-at statement, which increments the degree of asymmetry, but we also discard the list of NaN’s, which decrements the degree.
TODO: cons-sequences vs scheme streams, and consider backtracking (see the queens problem)
a good algebra has few operations and few ways to express any program. this means that if a programmer doesn’t know how to code a program, or they’re unfocused, then glancing the algebra itself, or a list of common patterns/idioms, suggests to the programmer a definition, so that programming happens more automatically.
this section suggests identifying consequences of unusual interpretations.
relation equivalnce allows us to consider interesting things: a → b → c can be described by [a b c]. how can we use matrices to compute or reason about types? if we describe types by numbers, e.g. arity, or a sequence of arities (e.g. (a → b) → b → b has arity sequence [1 0 0]) &c, can we calculate particularly useful function types, then from that derive a suggested function definition? if that sounds like a one-in-a-million shot, remember that functions are just lambdas, and their only supported operation is application/composition. if we consider that all n-ary functions can be described by applications of unary functions (demonstrated by currying) then all functions are trees of unary functions, which may be expressed as tuples or any other binary relation. it’s easy to identify calculi about such simple structures.
we can express a graph by an adjacency matrix. we can take eigenvalues of matrices (generally vectors.) what do the eigens of an adjacency matrix represent or tell us? under which varieties of graphs is this operation sensible? is the cardinality of the operation on adjacency matrices any less useful than it is on matrices for which the operation is already assumed useful? if not, why? if not, this must mean that, despite matrices and graphs being isomorphic, there’s a difference in either axioms or amount of information between adjacency matrices and matrices for which eigens are useful! if eigns are found to have meaning, perhaps that can give a better implementation of some common structure/method that we’ve been using.
not only is branching slow, but we must write extra code to account for it. it’s also slower for any computer to calculate because it can only predict (often badly, and always entailing extra computation) what the upcoming code is. if there’s no branching, then it obviously loads whatever code is next; however, if it branches, then it must load some code from wherever the branch tells it to go (determined during runtime,) which generally is impossible to know in advance.
//branching
//faster b/c compiler optimized into 3 instructions
int min (int a, int b) { if (a < b) return a; else return b; }
//branchless. relies on comparison statements returning 0 or 1 instead of true | false
//slower b/c the assembly outputted by the compiler was poor
int min(int a, int b) { return a * (a < b) + b * (b <= a); }
//branching
void upper(char* d, int n) {
for (int i = 0; i < count; i++)
if (d[i] >= 'a' && d[i] <= 'z')
d[i] -= 32;
}
//branchless. 7x faster.
void upper(char* d, int n) {
for (int i = 0; i < count; i++)
d[i] -= 32 * (d[i] >= 'a' && d[i] <= 'z');
}general branchless is a × conda + … + z × condz, which can be obviously expressed by a matrix.
|
Note
|
be careful that whatever method you choose is efficient as output by your compiler! |
-
branchless handcoded assembly is always faster than branching handcoded assembly
-
array-based programming on some architectures (e.g. intel) avoids branching but still loops, by using
esi&edispecial looping-designated registers. instructions likecmp …; cmovg …are branchless but still obviously use conditionals.-
SIMD/AVX branchless is the fastest variety of cpu (cf gpu) programs. many apl programs should be easily translatable to SIMD branchless.
-
self-modifying code can replace branching code by branchless code once a condition has been met. such conditions may be specified in code, e.g. using a compare statement, or they can arise naturally as the code self-modifies, e.g. if statements from a set of loop instructions are removed upon each loop, eventually leading the loop to collapse into a nop, or otherwise tend toward a state that, upon reaching, guarantees that the loop shall never be evaluated again during the program’s runtime.
the kakoune philosophy is basically the unix one (specialized and composable via RPC/IPC/piping/sockets) plus speed, simplicity, orthogonality (ideally only one implementation of each function and functions' functionalities don’t overlap,) language-agnostic (but not form-agnostic!), and *the notation is the language (as is the case in APL.)
in mathematics relation is a coupling of 2+ things whereas similarity is a set of common properties held by 2+ things. in mathematics objects like sets can be unordered collections. in programming this is akin to a block of memory being allocated then elements written to it in nondeterministic order. non-ordering implies that any operations on particular elements are less than ideal efficiency. these elements are truly unordered; the access and state of each element is entirely unrelated to any others. non-ordering is enabled by random-access memory. this being said, random access still supports sequential access; we can order array elements.
if items are to be ordered, then they can either be ordered by contextual or intrinsic property. in the former case, we must use general structures (e.g. list) whereas in the latter case, we can use efficient structures that exploit inherit properties (e.g. heap.)
| access | order type | example structure | required relations |
|---|---|---|---|
random |
n/a |
array |
0 |
sequential |
arbitrary, contextual |
array, list |
n, unpredicated (not constrained to predicate(s)) |
determined |
intrinsic, total ordering |
heap |
1, predicated (equivalence relation properties) |
-
in the case of sequential ordering, n relations are needed whether using a linked list or an array: in both cases each relation is a pointer. though there are pointers in an unordered array, they’re irrelevant because they don’t signify anything because there’s no order to signify.
-
because sequential access is determined by unpredicated relations, both cons cells and unary functions exactly represent (binary) relation and are therefore isomorphic. (linked) graphs and recursion are consequent structures. in untyped lisps
consis the only primitive except value (number, char) and maybe string. every cons pair is ad-hoc; a universal analogue is a rule for consing, such as an anamorphism.
|
Note
|
the default assumption for relations is that they’ren’t symmetric/commutative. their order is seen as a lack of property rather than having the commutative property. |
TODO: merge with §data/function equivalence
self-relation is noteworthy because it’s the simplest class of cycles: loops. all unpredicated relations are equal; therefore (a,b) = (a → b) etc. self-relation on the unpredicated binary relation produces chains of arbitrary order which is…common in programming? like really? is it more common than unordered or totally ordered structures? maybe it’s important only because it describes linked lists, which are then used like unordered arrays but with more flexible (de)allocation? TODO
self relation, if it were to have a degree, would obviously have a degree of 1. as with many things of degree 1, its significance (as its own concept) is its simplicity. self-relation commonly isn’t anything special compared to general relation. for example, recursion (a loop) isn’t worth distinguishing from corecursion (a cycle.) however, it commonly does make a difference, too. for example, in type theory, it’s obviously best to use fully-reduced/simplified types, i.e. a recursive type of one recursive variable rather than many, when possible. self-relation…can describe structures of arbitrary size, composed of units…. TODO TODO: questions about self-relation usually specifically question equivalence relations: reflexivity, symmetry, transitivity. but this is true specifically of predicated structures.
self-relation goes by many context-dependent names:
| context | name |
|---|---|
graph theory |
cycle or loop |
definition |
recursion |
geometry |
fractal (maybe always recursive) |
aristotelian logic |
posture |
general grammar |
reflexivity |
program runtimes |
reflection |
systems analysis |
feedback |
and there are many more, such as involution (a function that’s its own inverse.) recursion is often treated as a special topic, but the more general self-relation or cycles should be considered instead. consequently the question of "recursive vs iterative" is foolish; they’ren’t particularly related and certainly don’t form a dichotomy. they’re both essentially cycles that may halt if a branch is taken.
TODO
recursion is the method of traversal: the very fact that the primitive relation (cons) is binary implies that:
-
all arbitrary data structures are expressible by cons
-
trees (graphs with exactly one path to any node) can be traversed entirely by simply recursively unconsing
-
this assumes that cons is applied symmetrically: (… . (c . (b . a))…). if a different rule were used—e.g. (d . b . a) . c
-
traversal by uncons means that all data structures expressed entirely by cons can be traversed entirely by simple applications of
map(though for graphs with cycles that would map over some elements multiple times)-
this applies to structures that aren’t really connected, but are expressed as such for convenience. for example, a zipper’s two lists may be considered sepapate, but where zipper is a type alias for (cons L1 L2), a single nested map would iterate over both. in other words, in practice in languages like picolisp where cons is the only relation (no arrays,) all structures are graphs all of whose nodes share at least one edge with another node, even if that’s not true of the abstract graph that they represent. in this case the structure traversal would need to know which edges are "true" edges vs merely implementation-necessary edges.
-
-
thus we see that there are rules for generating data structures, and inverse rules for traversing them—unfolds & folds. therefore types that describe rules for generating structures also describe traversals. recursion is simply closure over application/parameterization. where do we draw the line between recursion and repitition or loops? they’re all about self-similarity/factorability. it’s about recurrence.
TODO: how did i get to talking about types here?
type systems are algebraic. they’re abstract structures, not data structures. they’re collections of abstract rules, not collections of data.
recursive types' base case examples:
List a := Nil | Cons a (List a) -- base case & recursion are both constructors
StateT s m a := Monad m => (s, m a) -- StateT is a monad transformer: a monad parameterized by another monad.
-- it's therefore recursive. the base case is m being not a transformer,
-- i.e. a monad not parameterized by another monad.ad-hoc specs are effort; symmetrical is implied. ad-hoc means that you’re specifying something specific in order to accomodate a specific problem.
the key is not amount of abstraction, but appropriateness: the basis functions should work together well. e.g. semirings are awesome.
abstract structures are better than data structures, too: we can easily compare abstract structures' definitions, whereas data structures are usually defined (and designed) separately, in the manner of, "we’re doing such-and-such with some blocks of memory; what’s a structure that accomodates exactly that and no more, so that it’s efficient?" this kind of thinking gets us a large selection of incompatible structures whose definitions are hidden in implementation details. such thinking is entirely devoid of symmetry and mathematical property. by contrast, we can easily relate abstract structures: a monoid and monad are obviously similar, as are semigroups and monoids.
we want:
-
freedom from redundancy (e.g. using fold when map would do)
-
defaults (parameters) are a good way to do this. however, whereas commonly "parameters" regards functions, we should consider structural parameters, e.g. when a category is unspecified,
Identityis assumed, or in the case of numeric division, 1 is assumed as the numerator if only one arg is specified. in the case of map & fold, we need default parts of the function itself! thta’s true abstraction & reification!
-
-
using appropriate basis (e.g. using polar for a cylinder rather than cartesian)
it’s been noted that programmers commonly spend significant time fixing things that they’ve made, which is interpreted as unnecessary difficulty, as opposed to dealing with natural difficulty. while true, there’re universal & subtle varieties of this:
-
no matter what code we write, whatever designs we use, we must make other code or designs work with them.
-
whatever designs we use, we’re constrained to them. commonly we must change their internals, augment them, and sometimes split them into parts. efficient autoadaptation or re-solving are ideal. both require a goal [predicate], and respectively require 1) each component being context-sensitive or 2) a (not necessarily deterministic) method for solving the system.
thus we want a small variety of structures that we know fit together easily and still express all programs elegantly (clearly, tersely.)
any separation of data is partitioning, whether it be a vs b in (a, b) or split(f)-at or partition (which should be called split-on aka group)
remainder is a partitioning scheme and modulo arithmetic forms a ring. remainders are as common in programming as are partitions / splits / groups / equivalence relations. it’s the most fundamental and so most pervasive pattern in programming. for example, we see it in many recursive loops: the structure that we’re iterating over is partitioned, and some of that data is consumed into an output, leaving the remainder for the next iteration. this is basic, not insightful. the interesting question is whether recursion also forms a ring.
-
contrast with data structures
-
the subject of universal and/or abstract algebra and/or category theory
-
an algebra is axiomatic operations on any set
-
in universal algebra the set is never particular; it’s always free. i.e. it’s discussion about the operations alone, where the operations all operate over any arbitrary set
-
you may think of universal algebra as point-free algebra
-
-
in abstract algebra each algebraic structure (e.g. groups) may consider special sets (e.g. vector space over a field)
-
categories' (classes of) objects are ad-hoc/particular/bound rather than parametric/free. the arrows particularly consider classes of objects.
-
-
abstract structures are interpretations. abstract structures interpret data structures, but also identify classes of data structures.
-
functors
-
applicatives (strong lax monoidal functors)
-
monads
-
-
-
semigroups
-
monoids
-
-
ring
-
modular arithmetic
-
boolean
-
semirings
-
-
groups
-
vector spaces
-
optics
-
recursion schemes
-
hughes' arrows
-
algebraic effects
-
barbie/HKDTs (good idea, but flawed in generality & reification under type systems that i know of (scala or haskell.) not even available in typed racket.)
TODO: this section needs research
Q: what does cons (the primitve relation operator) necessitate about traversals?
trying to implement a doubly-linked list as e.g.
'((0 . 1) . (1 . 2) . (2 . 3)) : (Listof (Pairof a a))does not work insofar as given only any element of the structure, we cannot navigate either to the left nor right of the structure. given any element, we want to be able to navigate to the rest of the structure; that implies that the rest of the structure must be present inside the current element. well, the "rest of the structure" means "a traversable cons chain" because every structure is a cons chain. well, lists are a common cons chain. let’s try replacing elements by lists: (Listof (List a) (List a)). well, that’s not right. its rank seems off. (Pairof (List a) (List a)) (a zipper) works. indeed, it’s two lists, which corresponds to the two directions that we want to navigate. furthermore, it’s symmetrical, whereas list is a non-symmetrical type: its recursive case is on its RHS. such (a)symmetry and rank determine which recursion schemes to use to traverse structures.
-
associativity
-
commutativity
-
distributivity
-
closure/recursion
-
inversion (nb. usually called inverse element, but the element' inversion property is always relative to an operation. no element is inherently an inversion of any other element.)
-
involution
-
-
identity
-
idempotency/fixedness
-
information change [amount] (e.g: integration adds a constant: injections in some sense lose information but bijections don’t: forgetful functors. isomorphisms)
-
short-circuiting (achieved by multiplication by the additive identity)
-
order
-
analogue (e.g. homomorphisms)
-
uniqueness
-
basis [vector spaces] / generating set [groups]
unless you’ve special need (which i can’t imagine, but assume may be possible,) use fixed-point arithmetic (including ratios) instead of floating-point. they’re faster and exact rather than approximate.
verdict: type theory is good. typed languages are either so poorly typed as to be not considerably better than untyped langs (e.g. java, c++); or are well typed but have flawed type checkers (haskell, typed racket) that ultimately makes them good for most programs, but literally completely obtrusive for certain programs, and merely cumbersome for some programs. therefore it’s best to use types and type calculators to design programs, but code the programs in untyped (or latently/dynamically typed) languages.
typing can be its own programming language if done properly. types describe data/functions, which are equivalent, i.e. there’s a bijection between types and functions. this is a reflection of the curry-howard-lambek correspondence.
types add (and enforce) structure.
-
polymorphism
-
provability: encoding or guaranteeing specific properties as types rather than verifying by predicates at runtime
-
especially useful for preventing unobvious invalid values. if a program crashes due to invalid data, then it’s obvious where/why. however, handlable invalid data is unobvious, e.g.
(* precondition: x ≠ 0 *) (de [x] (send-to-remote-api "DoThing" (add1 x))). usually dependent typing (e.g. ada’s) is needed to avoid this class of errors. -
lessens probability that a program will crash
-
-
defines/expresses grammars well. yes, types, to the extent that they’re specific (e.g. dependent typing is more specific than non-dependent) types can implement parsers.
-
especially useful when many similar but significantly/importantly distinct data & morphisms are used, e.g. git would benefit from types to easily know which similar operations work on branches vs commits vs files.
-
-
identifies improper/incomplete refactoring. e.g. if i change a type’s shape but fail to account for that change in functions of that type, then the checker immediately tells me. this is especially useful for polymorphic types.
-
we can use types to identify what we know; this is a metric of how well each part of the program is understood.
-
types are only useful when you’re working with distinct types that are valid only in particular relations. for example, types are useless for arithmetic, since only one type (complex numbers) is used.
-
types are only (quite a bit of) trouble when we’re having trouble identifying structure. typing directly oppose flexibility.
-
types are uneful exactly insofar as they’re specific; (unqualified) general types (the most extreme being
Any) are not helpful. a qualified general type likeC a ⇒ a → T a → ais useful and most powerful. -
jack-in repls or eDSLs are good cases of whether types usefully add structure or limit expression
-
in languages without good type inference (e.g. typed racket:)
-
typing syntax adds cruft, which competes with brevity
-
passing polymorphic functions to higher-order functions (and
→) is a hassle
-
|
Note
|
typed racket is faster than untyped, since types are used instead of contracts! therefore it’s better (though possibly with less helpful error messages) to use other lisps for untyped code. |
algebra-based progamming does not need typing because all of the operations and valid compositions thereof satisfy laws. therefore the question is no longer typechecking, but rather whether the described program is valid. invalid programs will be caught at compile time. an invalid program can only be one containing any invalid (g ∘ f), i.e. codomain/domain mismatch.
types may be useful if they obey an algebra, again with closure. see §data/function equivalence below.
haskell is a good case study of a language based on abstract structures with a good type system that nonetheless is not the most preferable language. it currently has the most capable & elegant type system of all languages. here’re the reasons that i don’t use haskell:
-
lacks:
-
(elegant) row types
-
(elegant or efficient) dependent types
-
type sequences (
a …in racket) -
list types capable enough to iterate over
a b c … -
refinement types
-
-
ghc (at least) fails to infer multiparameter type class instances
-
uses nominal typing (only)
-
neither isomorphic nor equivalent types are implicitly coercible
-
by haskell’s design, they’re needed for type class instance lookup. this is yet another suggestion that type classes have flawed design
-
suggests seeing a type for its intended purpose rather than for its form
-
no anonymous types
-
no anonymous newtypes. we can’t bind type classes to type forms on-the-fly
-
-
a simple example expression that haskell cannot handle: a function that applies an elementwise operation over a tensor of types.
simple example problem: the Eq type class implies that things can be compared by only one equivalence relation. it doesn’t directly imply this; we could define Eq2, Eq3, &c (even though that’s obviously stupid.) still the real trouble comes as functions use types like Eq t ⇒ t → t → Bool. now if i want to use this function with a different equivalence relation, then i’d need to create a new type. oh, wait! that’s easy because i can use newtype. but this is obviously less elegant than the obvious way: simply saying, t | a == b = …—on-the-fly overrides.
newtypes' only utility is changing a type’s type class instances. very strange & inelegant idea compared to the obvious solution:
-
types are considered for their form alone
-
like lambdas, types are algebraic expressions and may be bound to identifiers; the binding and expression are separate (y’know, like they are in typed racket?)
-
types are algebraic: they’re composed of only primitive operations. equivalence, isomorphism are considered by the type system and calculated by the same laws of lambda calculus: α-translation, β-reduction, η-reduction. given that haskell’s elected to not have row types (and so type composition/application cannot be commutative) the least it can do is make its type compositions follow the same rules as actual haskell code, i.e. the lambda calculus!
e.g. haskell’s rose Tree type would be a mere type alias for ∀a. rec r: a × (List (r a)), which, considering that List := ∀a. rec r: 1 + (a × r a) expands to
∀a. rec r: a × (rec s: 1 + r a) × s (r a) which may or may not β-reduce; i’ve not learned type theory well enough to say. i’m also not sure if these types should be expressed by rec or μ.
rather than type classes, we want context-specific interperations or roles:
-
derive morphisms from types that satisfy some property, e.g. a predicate on a type that refinemes it into (or derives) a type class instance
-
instead of type classes, use functions from types to output functions, e.g.
∀a b. a × b↦(\(a,b) → a == b) : (∀a b. a × b) → 2(this example assumes that==is defined on ∀a b. a × b) -
this breaks haskell’s separation of types and functions, excepting type families, which are defined ad-hoc, but are recursive, so more complex values can be derived, albeit inelegantly and generally inefficiently (increased compilation times)
-
this point is somewhat incorrect/flawed. i’ll come back to it later.
lenses are a perfect example of how a single type replaces a type class of methods get & set, and instead of instancing a type class, each lens is simply defined as a function. even better, this makes lenses composable! this is possible because, unlike type classes, functions are types and obey symmetric type algebra rather than being ad-hoc. ad-hoc inherently structures resist axioms.
-
languages that don’t feature ADTs, like c++ or java, types merely augmentat actual program logic in order to prevent errors
-
ADT-based langs like ocaml or haskell use types to help ensure correctness but also to design programs and both attribute & enforce axioms
-
type systems with dependent & refinement typing like agda’s or f*'s are capable of encoding the entirety of programs "purely as types" because predicates are part of the types and the only other code needed is pattern matching (type deconstruction) and funccalls (which, by including recursion, includes looping.)
the ultimate use of types is their implicit computation of every implied fact, so that the programmer never needs to specify any implied (redundant) information. this entails some kind of solver.
the power of function types is that they describe all λ-exprs and that they’re explicit. data types vs functions is a false dichotomy. that’s why it’s so difficult to decide how abstract to make data types; should they be higher-kinded? should their constructors take function parameters, or should we use functions on the data type whose constructors take data parameters? these questions are unnecessary and can easily lead one to waste time trying to specify the best definition of a data types—ones that’re flexible and elegant and work together. functions already satisfy all of those conditions, and are anonymous.
|
Note
|
the following entails a summary of christophe calvès' series of articles on types |
val currencies: Set[String] = Set("EUR", "USD", "JPY")
final case class Conversion(
from: String{currencies.contains(from)},
to: String{currencies.contains(to) && from < to }
)
type ConversionRates = Map[Conversion, rate:Double{rate > 0}]
-
rate > 0is dependent typing -
currenciesis an enum of strings to effectively identify a subset of all strings (haskellCurrency = EUR | USD | JPYbut more generalizable) -
Conversionis an unordered tuple type-
the predicate
from < toensures that: 1) pairs are of distinct currencies; 2) no pair of currencies can be specified twice e.g. USD→JPY and JPY→USD being defined separately, and so possibly being inconsistent-
a map is used instead of tuples to complement point (2)
-
-
|
Note
|
in type theory, the uninhabited type is called 0; the unit type is 1, booleans are 2, &c
|
in referentially transparent programs, such as those of haskell, programs are mathematica functions. i’m going to say the same thing 3 times for clarity:
-
all data are thus either program inputs or outputs, or inputs or outputs of the functions whose composition is
main. -
program inputs (hard-coded data) are passed to a function, whose output is passed to another function, …, whose output is passed to a function upon whose evaluation the program halts.
-
a datum
b : bmay be produced from anaby a functionf : a → b. ifbis to be used anywhere (which is must, if it’s to be useful,) the only way that it can be used is by being passed to another function, sayg : b → c. this is equivalent to morphismsa → b → c—"a to b to c"—expressed by the functiong ∘ f : a → c. entire programs are function composition; therefore all intermediate data are function parameters.
again, because all binary relations are isomorphic, and recursing on them produces all structures, all structures are isomorphic independent of relation opreator. many haskell libraries, e.g. lenses, use functions instead of data. curry & uncurry demonstrate equivalence of product types and function types by being bijections between the two.
every data is bijective with a pair of inverse functions; therefore data & functions are equivalent. a common (though only as an implementation detail) example is build, which generates a list using a function. another example is StateT, which is essentially (i.e. excepting kleislihood) a chain of function compositions evaluate to a final state, like how build evaluates to a list.
--- 0 as data & function
data Void
type VoidFn = ∀ a. a
d2f :: Void -> VoidFn
d2f x = case x of {}
f2d :: VoidFn -> Void
f2d x = x
--- 1 as data & function
data Unit = Unit
type UnitFn = a -> a
unitFn :: UnitFn
unitFn x = x
d2f :: Unit -> UnitFn
d2f Unit = unitFn
f2d :: UnitFn -> Unit
f2d f = f Unit
--- 2 as data & function
data Bool = True | False
type BoolFn = a -> a -> a
true,false :: BoolFn
true a _ = a
false _ b = b
d2f :: Bool -> BoolFn
d2f True = true
d2f False = false
f2d :: BoolFn -> Bool
f2d f = f True False
--- &c-
a nullary product type is the unit. this is why unit is written
(); cf(A,B).
data Prod a ... = Prod a ... -- constructor is of type a -> ... -> Prod a ...
type ProdFn a ... = ∀ c. (a -> ... -> c) -> c`
constructor :: a -> ... -> ProdFn a ...
constructor a ... f = f a ...
d2f :: Prod a ... -> ProdFn a ...
d2f (Prod a ...) = constructor a ...
f2d :: ProdFn a ... -> Prod a ...
f2d f = f Proddata Coprod a ... = A a | ... -- each constructor is of type t -> Coprod a ...
type CoprodFn a ... = ∀ c. (a -> c) -> ... -> c
-- n represents the nth constructor
injN :: ∃ n ∈ (a ...). n -> CoprodFn a ...
injN n ... _ ... f ... _ = f n -- f :: (n -> c)
d2f :: Coprod a ... -> CoprodFn a ...
d2f = \case
(A a) -> injN a
⋮
(N n) -> injN n
f2d :: CoprodFn a ... -> Coprod a ...
f2d f = f A ... Nan example of non-obvious type equivalence as proven by inverse bijections:
data N where
Z :: N
S :: N -> N
f :: Maybe N -> N
f Nothing = Z
f (Just n) = S n
invF :: N -> Maybe N
invF Z = Nothing
invF (S n) = Just n- therefore N ≅ Maybe N. considering that Maybe a ≅ 1 + a, N is a solution to t ≅ 1 + t. in fact, it’s the least fixed point of the type-level function `Maybe
-
a → Maybe a`! the greatest fixed point is an infinite peano.
|
Note
|
μ: (* → *) → * is the least fixed point operator, i.e. T ≅ μ(F). μ(Maybe) = N. this example using an alternate λ-like notation: N = μT.(1 + T) |
as you’d expect, the function version of N is a → (a → a) → a. morphisms between the GADT and such functions is obvious by now. this function is the primitive for all recursive structures.
-
each of all recursive types is the smallest solution of some type equation. this isn’t a surprise when we consider that
fixcan be easily used to implement recursion.-
List a = μ(1 + a × T)
-
streams are the greatest fixed point
-
-
-
ADTs are types expressible by relations of 0, 1, +, ×, and μ
-
BinTree a = μT.(1 + a + (T × T))
-
TODO: how to express recursive types literally instead of in terms of μ? TODO: given this, do i want to add anything to the statement that recursion is closure under function application? TODO: types are inherently for pure programs. how to apply them to stateful programs (for speed, e.g. using vector instead of list)?
now that we know function/ADT equivalence and ADTs' basis, we’re ready to consider recursion schemes: the factorization of recursive functions.
-- one base case
s1 :: a -> (Int -> a -> a) -> Int -> a
s1 base rec = f
where
f :: Int -> a
f 0 = base
f n = let r = f (n-1) -- this is why Int type is present instead of a
in rec n r
-- tail recursive version
s1 base rec n = aux base 1
where
aux res i = if i <= n
then aux (rec i res) (i + 1)
else res
fact,sum :: Int -> Int
fact = s1 1 (*)
sum = s1 0 (+)
list = s1 [] (:)
-- two base cases
s2 :: a -> a -> (a -> a -> a) -> Int -> a
s2 base1 base2 rec = aux
where
aux 0 = base1
aux 1 = base2
aux n = rec (aux (n - 1)) (aux (n - 2))
-- tail-recursive version
s2 base1 base2 rec = aux bsae1 base2 2
where
aux b1 b2 i = if i <= n
then aux b2 (rec b1 b2) (i + 1)
else b2
fib = s2 1 1 (+)
type bintree a = forall c. (a -> c) -> (Tree a -> Tree a -> c) -> c
data BinTree a = Leaf a | Node (BinTree a) (BinTree a)
tree :: Int -> BinTree Bool
tree = s2 (Leaf False) (Leaf True) Nodei used BinTree rather than bintree because it gives a more elegant definition of tree. now i wonder about function types' utility. their beauty is symmetry: they express both functions and ADTs symmetrically, AND they encode ADTs anonymously, thereby focusing on the ADT’s form rather than its name or intended purpose. they extend the algebra of (function) types, seeing ADTs as their arrows (constructors and dual pattern matching) rather than as categories or choices or structs! therefore function types are the fundamental algebra of computation.
however, they’re troublesome to use in current languages (except maybe f*, coq, or agda, as i’ven’t learned them yet.) our programming language really should elegantly support algebraic operations on types, including implicitly solving a type-algebraic equation for a type solution. perhaps, however, recursion schemes & optics are together enough to express all programs elegantly.
at least function/data equivalence allows us to systematically derive data types from functions, which may or may not be useful.
rather than latent or general typing, by numeric typing i mean using complex numbers as the only data type. complex numbers have many useful algebraic properties and describe much of the natural world, which should describe at least most practical (cf theoretic) programs; usually programs compute things that laypeople can understand, let alone things that can be described by complex numbers! a generalization (albeit losing some algebraic properties) of binary complex numbers is arrays (n-ary numbers,) or even more generally, tensors (arrays of arbitrary nesting patterns.)
benefits of complex numbers:
-
great cardinality
-
contain the boolean ring
-
fast & efficient computation, and ubiquitous (especially regarding both cpu & gpu opcodes)
benefits:
-
consider whole program at once. no being lost in detail.
compose pointfree operators:
| how | lang |
|---|---|
threading macro (esp. supporting insertion point via underscore, e.g. |
lisp |
pointfree composition |
haskell |
concatenative programming |
apl |
ideally the language would infer pointfree, e.g. (+ car last) would be shorthand for (λ (a) (+ (car a) (cdr a))). haskell’s applicative → is decent—(+) <$> head <*> last—but lacks elegant generalization (viz nesting.)
no programmatic entities should be given names; they should be given symbols that are either arbitrary, or correspondent (e.g. ∧ & ∨, whose vertical inversion describes their duality,) or common not for their use, but for their behavior, e.g. + & × are used when they’re defined to obey the common identities, associativity, &c. the reason to never name based on usage is that:
-
the name is not as descriptive as the definition itself
-
definitions are often modified incrementally as new uses arise, but names do not support such small/elegant alterations where the new name describes its difference from the original
-
homomorphisms abound. it should be assumed that in every case where something has some purpose, there’s a separate case where the purpose is different but analagous. having separate names for entities with closely related mathematical definitions hides their similarity. finally, there are too many axes of similarity for words to elegantly describe: in addition to homomorphic (a difference of context,) things may differ in abstraction, implementation, arity, axioms, &c. composable symbols are the best (and arguably the only decent) notation that we have.
|
Note
|
functions are called words |
-
purely functional: all functions implictly have the stack as the only argument. thus each function is implictly a stack endomorphism.
-
no arguments are named. no local binds.
-
refactoring functions is practically moot compared to applicative languages
-
it’s like whole programs are implicitly in the threading macro
-
-
satisfies algebraic design; functions are the only elements and composition is the only operation on them. this allows us to see the program for its structure rather than purpose.
-
e.g. stack words
bi&dupare\f g → \x → [f x, g x]and\f g → \x → g x (f x)where x is on the stack.
-
-
plural symmetry: returning or accepting multiple values is no different from one
-
composing variadic functions is just as easy as unary ones. this enables interesting tacit programming.
-
-
prefers more simpler functions than fewer complex ones. this encourages writing higher-order functions and makes programs tacit, again preferring a composition of many small functions to create various composite functions on-the-fly still without requiring much code
-
functions are printable
-
lisp-like macros (homoiconic)
-
continuations (which is a tuple of stacks)
-
coroutines
-
exception handling
-
it’s interesting that the word short can modify head &c to take what’s available instead of erroring. i should try to implement that in scheme.
-
om seems to be the best catlang. however, it needs funding & development.
-
joy, factor, forth, seem to be the best available catlangs. however:
-
forth is like C: no types, so reflection isn’t feasible; fast, low-level, less suggestive of functional paradigm
-
despite being beautiful and algebraic, joy is apparently, at least currently, slower and less practical than factor.
-
i’m choosing factor as the language that i’ll use at least until om is ready.
as i’ven’t yet identified a method for determining an algebra from a set of needs, here i’ll fumble with vagries that can be explored.
-
goal: all specific/complex functions have small, simple, pointfree definitions. this requires good choice of common/fundamental functions.
-
generation functions that guarantee certain data forms
-
implies that other functions don’t need to check their inputs
-
abstract structures are defined by their axioms/behaviors; being algebraic, they aren’t defined in terms of particular data. differently, data structures always contain particular arbitrary data, and are defined for fast particular operations, viz search, get, set. an abstract consideration of data structure is concerned with both the algebraic properties of the data, but also storing the data such that desired operations are efficient.
an example is the heap: it requires its elements be totally ordered. the definition [implementation] of the structure is strictly dependent on this property. therefore the structure itself is imbued with algebraic truth, allowing simpler definitions of search, get & set—at least when search is a predicate only of the ordering, e.g. defined in terms of <, not, and ordered constants. a search for numbers that divide 3 would be no better here than in a data structure defined without regard to algebraic properties. minheaps or maxheaps even enforce O(1) access of a set’s min or max. very cool.
it’s silly to choose a structure; it’s more sensible to identify relevant algebraic properties of data, then identify a structure defined about those properties; data structure should always be derived from abstract structure unless you’re using a probabilistic data structure, in which case obviously the structure should correspond with a probability distribution of certain events.
structures are ranked by their specifity (to a problem) and speed for a set of operations. there are only two data structure operations: traverse & transform.
-
traversal (identify a subset of elements)
-
arbitrary element(s)
-
traverse a proper subset of elements
-
traverse all elements
-
-
particular element(s) e.g. max of maxheap
-
particular element(s) as determined by (particular) predicate e.g. predicate
(> 5)for a heap
-
-
reshape
-
reindex (e.g. matrix transpose or reversing a sequence)
-
rearrange (change a graph’s edge set)
-
resize
-
insert at arbitrary position
-
add to or extend a side (concat, cons, snoc)
-
delete
-
-
-
traverse generalizes get & set from one element to subsets. anything that can be gotten can be set or traversed.
-
note, however, that some structures, like red/black trees, spend effort to reshape themselves after a set
-
-
each structure permits particular traversals, and, because all structures are relations among data, all traversals can be expressed as recursion schemes
traversal is partitioned into 3 classes because reasoning about each can be quite different depending on the structure. for example, traversing an element in a list is similar to but a bit easier than traversing a substring, traversing the whole list is easiest since that’s what we already have, and traversing all but the last element is is the slowest possible traversal of a (singly) linked list. parallelism is no consideration for individual traversals, and easiest to consider for complete traversals.
for efficiency, all structures should be built & consumed non-strictly; then consume ∘ produce allocates no memory, and one can map over the structure multiple times, and have those automatically combined into a single traversal. to implement non-strict in an otherwise strict runtime, use sequences: functions from index to element. notable index types are 0, Int, (Array Int). examples of sequence superiority: 1) range 10 doesn’t allocate memory; 2) traversing [2 3 4], then traversing [6 4 7] is faster than traversing [2 3 4] ++ [6 4 7] since it elides O(n) concatenation.
generators are easily represented by closures:
(define (nats [n 0]) (cons n (λ () (nats (add1 n)))))
(nats) ; '(0 . #<procedure>)
((cdr (nats))) ; '(1 . #<procedure>)
;; print 1 through 10
(let loop ([p (nats)])
(let ([n (car p)])
(when (< n 10)
(displayln n)
(loop ((cdr p))))))
;; fusion
(define (cs-map f s) (cs-fold cons '() s))i call generators of this style cons sequences. i just realized that’s an unintentional pun…maybe i’ll rename it later. anyway, it’s just non-strict looping, just returning the current value and a thunk to loop again (or not loop again.)
-
it’s like a fold, but
-
always a left fold
-
can short circuit
-
supports infinite sequences
-
evaluates elements only as needed
-
using multiple states is just as easy as one. e.g.
foldl (\(s,t,u) a → (s',t',u')) (s0,t0,u0)vs(let loop ([s s0] [t t0] [u u0]) (loop s' t' u')). the former is easy because of pattern matching. however, in languages without pattern matching, it’d be a hassle. -
semi-stateful: each closure has its own state, but closure states are independent, so closures can be freely duplicated and passed to various other closures
-
states can be saved and resumed later
-
-
terminating generators can simply return any non-pair when they’re done. the empty list is a good choice.
-
-
converting a closure to a list does not really make sense; we’re using closures instead of lists.
-
in a language whose recursion is inefficient and/or a non-functional language, we’d instead return the address of a function to invoke, rather than the function itself
-
using non-strict thunks like this is just waste if a strict version would be as well. a good language must default to strict eval, using non-strict only when the code implicitly requires it
another benefit of cons-sequences: on particular iterations of their execution, they can perform stateful actions, which other cons-sequences can use. consider the following code:
(for/fold ([i 1]) ([x X] [y (Y X)]) where X & Y are sequences
(+ (* i x) y))ideally X would be traversed only once. this would be a necessity if X performed stateful actions during its execution. (why state instead of returning usual values & rolling state in a list? dunno; maybe this functional approach is always better.) but this would require Y to be a cons-sequence, since it’d need to take each x ∈ X as an arg. at least, if not needed nor cleaner, it saves redundant computation.
i have multiple loops that generate sequences of differing lengths. the way that i want to use these lists together is particular—not simply shortening lists to the smallest’s length, nor padding shorter lists with default values to make them all the length of the longest list.
solution: with sequences or cons-sequences, instead of, on any given iteration, omitting a return value, we can return a zero value; thus padding would be natural; this supports complex padding patterns. it would be interesting to consider how cons-sequences could be redefined to support dependency on other cons-sequences. this immediately seems to relate to frp, non-strict eval langs like haskell, and agent-based models like erlang; such a model may be better for coding in general. of course, the second that you mix functional/applicative and stateful programming, the question "why not use only one" arises, and it’s such a valid question that we immediately see that there cannot be a good reason to combine them; they’re each orthogonal bases. non-functional applicable programming is just functional programming before people understood how to do it properly. purely stateful programming is what actually physically occurs, and so must always be used. therefore fp is the λ-calculus (plus some extra needless complexity in many languages): a dsl for composing functions, whose final product supports a transform to stateful machine code.
(define (even-nats [n 0])
(if (even? n)
(cons n (λ () (nats (add1 n))))
;; strict eval if we'ren't returning
;; a value on this iteration
(nats (add1 n))))apply, unquote, and cons in formalsfor a language to feature both apply and . in formals (e.g. (define (f x . xs)))) is redundant.
;; unary f
(define (mmap f xs)
(if (null? xs)
'()
(cons (f (car xs))
(mmap f (cdr xs)))))
;; variadic f
(define (mmapm f . xss)
(if (ormap null? xss)
'()
(cons (apply f (map car xss))
(apply mmapm f (map cdr xss)))))
;; same, but latter is less efficient b/c it maps twice.
;; ideally we'd not have variadic functions; instead we'd
;; use 1. a function that transforms ((1 2 3) (a b c))
;; into ((1 a) (2 b) (3 c)) with 2. apply (or unquote).
(mmapm cons '(0 1 2) '(a b c))
(mmap (curry apply cons) (mmapm list '(0 1 2) '(a b c)))also apply (at least as it’s defined in racket) is just a less-capable form of unquote-splicing:
(apply + '(1 2) '(3 4)) ; fails b/c apply expects only its last argument to be a list
(apply + (append '(1 2) (3 4))) ; works
(+ ,@'(1 2) ,@'(3 4)) ; obviously works
(+ ,@'(1 2) 3 4) ; we can unquote only where necessarythis form of unquote splicing is available only in lisps that allow it outside of a quasiquotation.
TODO: see ~/codenotes/plurality-in-lisp.rkt about fusion. the above cons sequences do not support fusion.
TODO: all data structures are graphs; how does graph theory relate to identifying data structures? graph theory obviously considers traversal & structure. it’s also related to group/galois theory, which concerns symmetries. frankly, all discrete mathematics should be considered by one structure. see Graph Theory by Russell Merris (Wiley Series in Discrete Mathematics & Optimization.)
you may generally think of data structures as graphs in the graph theory sense. for example, a zipper (a duple of lists) would be considered as the disjoint union of two paths. this is a rather strange yet apparently correct description. being not well studied in either group nor graph theory, i can’t comment further, but i assume that both disciplines enlighten us to better interpretations of zippers, as with any other data structures. certainly we can intrepret every data structure as a graph, and optimize the structure by
-
optimizing traversal: minimizing the shortest path between given nodes
-
optimizing balancing: minimizing the difference between a graph and the rebalanced graph
as with everything, data structures should not be seen as "some few things each complex and worthy of study." instead, usually, each structure should be seen as no more particular than one number out of an infinite number of numbers. structures should be implemented on-the-fly just like lambdas. this is almost always feasible because:
-
most structures are bulit on few symmetries
-
structures can be defined by other structures
typed racket’s array library (part of the math package) usefully supports three varieties of strictness:
- strict
-
array wrapping a vector. evaluation changes vector in memory.
- non-strict
-
function from indexes to element. recomputed on each eval.
- lazy
-
memoized function
or, as a table:
| strictness | caches values? | evaluates indexes → value function |
|---|---|---|
strict |
yes |
on each evaluation |
non-strict |
no |
on every evaluation at each index |
lazy |
yes |
on first evaluation at each index |
every structure permits terse, elegant traversals & reshapes when these functions are written in terms of the structure’s symmetries. reasoning by symmetry allows easily identifying solutions that would be very difficult to reason about by studying "frame-by-frame" updates to structures. non-coincidentally this is the same as reasoning about recursion: it’s difficult to trace every function call, but much easier to understand in terms of closure and base & recursive cases.
for example, it’s difficult to understand folds over rose trees unless you understand the rose tree’s symmetry.
rose trees and zippers are simple compositions of lists. rose trees use particular nestings of lists whereas zippers use multiple non-nested lists. how could i identify particular substrings of a list? a substring is defined by the triple (list, start, end). multiple would then be [(list, start, end)], but i know that the list is constant over all substrings, so i can factor it out: (list, [(start, end)]). that’s technically a "new" data structure. if we want only to traverse the list of intervals in any order, or fifo, then we’re done. however, if we want to traverse in order of substring length, then we’re store them in a data structure defined by order, e.g. a heap. for this we’d need to tell the heap to sort on end - start. racket’s `data-lib’s binary heaps are constructed over a ⇐ operation.
-
array
-
static
-
dynamic
-
-
cons/pointer digraph
-
skip list
-
DAG
-
tree
-
balanced tree
-
binary search tree, heap, splay tree
-
-
rose tree
-
finger tree
-
list
-
stack
-
ring buffer
-
alist
-
-
-
-
-
hashmap
-
zipper
-
differn list (purely functional substitute for doubly-linked list)
hashmaps are an interesting solution to making alists faster. however, with ordered keys, splay trees may be better than hashmaps since they grow & rebalance well, whereas, depending on the hashmap implementation, the hashmap may not grow quickly (say, if it uses dynamic arrays under the hood.)
|
Note
|
efficient general-purpose structures
this section currently isn’t even attempting to be complete
|
where these structures should be used is debatable. however, they’re listed here simply because they’re impressive, independent of their suitablity. they’re efficient for many applications where not much forethought is put into the nature of their data.
-
finger tree
-
rope
-
skip list (apparently generally superior to balanced trees)
here we’re assuming that we’re choosing a data structure instead of a generation function or recursion scheme, in case it’s worth entertaining.
it’s far too easy to assume a structure or framework simply because of its popularity or support from builtin functions. we need to plainly but carefully consider our reasons for using given structures:
-
efficiency
-
speed
-
memory
-
-
elegance/naturality
-
convenience (it and/or functions on it are already implemented)
-
recommendation (either explicitly or implicitly, e.g. being a language’s builtin type)
instead ponder:
-
why are you putting your data in a structure? why do you need to structure unstructured data? what properties should your data have after being structured? what’s the least structure that you need to implement in order to achive the desired relations of data?
-
how will it be generated? (function output)
-
how will it be consumed? (function input)
you may make your own structure (which should be very easy if you follow these best paradigms) or you’ll know which of many already-available ones to choose.
structures are for code, not readability! whatever structure you impose—whether data structure, abstract structure, or structured functions, do so expressly to the end of expressing program logic better. an example is recursion schemes. not an example is a Person struct of name, age, &c. that’s descriptive data, not programmatic data! all descriptive data should be stored either in a database or by common types, here (again, like a database) as a matrix: an unordered list of vectors of known size. always use basic structures when they’ll do. just like you should never assume use of basic structures for implementing program logic, so should you never consider anything beyond basic structures for descriptive data, i.e. data that isn’t calculated in a way that significantly affects the program’s behavior.
suppose ads := [a…z] ↦ [b - a, …, y - x]. the following are various implemenations:
(define (ads s) (map - (cdr s) (drop-right s 1))) ; if (drop-right _ 1) is O(n), then this impl is O(2n)
(define (ads s) (map - (cdr s) s)) ; O(n). assumes that map returns at end of shortest list rather than requiring all same length
(define (ads s) (let r ([p (car s)] [s (cdr s)]) (and s (let ([e (car s)]) (cons (- e p) (r e (cdr s))))))) ; O(n)
(define (ads s) (reverse (cdr (foldl (λ (e t) `(,e . (,(- e (car t)) . ,(cdr t)))) `(,(car s) . ()) (cdr s))))) ; O(2n)
(define (ads s) (and s (cdr s) (cons (- (cadr s) (car s)) (ads (cdr s))))) ; O(n)
_TODO ; stack paradigm, implemented by a zipper
(define-syntax-rule (sp/ a is ...) (array-slice-ref a (list is ...)))
(define (ads a) (array- (sp/ a (:: 1 #f 1)) (sp/ a (:: 0 (sub1 (array-size A)) 1))))|
Note
|
versions 3 & 5 assumes that the empty list is the false value. in scheme we’d say (if (or (null? s) (null? (cdr s))) s …) instead of (and s (cdr s) …).
|
version 5 is the best (fastest & simplest) list implementation.
both:
-
relate arbitrary data
-
support nesting, which means multidimensionality like a matrix (an array (the primitive relation) of arrays) or a matrix of matrices (which supports flattening). the former doesn’t increase depth, but the latter does.
-
-
are equally apt for iteration
-
run in parallel just the same (on cpus): we can perform multiple
mapoperations in separate (virtual) threads.-
pointwise ops use one gpu cycle, so arrays are, only on such architectures, faster than lists.
-
-
support n-ary operations (in scheme,
mapis like haskell’szipNWith)
neither suggests a traversal; traversals are problem-specific. iteration clearly depends on shape, e.g. a tensor, cycle, general graph with(out) cycles, DAG, tree, list, stack.
arrays (tensors):
-
random access
-
a structure that’s nothing more than direct data access. it’s the simplest arbitrary-size random access data structure possible.
-
-
generalize bitwise operations
-
can encode:
-
graphs
-
linear transformations
-
-
fixed rectangular shape (determined by the size of each axis)
-
O(1) get & set
-
O(1) length
-
O(1) removal or duplication of axes
-
O(1) transpose
-
slow addition of shape
-
parallelizable pointwise ops (since matrices can be considered as columns or rows)
-
especially good for combinatronics (for both efficiency/parallelism and expression elegance)
linked list (graphs):
-
sequential access
-
support ragged matrices
-
O(1) push & pop
-
O(m) get & set
-
O(n) length
-
purely functional
zippers:
-
eliminate the trouble of choosing take vs spilt-at; they’re effectively the same. to split a zipper is the same as searching through a list.
-
naturally iterate: input on the right, output on the left if all elements are consumed; else old elements on the left and a separate output list.
-
generalize to arbitrary dimensions. xmonad’s
StackSetis an example of a non-linear zipper -
are dual to queues. both are pairs of lists that transform into one list by reversing one list and appending to the other, but queues do so in a slightly different manner, and thus achieve different functionality.
-
queues and zippers are alternaties to doubly-linked lists.
-
as lists are to graphs, matrices (or vectors, depending on processor architecture) are to tensors: graphs or tensors are the most general mathematical structure, but are programmatically implemented in terms of lists or matrices.
zippers are not a particularly fundamental structure; they’re given as an example of a possible improvement on a list—namely when one would want to take or split a list. this segways into another point: each graph structure must be designed for particular traversals. however, this is not true of arrays! arrays support all possible traversals over arrays equally well, since arrays are O(1) get & set. therefore arrays are the default solution if you don’t know how you’ll traverse your data, but you know that it won’t change size.
it’s impossible to make strict purely functional doubly-linked lists. however, lazy ones are possible. being that even strict languages support non-strict functions by wrapping the original function in a lambda [thunk], construction &al traversals can be non-strict, thus making purely functional doubly-linked lists possible when implemented purely by functions rather than data structures i.e. the "data" of the structure is stored entirely as lambda arguments.
i’m also considering a synchronizable untyped structure (struct fs (f stack v)) (or encoded as a closure): elements are pushed to it until a condition is met, and then (begin (set! v (apply f stack)) (set! stack '())
-
arrays are superior for fixed-shape rectangular data. graphs are better for data of irregular pattern and shape
-
when you don’t know what variety of traversals you’ll be doing on your data, but you know that the data’s shape won’t change, then arrays are best because all their supported traversals (and mutations) are equally fast
-
typed racket’s math’s array library (at least when using non-strict arrays) uses relative indices, so:
-
rotating an array is as simple as changing its starting offset.
-
slicing, reshaping, removing axes or duplicating axes are O(1)
-
-
-
if slicing operations seem ineffective, consider them on the transpose of an array
lists aren’t as powerful as arrays, but i’ve yet to identify which powers are useful:
-
we can start & end array iteration at arbitrary indices, whereas we must start list iteration at its beginning, and though we can stop iteration after an arbitrary number, we cannot express this number in terms of its length without having O(n) traversal.
-
suppose we’ve a db-style m×n matrix (rows are records, cols are fields.) in this case, the array’s shape is known at compile time. we can just as well declare n lists each of length m. if we need fast lookup, we can use a treeset. it’s not O(1), but O(log2n) isn’t bad. a proper B-tree or splay tree or skip list (latter both are implemented in racket’s
data-liblib) is even better. this being said, matrices don’t need balancing! -
vector search always returns an index; if you want the index, then there it is; however, if you want the value, then it’s trivial to get the value given an index. lists support search, but would need to return the index and value simultaneously in order to satisfy both needs without writing multiple functions. furthermore splitting lists would require either even more nearly-identical function definitions, or a more complex single function. splits are most elegantly done with zippers, but zippers are still less efficient than arrays, even if they’re on the same order of magnitude.
verdict: sequences (lazy lists) and vectors defined by index functions to elemetns are best; these don’t require upfront allocation (or, often, any allocation,) and they support fusion. array implementations often come with better functions than lists; such functions shoud be defined & used for lists, too. for example, scalar extension is done on arrays of any shape (tensors,) but this functionality isn’t standardly implemented for lists; for lists you’d need to nest map multiple times, and you’d need to make the nesting relative to the nested lists' shape. (optics & recursion schemes solve this.)
open questions:
-
how commonly do structures sizes change? why?
-
when (and how often) do we know the shape of our data when we initialize it?
j & apl, and probably both burlesque & bqn.
advantages:
-
array-based. array algebra, because it works on multiple data automatically, is a simpler alternative to iteration; rather than performing an operation on each element, an operation is performed on all elements at once. furthermore we can consider a whole structure easily rather than using functions that go element-by-element, which are inelegant whenever we need to consider specific substructures.
-
implicit plurality & mapping
-
simple & regular
-
polymorphic
-
algebraic (viz boolean and [modular] integer rings)
-
suggestive. they’re small but particular languages.
-
instead of defining polymorphism or kwargs, abtl uses idioms, which can be modified inline as different functionality is desired.
-
relieves burden of designing collections of functions modularly, which usually entails careful forethought about polymorphism, default values, and when functions should be split into multiple smaller functions.
-
an example is apl’s grade operators: they break a sort operation into multiple parts, so that one may easily choose only part of a sorting operation, or the whole thing, without any appreciable difference in effort.
-
-
reshaping uses modular arithmetic
-
(in apl at least) rank differs from depth: a matrix (array of arrays) has depth 1 but rank 2. "The length of the shape of an array is equal to its rank. Therefore, we can find the rank of an array with ⍴⍴—the shape of the shape. Since a scalar is rank 0 (i.e it has no dimensions) the shape of a scalar has length 0 and is an empty vector: ⍬"
-
scalar operations are applied to all points without changing shape
-
if α f ω, if α or ω is scalar, then f is applied to all elements of ω or α (this is called scalar extension). if both are arrays of the same shape, then f is applied pointwise. incompatible shapes raise an error: if shapes differ, then the smaller shape is repeated until it’s the same size as the larger array (in t.racket’s
math/arraythis is called broadcasting). this makes expressions like1 2 3+(3 2 1) (¯1 ¯2 ¯3) 44 71 11) 1 (32 0.5) )valid. this example’s β-reduction is(4 3 2) (1 0 ¯1) ((47 74 14) 4 (35 3.5) );1+is mapped over(3 2 1),2+is mapped over(¯1 ¯2 ¯3), and3+is mapped over((44 71 11) 1 (32 0.5.-
this expression is invalid in t.racket’s
math/array(error "expected rectangular data"). however, for arrays of the same shape, broadcasting is obvious:(array+ (array [1 2 3]) (array [[3 2 1] [-1 -2 -3] [44 71 11]]))⇒(array [[4 4 4] [0 0 0] [45 73 14]]). see §6.3.1 broadcasting rules for complete spec. i haven’t understood the rules technically enough to want to try to find an array representation that accomplishes the same result as the abtl (viz apl) version.
-
TODO: interpret this section into terms of recursion schemes, then generalize to arbitrary computations, generalizing matrices if/when necessary.
recursive → iterative → matrix. credits to https://rybczak.net/2015/11/01/calculation-of-fibonacci-numbers-in-logarithmic-number-of-steps/
fib n := fib (n - 2) + fib (n - 1), fib 0 := 0, fib 1 := 1
TODO: how was this reformulation derived from the recursive form? what subset of recursion schemes support this optimization?
fib n = go n 0 1 where go k a b = (k == 0) * a + go (k - 1) b (a + b) # branchless style
the iterative definition is a linear combination, in the recursive case, even in branching form. therefore it can be expressed by a matrix. being that fibbonacci is a 2nd-degree recscheme (i.e. over 2 recursive variables) our matrix will be a 2x2: [[0 1] [1 1]], which satisfies the recursive identity: T * [[Fn] [Fn+1]] = [[Fn+1] [Fn]] = [[Fn+1] [Fn+2]]. it preserves the original increment between the 1st & 2nd recursive variables, i.e. the difference between (n + 1) - n = (n + 2) - (n + 1) = 1. the transform is applied to the recursive variables column vector.
-
it’ll always be a square matrix because we aren’t adding nor subtracting degrees of freedom / variables.
-
notice that 0 & 1 here are not mere scaling factors; they determine abscence or not of variables, which means that they describe the actual code (inclusion of identifiers in code or not.)
-
this is exactly the same as branchless programming or using predicates as masks.
-
we can use matrices of 0’s & 1’s to both describe and computationally encode the structure of algorithms themselves. of course matrices might not be able to well describe any code (though they may perhaps) and so we’d use other structures such as graphs or semilattices to describe program logic. the significance of such encodings is that these structures exhibit known mathematical properties, and so can be systematically optimized/reduced, transmuted, compared, and other goodies. this is in the direction of computationally producing, optimizing, and hacking together programs, as well as formally discussing the structure and therefore form of programs independent of each their expected runtime contexts/purposes.
-
as with all vector spaces, matrices encode linear combination. linalg is neat & useful, but how do matrices compare with other algebraic structures for their description and computability of programs? btw macros don’t count; they are programs that produce others, but they don’t compute programs algebraically; they just follow arbitrary instructions. it’s impossible to write a macro to optimize some source code. syntax trees are not algebraic.
-
-
thus the iterative form is reformulated into Fn := (matref (Tn * [[F0] [F1]]) [0 0]). we can compute matrix exponentiation, but as is common with any recursion, e.g. factorial, we can preferably omit redundant intermediate operations. in fact, one can study any system’s evolution to see common patterns (e.g. self-similarity or other forms of modulus) and then factor-out those patterns and apply any reductions. perhaps rather than "recursion" you may think of
if a program is described entirely by mutable data structures, such as lists or matrices, then the structure of the code is simple and lends itself to smc, which makes programs adaptable during runtime, and makes their definitions simple if the program is mostly created by bootstrapping, like programming a diploid cell rather than a whole person. obviously like the cell, such programs would take time to "grow;" that growth should be saved at various stages, or should be at least done at "complile time" rather than when the program is run by a user. saving programs is trivial when they’re just simple data structures.
TODO: see file:///nix/store/90c33f811d9vhlhs9qkr2xycrcv3xgqw-racket-8.0/share/doc/racket/guide/eval.html about making the below code work in racket.
macros are considered different from functions, which is asymmetrical. for example, we infamously can’t apply or. that’s stupid; if we just define or as a function (i.e. a manipulation of relations) then obviously there’s no problem (since all code must be defined as such):
(define (my-or . xs)
(if (null? xs)
#f
(let ([x (eval (car xs))])
(if x x (apply my-or (cdr xs))))))
(my-or #f 3 '(error "error"))
(apply my-or '(#f 3 4 (error "error")))should work. however, at least in racket, the bind to x fails: "?: literal data is not allowed; no #%datum syntax transformer is bound in: <1st arg here>". i’m unclear on when eval works in racket.
TODO: use racket namespaces properly, then try again
-
apply works nicely with the fact that quoting produces a list; it allows me to effectively do
(my-or 'a 'b 'c …)without needing to quote each argument:(apply my-or '(a b c …)) -
a proper lisp would allow me to say
(if xs)because()would be falsy and#fwouldn’t exist
consider the following function. it operates on sequences and lists. that’s one degree of asymmetry, as clearly marked by case-lambda. why have both? how can we express this one simple idea (take no more than) by one simple function rather than two?
;; 1st form: take m elements if available; else take whole list, whose length must be less than m.
;; 2nd form: e.g. (let-values ([(more? next)] (sequence-generate (in-naturals))) (take-no-more-than/gen 3 more? next))
;; the 2nd form was chosen because, at least currently in typed racket, (-> Natural (Sequenceof a) (Values (Listof a) (Sequenceof a)))
;; isn't neat to implement.
(: take-no-more-than (∀ (a) (case-> (-> Real (Listof a) (Listof a))
(-> Real (-> Boolean) (-> a) (Listof a)))))
(define take-no-more-than
(case-lambda
[(m xs0) (let loop ([count 0] [xs xs0])
(if (or (>= count m) (null? xs))
null
(cons (car xs)
(loop (add1 count) (cdr xs)))))]
[(m more? next) (let loop ([count 0])
(if (or (>= count m) (not (more?)))
null
(cons (next)
(loop (add1 count)))))]))we begin by comparing the two. to compare, they should be normalized, which means expressed in common terms & forms, then simplified, namely by factoring and reducing redundancies. the obvious, yet particular solution is to accept only sequences, and convert the list to a sequence before passing it to take-no-more-than. this works because an inexpensive morphism from list to sequence exists. generally, though, one case is not a subcase of another, and such lucky, simple refactors aren’t possible. anyway, let’s compare. their difference:
-
in the list case, the state is as a loop var, whereas in the sequence case it’s hidden in a closure
-
instead of
null?the sequence version uses(not (more?)).
but that’s not an enlightened description; it’s a plain one—one that sees only what the code is, not what the program is. we aren’t reversing some hacky code, here; this is code whose purpose we already know. we can reason about its intent rather than its machinery. it’s a function that we can reimplement any way we want so long as it fulfills its purpose. a description of it in cs terms:
terminating loop, i.e. there’re a variant and an invariant. considering the whole program as a triple of code, data, (each corresponding to .code & .data sections of assembly code) and presently-allocated memory (the stack, registers, heap), the variant must be represented by data, memory, or code, any of which can be called state—that which can be saved & loaded from in order to suspend & resume the program, just like save states in video game emulators, or suspending & resuming an os. iff the state is saved in code, then the code is polymorphic [self-modifying]; else it’s state stored in some memory structure.
.. it’s obvious to think of it as being stored in a single memory block whose value is overwritten
.. but in fp, state is stored in the parameters to the loop (a lambda/continuation). the loop address is a constant value in memory, which means that, unless there’s optimization by the compiler/interpreter, then memory is dynamically allocated during the loop body, then its value is set to the updated state, then that memory address is passed to the subsequent invocation of the loop, and, assuming that the old state is not referenced elsewhere, its memory cell is deallocated. not a memory leak, but inefficient needless (de)allocation.
.. finally, in polymorphic code the memory state is not significantly different from the code state
the variant is the tuple (count, xs or (next,more?)) the invariant is at least the loop address and the code.
but those’re the variant and invariant of the implementation. what are the mathematical variant & invariant?
TODO: identify