notes - www - Unnamed repository; edit this file 'description' to name the repository.

commit 7bc0b8035631920489f81e15033acb759349be96
parent 60f8915c59dda3618b23497ef5fbf4081adbf47a
Author: Georges Dupéron <georges.duperon@gmail.com>
Date:   Fri, 16 Dec 2016 17:34:06 +0100

notes

Diffstat:
A features-and-implementation  | 40 ++++++++++++++++++++++++++++++++++++++++
A features-and-implementation2  | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A features-and-implementation3  | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

3 files changed, 283 insertions(+), 0 deletions(-)
diff --git a/features-and-implementation b/features-and-implementation
@@ -0,0 +1,40 @@
+* gc
+* automatic generation of mappings Old → New
+
+
+* safe and automatic generation of 1↔1, 1↔many and many↔1 backward references
+
+** global directives which apply to all nodes types: insert a pointer to the root, insert a backward pointer for every link (i.e. .back brings back to the previous node)
+
+** the backward references specify the link that should be reversed, as well as a path from there. That way, methods can have a backward pointer to the "call" instructions referencing them, but the backward pointer should not
+   directly point to the "call" instruction, but instead to the instruction's containing method, accessed with ".method". Maybe some syntactic sugar can allow this to be specified as #:backward-reference method.instructions.*[call]
+
+* try to think about PHOAS
+
+** Strong HOAS article: https://www.schoolofhaskell.com/user/edwardk/phoas
+
+
+
+* mapping-placeholder          = Placeholder type, one for each mapping
+* mapping-incomplete-result    = (~> mapping) types within the mapping's result type are replaced by (U mapping-placeholder mapping-incomplete-result), and node types are replaced by (U xxx old-node)
+* mapping-with-promises-result = node types within the mapping's result type are replaced by node-promise
+
+* node-promise                 = (Promise node-with-promises)
+* node-with-promises           = node types within the node's fields are replaced by node-promise
+
+
+
+
+
+
+
+
+
+
+
+To discuss:
+* maybe add a "self" variable, so that a mapping can give to others a placeholder for the node it is currently generating? That wouldn't play well with mappings which generate multiple nodes and/or part of a node.
+** self is a opaque "SELF" token, not an actual placeholder
+
+Minor TODOs:
+* If we can guarantee that any two nodes within the same graph are (not (equal? …)), then we can implement a fast negative equality test, in addition to the existing fast equality test (which compares the "raw" field of nodes).
diff --git a/features-and-implementation2 b/features-and-implementation2
@@ -0,0 +1,142 @@
+On verification
+===============
+
+* Some guarantees can be enforced statically using the type system.
+  They are in a way proven to be enforeced in the expanded code,
+  but in order to write verified compilers, it would be necessary to
+  prove that the type indeed enforces the desired property, for all
+  possible macro inputs.
+  
+  Advantage: these proofs need to be written only once for the graph macro,
+  and can be relied on by all compilers written using these static guarantees.
+* Some guarantees can be enforced statically "by construction",
+  and the design patterns used to enforce them are guaranteed to be
+  correctly applied because the code is macro-generated.
+  In order to write verified compilers, it would be necessary to
+  prove that the macro-generated code always enforces the desired property.
+
+  Advantage: these proofs need to be written only once for the graph macro,
+  and can be relied on by all compilers written using these static guarantees.
+* Some guarantees can be enforced at run-time.
+  This allows catching bugs early, and ensures no incorrect output can occur
+  when these guarantees are broken. It does not however ensure that the compiler
+  will never break these guarantees. It is therefore still possible to ship a
+  broken compiler which fails on valid inputs.
+
+  Drawback: it is necessary for every compiler written using such guards to prove
+  that the guards are never raised.
+
+
+Features
+========
+
+* GC:
+  * Group nodes by SCC
+  * Two kinds of "indices" pointers:
+    * index within the current strongly-connected component
+    * direct pointer to a descendant SCC + index within it
+  * To limit the waste when there is an SCC not containing nodes of a given type,
+    do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
+    Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
+    Either:
+    * use casts
+    * pay the compile-time price of a huge type (not feasible with large N)
+    * pay the run-time price of bloated components (feasible but costly with large N)
+    * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
+      * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
+        run-time
+      * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
+        and also close over some markers indicating to which "real" arrays these correspond.
+        This means O(2n) space usage worst-case.
+        Problem again: the construction function for such a case→ while minimizing the number of values closed on
+        has a large O(2^n) type too.
+    * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
+  * In theory it could be possible to give hints to the GC,
+    but Racket's GC does not offer that possibility.
+
+GC Implementation:
+* Must be optional
+* Initially, all nodes within a single component, all indices are of the "slef + index" kind.
+* Use tarjan's algorithm
+* group nodes by SCC
+  Update references to nodes in the same SCC, using just an index
+  Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
+* Due to the higher memory cost (+ 1 reference for references to other SCCs),
+  and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
+  we will not implement the GC.
+  
+  We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
+  not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
+  transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
+  thereby allowing the original graph to be freed.
+
+* Invariants
+  * Scope of the invariants:
+    * Input contracts, types and structural properties
+    * Output contracts, types and structural properties
+    * Transformation invariants (relate the input and output graphs)
+  * Time of verification:
+    * Run-time
+      Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
+      them at run-time. Implemented as some sorts of contracts.
+    * Compile-time
+      * Node types
+      * Statically-enforced structural invariants at the level of node types
+        (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
+      * Macros can be used to generate code which is known to be correct
+        e.g. paths 
+        * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
+          e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
+          is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
+          and about the potential interactions.
+      * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
+        that the output contains that flag.
+      * PHOAS
+  * Specification
+    * Invariants specified in the graph type
+    * Transformation invariants specified on the graph creation code
+    * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
+      The graph creation code must enforce all invariants within the type
+      That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
+      construction, or as a guard at the end of the construction.
+
+* Automatic generation of mappings
+  * when there is no mapping taking an old node as an input, a mapping is automatically generated.
+    The mapping simply translates the old node to the new node type, and recursively transforms its fields,
+    traversing lists etc.
+  * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
+    and instead calls to the mapping must be explicit (i.e. can't return the old node type).
+  * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
+    to auto-generate.
+  * We also have a mechanism for auto-calling transformations on input node types
+  * Possibility to escape this, in order to actually insert a reference to the old graph?
+    * Some notation in the type, to indicate that it's "protected" in some way?
+    * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).
+
+* Structural node equality
+  * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
+    so that any two nodes which are equal? within the same graph have the same index.
+    I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
+    costly, in principle. If we cached results, we could achieve a better run-time in practice, and
+    perhaps a better theoretical complexity if we handled cycles ourselves.
+    * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
+    * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
+    * We could cache all calls to equal? on two nodes for a limited dynamic scope.
+    * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
+  * Alpha-equivalence comparisons
+    (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
+    Possibility: create values which are unique within the graph, but are ignored when comparing
+                 nodes from different graphs.
+    I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
+    * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
+      It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
+      discard names early on (keeping them just for printing / debuggin purposes).
+
+
+
+
+
+
+* Coloring: the existing graph library for Racket has some coloring algorithms:
+  http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
+  Maybe we can build a wrapper for those?
+\ No newline at end of file
diff --git a/features-and-implementation3 b/features-and-implementation3
@@ -0,0 +1,99 @@
+* Invariants
+  * Scope of the invariants:
+    * Input contracts, types and structural properties
+    * Output contracts, types and structural properties
+    * Transformation invariants (relate the input and output graphs)
+  * Time of verification:
+    * Run-time
+      Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
+      them at run-time. Implemented as some sorts of contracts.
+    * Compile-time
+      * Node types
+      * Statically-enforced structural invariants at the level of node types
+        (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
+      * Macros can be used to generate code which is known to be correct
+        e.g. paths 
+        * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
+          e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
+          is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
+          and about the potential interactions.
+      * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
+        that the output contains that flag. Potentially out-of-scope fields in the input do not have the flag.
+      * PHOAS
+  * Specification
+    * Invariants specified in the graph type
+    * Transformation invariants specified on the graph creation code
+    * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
+      The graph creation code must enforce all invariants within the type
+      That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
+      construction, or as a guard at the end of the construction.
+
+; The body should produce a function of type (→ (Listof Nodeᵢ) … Boolean)
+; The body should also return an indication about which invariants it ensures. Maybe use {~seq #:ensures (invariant-name inariant-arg …)} …
+  however this does not allow the body to perform some minimal amount of rewriting on the "ensures" options.
+(define-syntax/parse (define-graph-contract (name g-descriptor arg …) . body)
+  #'(define-syntax name
+      (graph-contract
+        (λ (g-descriptor arg …) . body))))
+
+;; TODO: find a way to translate this to a type, with subtyping for weaker invariants.
+;; The type only serves to testify that the invariant was statically enforced or dynamically checked at construction, it does not actually encode the property using the type system.
+(struct invariants-wrapper ()) ;; Use a private struct to prevent forging of the invariants aggregated in a case→ (since it is never executed, any non-terminating λ could otherwise be supplied).
+(define-for-syntax invariant-introducer
+  (make-syntax-introducer))
+;; the body should return the syntax for a type, such that less precise invariants are supertypes of that type.
+(define-syntax/parse (define-graph-invariant (name g-descriptor arg …) . body)
+  #'(define-syntax name
+      (graph-invariant
+        (λ (g-descriptor arg …) . body))))
+
+The type of a graph node with the invariants inv₁ … invₙ on the graph must include an extra dummy field with type
+(invariants-wrapper
+  (case (→ inv₁ inv-arg … #t)
+        …
+        (→ invₙ inv-arg … #t)))
+
+The invariant arguments can be symbols, to indicate node types or field names.
+
+Concern: invariants on a graph may not have the same semantics if the graph has more or fewer nodes?
+
+* Automatic generation of mappings
+  * when there is no mapping taking an old node as an input, a mapping is automatically generated.
+    The mapping simply translates the old node to the new node type, and recursively transforms its fields,
+    traversing lists etc.
+  * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
+    and instead calls to the mapping must be explicit (i.e. can't return the old node type).
+  * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
+    to auto-generate.
+  * We also have a mechanism for auto-calling transformations on input node types
+  * Possibility to escape this, in order to actually insert a reference to the old graph?
+    * Some notation in the type, to indicate that the output should be an "old" node here
+    * In the rare case where we have an (U (Old nd) nd), in the mapping use some (old v) wrapper
+      indicating that the returned it should merely be unwrapped, not processed.
+
+(define-syntax define-graph-type
+  (syntax-parser
+   [(_ _name [_nodeᵢ [_fieldᵢⱼ :colon _τᵢⱼ] …] …)
+    ;; save the graph type metadata
+    ]))
+
+(define-syntax define-graph-transformer
+  (syntax-parser
+   [(_ _name graph-type)
+    ;; TODO
+    ]))
+
+* Structural node equality
+  * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
+    so that any two nodes which are equal? within the same graph have the same index.
+    I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
+    costly, in principle. If we cached results, we could achieve a better run-time in practice, and
+    perhaps a better theoretical complexity if we handled cycles ourselves.
+    * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
+    * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
+    * We could cache all calls to equal? on two nodes for a limited dynamic scope.
+    * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
+
+* Coloring: the existing graph library for Racket has some coloring algorithms:
+  http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
+  Maybe we can build a wrapper for those?
+\ No newline at end of file

	www Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs \| README

A	features-and-implementation	\|	40	++++++++++++++++++++++++++++++++++++++++
A	features-and-implementation2	\|	143	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	features-and-implementation3	\|	100	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++