commit 7bc0b8035631920489f81e15033acb759349be96
parent 60f8915c59dda3618b23497ef5fbf4081adbf47a
Author: Georges Dupéron <georges.duperon@gmail.com>
Date: Fri, 16 Dec 2016 17:34:06 +0100
notes
Diffstat:
3 files changed, 283 insertions(+), 0 deletions(-)
diff --git a/features-and-implementation b/features-and-implementation
@@ -0,0 +1,40 @@
+* gc
+* automatic generation of mappings Old → New
+
+
+* safe and automatic generation of 1↔1, 1↔many and many↔1 backward references
+
+** global directives which apply to all nodes types: insert a pointer to the root, insert a backward pointer for every link (i.e. .back brings back to the previous node)
+
+** the backward references specify the link that should be reversed, as well as a path from there. That way, methods can have a backward pointer to the "call" instructions referencing them, but the backward pointer should not
+ directly point to the "call" instruction, but instead to the instruction's containing method, accessed with ".method". Maybe some syntactic sugar can allow this to be specified as #:backward-reference method.instructions.*[call]
+
+* try to think about PHOAS
+
+** Strong HOAS article: https://www.schoolofhaskell.com/user/edwardk/phoas
+
+
+
+* mapping-placeholder = Placeholder type, one for each mapping
+* mapping-incomplete-result = (~> mapping) types within the mapping's result type are replaced by (U mapping-placeholder mapping-incomplete-result), and node types are replaced by (U xxx old-node)
+* mapping-with-promises-result = node types within the mapping's result type are replaced by node-promise
+
+* node-promise = (Promise node-with-promises)
+* node-with-promises = node types within the node's fields are replaced by node-promise
+
+
+
+
+
+
+
+
+
+
+
+To discuss:
+* maybe add a "self" variable, so that a mapping can give to others a placeholder for the node it is currently generating? That wouldn't play well with mappings which generate multiple nodes and/or part of a node.
+** self is a opaque "SELF" token, not an actual placeholder
+
+Minor TODOs:
+* If we can guarantee that any two nodes within the same graph are (not (equal? …)), then we can implement a fast negative equality test, in addition to the existing fast equality test (which compares the "raw" field of nodes).
diff --git a/features-and-implementation2 b/features-and-implementation2
@@ -0,0 +1,142 @@
+On verification
+===============
+
+* Some guarantees can be enforced statically using the type system.
+ They are in a way proven to be enforeced in the expanded code,
+ but in order to write verified compilers, it would be necessary to
+ prove that the type indeed enforces the desired property, for all
+ possible macro inputs.
+
+ Advantage: these proofs need to be written only once for the graph macro,
+ and can be relied on by all compilers written using these static guarantees.
+* Some guarantees can be enforced statically "by construction",
+ and the design patterns used to enforce them are guaranteed to be
+ correctly applied because the code is macro-generated.
+ In order to write verified compilers, it would be necessary to
+ prove that the macro-generated code always enforces the desired property.
+
+ Advantage: these proofs need to be written only once for the graph macro,
+ and can be relied on by all compilers written using these static guarantees.
+* Some guarantees can be enforced at run-time.
+ This allows catching bugs early, and ensures no incorrect output can occur
+ when these guarantees are broken. It does not however ensure that the compiler
+ will never break these guarantees. It is therefore still possible to ship a
+ broken compiler which fails on valid inputs.
+
+ Drawback: it is necessary for every compiler written using such guards to prove
+ that the guards are never raised.
+
+
+Features
+========
+
+* GC:
+ * Group nodes by SCC
+ * Two kinds of "indices" pointers:
+ * index within the current strongly-connected component
+ * direct pointer to a descendant SCC + index within it
+ * To limit the waste when there is an SCC not containing nodes of a given type,
+ do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
+ Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
+ Either:
+ * use casts
+ * pay the compile-time price of a huge type (not feasible with large N)
+ * pay the run-time price of bloated components (feasible but costly with large N)
+ * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
+ * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
+ run-time
+ * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
+ and also close over some markers indicating to which "real" arrays these correspond.
+ This means O(2n) space usage worst-case.
+ Problem again: the construction function for such a case→ while minimizing the number of values closed on
+ has a large O(2^n) type too.
+ * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
+ * In theory it could be possible to give hints to the GC,
+ but Racket's GC does not offer that possibility.
+
+GC Implementation:
+* Must be optional
+* Initially, all nodes within a single component, all indices are of the "slef + index" kind.
+* Use tarjan's algorithm
+* group nodes by SCC
+ Update references to nodes in the same SCC, using just an index
+ Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
+* Due to the higher memory cost (+ 1 reference for references to other SCCs),
+ and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
+ we will not implement the GC.
+
+ We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
+ not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
+ transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
+ thereby allowing the original graph to be freed.
+
+* Invariants
+ * Scope of the invariants:
+ * Input contracts, types and structural properties
+ * Output contracts, types and structural properties
+ * Transformation invariants (relate the input and output graphs)
+ * Time of verification:
+ * Run-time
+ Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
+ them at run-time. Implemented as some sorts of contracts.
+ * Compile-time
+ * Node types
+ * Statically-enforced structural invariants at the level of node types
+ (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
+ * Macros can be used to generate code which is known to be correct
+ e.g. paths
+ * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
+ e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
+ is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
+ and about the potential interactions.
+ * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
+ that the output contains that flag.
+ * PHOAS
+ * Specification
+ * Invariants specified in the graph type
+ * Transformation invariants specified on the graph creation code
+ * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
+ The graph creation code must enforce all invariants within the type
+ That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
+ construction, or as a guard at the end of the construction.
+
+* Automatic generation of mappings
+ * when there is no mapping taking an old node as an input, a mapping is automatically generated.
+ The mapping simply translates the old node to the new node type, and recursively transforms its fields,
+ traversing lists etc.
+ * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
+ and instead calls to the mapping must be explicit (i.e. can't return the old node type).
+ * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
+ to auto-generate.
+ * We also have a mechanism for auto-calling transformations on input node types
+ * Possibility to escape this, in order to actually insert a reference to the old graph?
+ * Some notation in the type, to indicate that it's "protected" in some way?
+ * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).
+
+* Structural node equality
+ * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
+ so that any two nodes which are equal? within the same graph have the same index.
+ I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
+ costly, in principle. If we cached results, we could achieve a better run-time in practice, and
+ perhaps a better theoretical complexity if we handled cycles ourselves.
+ * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
+ * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
+ * We could cache all calls to equal? on two nodes for a limited dynamic scope.
+ * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
+ * Alpha-equivalence comparisons
+ (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
+ Possibility: create values which are unique within the graph, but are ignored when comparing
+ nodes from different graphs.
+ I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
+ * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
+ It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
+ discard names early on (keeping them just for printing / debuggin purposes).
+
+
+
+
+
+
+* Coloring: the existing graph library for Racket has some coloring algorithms:
+ http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
+ Maybe we can build a wrapper for those?
+\ No newline at end of file
diff --git a/features-and-implementation3 b/features-and-implementation3
@@ -0,0 +1,99 @@
+* Invariants
+ * Scope of the invariants:
+ * Input contracts, types and structural properties
+ * Output contracts, types and structural properties
+ * Transformation invariants (relate the input and output graphs)
+ * Time of verification:
+ * Run-time
+ Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
+ them at run-time. Implemented as some sorts of contracts.
+ * Compile-time
+ * Node types
+ * Statically-enforced structural invariants at the level of node types
+ (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
+ * Macros can be used to generate code which is known to be correct
+ e.g. paths
+ * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
+ e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
+ is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
+ and about the potential interactions.
+ * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
+ that the output contains that flag. Potentially out-of-scope fields in the input do not have the flag.
+ * PHOAS
+ * Specification
+ * Invariants specified in the graph type
+ * Transformation invariants specified on the graph creation code
+ * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
+ The graph creation code must enforce all invariants within the type
+ That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
+ construction, or as a guard at the end of the construction.
+
+; The body should produce a function of type (→ (Listof Nodeᵢ) … Boolean)
+; The body should also return an indication about which invariants it ensures. Maybe use {~seq #:ensures (invariant-name inariant-arg …)} …
+ however this does not allow the body to perform some minimal amount of rewriting on the "ensures" options.
+(define-syntax/parse (define-graph-contract (name g-descriptor arg …) . body)
+ #'(define-syntax name
+ (graph-contract
+ (λ (g-descriptor arg …) . body))))
+
+;; TODO: find a way to translate this to a type, with subtyping for weaker invariants.
+;; The type only serves to testify that the invariant was statically enforced or dynamically checked at construction, it does not actually encode the property using the type system.
+(struct invariants-wrapper ()) ;; Use a private struct to prevent forging of the invariants aggregated in a case→ (since it is never executed, any non-terminating λ could otherwise be supplied).
+(define-for-syntax invariant-introducer
+ (make-syntax-introducer))
+;; the body should return the syntax for a type, such that less precise invariants are supertypes of that type.
+(define-syntax/parse (define-graph-invariant (name g-descriptor arg …) . body)
+ #'(define-syntax name
+ (graph-invariant
+ (λ (g-descriptor arg …) . body))))
+
+The type of a graph node with the invariants inv₁ … invₙ on the graph must include an extra dummy field with type
+(invariants-wrapper
+ (case (→ inv₁ inv-arg … #t)
+ …
+ (→ invₙ inv-arg … #t)))
+
+The invariant arguments can be symbols, to indicate node types or field names.
+
+Concern: invariants on a graph may not have the same semantics if the graph has more or fewer nodes?
+
+* Automatic generation of mappings
+ * when there is no mapping taking an old node as an input, a mapping is automatically generated.
+ The mapping simply translates the old node to the new node type, and recursively transforms its fields,
+ traversing lists etc.
+ * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
+ and instead calls to the mapping must be explicit (i.e. can't return the old node type).
+ * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
+ to auto-generate.
+ * We also have a mechanism for auto-calling transformations on input node types
+ * Possibility to escape this, in order to actually insert a reference to the old graph?
+ * Some notation in the type, to indicate that the output should be an "old" node here
+ * In the rare case where we have an (U (Old nd) nd), in the mapping use some (old v) wrapper
+ indicating that the returned it should merely be unwrapped, not processed.
+
+(define-syntax define-graph-type
+ (syntax-parser
+ [(_ _name [_nodeᵢ [_fieldᵢⱼ :colon _τᵢⱼ] …] …)
+ ;; save the graph type metadata
+ ]))
+
+(define-syntax define-graph-transformer
+ (syntax-parser
+ [(_ _name graph-type)
+ ;; TODO
+ ]))
+
+* Structural node equality
+ * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
+ so that any two nodes which are equal? within the same graph have the same index.
+ I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
+ costly, in principle. If we cached results, we could achieve a better run-time in practice, and
+ perhaps a better theoretical complexity if we handled cycles ourselves.
+ * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
+ * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
+ * We could cache all calls to equal? on two nodes for a limited dynamic scope.
+ * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
+
+* Coloring: the existing graph library for Racket has some coloring algorithms:
+ http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
+ Maybe we can build a wrapper for those?
+\ No newline at end of file