ArchiveOrangemail archive

The Haskell Cafe


haskell-cafe.haskell.org
(List home) (Recent threads) (40 other Haskell lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • Moderate traffic list: up to 30 messages per day
  • This list contains about 115,781 messages, beginning Oct 2000
  • 2 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Chuch encoding of data structures in Haskell

Ad
Günther Schmidt 1274972055Thu, 27 May 2010 14:54:15 +0000 (UTC)
Hi all,

I'm exploring the use of church encodings of algebraic data types in 
Haskell.
Since it's hard to imagine being the first to do so I wonder if folks 
here could point me to some references on the subject.

I'm looking for examples of church encodings in Haskell a little bit 
beyond Church Booleans and Church Numerals.

Günther
C. McCann 1274985911Thu, 27 May 2010 18:45:11 +0000 (UTC)
2010/5/27 Günther Schmidt : > I'm exploring the use of church encodings of algebraic data types in > Haskell. > Since it's hard to imagine being the first to do so I wonder if folks here > could point me to some references on the subject. > > I'm looking for examples of church encodings in Haskell a little bit beyond > Church Booleans and Church Numerals.
The fully general description of Church encoding is rather simple, but I've rarely seen it described explicitly. Consider the type of Church encodings for Bool, Either, and the 2-tuple (written here as "Pair" for clarity): churchedBool :: t -> t -> t churchedEither :: (a -> t) -> (b -> t) -> t churchedPair :: (a -> b -> t) -> t And compare the signatures for the constructors of the non-Church encoded types: True :: Bool False :: Bool Left :: a -> Either a b Right :: b -> Either a b Pair :: a -> b -> Pair a b We can observe two patterns: 1) The Church encodings take as many arguments as the type has constructors 2) The type of each argument is the same as the signature of a constructor, except returning an arbitrary type. As this suggests, "Church decoding" is as simple as applying the Church encoded type to each of the constructors. >From the above, a general description of Church encoding can be deduced: The encoding of a value is a function that replaces each data constructor with an arbitrary function. The Church encoding represents, in a way, the most generalized means of using values of that type--which is why Haskell already includes variations of "Church encode" functions for a few standard types, like so: encodeEither x = \f g -> either f g x encodeMaybe x = \z f -> maybe z f x encodeTuple x = \f -> uncurry f x encodeBool x = \t e -> if x then t else e But what of Church numerals? First, we must consider the Church-encoding of recursive data types. Given arbitrary nested data types, there's nothing else that can be done--the outer types know nothing of the types they contain. But if an inner type is known to be the same as the outer type, there are two options for the encoding: Work only with the outermost value, as with non-recursive types, or work with the recursive value as a whole, by having the outermost value pass its arguments inward. Now, consider the signature of a Church numeral: churchedNumeral :: (t -> t) -> t -> t Given the above, what can we say about the equivalent "decoded" data type? It takes two arguments, so we have two constructors. The second argument is a single value, so the associated constructor is nullary. The first argument must be associated with a unary constructor, but look at the parameter it takes: the same type as the result! This is how we can tell that Church numerals are the encoding of a recursive type. Since the only way a recursive constructor can do anything with the values it contains is to pass its arguments inward, the value it has to work with is the result of doing so. Thus, the other constructor must take a single argument of its own type. What does this look like as a standard data type? data Nat = S Nat | Z Good old inefficient Peano numbers, of course! Keeping all that in mind, consider a List type, and both ways of encoding it: data List a = Nil | Cons a (List a) churchedListOuter :: t -> (a -> ___ -> t) -> t churchedListRecursive :: t -> (a -> t -> t) -> t For the "outermost" method, what type belongs in place of the ___? The second argument of Cons is itself a List, so naively we would like to simply put the type of "churchedListOuter" itself in place of ___, but that won't work. In fact, nothing will work here, because recursion on an "outermost encoded" list is impossible in a typed λ-calculus without some means of using general recursion provided as a primitive operation. Nested tuples can be used, but the length of the list will be reflected in the type, preventing polymorphism over arbitrary lists. This is also why Church encoding is most often seen in the setting of the untyped λ-calculus. The implicitly recursive encoding, however, presents no such problems. So, perhaps a function to Church encode lists would be useful? Indeed it would, and as before, it already exists: encodeList x = \z f -> foldr f z x Recall the earlier observation that "decoding" involves applying the encoded type to its equivalent constructors; the same holds true for recursive types, as demonstrated by right-folding a list with (:) and [], or applying a Church numeral to 0 and (+ 1). Regarding recursive data types as the least fixed point of a non-recursive data type is thus tied to replacing the "outermost" encoding with the "recursive" encoding, and the "Church encode" function for a recursive type is simply a generalized right fold, partially applied. Now, the descriptions above are rather informal, and ignore the possibility of infinite lazy recursive data structures, among other gaps; but perhaps will help to get you started regardless. - C.
Günther Schmidt 1274990027Thu, 27 May 2010 19:53:47 +0000 (UTC)
Hello C,

thank you for explaining.

The funny thing is that I have never seen anybody take this even a 
single step further than you have in your email.

In particular I have not found anything where someone might use church 
encoding to solve a quite practical problem, namely for implementing 
extensible records.

For instance suppose we'd have

data Person = Person { firstName :: String, lastName :: String }

person1 :: Person
person1 = Person "John" "Doe"

alternatively, and I'm not 100% if this what is commonly understood as 
"church encoding" I'm just guessing, we could write this:

person1' :: (String -> String -> x) -> x
person1' = \fn -> fn "John" "Doe"

Now from a practical standpoint, up to here anyway, the first encoding 
is far more useful.
But there is a problem when we want to extend the Person type by a new 
field, age for instance. Person as such is *not* extensible, modifying 
it would require us to rewrite existing code.

But person1' is easily extended:

person1'' :: (String -> String -> Int -> x) -> x
person1'' = \fn -> person1' fn  36

or more generally:

(.+.) :: (t1 -> t -> t2) -> t -> t1 -> t2
rec .+. v = \fn -> rec fn v

person1''' :: (String -> String -> Int -> x) -> x
person1''' = person1'' .+. 36

There is also a way to shrink the "record" from the left:

drp :: ((t1 -> t) -> b) -> t -> b
drp rec = \fn -> rec $ \_ -> fn


I can think of a number of applications for this and especially I 
believe it should be possible to create more "combinators".

The approach is so simple and trivial that it must have occurred to 
people a hundred times over. Yet I do not find any other examples of 
this. Whenever I google for church encoding the examples don't go beyond 
church numerals.

Am I googling for the wrong keywords?

Best regards

Günther
Am 27.05.10 19:10, schrieb C. McCann: > 2010/5/27 Günther Schmidt<gue.schmidt@web.de>: >> I'm exploring the use of church encodings of algebraic data types in >> Haskell. >> Since it's hard to imagine being the first to do so I wonder if folks here >> could point me to some references on the subject. >> >> I'm looking for examples of church encodings in Haskell a little bit beyond >> Church Booleans and Church Numerals. > > The fully general description of Church encoding is rather simple, but > I've rarely seen it described explicitly. Consider the type of Church > encodings for Bool, Either, and the 2-tuple (written here as "Pair" > for clarity): > > churchedBool :: t -> t -> t > churchedEither :: (a -> t) -> (b -> t) -> t > churchedPair :: (a -> b -> t) -> t > > And compare the signatures for the constructors of the non-Church encoded types: > > True :: Bool > False :: Bool > Left :: a -> Either a b > Right :: b -> Either a b > Pair :: a -> b -> Pair a b > > We can observe two patterns: 1) The Church encodings take as many > arguments as the type has constructors 2) The type of each argument is > the same as the signature of a constructor, except returning an > arbitrary type. As this suggests, "Church decoding" is as simple as > applying the Church encoded type to each of the constructors. > > From the above, a general description of Church encoding can be > deduced: The encoding of a value is a function that replaces each data > constructor with an arbitrary function. The Church encoding > represents, in a way, the most generalized means of using values of > that type--which is why Haskell already includes variations of "Church > encode" functions for a few standard types, like so: > > encodeEither x = \f g -> either f g x > encodeMaybe x = \z f -> maybe z f x > encodeTuple x = \f -> uncurry f x > encodeBool x = \t e -> if x then t else e > > But what of Church numerals? First, we must consider the > Church-encoding of recursive data types. Given arbitrary nested data > types, there's nothing else that can be done--the outer types know > nothing of the types they contain. But if an inner type is known to be > the same as the outer type, there are two options for the encoding: > Work only with the outermost value, as with non-recursive types, or > work with the recursive value as a whole, by having the outermost > value pass its arguments inward. > > Now, consider the signature of a Church numeral: > > churchedNumeral :: (t -> t) -> t -> t > > Given the above, what can we say about the equivalent "decoded" data > type? It takes two arguments, so we have two constructors. The second > argument is a single value, so the associated constructor is nullary. > The first argument must be associated with a unary constructor, but > look at the parameter it takes: the same type as the result! This is > how we can tell that Church numerals are the encoding of a recursive > type. Since the only way a recursive constructor can do anything with > the values it contains is to pass its arguments inward, the value it > has to work with is the result of doing so. Thus, the other > constructor must take a single argument of its own type. What does > this look like as a standard data type? > > data Nat = S Nat | Z > > Good old inefficient Peano numbers, of course! > > Keeping all that in mind, consider a List type, and both ways of encoding it: > > data List a = Nil | Cons a (List a) > > churchedListOuter :: t -> (a -> ___ -> t) -> t > churchedListRecursive :: t -> (a -> t -> t) -> t > > For the "outermost" method, what type belongs in place of the ___? The > second argument of Cons is itself a List, so naively we would like to > simply put the type of "churchedListOuter" itself in place of ___, but > that won't work. In fact, nothing will work here, because recursion on > an "outermost encoded" list is impossible in a typed λ-calculus > without some means of using general recursion provided as a primitive > operation. Nested tuples can be used, but the length of the list will > be reflected in the type, preventing polymorphism over arbitrary > lists. This is also why Church encoding is most often seen in the > setting of the untyped λ-calculus. > > The implicitly recursive encoding, however, presents no such problems. > So, perhaps a function to Church encode lists would be useful? Indeed > it would, and as before, it already exists: > > encodeList x = \z f -> foldr f z x > > Recall the earlier observation that "decoding" involves applying the > encoded type to its equivalent constructors; the same holds true for > recursive types, as demonstrated by right-folding a list with (:) and > [], or applying a Church numeral to 0 and (+ 1). > > Regarding recursive data types as the least fixed point of a > non-recursive data type is thus tied to replacing the "outermost" > encoding with the "recursive" encoding, and the "Church encode" > function for a recursive type is simply a generalized right fold, > partially applied. > > Now, the descriptions above are rather informal, and ignore the > possibility of infinite lazy recursive data structures, among other > gaps; but perhaps will help to get you started regardless. > > - C.
Brandon S. Allbery KF8NH 1275008230Fri, 28 May 2010 00:57:10 +0000 (UTC)
On May 27, 2010, at 13:44 , Günther Schmidt wrote: > The approach is so simple and trivial that it must have occurred to > people a hundred times over. Yet I do not find any other examples of > this. Whenever I google for church encoding the examples don't go > beyond church numerals.
Hm. If I reorder your (.+.) slightly and reparenthesize: reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) churchedNumeral :: (t -> t ) -> t -> t t unifies with (t1 -> t2), giving us a Church numeral made up of (t1,t2). (I think.)
Brandon S. Allbery KF8NH 1275003801Thu, 27 May 2010 23:43:21 +0000 (UTC)
On May 27, 2010, at 19:07 , Brandon S. Allbery KF8NH wrote: > reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) > churchedNumeral :: (t -> t ) -> t -> t > > t unifies with (t1 -> t2), giving us a Church numeral made up of > (t1,t2). (I think.)
Which also explains why that record representation isn't used: it's as inefficient as Peano numbers are.
Dan Doel 1275008588Fri, 28 May 2010 01:03:08 +0000 (UTC)
On Thursday 27 May 2010 7:15:15 pm Brandon S. Allbery KF8NH wrote: > On May 27, 2010, at 19:07 , Brandon S. Allbery KF8NH wrote: > > reordered_cons :: (t -> (t1 -> t2)) -> t -> (t1 -> t2) > > churchedNumeral :: (t -> t ) -> t -> t > > > > t unifies with (t1 -> t2), giving us a Church numeral made up of > > (t1,t2). (I think.) > > Which also explains why that record representation isn't used: it's > as inefficient as Peano numbers are.
Church encoding has additional efficiency problems, too. If we look at Peano numerals for instance, the Church encoding is: forall r. r -> (r -> r) -> r This means that our only method of computation with them is the structurally recursive fold. So, suppose we want to implement a predecessor operation. With Haskell-like data, we do: pred :: Nat -> Nat pred Zero = Zero pred (Suc n) = n This operation is expected to be O(1). What if we want to write this using only fold? Well, we have to do tricks (this same trick works to define tail with foldr): pred n = snd $ fold (Zero, Zero) (\ ~(m, _) -> (Suc m, m)) So, we've done it, but if you look closely, you'll see that we've actually rebuilt the entire 'tail' of the number. So this definition is O(n) (at least, if it all gets evaluated). So, Church encodings are less efficient than a direct implementation is capable of being. There is a dual problem with codata. You can encode the extended natural numbers as: exists s. (s, s -> Either () s) but this only gives you the anamorphism. So, if you want to write the successor operation, it's: succ n = (Nothing, step) where step Nothing = Right (Just n) step (Just m) = case observe m of Left () -> Left () Right m' -> Right (Just m') so we end up re-unfolding the entire number. Taking the paramorphism/apomorphism as a primitive operation fixes this, I believe, but you cannot, to my knowledge, encode types that way in, say, System F, because it would involve: Nat = forall r. r -> (Nat -> r -> r) -> r which is still a recursive equation. -- Dan
Stephen Tetley 1275008459Fri, 28 May 2010 01:00:59 +0000 (UTC)
Of interest, (.+.) is the T combinator - called (##) in Peter
Thiemann's Wash and the queer bird in Raymond Smullyan's To Mock a
Mockingbird.

Your technique might well relate to the 'element transforming style'
of Wash, see the Modelling HTML in Haskell paper.

Best wishes

Stephen
Jason Dagit 1275017882Fri, 28 May 2010 03:38:02 +0000 (UTC)
2010/5/27 Günther Schmidt > Hello C, > > thank you for explaining. > > The funny thing is that I have never seen anybody take this even a single > step further than you have in your email. > > In particular I have not found anything where someone might use church > encoding to solve a quite practical problem, namely for implementing > extensible records. >
I think this article uses Church encoded lists, I could be wrong: http://okmij.org/ftp/Algorithms.html#zip-fold... Jason
Stefan Monnier 1275009163Fri, 28 May 2010 01:12:43 +0000 (UTC)
> churchedBool :: t -> t -> t

Important detail: the precise type is "∀t. t → t → t".

> encodeBool x = \t e -> if x then t else e

So the type of encodeBool should be:

 Bool → ∀t. t → t → t

whereas Haskell will infer it to be

  ∀t. Bool → t → t → t

which means that a given object can only be eliminated to one type.
I.e. to make such an encoding really usable, you need "deep
polymorphism" (which GHC supports just fine, but which is not part of
the Haskell standard).


        Stefan
C. McCann 1275012576Fri, 28 May 2010 02:09:36 +0000 (UTC)
On Thu, May 27, 2010 at 9:11 PM, Stefan Monnier wrote: > I.e. to make such an encoding really usable, you need "deep > polymorphism" (which GHC supports just fine, but which is not part of > the Haskell standard).
Ah, yes, and thank you for pointing that out. My message involved a great deal of hand-waving that I neglected to clearly identify as such. The same caveat of course applies to most Church encodings. For instance, the proper type of a (recursive) encoded list would be: type ChurchList a = ∀t. t -> (a -> t -> t) -> t ...where "a" is fixed as the type of the list elements. Thus, "cons" ought to look something like this charming type: cons :: ∀a. a -> (∀t. t -> (a -> t -> t) -> t) -> (∀t. t -> (a -> t -> t) -> t) For extra fun, try writing an instance such as: instance (Show a) => Show (ChurchList a) where [...etc.] ...at which point, perhaps, we remind ourselves that the language is named "Haskell", not "Alonzo", and drop the whole matter. - C.
wren ng thornton 1275023959Fri, 28 May 2010 05:19:19 +0000 (UTC)
Stefan Monnier wrote: >> churchedBool :: t -> t -> t > > Important detail: the precise type is "∀t. t → t → t". > >> encodeBool x = \t e -> if x then t else e > > So the type of encodeBool should be: > > Bool → ∀t. t → t → t > > whereas Haskell will infer it to be > > ∀t. Bool → t → t → t
Those are the same type. > which means that a given object can only be eliminated to one type. No, the t is universally quantified just fine. Perhaps you're thinking of the need for rank-2 polymorphism in functions which take Church-encoded arguments?
Ivan Miljenovic 1275024319Fri, 28 May 2010 05:25:19 +0000 (UTC)
On 28 May 2010 15:18, wren ng thornton wrote: > Stefan Monnier wrote: >>> >>> churchedBool :: t -> t -> t >> >> Important detail: the precise type is "∀t. t → t → t". >> >>> encodeBool x = \t e -> if x then t else e >> >> So the type of encodeBool should be: >> >> Bool → ∀t. t → t → t >> >> whereas Haskell will infer it to be >> >> ∀t. Bool → t → t → t > > > Those are the same type.
I can see a slight distinction between them, based upon when the quantification occurs (the former returns a function that work on all t, the latter will do that encoding for all t). For most purposes there is no difference, but IIUC the former will let you do hlist style stuff post-encoding whilst the latter doesn't.
wren ng thornton 1275030064Fri, 28 May 2010 07:01:04 +0000 (UTC)
Ivan Miljenovic wrote: > On 28 May 2010 15:18, wren ng thornton wrote: >> Stefan Monnier wrote: >>>> churchedBool :: t -> t -> t >>> Important detail: the precise type is "∀t. t → t → t". >>> >>>> encodeBool x = \t e -> if x then t else e >>> So the type of encodeBool should be: >>> >>> Bool → ∀t. t → t → t >>> >>> whereas Haskell will infer it to be >>> >>> ∀t. Bool → t → t → t >> >> Those are the same type. > > I can see a slight distinction between them, based upon when the > quantification occurs (the former returns a function that work on all > t, the latter will do that encoding for all t).
The isomorphism in System F: in : (∀t. Bool → t → t → t) → (Bool → ∀t. t → t → t) in f = \b. /\t. f @t b out : (Bool → ∀t. t → t → t) → (∀t. Bool → t → t → t) out g = /\t. \b. g b @t If you're using the raw System F, then there's a slight difference ---as there always is between isomorphic things which are non-identical--- but that difference is not theoretically significant. If you're using a higher-level language like Haskell, then the conversions are done for you, so there's even less difference. The expression 1+1 is distinct from 2: it has a different representation in memory, it takes a different number of steps to normalize, etc. Sometimes these differences are important for engineering (e.g., dealing with space leaks), but they're rarely significant for theory.
Ivan Lazar Miljenovic 1275008188Fri, 28 May 2010 00:56:28 +0000 (UTC)
Günther Schmidt writes: > Hi all, > > I'm exploring the use of church encodings of algebraic data types in > Haskell. > Since it's hard to imagine being the first to do so I wonder if folks > here could point me to some references on the subject. > > I'm looking for examples of church encodings in Haskell a little bit > beyond Church Booleans and Church Numerals.
It's not quite practical, but have a look at Paul Bailes' Totally Functional Programming papers: http://www.itee.uq.edu.au/~paul/tfp-papers/ Also, Types and Programming Languages by Benjamin Pierce (who keeps reminding me of M*A*S*H, even though his middle name starts with a C...) has a decent section on church encodings.
Ad
Home | About | Privacy