Please Buy now at Leanpub
Written and illustrated by GetContented.
Published on 2016-08-02.
Let’s say we have a movie collection, and for some reason, we're only interested in movies whose first letter is in the first half of the alphabet. In our program, we'll call those good movies, and the others bad.
movies =
[ "Aeon Flux"
, "The Black Cat"
, "Superman"
, "Stick It"
, "The Matrix Revolutions"
, "The Raven"
, "Inception"
, "Looper"
, "Hoodwinked"
, "Tell-Tale"
]
We'd like to be able to make a new list of our movies with “good” or “bad” appended to the name so we know which we can watch.
What we need is a function that decides if a movie is good:
isGood :: String -> Bool
isGood (x:_) = x <= 'M'
isGood _ = False
This kind of function, one that returns a Bool value that is depending on another value, is called a predicate function. It’s yet another fancy word which comes to us from the subject of Logic. It simply means something can be either affirmed or denied about something else. That’s exactly what’s going on here. We’re finding out if something is good or bad, by our arbitrary distinction, and returning a Bool
value depending on that.
The first line takes any String
of at least one element, matches x to the first element, then throws the rest away, and uses the (<=) :: Ord a => a -> a -> Bool
operator to check if it’s earlier in the alphabet than 'M'
.
There are some new things here, obviously. First is that we’ve got single quotation marks around M. What is the type of this thing? Well it’s 'M' :: Char
. In Haskell, single quotation marks are used to indicate that something is a Char
value.
As we mentioned before, a String
is simply [Char]
. That means "Hello"
is the exact same thing as ['H', 'e', 'l', 'l', 'o']
. So, String
has a special syntax, just like list has a special syntax. You can either write strings as a list of Char
values, or you can write them with double quotation marks around them.
Back to our predicate function, we notice we’re using the (<=)
function. This is called “less than or equal to”. Its type indicates it takes two values of type constrained by the Ord
typeclass. This typeclass is for types whose values can be compared in some ordered way (hence the name).
We’re matching on the String
argument using (:)
, grabbing out the head as x
, comparing it to the Char
value ‘M’
to see if it’s in the first half of the alphabet, because A
is “less than or equal to” M
. The head of a list is the Haskell name for its first item, and the tail is the name for the remainder.
The Ord
typeclass provides a type with the compare
, (<)
, (<=)
, (>)
, (>=)
, max
, and min
functions. They provide functionality to do various comparisons of values.
The end result of the above is a function that checks if a movie’s title starts with a letter in the first half of the alphabet.
Now we'll see a function that takes a movie title and gives us a new name for it, depending on its good/bad assessment:
assess :: String -> String
assess movie = movie ++ " - " ++ assessment
where assessment = if isGood movie
then "Good"
else "Bad"
In this function, we’re introducing you to a where
clause. This isn’t an expression, so we can’t embed where
clauses anywhere we like, but rather they let us write definitions that are only “in effect” within the level above where they appear. This is called local scoping. That is, the scope of the definition only applies locally to the where
clause.
This is why in the assess
function above, we can use the definition for assessment
within the function’s body expression.
We can have multiple definitions and even functions written in where
clauses. Let’s see another definition of the assess function that has two definitions in its where
clause:
assess' :: String -> String
assess' movie = movie ++ " - " ++ assessment
where assessment = if movieIsGood
then "Good"
else "Bad"
movieIsGood = isGood movie
Here we’ve just pulled isGood movie
out into its own locally scoped definition. We don’t recommend it, but if you wanted to get crazy, you could even put where
clauses within definitions within other where
clause definitions. Usually if you have this level of nesting, you should consider pulling your functions into smaller pieces.
Next we're going to see a partial function so we can explain how it works to you in stages. Here, by partial we mean it's not total, because it’s missing some cases of values of its arguments. That is, the function doesn't cover every possibility. This is not a good thing to do, ordinarily. We're just doing it here to show how we build up a function in steps.
So, this function we want to write should take a list of movies and give us back the newly named movie list...
assessMovies :: [String] -> [String]
assessMovies [] = []
assessMovies [y] = [assess y]
... but it doesn't, because it only works properly on lists of length 0 or 1, and we already know that in general, lists can be any length. The [y]
pattern only matches against lists of one item, naming the item y
as it does, and here we're returning a new list with the newly named movie in it by passing it to our assess
function. We see we can have function applications inside of lists without trouble.
Next, we’ll see that we add a pattern for lists of two items and change the whole thing to use the (:)
operator as a pattern-matcher, because we know [x]
and (x:[])
are the same thing, but also to use the (:)
operator to make the new list rather than how we did above:
assessMovies :: [String] -> [String]
assessMovies [] = []
assessMovies (y:[]) = assess y : []
assessMovies (x:y:[]) = assess x : assess y : []
This is a little better, but what about lists with more than two items? Also, it’s getting tedious writing so many definitions, and a pattern of repetition is emerging. As we become programmers, we need to learn to see these repeated patterns, because they're usually an indication that something can be done in a better, or more abstract way.
One of the first patterns to notice is our second definition works for 1-item lists, and our third pattern-match has the exact same 1-item list pattern in it as the second definition.
If we could match the (y:[])
part as one whole variable name in the third pattern, then somehow replace it with an application of the assessMovies function itself, we might be able to reduce these two definitions down to just one.
Well, you probably guessed that we can do this, and in fact this is exactly what we're going to do. This process of using the function we're defining within that function definition itself is called recursion, and it's one of the foundational underpinnings of Haskell.
So these two duplicated patterns can be resolved with a single definition, as long as we keep our empty list as a base case, otherwise we'd end up in an endless cycle. The empty list is our way out of the recursion. The recursive definition handles everything other than the empty list, which the base case handles.
assessMovies :: [String] -> [String]
assessMovies [] = []
assessMovies (x:xs) = assess x : assessMovies xs
So, if we had the list ["The Matrix"]
, that's equivalent to "The Matrix" : []
, and so matches the second pattern with "The Matrix"
as x
, and []
as xs
. The body of the function uses the list construction operator (:)
to join the result of assess x
to the application of assessMovies
with its argument of the empty list, which results in the empty list as the base case, ending the recursion. You should think about how this works across a few different lists of different sizes.
Another way to think of this is if you had to crack some eggs. You could write down a definition for making cracked eggs like this: “crack eggs” is: take a line of eggs. Crack the first egg, then do “crack eggs” on the rest of the eggs until there are none left. See how “crack eggs” is both the definition and used in itself? It's the same idea here, except Haskell creates a new list rather than modifies an existing one. We define building a list of assessed movies as: if the list is empty, give an empty list, otherwise take the first item and make its assessment String, joining that as the head of the list to the evaluation of the rest of the assessed movie titles list.
It's really beautiful, isn't it? You can start to see why having purity can really pay off. We can do this kind of substitution and equational reasoning easily. Why? precisely because we have pure functions, and because Haskell gives us these nice abstract pieces to work with.
This pattern - applying a function (the function assess
here) to each element of a list — is so common that there’s actually a function called map
that extracts this mapping functionality. It takes a mapping function and a list as arguments. Its type is map :: (a -> b) -> [a] -> [b]
. This type says that map is a function of two arguments, the first one of which is a function.
The assess
function’s type is String -> String
. This fits into the pattern of the first argument of map: a -> b
. In Haskell a -> b
means the types can be different, but don’t have to. It's a function of one type to a second type (which may or may not be the same type). So, assess :: String -> String
fits the first argument to map
. If we supply the function map
with the function assess
as its first argument, that means we’re saying the type a
in map
’s type signature must be String
, and also that the type b
must be String
, too. That means the second argument and the return type must both be [String]
.
Let’s rewrite assessMovies
using map
to get an intuition of how to use it, and build a small full program around it. If any of this is unclear, the next chapter will most likely clear it up as it delves more into functions such as map
.
import qualified Data.List as L
movies =
[ "Aeon Flux"
, "The Black Cat"
, "Superman"
, "Stick It"
, "The Matrix Revolutions"
, "The Raven"
, "Inception"
, "Looper"
, "Hoodwinked"
, "Tell-Tale"
]
isGood :: String -> Bool
isGood (x:_) = x <= 'M'
isGood _ = False
assess :: String -> String
assess movie = movie ++ " - " ++ assessment
where assessment = if isGood movie
then "Good"
else "Bad"
assessMovies :: [String] -> [String]
assessMovies = map assess
assessedMovies :: [String]
assessedMovies = assessMovies movies
main :: IO ()
main = putStrLn (L.intercalate "\n" assessedMovies)
First we import the Data.List
package so we can use the intercalate function (which takes a joining list of a type, and a list of Lists the same type and builds a fresh list by joining each element of the list together with the joining item between). Here we’re using it with [Char]
, and [[Char]]
which is the same as String
and [String]
, because String
is just a type alias for [Char]
.
If we were to define a “new” version of String
, we could do it like this, because String
is simply a type alias to a list of Char
, otherwise known as a type synonym:
-- type sets up a type alias:
type NewString = [Char]
This is how we create a type alias (or type synonym). Once we have this in our program, everywhere Haskell sees NewString
in your program, it will interpret it as [Char]
. Note that these are not different types, they're identical, they just have different interchangeable names.
Anyway, after this, back in our main program, we set up the movies expression, and the isGood
and assess
functions. Then we create the assessMovies
function which takes the list of movies and uses map
with assess
to build a list of assessed movies. Once that is done, we create the assessedMovies
expression that simply applies assessMovies
to the movies list.
The main
function then simply prints out with the passed in newline ("\n"
is the special newline character in a String
) between each of the assessedMovies
(that’s what intercalate
does).
See if you can change the program so that it has a different first letter that decides which movies are bad and good.
After you’ve done that, see if you can change the program so that it has different movie titles in it.
If you’ve enjoyed reading this, please consider purchasing a copy at Leanpub today
Please follow us, and check out our videos Follow @HappyLearnTutes
Also, Volume 2 is now in beta and being written! Show your support and register your interest at its Leanpub site.