Modern PHP Developer
Table Of Content
- Your first iterator class
- Why iterator?
- SPL Iterators
- ArrayObject vs SPL ArrayIterator
- Iterating the File System
- Peeking ahead with CachingIterator
- The end
If you have used a for loop in PHP, the idea of iteration is most likely not foreign to you. You pass an array to a for loop, and perform some logic inside the loop, but did you know that you can actually pass data structures other than arrays to a for loop? That's where Iterator comes into play.
Below is summarised definition of an iterator from Wikipedia:
In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists.[...] Note that an iterator performs traversal and also gives access to data elements in a container, but does not perform iteration [...]. An iterator is behaviorally similar to a database cursor.
Some key points to remember here:
- Iterator enables us to traverse a container. It is similar to arrays.
- Iterator does not perform iteration. In our previous example, for does the iteration. Other loop types such as foreach and while do iteration.
Now that we know the definition of Iterator, the concept may still be somewhat obscure, but do not worry, we aren't done yet. We have now established that Iterator works similar to array and it can be loop through in a for loop.
It is helpful to understand how array actually works in a for loop. Let's take a look at the code below:
Here is how an array works in a for loop:
- In step 1, we set $i to 0.( $i=0 )
In step 2, we check to see $i is less than the length of $data. ( $i
- In step 3, we increase $i value by 1. ( $i++ )
- In step 4, we can access the key of the current element. ( $key = $i )
- In step 5, we can also get the value of current element. ( $value = $data[$i] )
We can abstract the steps as simple functions as below:
- Step 1 = rewind().
- Step 2 = valid().
- Step 3 = next().
- Step 4 = key().
- Step 5 = current().
In abstract level, we can imagine that, as long as an object provides the five functions above, it can be loop through by a for loop.
In fact, an iterator is nothing but a class implements all five steps mentioned above. In PHP, The Standard PHP Library(SPL), which is a collection of interfaces and classes that are meant to solve common problems, provides a standard Iterator interface.
1. Your first iterator class
Now that we understand what an iterator is, it's time to build our first one.
Our first iterator represents top 10 stared PHP repositories from Github. We can pass it into a foreach and loop through it just like an array. We will name it TrendingRepositoriesIterator.
First, we need to make our class implement the Iterator interface.
An iterator must always implement the five methods described above. Final code of TrendingRepositoriesIterator is as follow:
- public function populate(): We will not go in-depth regarding this function as that will defeat the purpose of this chapter. Basically this function fetches top 10 stared PHP repositories from Github via Github public API and store them into $repos property.
- private $repos: We use this property to store fetched repositories.
- private $pointer: We can use array's internal pointer to do the job, however since we are building our own iterator, we want to retain full control.
- public function __construct(): The property fetches target repositories when an object is instantiated.
- public function rewind(): We can use this to set pointer to first position.
- public function valid(): As long as the value of current pointer is set, it is valid.
- public function next(): This is used to increase the pointer by 1 position.
- public function current(): We can return value of current pointer through this function.
Let's see the use case of TrendingRepositoriesIterator, which can used just like an array:
Awesome! Now we have written our first iterator and as you can see, it is actually very easy and straightforward.
2. Why iterator?
You might still wonder why we need to use iterator. Can't we just use array? The answer is yes and no. In most cases, array is sufficient for the job, although iterator does come with some key advantages, which we will share next. Keep in mind, we are by no means suggesting using iterator in all circumstance.
In our first iterator, TrendingRepositoriesIterator, the details of traversing Github repositories is completed hidden from outside. We can update how we get the data, where we get the data from, and how we want to traverse the resources. No change is needed from the client code. This, known as Encapsulation, is one of the key concepts of Object- Oriented Programming.
Additional examples include:
To iterate through MySQl results, we can use:
To iterator through content of a text file, we can:
With iterator, we can encapsulate the process of traversing the recourse so that the outside world is not aware of the internal operations. In fact, the outside world does not need to know where we get the data from or how it is traversed in a loop. All they need to know is that, they can iterate through it as simply as:
Encapsulation is a very powerful concept and it enables us to write clean code.
Efficient memory usage
Efficient memory usage is a key benefit of iterator.
In our TrendingRepositoriesIterator class, we can actually fetch resource dynamically, meaning we will only fetch data from Github API when the next() method is called. This technique is called Lazy Loading. It helps us save a very significant amount of memory as value is only generated when it is needed.
Easy to add additional functionalities
Another benefit of using iterator is that we can decorate it to add additional functionalities. Take our TrendingRepositoriesIterator class for example. We want to exclude "laravel" from the resource. One obvious method is to update our original class, although that is of course not what we would do here.
We can decorate the original iterator using SPL's CallbackFilterIterator and no change is needed for TrendingRepositoriesIterator at all.
The cool part of this is that there is no duplication of objects. The callback fires only when TrendingRepositoriesIterator hits the next() method, and then the logic it will be applied accordingly. This is a great way to save memory as well as boost performance.
3. SPL Iterators
Now that we understand the power and benefits of using iterators, it is good practice to use iterators to solve suitable problems. However if we were to write iterators by ourselves whenever we encounter a new problem, it would be very time consuming since it does require us to implement a set of pre-defined functions.
Luckily PHP has done a good job of providing a set of iterators for solving some common problems. In the following sections, we will work through a set of common iterators provided by SPL. As a refresher, SPL standards for Standard PHP Library was built to provide a collection of interfaces and classes that are meant to solve common problems.
4. ArrayObject vs SPL ArrayIterator
In PHP, array is one of the eight primitive types. PHP provides 79 functions for handling array related tasks (reference). It is completely suitable to use array, however there are times, depending on how much you embrace Object-Oriented programming, that you may want to use array as an object. In this case, PHP provides two classes to make array a first class citizen in Object-Oriented code.
The first option we have is ArrayObject. This class allows objects to work as arrays.
Let's take a look at its class signature:
As we have seen above, ArrayObject implements IteratorAggregate. What is IteratorAggregate? It is an interface to create an external iterator. In simple terms, it is a quick way to create an iterator, instead of implementing Iterator interfaces with five methods: rewind,valid,current,key and valu, IteratorAggregate allows you to delegate that task to another iterator. All you need to do is implement a single method getIterator().
ArrayObject implements IteratorAggregate. It creates an external ArrayIterator for iterator feature.
As ArrayObject implements IteratorAggregate, we can use it in a foreach loop just as an array.
The primary reason we want to use ArrayObject is to use array in Object-Oriented fashion.
ArrayIterator works similar to ArrayObject.
Let's take look at its class signature as well:
It is almost identical to ArrayObject in terms of interfaces they implement. The only difference is, instead of ArrayIterator interface ArrayObject implements, it implements SeekableIterator.
We use ArrayIterator the same way as we use ArrayObject in a foreach loop:
Use array in Object-Oriented fashion:
You may be wondering when to use ArrayObject and when to use ArrayIterator. It is important to know the difference and the relationship between ArrayObject and ArrayIterator.
As we have already discovered in the ArrayObject section, ArrayObject actually creates ArrayIterator as an external iterator. It is fair to say ArrayIterator does what ArrayObject does, and it provides more functionality, specifically seeking to a position. This is accomplished by implementing SeekableIterator.
Besides moving a pointer from top to bottom as iterator, it allows you to randomly jump to a position.
At last, ArrayIterator is part of SPL whereas ArrayObject is not.
5. Iterating the File System
It is a very common task to list outthe content of a given directory. PHP provides lots of functions for handling a file system. One of them is scandir().
Suppose we are given a task to list out all files in a given directory as below:
---books | ---book_item_1.txt | ---book_item_2.txt | ---book_item_3.txt | ---book_item_4.txt
We can accomplish it through scandir() as shown below:
These are two virtual directories("." and "..") you'll find in each directory of the file system.
As this chapter is about iterators, we are going to introduce some iterators for handling filesystem. Hopefully in your next project, you will be able to utilize some of them. Three iterators come in handy: DirectoryIterator, FilesystemIterator and RecursiveDirectoryIterator.
Before we look into each one of them, it is useful to take a look at their inherit relationship:
The DirectoryIterator class provides a simple interface for viewing the contents of filesystem directories.
To accomplish the same task, we can use DirectoryIterator:
The only parameter needed to create a DirectoryIterator object is a directory's path. Compared to scandir function, instead of the file name as a string, DirectoryIterator returns an object. The object holds various information relating to a file, which we can make use.
To accomplish the same task by using FilesystemIterator, we can use:
This looks almost the same as DirectoryIterator, except that FilesystemIterator has automatically filtered out the two virtual directories.
Are they really the same? We can use a simple method to tell the differences:
The result of running above script from CLI is:
Now we can see they are actually quite different internally:
- DirectoryIterator returns an integer as the key and a DirectoryIterator as the value in a loop.
- FilesystemIterator returns a string of full path as the key and a SplFileInfo object as the value in a loop.
In fact, FilesystemIterator comes with a bit more flexibility. When creating a FilesystemIterator object, it accepts a directory's path as the first parameter similar to DirectoryIterator. Moreover, you can optionally pass a second parameter as a flag. This flag is able to configure various aspects of this function.
- FilesystemIterator::CURRENT_AS_PATHNAME: This flag will make FilesystemIterator return file path instead of SplFileInfo object as the value.
- FilesystemIterator::CURRENT_AS_FILEINFO: This flag will make FilesystemIterator return SplFileInfo object as the value. This is the default behavior. You don't have to set it explicitly.
- FilesystemIterator::CURRENT_AS_SELF: This flag will make FilesystemIterator return FilesystemIterator itself as the value.
- FilesystemIterator::KEY_AS_PATHNAME: This flag will make FilesystemIterator return file path as the key. This is the default behavior. You don't have to set it explicitly.
- FilesystemIterator::KEY_AS_FILENAME: This flag will make FilesystemIterator return file name and extension instead of file path as the key.
- FilesystemIterator::FOLLOW_SYMLINKS: This flag will make RecursiveDirectoryIterator::hasChildren() follow symlinks.
- FilesystemIterator::NEW_CURRENT_AND_KEY: This flag helps set two other flags(FilesystemIterator::KEY_AS_FILENAME and FilesystemIterator::CURRENT_AS_FILEINFO) at once.
- FilesystemIterator::SKIP_DOTS: This flag will make FilesystemIterator ignore virtual directories ("." and "..").
- FilesystemIterator::UNIX_PATHS: This flag will make FilesystemIterator use Unix style directory separator() despite what system the PHP script runs on.
6. Peeking ahead with CachingIterator
In this section, we will introduce an iterator with the ability of peeking into next element in an iteration. This feature enables us to do a lot useful things such as, executing something different when iterator reaches the end of the list.
The class with this great power is CachingIterator.
Let's first take a look at it class signature, then, we will go into details of its usage.
CachingIterator inherits from IteratorIterator. What is IteratorIterator? It is simply a wrapper around another iterator, under the hood. It will forward the five Itertator methods( rewind() , current() , key() , valid() , next() ) calls to the iterator it wraps around. We can also retrieve the inner iterator by calling method getInnerIterator() .
Due to the nature of this class, the inner iterator's pointer always moves one step ahead of CachingIterator, and CachingIterator provides a method hasNext() to tell us if it reaches the end of the list. That is how CachingIterator peeks ahead.
Now, let's put it into action.
Result of running above script in a CLI:
Similar to other iterators, to create an CachingIterator instance, we pass in an iterator as the first parameter to the class contractor. As we can see, the real magic behind peeking ahead is provided by method hasNext() . This method is able to tell us if there is an immediate next element.
Beside the first parameter, CachingIterator also optionally accepts a second parameter as a flag.
- CachingIterator::CALL_TOSTRING: It will return __toString of the current element as value. This is the default behavior.
- CachingIterator::CATCH_GET_CHILD: It will capture all exceptions thrown when accessing children.
- CachingIterator::TOSTRING_USE_KEY: It will return the key value when casting the iterator to a string in a loop.
- CachingIterator::TOSTRING_USE_CURRENT: It will return the current value when casting the iterator to a string in a loop.
- CachingIterator::TOSTRING_USE_INNER: It will return the inner iterator casted to a string when casting the iterator to a string in a loop. If we set this flag in the same code as previous example, it will throw an exception. This is because ArrayIterator does not implement the __toString() method.
- CachingIterator::FULL_CACHE: A CachingIterator won't have a key word "caching" in its name if it is not able to do some sort of cache. When this flag is set, it will cache the results, should they ever be iterated for future use.
You are now convinced by the benefits of iterators. They encapsulate the details of traversing and they are much more efficient than creating in-memory arrays. However, everything has its price. To create an iterator, we still have to implement the SPL Iterator interface. You might be terrified of iterators and not want to implement those five methods contracted by Iterator interface. It is time consuming and sometimes even complex to implement them.
Starting from PHP 5.5, you won't be intimidated any more. PHP introduces something, Generators, which provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.
What is exactly a generator? A generator is like a normal PHP function, except that it has a special keyword , "yield", in it.
Below is a simple example of a generator function. We won't have such a generator in the real world application - it is here for demonstration only:
Internally PHP realizes a generator function when it spots the yield keyword. When a generator function is called for the first time, PHP creates a Generator object. This Generator object is an instance of an internal class Generator and Generator class implements the Iterator interface. This way, users are able to create iterators without writing the contracted code, all thanks to PHP generator.
The yield is called when we need to provide the step values. Think of it as return in a function or current method in a regular iterator.
Let's turn one of our first iterator class TrendingRepositoriesIterator to a generator function:
It turns out to be much less code with a generator. We can also use it in a foreach loop, in the same way as we did with TrendingRepositoriesIterator:
Note that generators themselves do not provide anything special - they just make creating iterators simpler. In other words, they are definitely not replacements for iterators.
8. The End
Hopefully this simple tutorial helped you with your development.
If you like our post, please follow us on Twitter and help spread the word. We need your support to continue.
Did we miss out anything? Do leave a comment below to let us know.