Archive for October, 2010

How does WP (3) resolve (pretty) urls

When I was checking a pretty custom WP site, I couldn’t easily figure out how the URLs flow through the system, so I decided to write down what / how I found out how it works.

index.php -> wp-blog-header.php -> wp-load.php -> wp-includes/functions.php -> wp() -> (wp-includes/classes.php/WP)wp->main() -> wp->parse_request()

… is the basic flow used to decode the urls.

The first point you need to look at is:

// Fetch the rewrite rules.
$rewrite = $wp_rewrite->wp_rewrite_rules();

In my case this ($rewrite) contained a ton of rules which I couldn’t really match with any code or rules in the database. Where are these rules coming from? The following comment to the function would indicate that it’s cached:

* WP_Rewrite::rewrite_rules()} is that this method stores the rewrite rules
* in the ‘rewrite_rules’ option and retrieves it. This prevents having to

I wanted to make sure. To find this we have to dive into wp_rewrite_rules() which can be found in wp-includes/rewrite.php;

function wp_rewrite_rules() {
$this->rules = get_option(‘rewrite_rules’);

The above $this->rules now already contained the ‘ghost rules’ so apparently they are stored in the database somehwere. To find out where, you just debug get_option() in functions.php. This brought me into:

$alloptions = wp_load_alloptions();

And options are retrieved, obviously, from:

if ( !$alloptions_db = $wpdb->get_results( “SELECT option_name, option_value FROM $wpdb->options WHERE autoload = ‘yes'” ) )
$alloptions_db = $wpdb->get_results( “SELECT option_name, option_value FROM $wpdb->options” );

So basically we found out that this is a kind of cache indeed. To flush that cache, you go to the wp-admin->Settings->Permalinks and click Save Changes. After that it’s flushed.

After flushing there were still a few of the mysterious rules in the rewrite list. Those were due to;

register_taxonomy( ‘review’, ‘post’, array( ‘label’ => __(‘Review’) ) );

When a custom taxonomy was added, you will see rules like this:

[review/([^/]+)/feed/(feed|rdf|rss|rss2|atom)/?$] => index.php?review=$matches[1]&feed=$matches[2]
[review/([^/]+)/(feed|rdf|rss|rss2|atom)/?$] => index.php?review=$matches[1]&feed=$matches[2]
[review/([^/]+)/page/?([0-9]{1,})/?$] => index.php?review=$matches[1]&paged=$matches[2]
[review/([^/]+)/?$] => index.php?review=$matches[1]

Now that ‘mystery’ was solved, on with the rest of the flow:

In the parse_request() method you can find;

foreach ( (array) $rewrite as $match => $query) {

This is the loop which has all rewrite rules which WP uses to translate from pretty urls to arguments the WP application can use to get the post(s) and other data needed and mesh that with the templates.

if ( preg_match(“#^$match#”, $request_match, $matches) ||
preg_match(“#^$match#”, urldecode($request_match), $matches) ) {
// Got a match.
$this->matched_rule = $match;

Now WP found a match, so we can check what the result is:

die(“$match => $query”);

An example of a matching rule is:

review/([^/]+)/([^/]+)(/[0-9]+)?/?$ => index.php?$matches[1]&name=$matches[2]&page=$matches[3]

This rule is translated based on the URL and then it can be matched, this is done with this code:

$query = addslashes(WP_MatchesMapRegex::apply($query, $matches));

and the result is, for instance;

review/([^/]+)/([^/]+)(/[0-9]+)?/?$ => ms-windows-7&name=windows-7-is-better-but-still-sucks&page=

With the following method this is translated to a php array:

parse_str($query, $perma_query_vars);

This results into something like this:

Array ( [ms-windows-7] => [name] => windows-7-is-better-but-still-sucks [page] => )

With this information, it should be clear (and at least be easy to follow) how WP resolves it’s URLs. With a debugger this is all easy enough to find, but it’s still easy to see it all on a page.