I’ve been in charge of Centre for Digital Media‘s SEO since January 2018. Aside from the technical optimizations I’ve been implementing and various content & UX projects, I’ve come to realize that a big part of my job has been around SEO advocacy and awareness-raising.
Carrying out SEO audits and submitting Github issues is one thing but changing processes and shifting mindsets around SEO leads to higher rewards.
In this post I attempt to demonstrate, through different examples, how the lack of awareness and belief in SEO can lead to higher crawl demand, poor indexation, and ultimately bad UX.
Blog post tags
During my initial audit I found an extremely large number of tag pages on our website. All of them had been indexed by Google. Our tag pages are just List Pages with our header/footer and a list of tagged blog posts in-between.
Our content team had created a large set of tags for each one of our blog posts. Among the generated tags were variations of the same tag (eg. “cdm news” “cdmnews”), acronyms… and more.
Drupal’s tagging feature doesn’t come with an autocomplete function. So typing “cdm news” in the tag box would lead to the creation of the /news/cdm-news tag page, and typing “cdmnews” creates /news/cdmnews.
The content team hadn’t realized this was happening and that they were inadvertently creating hundreds of duplicates every month. This had a huge impact on SEO.
I chose to keep our tag pages as they were useful from a UX perspective but I immediately took steps to remove them from Google’s index as they were providing no real value for SEO. Here are the steps I took to address the problem:
- I redirected all our tag pages to a new /tag/ directory (eg. the tag page /news/cdmnews became /news/tag/cdmnews). This was a cleaner solution and made it easier for me to deindex them;
- I added the rule “Disallow: /news/tag/*” in robots.txt;
- And I’m now developing new SEO copy for our category pages to convert them to key landing pages for our website.
These changes helped me clean up the the sitemap and exclude a whole bunch of low quality pages from Google’s index.
As any 10-year-old website, a fairly big number of duplicate pages had been created over time. I identified two main causes for this: 1) human error and 2) poor back-end design/development.
- Event pages: our events team creates a new event page every time a new event comes up. For every event page, Drupal had been creating a new calendar page to feed our website calendar. Our calendar pages only contained a link to their associated event page, they weren’t providing any SEO or UX value at all. I chose to remove the entire calendar from our website and redirect the entire calendar directory to our events directory with an .htaccess rule.
- Duplicate content pages: our main program page was accessible through two separate URLs (/program/mdm and program//master-digital-media-program) because of Drupal’s weird alias system that lets you add a second URL alias to the same node… This issue can happen on any website and in any organization, but Drupal makes it particularly easy for webmasters to create duplicates. For a landing page like our master’s program page, this had to be fixed; I deleted the alias and added a redirect to /program/mdm.
- Mobile subdomain: our mobile subdomain (m.thecdm.ca) had been configured to redirect to our main website (thecdm.ca) after our responsive redesign. However, I noticed that our ssl certificate had expired for this subdomain. This prevented the server from executing the redirect regex. This led to the indexation of hundreds of mobile pages that were no longer needed. We renewed the certificate and everything was back in order.
As you can tell from the picture above, we had to implement a pagination solution to improve the indexation of our /news/archive directory.
Our news page had an infinite scroll feature that allowed users to scroll through our news posts until the very last one. This feature was convenient, however our server was returning a different URL for every new set of news posts and Google had indexed every single one of these sets (see ?page=* in the URLs above).
There are multiple ways of addressing pagination issues. I evaluated the main solutions and I chose the one that made the most sense from a UX perspective.
Our school has a year-long master’s program and each academic year generates a whole bunch of articles around student projects and industry news. I chose to organize our news articles by year and create year directories instead.
- /news/archive now contains a list of all of our 2018 news articles /news/archive?page=2 became /news/archive/2017
- I added navigation buttons to “Previous year” and “Next year” on each list page
- I implemented the right rel=”next” and rel=”prev” tags on each list page
- I haven’t changed Drupal’s canonical system so far
That led to Google removing every news list page from its index, apart from the main /news/archive page which lists all the most recent news. Our /news/archive page is now more likely to rank and our visitors will have an easier time finding our most recent news.
I know that solving pagination issues requires more than that but I chose to address it step-by-step for a few reasons: 1) testing and seeing how Google’s crawler would react, and 2) because the following two tweaks require quite a lot of dev time.
- A canonical on the first page only
- A “noindex follow” tag on all the news list pages, except the first one
Status code 200?
I encountered an anomaly with our CMS and our back-end configuration. Some directories were returning a valid 200 status code with random parameters after the last “/”.
Because of internal linking errors (mostly), Google had indexed those pages and they were all showing up in the SERP. I changed the server response to 404 and everything was back to normal.
Immediate next steps
- Improve the sitemap cron job to make sure any noindex/redirected/deleted pages are removed from our sitemaps.
- …more to come.