Since this is a beginner’s guide, let’s start with the basics.
What is Technical SEO?
Technical SEO is the process of optimizing your website to help search engines like Google find, crawl, understand, and index your pages. The goal is to be found and improve rankings.
How complicated is technical SEO?
It depends. The fundamentals aren’t really difficult to master, but technical SEO can be complex and hard to understand. I’ll keep things as simple as I can with this guide.
In this chapter we’ll cover how to make sure search engines can efficiently crawl your content.
How crawling works
Crawlers grab content from pages and use the links on those pages to find more pages. This let’s them find content on the web. There are a few systems in this process that we’ll talk about.
A crawler has to start somewhere. Generally they would create a list of all the URLs they find through links on pages. A secondary system to find more URLs are sitemaps that are created by users or various systems that have lists of pages.
All the URLs that need to be crawled or re-crawled are prioritized and added to the crawl queue. This is basically an ordered list of URLs Google wants to crawl.
The system that grabs the content of the pages.
These are various systems that handle canonicalization which we’ll talk about in a minute, send pages to the renderer which loads the page like a browser would, and processes the pages to get more URLs to crawl.
These are the stored pages that Google shows to users.
There are a few ways you can control what gets crawled on your website. Here are a few options.
A robots.txt file tells search engines where they can and can’t go on your site.
Just one quick note. Google may index pages that they can’t crawl if links are pointing to those pages. This can be confusing but if you want to keep pages from being indexed check out this guide and flowchart which can guide you through the process.
There’s a crawl-delay directive you can use in robots.txt that many crawlers support that lets you set how often they can crawl pages. Unfortunately, Google doesn’t respect this. For Google you’ll need to change the crawl rate in Google Search Console as described here.
If you want the page to be accessible to some users but not search engines, then what you probably want is one of these three options:
- Some kind of login system;
- HTTP Authentication (where a password is required for access);
- IP Whitelisting (which only allows specific IP addresses to access the pages)
This type of setup is best for things like internal networks, member only content, or for staging, test, or development sites. It allows for a group of users to access the page, but search engines will not be able to access them and will not index the pages.
How to see crawl activity
For Google specifically, the easiest way to see what they’re crawling is with the Google Search Console Crawl Stats report which gives you more information about how they’re crawling your website.
If you want to see all crawl activity on your website, then you will need to access your server logs and possibly use a tool to better analyze the data. This can get fairly advanced, but if your hosting has a control panel like cPanel, you should have access to raw logs and some aggregators like Awstats and Webalizer.
Each website is going to have a different crawl budget, which is a combination of how often Google wants to crawl a site and how much crawling your site allows. More popular pages and pages that change often will be crawled more often, and pages that don’t seem to be popular or well linked will be crawled less often.
If crawlers see signs of stress while crawling your website, they’ll typically slow down or even stop crawling until conditions improve.
After pages are crawled, they’re rendered and sent to the index. The index is the master list of pages that can be returned for search queries. Let’s talk about the index.
In this chapter we’ll talk about how to make sure your pages are indexed and check how they’re indexed.
A robots meta tag is an HTML snippet that tells search engines how to crawl or index a certain page. It’s placed into the <head> section of a web page, and looks like this:
<meta name="robots" content="noindex" />
When there are multiple versions of the same page, Google will select one to store in their index. This process is called canonicalization and the URL selected as the canonical will be the one Google shows in search results. There are many different signals they use to select the canonical URL including:
The easiest way to see how Google has indexed a page is to use the URL Inspection Tool in Google Search Console. It will show you the Google-selected canonical URL.
One of the hardest things for SEOs is prioritization. There are a lot of best practices, but some changes will have more of an impact on your rankings and traffic than others. Here are some of the projects I’d recommend prioritizing.
Make sure pages you want people to find can be indexed in Google. The two previous chapters were all about crawling and indexing and that was no accident.
You can check the Indexability report in Site Audit to find pages that can’t be indexed and the reasons why. It’s free in Ahrefs Webmaster Tools.
Reclaim lost links
Websites tend to change their URLs over the years. In many cases, these old URLs have links from other websites. If they’re not redirected to the current pages then those links are lost and no longer count for your pages. It’s not too late to do these redirects and you can quickly reclaim any lost value. Think of this as the fastest link building you will ever do.
Site Explorer -> yourdomain.com -> Pages -> Best by Links -> add a “404 not found” HTTP response filter. I usually sort this by “Referring Domains”.
This is what it looks like for 1800flowers.com.
Looking at the first URL in archive.org, I see that this was previously the Mother’s Day page. By redirecting that one page to the current version, you’d reclaim 225 links from 59 different websites and there are plenty more opportunities.
You’ll want to 301 redirect any old URLs to their current locations to reclaim this lost value.
Add internal links
Internal links are links from one page on your site to another page on your site. They help your pages be found and also help the pages rank better. We have a tool within Site Audit called “Link opportunities” that helps you quickly locate these opportunities.
Add schema markup
Schema markup is code that helps search engines understand your content better and powers many features that can help your website stand out from the rest in search results. Google has a search gallery that shows the various search features and the schema needed for your site to be eligible.
The projects we’ll talk about in this chapter are all good things to focus on, but they may require more work and have less benefit than the quick win projects from the previous chapter. That doesn’t mean you shouldn’t do them, this is just to help you get an idea of how to prioritize various projects.
Page experience signals
These are lesser ranking factors, but still things you want to look at for the sake of your users. They cover aspects of the website that impact user experience (UX).
Core Web Vitals
Core Web Vitals are the speed metrics that are part of Google’s Page Experience signals used to measure user experience. The metrics measure visual load with Largest Contentful Paint (LCP), visual stability with Cumulative Layout Shift (CLS), and interactivity with First Input Delay (FID).
HTTPS protects the communication between your browser and server from being intercepted and tampered with by attackers. This provides confidentiality, integrity and authentication to the vast majority of today’s WWW traffic. You want your pages loaded over HTTPS and not HTTP.
Any website that shows a lock icon in the address bar is using HTTPS.
Simply put, this checks if web pages display properly and are easily used by people on mobile devices.
How do you know how mobile-friendly your site is? Check the “Mobile Usability” report in Google Search Console.
These are checks to make sure pages aren’t deceptive, don’t include malware, and don’t have any harmful downloads.
Interstitials block content from being seen. These are popups that cover the main content that users may have to interact with before they go away.
Hreflang — for multiple languages
Hreflang is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to tell search engines like Google about these variations. This helps them to serve the correct version to their users.
General maintenance / website health
These tasks aren’t likely to have much impact on your rankings, but are generally good things to fix for user experience.
Broken links are links on your site that point to non-existent resources—these can be either internal (i.e., to other pages on your domain) or external (i.e., to pages on other domains.)
You can find broken links on your website quickly with Site Audit in the Links report. It’s free in Ahrefs Webmaster Tools.
Redirect chains are a series of redirects that happen between the initial URL and the destination URL.
You can find redirect chains on your website quickly with Site Audit in the Redirects report. It’s free in Ahrefs Webmaster Tools.
These tools help you improve the technical aspects of your website.
Google Search Console
Google Search Console (previously Google Webmaster Tools) is a free service from Google that helps you monitor and troubleshoot your website’s appearance in their search results.
Use it to find and fix technical errors, submit sitemaps, see structured data issues, and more.
Bing and Yandex have their own versions and so does Ahrefs. Ahrefs Webmaster Tools is a free tool that’ll help you improve your website’s SEO performance. It allows you to:
- Monitor your website’s SEO health
- Check for 100+ SEO issues
- View all your backlinks
- See all the keywords you rank for
- Find out how much traffic your pages are receiving
- Find internal linking opportunities
- It’s our answer to the limitations of Google Search Console.
Google’s Mobile-Friendly Test
Google’s Mobile-Friendly Test checks how easily a visitor can use your page on a mobile device. It also identifies specific mobile-usability issues like text that’s too small to read, the use of incompatible plugins, and so on.
The mobile-friendly test shows what Google sees when they crawl the page. You can also use the Rich Results Test to see the content Google sees for desktop or mobile devices.
Chrome DevTools is Chrome’s built-in web page debugging tool. Use it to debug page speed issues, improve web page rendering performance, and more.
From a technical SEO standpoint, it has endless uses.
Ahrefs SEO Toolbar is a free extension for Chrome and Firefox that provides useful SEO data about the pages and websites you visit.
Its free features are:
- On-page SEO report
- Redirect tracer with HTTP Headers
- Broken link checker
- Link highlighter
- SERP positions
In addition, as an Ahrefs user, you get:
- SEO metrics for every site and page you visit, and for Google search results
- Keyword metrics, such as search volume and keyword difficulty, directly in SERP
- SERP results export
PageSpeed Insights analyzes the loading speed of your web pages. Alongside the performance score, it also shows actionable recommendations to make pages load faster.
Let’s wrap this up
All of this is just scratching the surface of technical SEO. This should help you with the basics and many of the sections have additional links for you to dive in further. There are many other topics that weren’t covered in this guide so I put together a list if you want to learn more.
Have fun exploring and learning. Message me on Twitter if you have any questions.