A Definitive Guide to SEO with a Serverless React Application in 2019

In this post, we will go through many of the options for building a React app that will get properly crawled by search engines and social media sites. This isn't totally exhaustive but it focuses on options that are serverless so you don't have to manage a fleet of EC2s or Docker containers.

An often overlooked aspect when you are getting started with building full stack web applications in React is SEO because you have so many other components to build to even get the site working that it is easy to forget about it until the end. The tricky thing is that you can't even tell that it isn't working until you submit your site to Google and then come back a week later after it has crawled your site to see that none of your beautiful meta tags are showing up when you do a google search of your site. The left shows what the Google result shows up as, while the right is what you’d expect based on the dynamic tags that you are setting.

Rendering is critical for displaying SEO on dynamic pages.

The cause for this is rooted in a common design pattern for starting your site with the create-react-app generator, so let's go through it. After creating the boilerplate, you can add page titles and meta tags with React Helmet. Here’s what a React Helmet component might look like for a typical static page:

const seo = {
  title: 'About',
  description: 'This is an awesome site that you definitely should check out.',
  url: 'https://www.mydomain.com/about',
  image: 'https://mydomain.com/images/home/logo.png',
}

<Helmet
  title={`${seo.title} | Code Mochi`}
  meta={[
    {
      name: 'description',
      property: 'og:description',
      content: seo.description,
    },
    { property: 'og:title', content: `${seo.title} | Code Mochi` },
    { property: 'og:url', content: seo.url },
    { property: 'og:image', content: seo.image },
    { property: 'og:image:type', content: 'image/jpeg' },
    { property: 'twitter:image:src', content: seo.image },
    { property: 'twitter:title', content: `${seo.title} | Code Mochi` },
    { property: 'twitter:description', content: seo.description },
  ]}
/>

When seo is pulled from static data, there are no issues- Google will scrape all of it. We run into troubles when seo relies on fetching data from a server. This is the case if instead of a static about page, we wanted to make a blog page where we pull that data from an api using GraphQL or REST. In that case, seo would be empty initially and be filled in later after we receive data from the server. Here's what a blog page might look like with React Apollo:

const BlogPage = ({ match }) => {
    <Query variables={{name: match.params.title}} query={BLOG_QUERY}>
        {({ data, loading }) => {
            const blog = _.get(data, 'blog');
            if (loading || !blog) return <Loading />;
            const { seo } = blog;

            return (
                <div>
                    <Helmet
                        title={`${seo.title} | Code Mochi`}
                        meta={[
                            { name: 'description', property: 'og:description', content: seo.description },
                            { property: 'og:title', content: `${seo.title} | Code Mochi` },
                            { property: 'og:url', content: seo.url },
                            { property: 'og:image', content: seo.image },
                            { property: 'og:image:type', content: 'image/jpeg' },
                            { property: 'twitter:image:src', content: seo.image },
                            { property: 'twitter:title', content: `${seo.title} | Code Mochi` },
                            { property: 'twitter:description', content: seo.description },
                        ]} />
                    <div>
                        //Code for the Blog post.
                    </div>
                </div>
            )
        }
    </Query>
}
export default withRouter(BlogPage);

Initially, when the data is loading, the <BlogPage> will simply return the <Loading /> component. It's only when the loading is done that we move to the main portion of the code block, so the <Helmet> component will not be invoked until that point. Ideally, we’d like the Google crawler to wait on the page long enough until the data is loaded, but unfortunately, it is not something that we have control over.

There are a couple of approaches you can take to solve this problem and they all have their tradeoffs. We'll first go over some concepts:

Server-Side Rendering

This is where you have a server that runs your frontend website. When it receives a request for a page, the server will take the first pass at rendering the page before it sends you the HTML, js, and css. Any data that needs to be fetched from an api will be fetched by the frontend server itself and the page will be rendered before anything is delivered to the user's browser. This will ensure that a blog page has all of its title and meta tags rendered before it reaches the user. Since the Google web crawler acts like a user, the page that it receives will be all pre-populated with the correct title and meta tags as well so they will be ingested properly.

Static Site Rendering

This is where every page on your website will be pre-rendered at the time of building your site. This is distinguished from Server Side Rendering because instead of a server actively rendering a page when requested, all the possible site pages are pre-rendered and available without any further building required. This approach works especially well with static hosting solutions such as AWS S3 because an actively running server is not needed.

These are the two main classes of rendering, but there are several solutions for these two approaches:

Next.js

Next.js is a server-side rendering framework for React. It will render pages on the fly as they are requested from a user. There are two modes that it can operate in:

Option 1. Actively running a server.

This will run Next.js on an EC2 instance or possibly as a Docker Container.

Pros:

Standard way of running Next.js.

Cons:

Have to pay for an actively running server even if it isn't being used. Looking at $15/month minimum.
Need to manage scaling up and down server instances as the demand for your site goes up and down. This is where Docker, Kubernetes and a host of managed services come into play and things get complicated really fast at that point. The pro is that at that point your site is probably successful enough that you could pay a DevOps person to take care of this aspect if it is something you don't want to deal with.
Not currently AWS Amplify compatible.

Option 2. As a lambda function.

Next.js recently introduced a new mode called serverless where you can build each individual page as a lambda function that gets hosted either through AWS or using Zeit's now service.

Pros:

Serverless- you just pay for what you use. Likely will be in the free tier until you have hundreds or thousands of users (depending on usage patterns obviously).
Scales up and down effortlessly.

Cons:

Need to watch out for the payload size, can't have too many npm packages loaded.
Can have a slow initial load time if the site hasn't been visited in a while. These so-called cold starts are based on the complexity of your page and the dependencies you have.
Each page is an entire copy of your website, so it gets download each time someone navigates around (but gets cached in user's browser after).
Not currently AWS Amplify compatible.

Gatsby

Gatsby is a static site rendered framework for React. It renders pages during the build time so all possible pages have already been rendered as separate html files and are ready to be downloaded before they are even uploaded to the server. This site is actually rendered using this method!

Pros:

Blazingly fast: nothing to render so the page load times are super fast. Google PageSpeed Insights is going to love your site because it is so quick.
Great for SEO- all title and metatags are generated during build time so Google has no trouble reading it.
AWS Amplify compatible.

Cons:

Can be bad for dynamic sites where not all possible page combinations are known at build time. An example might be an auction website or something where users are generating content.
No good way to create all possible pages during build time because the data from an api can change in the future.
Needs additional finagling to handle both static content and dynamic content because you'll have some api calls happening during build time and others during run time.

Gatsby can render dynamics routes, but since the pages are being generated by the client instead of on a server, they will not be populated with the correct metatags and title. Static content will still, however, load. If you had a site that was a Marketplace, as an example, Google would be able to fetch the tags for the static portions of the site, such as the home page or posts page, but it wouldn't be able to get the tags for the individual post page posts/:id, because its title and tags need data from the server to populate.

Prerender.cloud

This is a service that sits in front of your application and pre-renders the content before it delivers it back to the client or the Google web crawler. I’ve used this service before and it works great- PocketScholar, a science app I previously built uses this technique.

Pros:

It will pre-render any webpage on demand so it is like Next.js but it will work with an existing create-react-app or statically generated site such as Gatsby or create-react-app's build static option.
You deploy it yourself using a cloud formation stack on your AWS account.
AWS Amplify compatible.
You are serving your site from a static s3 bucket, so it will infinitely scale as you get more users and you only pay for what you use.

Cons:

It's a service that you pay for based on the number of requests that your web application gets per month. It’s initially free but then is $9/month for 600-20,000 requests.
It doesn’t eliminate the cold starts that are present with AWS lambda- it can take a few seconds to load a website if the lambda hasn’t been used in the past 25 minutes or so.

Conclusion

There are a few ways to handle React and SEO and each has its benefits and drawbacks. Here is a table with the highlights:

Benefits	Create-React-App	Prerender.cloud	Gatsby	Gatsby with dynamic routes	Next.js	Next.js serverless mode
Pay for what you use	X	X	X	X		X
Seamlessly scale	X	X	X	X		X
Fast initial load times			X	X	X	X
Blazingly fast initial load times			X	X	X
Render dyamic and static content	X	X		X	X	X
Create new pages and routes without a rebuild	X	X			X	X
Webcrawler / Social Media scrapable (static)	X	X	X	X	X	X
Webcrawler / Social Media scrapable (dynamic)		X	*	*	X	X
AWS Amplify Compatible	X	X	X	X

* A Gatsby dynamic route will not set the metadata or title because it needs to fetch data from the server.

Starting with Create React App (CRA), we can see that while it is serverless which makes it easy for scalability and cost, that it fails for SEO purposes for any content that is dynamic. Prerender.cloud is a good option to put in front of a CRA app because it adds the rendering capability for search engines and social media sharing purposes, but it has the disadvantage of cold starts from the lambda function which can make it a little slow if the site hasn't been accessed in the past 25 minutes.

Gatsby is great for static sites and it wins in the speed department. It will allow you to have dynamic routes, but it won't allow you to benefit from SEO on those routes because it will need to fetch data from the server when rendering those routes on the user's browser.

Next.js is great for both dynamic and static routes but you've previously had to manage a running server instance. A glimmer of the best of all worlds lies in the serverless mode for Next.js. Although slower than Gatsby, pages get rendered on the fly so all relevant tags will be populated with their proper values for web crawlers. The only potential downside with this approach is that Next.js is currently not supported by AWS Amplify, so you can't use it with the automated deployment pipeline, authentication, or App Sync GraphQL endpoint.

Code Mochi

A Definitive Guide to SEO with a Serverless React Application in 2019

Learn something new? Share it with the world!

There is more where that came from!