This is my tale of how I built this site, and how you can build the same thing! When you’re done you’ll have a very efficiently loading website, on a content distribution network. Very snazzy!
Your site will not have scaling issues, but you may end up paying a little bit of money if you get a lot of traffic. But I’m talking on the order of pennies or dollars!
Glossary
- AWS: Amazon Web Services. A number of software products built by Amazon to power their vast computational empire. To offset the cost of idle hardware they started letting other people use it. Now its its own “huge thing”. Amazon provides foundationaly software that can be utilized and combine to power anything from an email servers to games, and yes, static websites.
- S3: Simple Storage Service. This is one service offered by Amazon that allows an individual to define files into “buckets” of similar content. Each file has a key that can look like a directory and filename, like
/assignments/econ101/jan2018/assignment-1.docx
. The content of this file can be fetched if you have the right credentials, know its “filename”—which Amazon calls its “key”, and what “bucket” its stored in. Buckets can be configured with complex permissions, and can even serve as an extremely limited web server. Each file has its own set metadata which can inform how S3 delivers your content to web browsers. - Built: When I talk about “Building” the website, I’m talking about the process by which your content is combined with templates and other files to produce the HTML, JavaScript, Cascading Style Sheets, and images that together you collectively call your “website”.
- Deployed: For someone to be able to load your website in a web browser, the files I mentioned above (HTML, JavaScript, et cetera…) must be deployed to a computer with special software or a specialized service that can serve up your content for you. Its generally more expensive to run servers all the time just in case someone goes to your website. So, unless you have a lot of traffic and have a seriously complex dynamic portion of your site, you probably dont need dedicated servers. More economical and efficient is using a service like Amazon’s S3 (mentioned above).
- Markdown: Markdown is a way you can put puncuation around the text in your content that informs the build process how you would like the content to be converted into HTML documents. For example if you would like a word to be a link you would surround the word-to-be-linked with square brackets (example: [Click Here!]) and after the closing square bracket—with no spaces—a URL surrounded by parenthesis (example: (https://www.google.com)) to form (example: [Click Here!](https://www.google.com)) becomes Click Here.
Why Build Your Own? Why not just use Medium?
Before you get started on this path, you should definitely ask yourself if this guide is for you. You will probably be interested in this guide if you answer yes to any of the following questions:
- You want to have the maximum amount of control over how your site is built and deployed.
- You want to have complete ownership over the content and processes that power your website.
- You want to have a site that can scale from 10 visitors a day to 10,000,000 visitors a day and not crash.
- You want to have a site that loads as fast in Beijing, China as it does in Nashville, Tennessee as it does in Seoul, Korea.
- You want all of the above and you don’t want to have to pay through the nose for it.
Tools
Hugo
The engine of the system you will be building is a tool named “Hugo”. Hugo is a computer program in a vein of programs collectivelly refered to as “static site generators.”
Similar tools include Github’s “jekyll”, if you’re heard of that.
All of these tools do the same thing: take your marked-up content and combine it with a “theme” to produce HTML files. These HTML files are then sometimes combined with other files to support your content such as:
- 🎨 Files that define the visual representation of your content, these are called “style sheets.” These style sheets are written in a specialized language called Cascading Stylesheets or CSS. The basics of CSS are very easy to grasp, but the longer you use it and the deeper you, go, the more you realise you know nothing 🤗
- 🤖 Files that contain programming code that can be used to do everything from ensure your users have entered the right kind of data into the right form fields all the way up to some of the most complex applications on the planet. Google’s GMail application is quite complex, and that’s probably not even the tip of the iceberg. These files are written in a programming language called JavaScript. You may have heard of Java, this is not Java. The name is a ridiculous quirk of programming history that’s not worth bothering with. Some people refer to it by its standardized name these days “ECMAScript” or “ES”. There has been a huge explosion in interest in JavaScript/ES over the last decade. For this reason, in the mid-2005 the development of the language began to accelerate through various working groups and such. This has led to a vastly different programming language and environment than you may remember from 1999. There are different levels of ECMAScript. Version 5 (ES5) was released in 2009, ES 5.1 in 2001, ES6 in 2015 (which is also referred to as ES2015) , ES7 (ES2016), ES8 (2017). You can read all about them on Wikipedia if you care.. What you must be careful about is what browser support what. I would wager that as I write this (February 2018), that many browsers yet support most of ES8 (2017). And anyway, this if only if you end up writing any JavaScript–which you may! But you probably won’t write very much right away. But just be aware that if you’re scouting around the internet looking for code and you manage to locate a snippet written for ES8, it may work in some web browsers, and not others (Let talk about cross-browser rendering differences some day!)
- 🖼 Image files that are used in myriad of way. Either in as top-of-article photography, art or illustrations sprinkled between prose, or decorative flourishes used in the design of your site itself.
Once Hugo is finished with its build process, a complete “package” of your site exists and is ready to be delivered to the Amazon services that will handle providing your content to web browsers all around the world!
A little aside about serving up “websites”
The ol’ website serving industry has swung wildly from extreme to extreme on the ol’ statci vs. dynamic spectrum. Originally sites were static html and text. Then people began serving sites dynamically with C and perl using CGI. Then folks found that dynamically rendering every page on every pageload was not something that could scale.
This drove many, most notably SixApart’s MovableType to pursue a static site serving option based upon their notorious Publish Queue (itself based upon the sturdy TheSchwartz
job queue system). Essentially, when someone created a new blog post, the system would calculate what pages needed to be regenerated and scheduled a rebuild job for those pages.
The publish queue workers would grab a pending job, compile the page or pages, and store them to disk.
You could have follow-on jobs that would react to the newly compiled files and copy those files to dumb web frontends. You could further snap things up with Varnish and a CDN like CacheFly or Akamai (then) or S3 and Cloudfront (now).
Hugo is a system that acts very much these, in that code is markup is consumed and html files are compiled. But unlike the Publish Queue and more like Jekyll, most hugo-based tasks are performed on the command line.
A hugo based project consists of a number of directories. Each directory directly affects the HTML files that are produced.
Installing
Follow the directions on the Hugo webpage to get the hugo program installed on your computer.
Getting Started
We’ll just make a basic site for now and slowly upgrade it over this guide.
First you need to create a new “site”. We’re going to call it “The First Byte”:
% hugo new site the_first_byte
Configuration File (config.toml
)
The Hugo configuration file has two broad sections. The first is a global section that controls how Hugo behaves and is at the top of the file:
baseURL = "/"
theme = "cocoa-enhanced"
builddrafts = false
canonifyurls = true
contentdir = "content"
languageCode = "en-US"
title = "The First Byte"
author = "The First Byte"
- baseURL: This will be prepended to any URLs generated by Hugo. You could also set this to something like “http://the-first-byte.com/"
- theme: The name of the theme you’ll be using. We’ll be using one named ‘cocoa-enhanced’. This name must correspond to a directory inside the
themes
directory of your project. - builddrafts: Tells Hugo whether or not you want it to build files which are marked as drafts.
- canonifyurls
Directory Structure
archetypes
content
data
layouts
static
themes
How Hugo produces HTML files
The process that Hugo takes to combine all the files in the aforementioned directories to produce static HTML files.
About Page
TBD
Posts
Raw content is placed in a folder named content
. Top-level pages like an about page can be generated with the command hugo new about.md
. This will create a new file at content\about.md
. This will map to the URL https://the-first-byte.com/about/
.
… more …
Themes
Themes are templates that control how the various pages of the site are composed into HTML files. These templates can use logic, helper methods, access data about pages, and more.
..more..
Finding Themes
TBD
Installing Themes
TBD
Source Control (Git, Github)
TBD
Wercker
Wercker is a service that can be connected to your code, through a Github repository. When you push your code changes to Github, it will send a signal to Wercker and Wercker will schedule a task. This task can be anything really. Wercker is spinning up a small virtual server or dockerized container and is ultimately just running a program on a linux server somewhere.
First you must create a new application, name it something like “The First Byte”. Attach it to a github repos
You’ll set up two pipelines here: build and deploy. The Build pipeline will be connected to a “step” that more-or-less just executes the hugo
command to generate the site’s static files.
The Deploy pipeline will be attached to a step that executes aws s3 sync
with some configuration options.
These two pipelines will be attached in what Wercker calls a “workflow”.
Configuration File (wercker.yml
)
This file allows us to define the various steps I described in the previous steps (build
and deploy
).
Finally, we’ll need to configure the pipeline to recognize these two steps
AWS
AWS will facilitate cheap(ish) static site hosting. You can probably do this for free by using github pages, but I’m not sure you can use a custom domain and have SSL. But if you’re fine with just plain HTTP, check out just using Github Pages.
If you’re going the AWS route, you get a fair bit more control exactly how your content is served, and its probably the next logical step up after Github starts chafing at the edges.
… more ..
S3
S3 is a service that Amazon provides that can both store our static files but also serve them up in a kind of super-simple webserver. S3 is what we will be calling the “origin” of our conent. This is said because this bucket of content is considered the “source of truth” about what files should exist on our site.
When we sync new files up to S3 in Wercker, the website served by S3 will immediately update.
Permissions / Policies
Cloudfront
Distributes your static content to content distribution servers all over the world. This allows your content to load quickly all over the world. If you did not use Cloudfront, users loading your site in Virginia would see very speedy load times. Users in Beijing would see very much the opposite.
CloudFront can work with Route 53 to use DNS to send the user’s browser to a server geographically near to them. That means everyone in the world will see your site loading quickly, not just a select few in America.
Distribution
TBD
Origin
TBD
Behaviors
TBD
Invalidations
IAM
Must generate an access key and secret for Wercker to use.
Access Keys belong to Users Users belong to Groups. Groups can have many Policies Policies can have many Permissions
Creating a custom Policy
Policy will contain five Permissions:
ListBucket
onthe_first_byte.com
bucket.GetObject
onthe_first_byte.com
bucket.DeleteObject
onthe_first_byte.com
bucket.PutObject
onthe_first_byte.com
bucket.PutObjectAcl
onthe_first_byte.com
bucket.
Creating a Group
Group will be named the_first_byte_s3
Creating a User
User will be named the_first_byte_s3
. Programmatic. Get access key ID and key secret. Plug these into Wercker.
Register Domain
Register the domain the-first-byte.com
with whoever you want, you’ll need to point the domain to the nameservers at AWS. What I’ll show you here is how to do it with Amazon’s own domain registration service.
Route 53
Route 53 records will be created automatically when you get the domain setup. You will need to create an Alias record for the root domain that points at the cloudfront distribution.
SSL
First you’ll need to request a certificate from Amazon. Amazon will ask that you verify that your own the domain, and the easiest way is via DNS. Amazon will ask you to make a specific CNAME record, and you’ll do that withing Route 53. Once AWS sees the new DNS record, you will be approved, and the certificate will be issued.
You can then go to your CloudFront distro to use a “Custom SSL Certificate” and link it to the certificate you just made.
DNS
DNS controls the connections between your domain name and the specific servers that will be reponsible for serving the content for your site.
…more…
Editing (Prose.io)
… more …