Why do we want to learn HTTP?

Most of the web development application are based on HTTP (Hyper Text Transfer Protocol). We use HTTP protocol to transmit data.

In short, HTTP protocol is the communication format between a client (web browser usually) and a server.

The born/creation of HTTP is to allow more texts to interrelate and eventually form "hyper text" for easier communication. We can say that HTTP is the basic of web communication, and that's something we must learn and understand thoroughly.

The basic concept of HTTP

When we learn about computer networking, we usually separate the network layer into 5 (or more) levels.

One might ask, why do we need to separate the network layers into different layers?
Well, the communication between two computers is actually very complicated and separating them is mainly to break down a difficult problem into multiple simpler problems. Also, when we separate the layers, we only need to care about the layers that are relevant to us, not everything that consists of the layers.

Our HTTP protocol is actually located on the topmost layer, which is also referred as application layer, and that is what we developers care the most.

The basic HTTP communication process

We know that HTTP stays in the application layer and obviously in the web communication process, it also involves other protocols other than HTTP protocol.

When we visit a webpage, we usually type in the web domain name, ex: www.domain.com in the browser URL bar. When we hit Enter key, it is the DNS (Domain Name System) that translates the domain url into IP address and communicate with the server computer.

When we load the website or perform actions on the website, we simply make a HTTP request to the server, and the server will be responding back with the retrieved message.

sample-http-request

And that's HTTP protocol. Next, we have TCP protocol which separates the HTTP data to ensure the data transmission. TCP protocol uses a three-way handshake to establish a connection. During the data transmission, the client sends a mark/signal over to server and the server responds back with a mark/signal as well, once the client receives the message, it sends back the mark/signal to server again. That way, we can ensure the data transmission process is reliable.

Next up is IP protocol. IP protocol sends the generated date over to the IP address. Because the IP address might be changed overtime, we can use ARP protocol to reflect the IP and MAC address as MAC address will not be changed overtime (the fixed physical address of our network card).

In next layer, we will reach the hardware related layer - the data link layer and physical layer. As it is getting way too far than what we intend to learn, I will stop right here.

And that, is the whole process when we request a website page.

Intention of request

The most common request method that we use in web development are POST and GET.

We understand that GET method is usually used to 'get' the data, while the POST is used to 'post' the data.

As a matter of fact, HTTP protocol also supports other request methods, such as HEAD, INPUT, DELETE, OPTIONS and so on.

The reason why HTTP provides multiple request methods is to let the server knows what the client wants to do. When the request method is OPTIONS, the server will usually return what methods it supports for HTTP requests.

Of course, with the prevalence of RESTful architecture, it is also making use of HTTP protocol.

HTTP is stateless

When we say that HTTP is stateless, it simply means that it does not store any state or status when transmitting the data. It doesn't know who the previous communicator is. The purpose of such design is to make HTTP simpler, hence it can process tons of tasks faster all at once.

While the HTTP is stateless, we often use cookie to keep track of the state or status. For example, the server can send a cookie to the client to remember who the client is. When the client visits the server again, the browser will automatically attach the cookie along with the request over to server and by doing so, the server will be able to identify who the client is easily.

Continuous connection

In HTTP1.0, the HTTP communication will be broken down when the request is done. It is fine if the size or capacity of the requested file is small. Things are different if we request a webpage that is full of resources such as images. Loading each image is considered as one HTTP request to the server (depending where the the image is hosted). While loading the webpage, it continuously builds the TCP connection, obtains the image and breaks the TCP connection after the image is loaded.

With such behavior, it simply consumes way too much of computer resource for a single request. In HTTP1.1, it can now handle multiple requests within one HTTP connection. In another words, it doesn't need to wait for the first server response before it sends out the second request. This is usually what we referred as persistent connection, or HTTP keep-alive, or HTTP connection reuse.

Increase data transmission efficiency

Before we get into that, we need to understand what HTTP entity is.

HTTP entity - to be added.

Commonly used HTTP status code

StatusCodeDescription
Success200Process successfully
Success204Process successfully but new page not loading
Success206Partial content data return (restricted by Content-Range)
Redirect301Requested resource has been assigned to new URL (permanent redirection)
URL address changed!
Redirect302Requested resource has been temporarily assigned to new URL
URL address no changed!
Redirect303Same as 302, but explicitly require client to use GET method to retrieve resource
Redirect304Sent request that doesn't meet criteria (returning expired cache data)
Redirect307Same as 302, except it won't change a POST request to GET request
Client Error400Request message has syntax error
Client Error401Request unauthorized
Client Error403Request forbidden
Client Error404Request not found
Server Error500Internal server error
Server Error503Server too busy

Application program between server and client

A HTTP server can have multiple sites, which means it can be configured to support multiple virtual hosts. When the user visit different sites, the user is actually requesting from the same HTTP server.

There are some application programs mounted in between client and server.

A proxy is a special network service that allows the network terminal (client) to connect to another terminal (usually server) without establishing a direct connection. Network routers usually possess with such functionality. A server that provides such service is also referred as proxy server. The purpose of proxy is to protect one's privacy and security.

The way how proxy works is that the client establishes a connection with the proxy server first, then the proxy server builds connection with the actual target server to obtain the resource such as files. It then downloads the resource to local cache and return it back to client.

Gateway - to be added.

Tunnel - to be added.

HTTP Header

In HTTP message, it consists of the following:

PartExample
Request Start LineGET /hello.html HTTP/1.1
Request HeaderHost: www.example.com
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Accept: text/html
Accept-Language: en-us
Accept-Encoding: gzip, deflate
General HeaderConnection: keep-alive
Upgrade-Insecure-Requests: 1<
Entity HeaderContent-Type: multipart/form-data
Content-Length: 345
Blank Line
Message Bodyid=12345&value=true

HTTP message header example:

  • Location: http://example.com server tell browser to redirect to that webpage
  • Server: apache tomcat server tells browser what the web server software is
  • Content-Encoding: gzip server tells browser the format of compressed data
  • Contnet-Length: 80 server tells browser the length of returned data
  • Content-Language: en-us server tells browser the server language environment
  • Content-Type: text/html server tells browser the returned data type
  • Last-Modified: Tue, 11 Jul 2019 server tells browser the data last updated date/time
  • Refresh: 1; url=http://example.com server tells browser to auto refresh page
  • Content-Disposition: attachment; filename = a.zip server tells browser to download and extract the data
  • Transfer-Encoding: chunked server tells browser to return data in pieces
  • Set-Cookie:SS=Q0=5Lb_nQ;path=/search server tells browser to save cookie
  • Expires: -1 server tells browser not to set up cache
  • Cache-Control: no-cache same as above
  • Pragma: no-cache same as above
  • Connection: close/Keep-Alive server tells browser the connection method
  • Date: Tue, 11 Jul 2019 18:23:41 GMT server tell browser the data returned date/time
  • more to be added...

HTTPS in brief

HTTP is not safe by nature as the content is not encrypted at all. It doesn't verify the identity of either server or client. Also, it doesn't prove data integrity (data can be altered by third party before reaching the receipient).

There are some tools that can be used to grab the incoming HTTP request information easily. Even if you encrypt the HTTP message, it is just merely the encryption on the content. When others obtain the HTTP content, they can still alter the content even if they can't crack the content.

The best way to establish a secure HTTP connection is by SSL, we usually refer it as HTTPS (s stands for secure). We will talk more about HTTPS in the next post.

As for credentials, HTTPS is based on third party certificate authority to obtain valid certificate. Hence, from the certificate, we will be able to identify whether a server is legit or not.


I hope this post can be somewhat useful. There are some parts remaining to be completed soon. That' all for now, see ya!

Post was published on , last updated on .

Like the content? Support the author by paypal.me!