Why do we want to learn HTTP?
Most of the web development application are based on HTTP (Hyper Text Transfer Protocol). We use HTTP protocol to transmit data.
In short, HTTP protocol is the communication format between a client (web browser usually) and a server.
The born/creation of HTTP is to allow more texts to interrelate and eventually form "hyper text" for easier communication. We can say that HTTP is the basic of web communication, and that's something we must learn and understand thoroughly.
The basic concept of HTTP
When we learn about computer networking, we usually separate the network layer into 5 (or more) levels.
One might ask, why do we need to separate the network layers into different layers?
Well, the communication between two computers is actually very complicated and separating them is mainly to break down a difficult problem into multiple simpler problems. Also, when we separate the layers, we only need to care about the layers that are relevant to us, not everything that consists of the layers.
Our HTTP protocol is actually located on the topmost layer, which is also referred as application layer, and that is what we developers care the most.
The basic HTTP communication process
We know that HTTP stays in the application layer and obviously in the web communication process, it also involves other protocols other than HTTP protocol.
When we visit a webpage, we usually type in the web domain name, ex: www.domain.com
in the browser URL bar. When we hit Enter key, it is the DNS (Domain Name System) that translates the domain url into IP address and communicate with the server computer.
When we load the website or perform actions on the website, we simply make a HTTP request to the server, and the server will be responding back with the retrieved message.
And that's HTTP protocol. Next, we have TCP protocol which separates the HTTP data to ensure the data transmission. TCP protocol uses a three-way handshake to establish a connection. During the data transmission, the client sends a mark/signal over to server and the server responds back with a mark/signal as well, once the client receives the message, it sends back the mark/signal to server again. That way, we can ensure the data transmission process is reliable.
Next up is IP protocol. IP protocol sends the generated date over to the IP address. Because the IP address might be changed overtime, we can use ARP protocol to reflect the IP and MAC address as MAC address will not be changed overtime (the fixed physical address of our network card).
In next layer, we will reach the hardware related layer - the data link layer and physical layer. As it is getting way too far than what we intend to learn, I will stop right here.
And that, is the whole process when we request a website page.
Intention of request
The most common request method that we use in web development are POST
and GET
.
We understand that GET
method is usually used to 'get' the data, while the POST
is used to 'post' the data.
As a matter of fact, HTTP protocol also supports other request methods, such as HEAD
, INPUT
, DELETE
, OPTIONS
and so on.
The reason why HTTP provides multiple request methods is to let the server knows what the client wants to do. When the request method is OPTIONS
, the server will usually return what methods it supports for HTTP requests.
Of course, with the prevalence of RESTful architecture, it is also making use of HTTP protocol.
HTTP is stateless
When we say that HTTP is stateless, it simply means that it does not store any state or status when transmitting the data. It doesn't know who the previous communicator is. The purpose of such design is to make HTTP simpler, hence it can process tons of tasks faster all at once.
While the HTTP is stateless, we often use cookie
to keep track of the state or status. For example, the server can send a cookie to the client to remember who the client is. When the client visits the server again, the browser will automatically attach the cookie along with the request over to server and by doing so, the server will be able to identify who the client is easily.
Continuous connection
In HTTP1.0, the HTTP communication will be broken down when the request is done. It is fine if the size or capacity of the requested file is small. Things are different if we request a webpage that is full of resources such as images. Loading each image is considered as one HTTP request to the server (depending where the the image is hosted). While loading the webpage, it continuously builds the TCP connection, obtains the image and breaks the TCP connection after the image is loaded.
With such behavior, it simply consumes way too much of computer resource for a single request. In HTTP1.1, it can now handle multiple requests within one HTTP connection. In another words, it doesn't need to wait for the first server response before it sends out the second request. This is usually what we referred as persistent connection, or HTTP keep-alive, or HTTP connection reuse.
Increase data transmission efficiency
Before we get into that, we need to understand what HTTP entity is.
HTTP entity - to be added.
Commonly used HTTP status code
Status | Code | Description |
---|---|---|
Success | 200 | Process successfully |
Success | 204 | Process successfully but new page not loading |
Success | 206 | Partial content data return (restricted by Content-Range) |
Redirect | 301 | Requested resource has been assigned to new URL (permanent redirection) URL address changed! |
Redirect | 302 | Requested resource has been temporarily assigned to new URL URL address no changed! |
Redirect | 303 | Same as 302, but explicitly require client to use GET method to retrieve resource |
Redirect | 304 | Sent request that doesn't meet criteria (returning expired cache data) |
Redirect | 307 | Same as 302, except it won't change a POST request to GET request |
Client Error | 400 | Request message has syntax error |
Client Error | 401 | Request unauthorized |
Client Error | 403 | Request forbidden |
Client Error | 404 | Request not found |
Server Error | 500 | Internal server error |
Server Error | 503 | Server too busy |
Application program between server and client
A HTTP server can have multiple sites, which means it can be configured to support multiple virtual hosts. When the user visit different sites, the user is actually requesting from the same HTTP server.
There are some application programs mounted in between client and server.
A proxy is a special network service that allows the network terminal (client) to connect to another terminal (usually server) without establishing a direct connection. Network routers usually possess with such functionality. A server that provides such service is also referred as proxy server. The purpose of proxy is to protect one's privacy and security.
The way how proxy works is that the client establishes a connection with the proxy server first, then the proxy server builds connection with the actual target server to obtain the resource such as files. It then downloads the resource to local cache and return it back to client.
Gateway - to be added.
Tunnel - to be added.
HTTP Header
In HTTP message, it consists of the following:
Part | Example |
---|---|
Request Start Line | GET /hello.html HTTP/1.1 |
Request Header | Host: www.example.com User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT) Accept: text/html Accept-Language: en-us Accept-Encoding: gzip, deflate |
General Header | Connection: keep-alive Upgrade-Insecure-Requests: 1< |
Entity Header | Content-Type: multipart/form-data Content-Length: 345 |
Blank Line | |
Message Body | id=12345&value=true |
HTTP message header example:
Location: http://example.com
server tell browser to redirect to that webpageServer: apache tomcat
server tells browser what the web server software isContent-Encoding: gzip
server tells browser the format of compressed dataContnet-Length: 80
server tells browser the length of returned dataContent-Language: en-us
server tells browser the server language environmentContent-Type: text/html
server tells browser the returned data typeLast-Modified: Tue, 11 Jul 2019
server tells browser the data last updated date/timeRefresh: 1; url=http://example.com
server tells browser to auto refresh pageContent-Disposition: attachment; filename = a.zip
server tells browser to download and extract the dataTransfer-Encoding: chunked
server tells browser to return data in piecesSet-Cookie:SS=Q0=5Lb_nQ;path=/search
server tells browser to save cookieExpires: -1
server tells browser not to set up cacheCache-Control: no-cache
same as abovePragma: no-cache
same as aboveConnection: close/Keep-Alive
server tells browser the connection methodDate: Tue, 11 Jul 2019 18:23:41 GMT
server tell browser the data returned date/time- more to be added...
HTTPS in brief
HTTP is not safe by nature as the content is not encrypted at all. It doesn't verify the identity of either server or client. Also, it doesn't prove data integrity (data can be altered by third party before reaching the receipient).
There are some tools that can be used to grab the incoming HTTP request information easily. Even if you encrypt the HTTP message, it is just merely the encryption on the content. When others obtain the HTTP content, they can still alter the content even if they can't crack the content.
The best way to establish a secure HTTP connection is by SSL, we usually refer it as HTTPS (s
stands for secure). We will talk more about HTTPS in the next post.
As for credentials, HTTPS is based on third party certificate authority to obtain valid certificate. Hence, from the certificate, we will be able to identify whether a server is legit or not.
I hope this post can be somewhat useful. There are some parts remaining to be completed soon. That' all for now, see ya!
Post was published on , last updated on .
Like the content? Support the author by paypal.me!