Casie Bao


All the lights we cannot see


HTTP

Introduction

HTTP is a TCP/IP based communication protocol, used to deliver data, such as HTML files, images, query results etc., on the World Wide Web. HTTP specification specifies how client’s request data will be constructed and send to the server, and how the servers respond to these requests. It has the following benefits:

  • stateless: clients and servers are aware of each other only during a current request. The HTTP client initiates an HTTP request, and after the request made, the client waits for the response. The server processes the request and sends a response back after which client disconnect and both of them forget about each other.
  • HTTP has two versions. HTTP/1.0 and HTTP/1.1 (HTTP/2.0 just came up and we will skip that in this post). HTTP/1.0 uses a new connection for each request/response exchange. HTTP/1.1 differs from it as the same connection can be used for more than one resquest/response exchanges.

HTTP Parameters

The following are important parameters:

  • HTTP version
  • URI: it is case-insensitive
  • Date/Time Formats: all HTTP data/time stamps must be in GMT.
  • Character sets: it specifies the character sets the client prefer. By default, it uses US-ASCII. Users can also change into ISO-8859-1 or ISO-8859-7.
  • Content encodings: it specifies an encoding algorithm that has been used to encode the content before passing it over the network. It allows a doc to be compressed or usefully transformed without losing the identity.
  • Media types: HTTP uses “Content-Type” and “Accept” header fields in order to provides open and extensible data typing and type negotiation. The general syntax to specify a media type is “type/subtype”. For example, “image/gif”, “text/html”. These types’ name are case-insensitive.
  • Language tags: HTTP uses language tags within the “Accept-Language” and “Content-Language” fields. The general syntax is “primary tag-(subtag)”. For instance, “en”, “en-us”, “en-cockney”.

HTTP Client

A HTTP client is a program that establishes a connection to a server for the purpose of sending HTTP request messages. For example, a HTTP client can be a web browser or any other client.

A HTTP request message has the following format:

<A request line>
<zero of more (General/Request/entity) Fields>
<empty line>
<optionally a message body>

A request line

The general syntax for a request line is “ SP SP HTTP-version " where SP stands for separator, and CRLF means a line feed "\n". For example, "GET /hello.html HTTP/1.1".

  • HTTP Method: GET/POST/PUT/DELETE
  • Request URI: used to establish a tunnel to the server identified by the URI

HTTP Server

A HTTP server is a program that accepts connections in order to serve HTTP requests by sending HTTP response messages. For example, it can be a web server like Apache.

A HTTP response includes the HTTP version, a status code, and a MIME-like message containing server information, entity meta information etc.

A HTTP response message has the following format:

<A status line>
<zero of more (General/Response/entity)-Header Fields>
<empty line>
<optionally a message body>

Status line

A status line has the general syntax “ SP SP CRLF", where the status code is a 3-digit integer, and reason phrase is to disply error condition. For example, a status line is "HTTP/1.1 404 Not Found."

Response Field

Response fields give information about the server and about further access to the resource identified by the request URI (e.g., further access includes proxy authenticate etc.).

General-Header Fields

Below are some important fields specified in the general fields section.

  • Cache-Control: the goal is to eliminate the need to send requests in many cases and to eliminate the need to send full responses in many cases. Thus, to improve the performance. It is either a “cache-request-directive” or “cache-response-directive” to specify parameters for the cache or to request certain kinds of documents from the cache. For example, “cache-control: no-cache” means a cache must not use the response to satisfy a subsequent request without successfully revalidation with the origin server. (NOT clear, need to rewrite)
  • Connection: it is either “close” or “keep-alive”. Close means that a connection will be closed after completion of the response. On the other hand, HTTP/1.1 by default uses persistent connection(get asked by a staff DevOps, is project’s calling optimization endpoint using http persistent connection), where connection doesn’t automately closed after a transaction. So a “keep-alive” status is shown.
  • Date
  • Upgrade: allows the client to specify what additional communication protocols it supports and would like to use it if the server finds it appropriate to switch protocols. For example, “upgrade: HTTP/2.0”. Note that for websocket protocols, it is based on HTTP protocols with an additional upgrade included in header to indicate it is for websockets. (NOT CLEAR, NEED RE_WRITE)

Client Request-Header Fields

  • Authentization: credentials consists of credentials containing the authentication information of the user agent.
  • cookie: contains a name/value pair of information stored for that URI. For example: “Cookie: ="
  • Host: to specify the internet host and port of the resources being requested. By default is port 80. For example, a request line is “GET /pub/www HTTP/1.1” and host-header is “HOST: www.w3.org”. It is a request on the origin server for “www.w3.org/pub/www”.

Server Response-Header Fields

  • Via: be used by gateways and proxies to indicate the intermediate protocols and recipicents. For example, “Via: 1.0 fred 1.1 nowhere.com (Apache/1.1)”. It is a request message send from an HTTP/1.0 user agent to an interal proxy named “fred”, which uses a HTTP/1.1 to forward the request to a public proxy “nowhere.com”, which completes a request by forwarding it to the origin server “www.ics.uci.edu”. The request is received by “www.ics.uci.edu” have the above “Via” header field. It has the general syntax “Via: [ "/" ] [ ":" ]", where "" defines the public proxy URL and port.

Entity-Header Fields

It defineds meta information about the entity body or if no body is present, it is about the resource identified by the request.

Below are some important entity-headers:

  • Allow: list the set of methods supported by the resource identified by the URI. e.g., “Allow: GET, PUT”
  • Content-Type
  • Content-Encoding
  • Content-Language
  • Content-Location: supply the source location for the entity enclosed in the message when entity is accessible from a location separated from the requested resources URI. (NEED CLARIFICATION, for example, in db)

Message Body

The message body part is optionally for an HTTP message. But if it is available, then it is used to carry the entity-body associated with the request or response. If entity-body is associated, usually “Content-Type” and “Content-Length” headers lines specify the nature of the body associated. A message body is the one which carries the actual HTTP request data (including form, image data etc), and HTTP response data from the server (including files, images etc).

An example of a message body. It is a file of “Content-type: text/html”:

<html>
    <body>
        <h1> Hello!</h1>
    </body>
</html>

HTTP Security

An HTTP usually experence the following security threats:

  • proxies and caching: proxies have access to security-related information such as personal information, so proxy operators should protect the systems on which proxies run. Second, cache contents should be protected as sensitive information may be included.
  • Authentication: exisiting HTTP clients typically retain authentication information indefinitely. For example, HTTP/1.1 doesn’t provide a method for server to direct clients to discard these cached credentials.
  • check location headers and spoofings: if a single server supports multiple organizations that don’t trust each other, then it must check the values of location and content location header in the response.

General suggestions: Since HTTP clients are often privy to large amount of personal information.

  1. all credential information must be sorted at the server in encrypted form.
    • revealing the specific software version of the server will allow the server machine to become more vulunerable to attacks against software that is known to certain security holes.
    • proxies that serves as a portal through a network firewall shall take special precautions regarding the transfer of header information that identifies the hosts behind the firewall.

MIME type (Media type)

A media type (formerly known as MIME type)[1] is a two-part identifier for file formats and format contents transmitted on the Internet. e.g., image/gif, text/csv

Long polling

  • The HTTP long polling mechanism can be applied to either persistent or non-persistent HTTP connections. The use of persistent HTTP connections will avoid the additional overhead of establishing a new TCP/IP connection.
Latest Articles

Bash

PATHWhat is PATHWhen you wish to install programs into other locations on your host, but be able to execute them easily without specifying their exact location. This can be easily done by adding a ...…

Read More
Eariler Articles

DTO

DTO stands for “Data Transfer Objects”. It is a design pattern for a representation of data, it can be formatted as JSON, XML, etc. It differs from JSON as JSON is a type of serialization and DTO i...…

Read More