1.2. HTML History¶
HTML stands for Hyper-Text Markup Language. It was initially created during the years 1989-1992 by Tim Berners-Lee while he worked at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland. He wrote the first web client, Mosaic, which initially came out on the NeXT computer. NeXT was a Steve Jobs creation during the time he was fired from Apple.
Later Berners-Lee worked with others to create Netscape. For several years, Netscape was the most popular web browser. Unfortunately, Netscape made one of the classic blunders in software development. They decided to throw everything away and start over. After that, Internet Explorer became the most popular web browser. It helped that Microsoft included IE with Microsoft Windows. A fact that was the subject of many long-running anti-trust lawsuits. During that time, Microsoft let IE languish for many years while Netscape became bankrupt.
Out of the ashes of the Netscape code, Dave Hyatt and Blake Ross created Firefox. Once IE had some competition, the browser race was back on. Soon both Google’s Chrome and Apple’s Safari web browsers joined the race.
An interactive time-line that describes the evolution of the web is available here with this gorgeous interactive website: http://evolutionofweb.appspot.com/
A great video to watch is Robert X. Cringely’s Nerds 2.0.1. It is a three part video, but it takes a bit of work to find a copy of it. The first of the three-video series is the most interesting to watch. Watching and understanding the history of the web is worthwhile. It can make you seem older, wiser, and more experienced than you actually are. Which can translate into better pay. Or the feeling that you are superior to your coworkers, which is also fun. HTML has changed over the years. The current version of HTML is HTML5. Thankfully HTML has gotten easier to work with, not harder. One of the reasons is that the specifications for the style of a document are separate from the content of the document. Originally they were mixed together. That was more difficult to manage. The look and feel of a web page is specified by a different language called CSS, which we will cover in a later chapter.
1.2.1. Proper HTML Coding¶
Web clients (browsers) are designed to try and show the HTML even if there are mistakes in the HTML code. From the user’s standpoint this is nice, because they don’t see the errors. But it also means there are many web developers out there creating code filled with errors, and they think that is OK. It is not.
There are several issues with bad HTML code:
- Web standards state what happens with well-formed HTML documents. But not with documents that are not formed correctly. That is left up to the web browser developers to figure out. So a poorly formed document might not look the same on all the different browsers.
- Languages like JavaScript and code that automatically parses out an HTML document are more sensitive to poorly constructed HTML documents. So if a web developer needs to add JavaScript or wants to create a custom app that works with the web site, it will make development more time consuming and bug-prone.
- You can get an idea if a developer is careless by seeing if the HTML he/she codes is filled with mistakes. For example, on my resume I say how I’m responsible for developing ProgramArcadeGames.com. Any prospective employer can see if I’m worth bringing in for an interview by running an automated HTML checker.
- And most importantly: other developers will make fun of you if you write bad HTML code.
1.2.2. Who Makes HTML Standards¶
The World Wide Web Consortium (W3C) is an international group that works with member organizations to come up with the standards for HTML.
1.2.3. Parts of a URL¶
HTTP stands for Hypertext Transport Protocol. The acronym is similar to, but different than HTML. HTML is how you format a web document. HTTP is how you take that web document and put it on the network to be transferred from one computer to another.
You can use HTML without HTTP if all the web documents you want are already on your computer. That is rarely the case though! So remember. HTML - how to format your document. HTTP - how to transfer it over the Internet.
Time for another acronym.
Links to web pages are called Uniform Resource Locators (URLs).
The name of the HTML file that you create will be the last part
of the URL. For example, in the URL below the filename
would be test.html
, located in a chapter01
directory on the server webdev.training
:
http://webdev.training/chapter01/test.html
There are many parts to a URL:
Let’s break it down. The first part is the protocol. In this case, the
protocol for moving the data is HyperText Transfer Protocol (HTTP).
Other common protocols are https
for encrypted data
and ftp
for old-school file transfer.
http://
Next is either the domain name, or the IP address. Because this is covered in detail in our Networking class, I won’t cover it here. If you haven’t taken a Networking class, you might want to read more about it.
http://webdev.training
Next might come the port. To tell web traffic from e-mail traffic, networks use port numbers. Web traffic usually goes over port 80 for unencrypted traffic, or port 443 for encrypted traffic. However a web address can specify something different. In this case, port 8080 is specified.
http://webdev.training:8080
Next might come the path. The path is the set of folders your file is in.
If web files are in subdirectories on the host
computer, you may see path names. Path names are separated by forward slashes,
even though on windows path names are separated by back slashes. In this example
we have two subdirectories, directory
and d2
.
http://webdev.training:8080/directory/d2/
Next up might come the file name. This usually corresponds to a file name
on the server computer. In this case, the server will look for file.php
in the directory/d2/
path.
http://webdev.training:8080/directory/d2/file.php
The default HTML file extension is .html
.
Because long ago, Microsoft Windows could not handle extensions
with four characters, you sometimes see the extension .htm
for backwards compatibility.
The .html
extension is only good for static web pages. That
is, web pages that you know won’t be customized or show current data. When
we talk about PHP later this will be explained in detail. You may see
extensions like .php
, .jsp
, .asp
, and
others. In our case, we can get used to using the .php
extension.
The directory and filename part of a URL are case sensitive. There are some exceptions in regards to Windows servers not being case-sensitive, but developers should treat everything as case sensitive. To make it easier for development and for people using the URLs, it is good practice to make the URLs all lower case, and without spaces.
Note
To make things easier, name your files and directories all lower case. Also, use underscores instead of spaces.
Not all URLs have paths. By default, most servers will look for files like
index.html
and index.php
if no file is specified. That means
every directory should have an index.html
file as a “landing” page.
Next up, parameters. These are variables that are passed to the
file. A question mark separates the beginning part of the URL with
the parameters. Each parameter is separated by an ampersand. In this case
the parameter name
has the value paul
and the
parameter time
has the value afternoon
.
http://webdev.training:8080/directory/d2/file.php?name=paul&time=afternoon
Last, there is the anchor. This will “auto-scroll” to a spot
in the web page rather that start the user at the top. The
anchor is separated from the rest of the URL by a pound symbol. In this
case the anchor is article1
.
http://webdev.training:8080/directory/d2/file.php?name=paul&time=afternoon#article1