While the human element may be the most critical aspect of Web-based communication, effective Web design is also extremely dependent on correct technical execution. If a site is poorly constructed or error ridden, visitors may lose sight of its message or function. To excel at Web design, practitioners should have a complete understanding of the elements of the Web medium.
The Web medium is composed of three major components
The Web medium is composed of three major components: client, server, and network. We will briefly overview each component and its subcomponents here in order to provide designers with a complete vocabulary of modern Web technology—and possibly provoke further study. We will also provide links about the activities of the various standards bodies, particularly the World Wide Web Consortium (W3C), which defines Web technologies, and the IETF, which sets many of the network, related protocols. Later chapters will focus on correct site execution and the effects of Web technologies on design decisions.
Core Web Technologies
The Web is implemented as a client-server system over a vast public network called the Internet. The three components of any client-server system are the client side, the server side, and the network. A visualization of the basic components that make up the Web is shown in Figure 3-1. We will now survey each of the primary components in turn, starting with the client side, which is primarily defined by the browser.
The Web browser is the interpreter of our Web sites. It is very important to understand the Web browser being supported and what capabilities it has. The two most common browsers at the time of this book’s publication are Microsoft’s Internet Explorer (which accounts for the majority of browser users) and Netscape’s Communicator (Navigator). While these two browsers account for most users accessing public Web sites, there are numerous other versions of browsers in use.
The problem with published browser usage reports is that they don’t necessarily reflect your browsing audience. Consider a site that publishes Macintosh software—its browser usage pattern might actually show a fair number of users with OmniWeb, a Macintosh-specific browser that has a notable number of rabid followers. However, most sites probably wouldn’t consider OmniWeb something to even think about. Depending on your users, the types of browsers will vary. From statistics showing that surveyed sites favor a particular browser, it does not necessarily follow that your site will exhibit the same browser usage patterns—though it is pretty likely. Look at your own log files to determine browser usage patterns. If you are building an intranet site, you might not even have to look at your logs to understand what browsers are in use.
Rule: Beware of relying on published browser usage figures; track actual browser usage on your site.
Given a mix of browsers made up of the top two vendors with a smattering of other browsers, the question becomes how this information relates to site design and technology use. One possibility is to look at the various browsers and their capabilities, and then design for some common set of features.
The only problem with moving to the next generation is that the gap between what different generations of browsers support can be rather large. Because of this, sites (and users) significantly favor Internet Explorer over Netscape. (The installed base for IE browsers includes between 85% to 90% of all users at the time of this writing.) With the advent of Netscape’s Mozilla-based browsers (Netscape 6 and 7, and Mozilla 1.0), things may get more interesting because these browsers promise more support for standards-based Web page development than Netscape’s 4.x generation browsers. Even so, there will not be an overnight adoption of new, non-IE browsers around the Web. As the installed base increases, the longer it will take for consumers to embrace new technologies. Therefore, public sites should consider developing for at least one, if not two, generations prior to the current release of a browser. Even more than six years after the release of the 2.x generation browsers, some public sites still support that generation of browsers perfectly.
Tip: Consider developing for at least the last two, if not three, versions of a browser to account for slow upgrades.
It is easy to be overwhelmed with potential browser considerations, even if dealing just with the major browsers’ most recent versions. At the time of this writing, there were more than 20 major versions of the 4.x generation alone and more than 400 other different potential Netscape variations—primarily older versions or beta releases— floating around the Web, all with different capabilities and bugs. Of course, Netscape isn’t the only browser vendor, and there are slight upgrades made to Internet Explorer as well. The only point to make here is that browsers are moving targets. Every release has new features and different bugs. Just because someone is using a 4.x generation browser doesn’t guarantee a site will work the same under the same version on another platform or under an interim release. Sorry, but Netscape 4 or Internet Explorer 4 on Windows won’t work the same on Macintosh and NT. Even different interim releases like 4.03 and 4.5 may have significant differences in page rendering and bugs. Add in the continual use of half-done beta browsers, and you have a recipe for disaster. Pages often won’t render correctly, and errors will ensue. Users unfortunately won’t always place blame correctly. A small layout problem may be interpreted as the designer screwing up, not the browser vendor releasing a poorly tested product.
Rule: Users often don’t blame browsers for simple errors—they blame sites.
So what’s a developer to do? First, make sure you know what’s going on. Keep up with the latest news in browsers at sites like http://www.upsdell.com/BrowserNews/. In particular, watch out for beta and interim releases. They are often the most dangerous, and users will not consider a 6.1 and 6.2 to be significantly different.
Tip: Be careful of features in beta and interim releases of browsers.
The next thing to consider is exactly what browsers you need to be aware of. This requires that you know the browsers used by the site’s audience, so look to your log files. In general, public sites should be as browser agnostic as possible, while private sites like intranets may be designed specifically for a single browser. Designers should be aware of the browser families listed in Table 3-2. Users interested in development for non-PC platforms may also find Palm (http://www.palmos.com/dev/), television (http://www.developers.aoltv.com/ and http://developer.msntv.com/), and cell phone simulators (http://developer.openwave.com/) very useful tools for testing sites.
Given the number of browsers available and the significant difficulties involved in testing dozens of different configurations just to ensure a site renders under common viewing environments, some authors decide to write for a particular browser version or indicate that a particular vendor’s browser is the preferred viewing platform. Many sites that do this exhibit a browser badge on the site. If a particular browser is required, do not blatantly advertise it on the home page as many sites do. It simply announces that you practice exclusionary development.
The foundation of any Web page is markup. Markup technologies such as HTML, XHTML, and XML define the structure and possible meaning of page content. Despite the common belief that markup languages define the look of Web pages, and the equally common use of HTML in this manner, page appearance should really be accomplished using other technologies, particularly style sheets.
HTML (HyperText Markup Language) is the primary markup technology used in Web pages. Traditional HTML is defined by a SGML (Standardized General Markup Language) DTD (Document Type Definition—see the upcoming section “XML”) and comes in three primary versions (HTML 2, HTML 3.2, and HTML 4). HTML 4 comes in three varieties: transitional, strict, and frameset, with most document authors using the transitional variant. HTML 4.01 is the most current and final version of HTML.
While the various tags and rules of HTML are fairly well defined, most browser vendors provide extensions to the language beyond the W3C definition. Further, the browsers themselves do little to enforce the markup language rules, leading to sloppy usage of the technology. Also, while HTML should be used primarily for structuring a document, many developers use it to format the document for display as well. HTML’s formatting duties should eventually be completely supplanted by Cascading Style Sheets (CSS). However, even with adequate style sheet support in browsers, many developers continue to use HTML tables and even proprietary HTML tags in their page design. There are no plans for further development of HTML by the W3C and browser vendors, and developers are encouraged to embrace XHTML.
The HTML 4.0 specification is available at the following URL:
XHTML is a reformulation of HTML using XML (extensible Markup Language) rather than SGML. XHTML solves two primary problems with HTML. First, XHTML continues to force designers to separate the look of the document from its structure, by putting more emphasis on the use of style sheets. Second, XHTML brings much stricter enforcement of markup rules to Web pages. For example, XHTML documents must contain only lowercase tags, always have quotes on attributes, and basically follow all the rules as defined in the specification. Figure 3-3 shows an example document in HTML and its equivalent in XHTML.
A rigorous discussion of HTML and XHTML that covers all the requirements of XHTML can be found in Appendix C as well as in the companion book, HTML: The Complete Reference (www.htmlref.com).
XHTML’s syntactical strictness is both its biggest benefit and biggest weakness. Well-formed pages may be easier to manipulate and exchange by a program but are harder to create for a human. Uptake of XHTML has been slow because of this strictness. XHTML’s extra rigor makes it less accessible than HTML, which is much more forgiving
to beginners. So, until more tools that generate correct XHTML become available, the language will probably continue its slow uptake in the Web community at large.
The following URLs provide important information about XHTML:
- XHTML 1.0 Specification: http://www.w3.org/TR/xhtml1/
- XHTML Basic Specification: http://www.w3.org/TR/xhtml-basic/
- XHTML 1.1 Module XHTML: http://www.w3.org/TR/xhtml11/
Extensible Markup Language (XML) is being touted by many as a revolutionary markup technology that will change the face of the Web. Yet, despite the hype, few understand exactly what XML actually is. In short, XML is a form of SGML modified for the Web;
thus, it allows developers to define their own markup language. So, if you want to invent YML (Your Markup Language) with XML, you can. To do this we would define the rules of our invented language by writing a document type definition, or DTD. A DTD defines how a language can be used by indicating what elements can contain what other elements, the values of attributes, and so on. A simple DTD to define a grading language for elementary school children is defined here:
<!--Grades DTD--> <!ELEMENT grades (student+)> <!ELEMENT student (course+)> <!ATTLIST student name CDATA #REQUIRED
sex (M|F) #REQUIRED level (1|2|3|4|5|6) #REQUIRED>
<!ELEMENT course EMPTY> <!ATTLIST course title CDATA #REQUIRED
grade (PASS|FAIL) #REQUIRED>
This DTD file named grades.dtd would be referenced by an XML file such as the one shown here:
<?xml version="1.0"?> <!DOCTYPE GRADES SYSTEM "grades.dtd"> <!-- the document instance --> <grades> <student name="Thomas" sex="M" level="3">
<course title="Math" grade="PASS" />
<course title="English" grade="FAIL" /> </student>
<student name="Sylvia" sex="F" level="1"> <course title="Math" grade="PASS" /> <course title="Art" grade="PASS" />
The example would not only be syntactically checked, but we could check the validity of the document against the DTD, a process known as validation. Yet, regardless of correctness, without a defined presentation you will not see much of a result, as shown in Figure 3-4. Presentation will eventually be handled by applying style rules to the XML document using one of the technologies discussed in the next section.
Many readers may now be wondering about the value of developers defining their own individual markup languages. Why not just use XHTML or HTML? Wouldn’t inventing new languages be the equivalent of creating a markup Tower of Babel on the Internet? Maybe, or it just may enable a whole new range of possibilities for markup. So far, the negative impact of inventing too many custom XML-based languages has been limited, and most Web developers are content using a commonly defined language like XHTML, WML (Wireless Markup Language), SVG (Scalable Vector Graphics), and numerous other XML-based languages. The precision and self-description properties of XML documents should enable a new class of Web technologies called Web Services that really could change the Web by allowing sites and programs to talk with each other more easily.
Style Sheet Technologies
Markup languages like HTML do not excel at presentation. This is not a shortcoming of the technology, but simply that HTML was not designed for this task. In reality, the look of the page should be controlled by the design elements provided by CSS (Cascading Style Sheets). In some cases, particularly when using an XML language, markup transformation may also be required to create the appropriate presentation format, so XSL (eXtensible Style Language) will be used as well.
CSS (Cascading Style Sheets) is used to specify the look of a Web page. This technology has been present at least partially in browsers as old as Internet Explorer 3.0, but it has long been overlooked in favor of HTML-based layout for a variety of reasons, including lack of consistent browser and tool support, as well as simple developer ignorance. With the rise of the 6.x generations of browsers, CSS is finally becoming a viable prospect for page layout.
CSS-based style sheets specify rules that define the presentation of a type of a type (for example, <h1>)—a group or, more correctly, class of tags—or a single tag as indicated by its id attribute. Style sheet rules can be used to define a variety of visual aspects of page objects, including color, size, and position. The various style rules can be combined depending on tag usage—thus the “cascading” moniker for the technology. An example of CSS in use is shown in Figure 3-5.
These URLs provide more information about CSS:
- CSS1 Specification: http://www.w3.org/TR/REC-CSS1/
- CSS2 Specification: http://www.w3.org/TR/REC-CSS2/
XSL is another style sheet technology used on the Web. It is primarily used to style XML languages. This is usually accomplished through XSL Transformation (XSLT), which is often used to convert XML markup into other markup, often XHTML or HTML plus CSS. It is possible to also use XSL Formatting Objects to style content, but, so far, this does not seem to be a commonly employed aspect of XSL. Thus, when developers speak of XSL, they often are speaking of XSLT. The relationship is set on the second line in the grades.xml file. The grades.xsl file specifies the transformations that would result in the HTML output.
Information about XSL can be found at these URLs:
- XSL Transformations 1.0 Specification: http://www.w3.org/TR/xslt
- XSL Activity at W3C: http://www.w3.org/Style/XSL/
Most Web browsers support either directly or through extension a variety of image formats, such as GIF, JPEG, Flash, and PNG. The image formats can be separated into two general categories: bitmap (or raster) images and vector images. Raster images describe each individual pixel and its color, while vector images describe an image generally as a collection of mathematical directions used to draw—or more precisely, render—the image. Regardless of storage format, all images become bitmaps onscreen.
Some designers speak of the value of one general format over the other, but, in reality, both have their problems. Vector images tend to be compact in description and can be scaled mathematically, but they suffer in potential rendering time and realism. Bitmap images can be very detailed but do not scale up well and tend to be very large in terms of file size. We will examine the specific types of the images in the following sections.
GIF (Graphics Interchange Format) is a bitmap format that does not provide a great degree of compression or color support, being limited to 8-bit or 256 simultaneous colors. However, the GIF format is relatively versatile and supports transparency, animation, and interlacing. It is commonly used in Web pages for logos, graphical navigation elements, and photos that do not require high-quality reproduction.
Information about the GIF Specification can be found at this URL:
JPEG (Joint Photographic Experts Group) images support up to 24-bit color and are well suited for reproduction of photographs. Despite being a raster format, JPEG images allow designers to balance file size with image quality and support an impressive lossy compression algorithm that can significantly shrink image size with little discernable quality loss to the casual viewer. JPEG images do support progressive loading, but are not quite as versatile as GIF images because they lack transparency and animation features.
Information about JPEGs can be found at these URLs:
■ JPEG Activity at the W3C: http://www.w3.org/Graphics/JPEG/
■ JPEG Specification: http://www.jpeg.org/
The JPEG 2000 standard aims to eliminate many of the problems with JPEG and provide an even greater degree of quality and compression than standard JPEG files. However, so far, JPEG 2000 is not available in Web browsers.
PNG (Portable Network Graphics) images provide an advanced image format designed to replace GIF as the dominant form of graphics on the Web. PNG images provide three primary advantages over GIF: alpha transparency, which provides variable degrees of transparency (versus GIF, which has a single degree of transparency); gamma correction to help improve image brightness across systems; and improved interlacing and compression. While PNG provides numerous benefits, many of its advanced features are not properly implemented in the latest browsers, so the rush to embrace the format has yet to materialize.
Information about Flash can be found at these URLs:
- Macromedia’s Flash Homepage: http://www.flash.com
- SWF File Format Page: http://www.openswf.org
Information about SVG can be found at these URLs:
VML (Vector Markup Language) is yet another vector image used in Web pages. It is relatively unnoticed by most Web developers, despite the fact that it has been natively supported in Microsoft Internet Explorer since the 5.0 version. It was briefly introduced to the W3C for standardization, but SVG is being pushed over VML, and Flash is currently the popular vector format for the masses. However, Microsoft-oriented developers should be well aware of this format, since it is found in pages exported from Microsoft products.
Information about VML can be found at these URLs:
- W3C VML Note: http://www.w3.org/TR/NOTE-VML
Other Image Formats
The previously discussed image formats are the primary standard for well-supported image formats on the Web. However, other images are supported in some browsers, and, in theory, the <img> tag does not discriminate among the type, of images included in a Web page. The most important other format is probably BMP, which is supported by Microsoft’s Internet Explorer. A variant called Wireless BMP (WBMP) is also noteworthy and is supported in some wireless browsers. Many browsers, particularly older browsers or those with a UNIX release, support Xbitmaps. Using plug-ins or helper applications, everything from PostScript files to TIFFs can be viewed in a browser.
Audio technologies on the Internet cover a lot of ground, from traditional download- and-play systems in a variety of formats such as WAV and MP3 to streaming audio, which attempts to play data as it is downloaded over a connection. Surprisingly, the most advanced technologies, and the most popular, may not be the best solution for Web sites. For example, MP3 files, while of high quality, tend to take too long to download, and streaming technologies might not provide reliable playback in all situations because of the unpredictable delivery conditions on the Internet. Fortunately, much has improved since the simple days of adding a WAV or MIDI file for background music, but there is still a long way to go before sounds will become commonplace, primarily because of the large size of audio files.
Audio files can be compressed to reduce the amount of data being sent. The software on the serving side compresses the data, which is decompressed and played back on the receiving end. The compression/decompression software is known together as a codec. Just like image formats, audio compression methods are either lossy or lossless. Typically, audio codecs are lossy because of size considerations.
The holy grail of Internet multimedia is certainly high-quality, 30-frames-per-second, real-time video. The main challenge to delivering video over the Internet is its extreme size. Digital video is measured by the number of frames per second of video and by the size and resolution of these frames. A 640 × 480 image with 24 bits color and a frame rate of 30 frames per second takes up a staggering 27MB per second—and that’s without sound. Add CD-quality audio (705,600 bits of data for each second of data; for stereo, double that amount to 1.4 Mbps) and the file size increases proportionately. Granted, these are uncompressed frames and audio, but the point is that a lot of compression as well as bandwidth is needed for high-quality, large-size video.
As with audio, numerous formats are supported for Web-based video, including AVI, QuickTime, MPEG, RealVideo, and ASF. Table 3-5 presents a brief overview of the various Web video formats.
Even with improvements in network and compression technology, audio and video services have a long way to go on the Web if they are to approach the quality and reliability that users are familiar with from radio and television. Until that time, developers should always proceed with caution with real time media technologies. Further, just because audio and video can be delivered over the Web doesn’t mean that it should be. Always pick the best media format for the message to be delivered and remember that if you have nothing to say, whether it is in Flash or not isn’t going to help. We now switch gears and turn our attention to the programming aspects of the Web medium.
Understanding the basic idea of adding programming to a site isn’t hard, but it’s easy to get overwhelmed by the number of technologies to choose from, particularly if you assume that each is very different. The reality is that Web programming technologies can be placed into two basic groups: client side and server-side. Client side technologies are those that are run on the client, generally within the context of the browser, though some technologies like Java applets or ActiveX controls may actually appear to run, or may truly run, beyond the browser, and Helper applications do so implicitly. Of course, programs can and do run instead on the server and thus are appropriately termed server-side programming. Table 3-6 presents the general programming choices available to Web developers.
CGI scripts and programs Server API programs —Apache modules
—ISAPI extensions and filters —Java servlets —Active Server Pages (ASP/ASP.NET) —ColdFusion —PHP
One approach to client-side programming comes in the form of programmed solutions, like helper applications. In the early days of the Web, around the time of Mosaic or Netscape 1.x, browsers had limited functionality and support for media beyond HTML. If new media types or binary forms were encountered, they had to be passed to an external program called a “helper application.” Helper applications generally run outside the browser window. An example of a helper application would be a compression or archive tool like WinZip, which would be launched automatically when a compressed file was downloaded from the Web. Helpers are often problematic because they are not well integrated with the browser and lack methods to communicate back to the Web browser. Because the helper was not integrated within the Web browser, external media types and binaries could not be easily embedded within the Web page. Last, helper applications generally had to be downloaded and installed by the user, which kept many people from using them.
The idea of a helper application is rather simple: it is a program that the browser calls upon for help. Any program can be a helper application for a Web browser, assuming that a MIME type can be associated with the helper. When an object is delivered on the Web, HTTP header information is added to the object, indicating its type. This information is in the form of a MIME type. For example, every Acrobat file should have a content-type of application/pdf associated with it. When a browser receives a file with such a MIME type, it will look in its preferences to determine how to handle the file. These options may include saving the file to disk, deleting the file, or handing the file off to another program, such as a helper or browser plug-in. With MIME types and helpers, a developer can put Microsoft Word files on their Web site; users may be able to download them and read them automatically, assuming they have the appropriate helper application.
Oddly, helper applications are not used as much as they could be. Consider, for example, the use of HTML on an intranet. Within an organization, data may often be created in Microsoft Word or Excel format. While it is possible to easily translate such information into HTML, why would one want to? HTML is relatively expensive to create and, often difficult to update, and may limit the quality of the document’s presentation. The main reason that documents are put in HTML is that they can ubiquitously read, meaning we don’t have to rely on users having a particular application to read our document, other than a Web browser. However, in an intranet, this probably isn’t an issue. In fact, it might be easier to create helper mappings on every system within a corporation rather than to reformat documents in HTML.
The plug-in approach of extending a browser’s feature set has its drawbacks. Users must locate and download plug-ins, install them, and even restart their browsers. Many users find this rather complicated. Netscape 4 offers some installation relief with self-installing (somewhat) plug-ins and other features, but plug-ins remain troublesome. To further combat this problem, many of the most commonly requested plug-ins, such as Macromedia’s Flash, are included as a standard feature with Netscape browsers. The standard plug-ins are primarily geared towards media handling and include Macromedia Flash and Shockwave, Adobe Acrobat, and Real player (audio and video). If plug-ins are used, make sure to focus on the popular ones first, given the installation hassle you’ll put the user through.
Suggestion: Focus on using only the more popular plug-in technologies unless automatic installation can be performed.
Even if installation were not such a problem, plug-ins are not available on every machine. An executable program, or binary, must be created for each particular operating system; thus, most plug-ins work on Windows systems, though a few of the more popular ones have versions that work on Macintosh and UNIX systems as well.
The main benefit of plug-ins is that they can be well integrated into Web pages. They may be included by using the <embed> or <object> tags, though <embed> is nearly always favored. For example, to embed a short Flash movie called welcome.swf that can be viewed by a Flash player plug-in, you would use the following HTML fragment:
<embed src="welcome.swf" quality="high" type="application/x-shockwave-flash" scale="exactfit" width="406" height="59" bgcolor="#FFFF00"> </embed>
The <embed> element displays the plug-in (in this case, a Flash animation) as part of the HTML document. Of course, always remember that the main downside of plug-ins is the barrier to entry they create because of installation and system requirements. If installation can be improved, designers will be able to rely on the technologies provided more and more.
ActiveX (http://www.microsoft.com/activex), which is the Internet portion of the Component Object Model (COM), is Microsoft’s component technology for creating small components, or controls, within a Web page. ActiveX distributes these controls via the Internet, adding new functionality to Internet Explorer. Microsoft maintains that ActiveX controls are more similar to generalized components than to plug-ins because ActiveX controls can reside beyond the browser, even within container programs such as Microsoft Office. ActiveX controls are similar to Netscape plug-ins in that they are persistent and machine-specific. Although this makes resource use a problem, installation is not an issue: the components download and install automatically.
Security is a big concern for ActiveX controls. Because these small pieces of code potentially have full access to a user’s system, they could cause serious damage. This capability, combined with automatic installation, creates a serious problem with ActiveX. End users may be quick to click a button to install new functionality, only to have it do something malicious, like erase an important system file. The potentially unrestricted functionality of ActiveX controls creates a gaping security hole.
Certificates only provide some indication that the control creator is reputable; they do nothing to prevent a control from actually doing something malicious—that’s up to the user to prevent. Safe Web browsing should be practiced by accepting controls only from reputable sources.
Adding an ActiveX control to a Web page requires the use of the <object> tag. For example, this markup is used to add a Flash file to a page.
<object classid="clsid:D27CDB6E-AE6D-llcf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/ cabs/flash/swflash.cab#version=5,0,0,0" width="406" height="59">
<param name="movie" value="welcome.swf" /> <param name="quality" value="high" /> <param name="scale" value="exactfit" /> <strong>Sorry, no ActiveX in this browser!</strong> </object>
What appears in a browser with no ActiveX? Just a short message indicating the user doesn’t have ActiveX. The reality is that the page should allow alternative technologies, such as plug-ins using the <embed> tag or even images, before giving a failure message.
Suggestion: If ActiveX controls are used on a public site, make sure to provide alternatives for Netscape or other browsers.
The main downside of component technologies like Netscape plug-ins and Microsoft ActiveX controls is that they are fairly operating system specific. Not every user runs on Windows or even Macintosh, so how do you deal with such a heterogeneous world? One solution is to create a common environment and port it to all systems—this is the intent of Java.
Sun Microsystems’ Java technology (http://www.javasoft.com) is an attractive, revolutionary approach to cross-platform, Internet-based development. Java promises a platform-neutral development language, somewhat similar in syntax to C++, that allows programs to be written once and deployed on any machine, browser, or operating system that supports the Java virtual machine (JVM). Web pages use small Java programs, called applets, that are downloaded and run directly within a browser to provide new functionality.
Applets are written in the Java language and compiled to a machine-independent byte code in the form of a .class file, which is downloaded automatically to the Java- capable browser and run within the browser environment. But even with a fast processor, the end system may appear to run the byte code slowly compared to a natively compiled application because the byte code must be interpreted by the JVM. This leads to the common perception that Java is slow. The reality is that Java isn’t necessarily slow, but its interpretation can be. Even with recent Just-In-Time (JIT) compilers in newer browsers, Java often doesn’t deliver performance equal to natively compiled applications.
Rule: Consider end-user system performance carefully when using Java.
Even if compilation weren’t an issue, current Java applets generally aren’t persistent; they may have to be downloaded again and again. Java-enabled browsers act like thin-client applications because they add code only when they need it. In this sense, the browser doesn’t become bloated with added features, but expands and contracts upon use.
Adding a Java applet to a Web page is relatively easy and can be done using the <applet> or <object> tag, though <applet> is preferred for backward compatibility. If, for example, we had a .class file called helloworld, we might reference it with the following markup:
<applet code="helloworld.class" height="50" width="175"> <h1>Hello World for you non-Java-aware browsers</h1> </applet>
In the preceding code, between <applet> and </applet> is an alternative rendering for browsers that do not support Java or that have Java support disabled.
Security in Java has been a serious concern from the outset. Because programs are downloaded and run automatically, a malicious program could be downloaded and run without the user being able to stop it. Under the first implementation of the technology, Java applets had little access to resources outside the browser’s environment. Within Web pages, applets can’t write to local disks or perform other potentially harmful functions. This framework has been referred to as the Java sandbox. Developers who want to provide Java functions outside of the sandbox must write Java applications that run as separate applications from browsers. Other Internet programming technologies (Netscape plug-ins and ActiveX) provide less safety from damaging programs.
The reality of Java, as far as a Web designer is concerned, is that it really isn’t useful on public sites. There are so many different Java Virtual Machines in browsers that the idea of “write once, run everywhere” has been turned into “write once, debug everywhere.” The major benefit of Java applets just isn’t there. Designers should need no proof other than the fact that major sites that relied on Java applets have in most cases long since removed them. However, within intranets or on the server side in the form of Java servlets, we have seen Java achieve significant success.
</script> <h1>Welcome back to HTML</h1>
- ECMAScript Spec: http://www.ecma.ch/ecma1/STAND/ECMA-262.HTM
- Microsoft Scripting Information: http://msdn.microsoft.com/scripting
Document Object Model
Online information about the DOM can be found at these URLs:
The Web server handles the server side of the Web communications medium, responding to the various HTTP requests made to it. Servers may directly return various file objects, such as HTML documents, images, multimedia files, scripts, or style sheets, or they may run executable programs, which return a similar result. In this sense, the Web server acts both as a file server and as an application server. We will survey the basic components of the server side here before addressing the network components of the medium.
Like the Web browser, the Web server frames the environment of each Web transaction. The term “Web server” is usually understood to mean both the hardware and software. The major issue with hardware is whether the Web server is capable of handling the memory, disk, and network input/output requirements resulting from site traffic. The interplay of operating systems, such as UNIX or Windows 2000, and Web server software also is closely related to performance, as is security.
From Apache to Zeus, all Web server software platforms handle basic HTTP transactions, but all tend to offer more than basic file serving facilities. Most Web server platforms provide basic security and authentication services, logging, and programming facilities. An in-depth discussion of the popular servers and their facilities is presented in Chapter 17; here, we will focus only on the programming aspects of site
The oldest of the server-side programming technologies, CGI (Common Gateway Interface) programs can be written in nearly any programming language, though commonly Perl is associated with CGI applications. CGI is not a language or program, but in fact just a way to program—unlike other server-side programming environments, which define both language and style. CGI defines the basic input and output methods for server-side programs launched by a Web server, as illustrated in Figure 3-12. While assumed by some to be slow and insecure, CGI is adequate for many Web development projects when correctly understood and used.
Online information about CGI can be found at these URLs:
- CGI Overview and Documentation: http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
- CGI Resource Index: http://cgi.resourceindex.com/
Server-side scripting technologies, such as Microsoft’s Active Server Pages (ASP) or Macromedia’s ColdFusion, allow dynamic pages to be created easily. All server-side scripting languages, including the popular ASP, ColdFusion, JSP, and PHP languages, work fairly similarly. The idea is that script templates that contain a combination of HTML and scripting language are executed server side to build a resulting Web page. Usually, some form of server engine intercepts page requests, and when files with certain extensions—such as .asp, .cfm, .jsp, .php, or .shtml—are encountered, the script elements in the page are replaced with the resulting markup output.
Server-side scripting languages are often used to build dynamic pages from databases, personalize content for users, or generate reusable components in pages. The syntax for each language is different, and many developers are somewhat religious about the merits of one language over the next, but the fact of the matter is that none of them scales well for extremely high-volume sites. Such sites usually require server API programs, which are discussed next.
Online information about server-side scripting can be found at these URLs:
- ASP Information: http://msdn.microsoft.com/asp
- ColdFusion Information: http://www.macromedia.com/software/coldfusion/
- PHP Information: http://www.php.net/
- JSP Information: http://java.sun.com/products/jsp
Server API (Application Programming Interfaces) programs are special server-side programs built to interact closely with the Web server. A simple way to think of server API programs is as plug-ins to a Web server. Common APIs include ISAPI for Microsoft’s IIS server, NSAPI for the Netscape/IPlanet/Sun server, Apache Modules for Apache, and Java servlets for Java-enabled Web servers. The benefit of server API programs is that their close interaction with the Web server generally translates into high performance. The downside, of course, is the complexity of writing such a program and the possibility that an errant server module may actually crash the entire server.
Information about server APIs can be found at these URLs:
Apache Module Information: http://modules.apache.org/
ISAPI Filters/Extension Information: http://msdn.microsoft.com
Java Servlet Information: http://java.sun.com/products/servlet
Network and Related Protocols
The underlying protocols of the Web include the TCP/IP suite of networking protocols. Not a single protocol but a group of protocols, TCP/IP is what makes all services on the Internet possible. Individually, IP (Internet Protocol) provides the basic addressing and routing information necessary to deliver data across the Internet. However, TCP (Transport Control Protocol) provides the facilities that make communications reliable, such as correction and retransmission. Together, in conjunction with the Domain Name Service (or DNS), which is the process of translating fully qualified domain names like www.webdesignref.com into their underlying IP addresses (22.214.171.124), we have the ability to build higher-level services, such as e-mail or Web sites, on the Internet. Knowledge of lower-level protocols may seem pointless to many Web designers, but it is particularly helpful to understand networking details when designing extremely scalable Web sites. However, regardless of site aims, the next protocol discussed should be understood by every Web designer.
- HTTP (Hypertext Transport Protocol) is the application-level protocol that handles the discussion between a user-agent, generally a Web browser, and a Web server. The protocol is simple and defines eight basic commands (GET, POST, HEAD, PUT, DELETE, OPTIONS, TRACE, and CONNECT) that can be made by a user-agent to request or manipulate data. Responses may contain both numeric and textual codes (for example, 404 Not Found) and associated data.
The simplicity of the HTTP protocol is both a blessing and a curse. It is simple to implement, but its lack of state management and its performance problems plague Web developers. The HTTP 1.1 specification as defined in RFC 2616 addressed many of the performance problems, but state management still has to be resolved using cookies, hidden data variables, or extended URLs. An overview of HTTP can be found in Chapter 17, while Appendix G details its request and response format.
Information about HTTP can be found at these URLs:
W3C HTTP Activity: http://www.w3.org/Protocols/
HTTP 1.1 Specification: ftp://ftp.isi.edu/in-notes/rfc2616.tx
MIME (Multipurpose Internet Mail Extensions), the unsung hero of Web protocols, is used by browsers to determine what kind of data they have received from a server. Specifically, an HTTP header called Content-type contains a MIME value, which is looked up by a browser to understand what type of data it is receiving and what to do with it. Servers append MIME types to HTTP headers either by generating them from a program or by mapping a file extension (for example, .html) to an appropriate MIME type (for example, text/html). MIME allows Web sites to deliver any type of data, not just the common Web formats like HTML.
Information about MIME can be found at this URL:
■ MIME Specification: http://www.ietf.org/rfc/rfc2045.txt
To request and link to Web pages, it is necessary to use an addressing scheme. Web users are familiar with URLs (Uniform Resource Locator), like http://www.webdesignref.com/, which specify protocol and location. In specifications, URI (Uniform Resource Identifier) is the more commonly accepted term for short names or address strings that refer to a resource on the Web. Yet, whatever the name, URI or URLs do not provide all that may be required on the Web in the future, since they specify only location. Uniform Resource Names (URNs) and Uniform Resource Characteristics (URCs) may eventually be implemented to provide non-location-dependent addressing and extra information about resources, respectively. However, resource characteristics are more commonly specified using a form of meta data, as described next.
Online information about addressing can be found at this URL:
■ W3C Addressing Activity: http://www.w3.org/Addressing/
Meta data is defind as data about data. Web developers may be familiar with putting meta data in a Web page using the <meta> tag. Often, this is used to specify keywords and descriptions for search engines. For example,
<meta name="keywords" content="robots,androids, bots"> <meta name="description" content="Demo Company makes the best robots in the Solar System!">
Meta data is also used in Web pages to control page characteristics, particularly those related to HTTP headers. For example,
<meta http-equiv="Expires" content="Wed, 15 May 2002 08:21:57 GMT" />
would set an expiration date for a Web page using the HTTP expires header.
The key to meta data is having a consistent and descriptive enough vocabulary for describing data. The Resource Description Framework (RDF) provides a standard way for using XML to represent meta data in the form of statements about properties and relationships of items on the Web. However, RDF itself is just a framework and needs a vocabulary. A popular vocabulary called Dublin Core initially has started to gain some traction. However, at the time of this edition’s writing, the use of meta data vocabulary beyond the simple <meta> tag for keywords and descriptions is not common practice on the Web, though it is prevalent in many large sites and very common in large intranets.
Online information about meta data can be found at these URLs:
W3C RDF Information: http://www.w3.org/RDF/
Dublin Core Metadata Initiative: http://dublincore.org/
Finally, the latest wrinkle in the Web medium is the rise of Web Services. The basic concept of Web Services is that Web sites may interact directly with each other, exchanging information or even running programs remotely. Web Services allow for complex distributed applications to be built using the pieces of various Web sites. For example, imagine running a small travel site and offering flight, hotel, and car rental booking services directly from your site through a large travel partner’s Web site without the user being aware. Web Services would provide the facilities for your site to talk to others and seamlessly make such a service possible.
The key to Web Services is the use of standardized message formats, typically specified in XML. A protocol called SOAP (Simple Object Access Protocol) appears to be the leading candidate for Web Services. However, others do exist, and Web Services are not prevalent enough yet to assume victory for SOAP. Beyond messaging protocols, Web Services also require a facility for service providers to describe their offered services, and for users to discover the services they require. So far, service description is being handled by a protocol called WSDL (Web Service Description Language), while service discovery is handled by UDDI (Universal Description, Discovery, and Integration). As mentioned, these protocols may not necessarily become standard; but regardless of what protocol is adopted, Web Services will provide for a much richer Web experience, which is coming to be known as the semantic Web. Information about Web Services can be found at these URLs:
■ W3C Web Services Activity: http://www.w3.org/2002/ws/
■ W3C Semantic Web Activity: http://www.w3.org/2001/sw/
A good portion of the activity in the Web Services space revolves around Microsoft’s .NET technology, which also provides SOAP as well as a sophisticated Web programming environment. However, what .NET actually means to Web Services and what it includes are still very fluid. The best source of information on the Microsoft variant of Web Services can be found at http://www.microsoft.com/net/.