DOM and the First Nodes of the Document

654 words | 2016-5-7

What happens inside browsers? How does this heap of tags turn into such amazing pages on the screen? Magic. I’ll try to depict something primitive to display HTML.


Everything is simple: read a series of articles about the toy web engine. And then I will repeat only the minimum.

First step: DOM

DOM (document object model) is a node tree. A node has zero or more children. We keep the descendants as a simple list. It is clear that the list is very poorly scaled, but probably for a toy engine this will not be a problem.

And most importantly — get acquainted with CLOS!

(defclass node ()
  ((children :accessor node-children
	     :initform '()
	     :initarg :children)))

Adding a child is quite simple, although it is possible that the reverse order of the children in the list will affect the future. In that case, just change the method.

(defgeneric node-add-child (tree-node child)
	    (:method ((tree-node node) (child node))
		     (setf (node-children tree-node)
			   (push child (node-children tree-node)))))

There are many types of nodes, But the toy engine will only use a few. Just a text:

;; Text is a simplest node.
;; Just string of text.
(defclass text-node (node)
  ((text :accessor text-node-text
	 :initform ""
	 :initarg :text)))

Then the element or tag and its attributes. Attributes are stored in a hash table:

;; Element represents all remaining types of nodes (joke).
;; Tag name and any number of attributes.
(defclass element-node (node)
  ((tag :accessor element-node-tag
	:initform ""
	:initarg :tag)
   (attrs :reader element-node-attrs
	  :initform (make-hash-table :test #'equal))))  ; "key" --- "value", they are strings
							; so use "equal"

And a comment:

;; Comment
(defclass comment-node (node)

Debug image

To visualize a tree that will be built on an HTML document, we’ll write a couple of functions that create a file for Graphviz:

Suppose we have a simple HTML file:

<html class=e>
 <body>Why yes.</body>

Since the HTML parser is not even in the project, we’ll do the tree with our hands:

(ql:quickload :toy-engine)
(in-package :toy-engine)
; <html class="e">
(defparameter *root* (make-instance 'element-node :tag "html"))
(setf (gethash "class" (element-node-attrs *root*)) "e")
; <head>
(defparameter *head* (make-instance 'element-node :tag "head"))
(node-add-child *root* *head*)
; <title>
(node-add-child *head* (make-instance 'element-node :tag "title"))
; Aliens?
(node-add-child (car (node-children *head*)) (make-instance 'text-node :text "Aliens?"))
; <body>
(defparameter *body* (make-instance 'element-node :tag "body"))
(node-add-child *root* *body*)
; Why yes.
(node-add-child *body* (make-instance 'text-node :text "Why yes."))

(pp->dot #p"~/tmp/" (lambda () (pp-dom *root*)))

After running through Graphviz we get the following image of the tree:

HTML tree, part 1

;;; ========================
;;; Pretty-print a DOM tree.
;;; ========================
(defgeneric pp-node-title (tree-node)
	    (:documentation "What to print in the node box.")
	    (:method ((tree-node node))
	    (:method ((tree-node text-node))
		     (text-node-text tree-node))
	    (:method ((tree-node element-node))
		     (let ((s (concatenate 'string (element-node-tag tree-node) "|")))
		       (concatenate 'string s
				    (with-output-to-string (out)
				      (maphash #'(lambda (k v)
						   (format out "~a:~a," k v))
					       (element-node-attrs tree-node)))
	    (:method ((tree-node comment-node))

(defgeneric pp-node-class (tree-node)
	    (:documentation "Enumerate classes of nodes.")
	    (:method ((tree-node node))         "N")
	    (:method ((tree-node text-node))    "T")
	    (:method ((tree-node element-node)) "E")
	    (:method ((tree-node comment-node)) "C"))

(defun pp-dom (root-node)
  "Print DOM tree in the nice way."
  (let ((nodes (make-hash-table))                ; node object ---
	(types (make-hash-table :test #'equal))) ; node class --- new
	  (( (node)
		       "New sequentual id for class object."
		       (if (gethash (pp-node-class node) types)
			 (incf (gethash (pp-node-class node) types))
			 (setf (gethash (pp-node-class node) types) 0)))
	   ( (node)
		   "Id by object"
		   (let ((id (gethash node nodes)))
		     (if id
		       (setf (gethash node nodes) ( node)))))
	   (pp-node-name (node)
			 "Make node name for graphwizard."
			 (format nil "~a~d" (pp-node-class node) ( node)))
	   (pp-node (node)
		    "Print one node."
		      ((name (pp-node-name node)))
		      (format t "~a[label=~s];~%" name
			    (concatenate 'string name ":" (pp-node-title node)))
		      (mapc #'(lambda (x)
				  (format t "~a -> ~a~%" name (pp-node-name x))
				  (pp-node x)))
			    (node-children node)))))
	  (write-string "digraph dom {label=\"DOM tree\";node [color=lightblue2, style=filled];")
	  (pp-node root-node)
	  (write-string "}"))))

(defun pp->dot (fname thunk)
  "Run the thunk and write any it's output to file."
  (with-open-file (*standard-output*
		    :direction :output
		    :if-exists :supersede)
    (funcall thunk)))