The Objective…
1. Have a framework for hosting static HTML pages at no cost
2. Linking between pages should still work
Where can I host some content?
There are probably lots of places out there where you can host for free. Your internet service provider might have free space. There might even be places in yahoo to do this. And of course there are WordPress.org (this blog), blogger, etc… These will allow you to get content online for free, but limits how you create the content and how you maintain it.
For me I was looking for a way to host my documentation (a continuation of this post). So I wanted to use my documentation that is available on github.com, but be able to view it as a web page. Really I was just playing around and having some fun.
Here is how I got it done
Setup your google scripts. See my other post on this.
In order to host the github pages, we need to dynamically get the HTML we want to load. In my example I will be using github.com for hosting, but you could use things like code.google.com or the like. Google scripts gives an easy way of dynamically getting content.
UrlFetchApp.fetch(http://somedomain.com/path/to/file/to/download).getContentText();
However, following the steps from my previous blog post we can show the content, but linking will not work. So we will have to parse the downloaded document.
Parsing the document and rewriting links
1. Find “<a ”
2. Check that there is an “href” attribute
3. Parse the actual link
4. Replace the link contents
The problems I came across
1. First pass, I didn’t account for links without “href” attribute
2. Thought it was so easy I could just write it procedurely with no functions…that was a mistake. Meaning, just because your playing around, doesn’t mean you should ignore breaking breaking a problem down into simpler parts.
3. Its really hard to diagnose problems in google scripts…
3a. Put all the code in your .gs files so you can step through the code. This would have saved me a bunch of tme.
Finding the Next Valid “<a ” tag
/**
* Return the this position
* <a class=”someclass” href =”/some/url”
* ^
* pos
*
* @param String The raw HTML to parse
* @param int The start position to begin parsing of the rawHtml
* @return int The position or -1 on fail
*/
function getNextAtagUrlStart(rawHtml, pos){
var temp = rawHtml.substring(pos,rawHtml.length).indexOf(“<a “);
var start = pos + temp;//no starting a tag
if(temp == -1)
return -1;//the href. We dont want to look too far ahead
var maxLen = Math.min(start+50,rawHtml.length);
var temp = rawHtml.substring(start,maxLen).indexOf(“href”);//this atag has not href, so lets look again
if(temp == -1)
return getNextAtagUrlStart(rawHtml,start+2);start += temp+4;//4 is for the “href” string
//ship whit spaces…href = ”
while(rawHtml[start] == ” “) start++;//skip the “=”
start++;//skip till we get the url (actually the ” or ‘)
while(rawHtml[start] == ” “) start++;//if this is an absolute linke skip it. This assumes that internal links
//are relative.
if(rawHtml.substring(start+1,start+5) == “http”)
return getNextAtagUrlStart(rawHtml,start);//return the position of the link including the leading ” or ‘
return start;
}
Get the End of the URL string
//skip the url portion of url
function getEndOfUrl(rawHtml, start){//should give a ” or ‘
var wrapper = rawHtml[start];//end tokens
var end1 = wrapper+”>”;
var end2 = wrapper+” “;var token = function(pos){
return rawHtml.substring(pos,pos+2);
};while(token(start) != end1 && token(start) != end2)
start++;//this take use to the position after the closing ” or ‘
return start+1;
}
Get only the <body> Conetent
function getInnerText(rawHtml){
var start = rawHtml.indexOf(“<body”);
var end = rawHtml.indexOf(“</body>”)+6;return rawHtml.substring(start,end);
}
Create your HTML Template File
<html>
<head>
<!– Use the w3 default template so things look ok –>
<link rel=”stylesheet” href=”http://www.w3.org/StyleSheets/Core/Swiss” type=”text/css”>
</head><body>
<? output.append(“<div”+body.substring(5,body.length-7)+”</div>”); ?>
</body>
</html>
Put it All Together
So google scripts uses a defaul “main” function.
1. doGet – This handles GET requests. Just normal webbrowser
2. doPost – This handles POST requests. These more than likely come from form submits.
First I put some global parameters…
var base, domain, tempalte, file;
This doGet function will set these global variables…
function doGet(e) {
var tpl;//need to just handle when we are missing template
if(typeof e == undefined || e.parameter.file == null || e.parameter.domain == null){tpl = HtmlService.createTemplateFromFile(‘404’);//load the 404 html file
} else {//choose template to load. I have a base one described above and some others with specifc styleings
template = (typeof e.parameter.tpl == undefined || e.parameter.tpl == null) ? “base” : e.parameter.tpl;base = ScriptApp.getService().getUrl();
domain = e.parameter.domain;
file = e.parameter.file;tpl = HtmlService.createTemplateFromFile(template);
//dynamically load the html body
var response = UrlFetchApp.fetch(domain+file).getContentText();var body = getInnerText(response); //get <body…</body>
var pos = getNextAtagUrlStart(body,0);
while( pos < body.length && pos != -1 ){
var endUrl = getEndOfUrl(body,pos);
var url = getUrl(body, pos+1,endUrl-1);//splice in the url
body = body.substring(0,pos+1)+url+body.substring(endUrl-1,body.length);pos = getNextAtagUrlStart(body,pos);
}
//make the body varaible available to the HTML template file we loaded
tpl.body = body;
}//render the page
return tpl.evaluate();}
Conclusion
In the end there are some short comings of my implementation.
1. Doesn’t handle forms
2. Doesn’t seem to handle anchor tags for bookmarking and internal references
Here is an example where I load my pages from https://github.com/Will-Smelser/openProjects