<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Learning by Doing &#187; wordpress</title>
	<atom:link href="http://tech.williamgunn.org/tag/wordpress/feed/" rel="self" type="application/rss+xml" />
	<link>http://tech.williamgunn.org</link>
	<description>...and trying, and failing, and trying again.</description>
	<lastBuildDate>Wed, 25 Mar 2009 00:17:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Updated to 2.4.bleeding(trunk)</title>
		<link>http://tech.williamgunn.org/2007/10/27/updated-to-24bleeding/</link>
		<comments>http://tech.williamgunn.org/2007/10/27/updated-to-24bleeding/#comments</comments>
		<pubDate>Sat, 27 Oct 2007 22:00:43 +0000</pubDate>
		<dc:creator>Mr. Gunn</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[K2]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[theme]]></category>
		<category><![CDATA[upgrade]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/10/27/updated-to-24bleeding/</guid>
		<description><![CDATA[I was going to update to 2.3.1, but I thought I&#8217;d see if the trunk was working as expected first, and it is. It&#8217;s actually fairly easy to upgrade using subversion, so here&#8217;s what I did. Updated betablog(my nonpublic mirror of my working site) to trunk using svn sw http://svn.automattic.com/wordpress/trunk/ . Files changed: A wp-admin/js/edit-posts.js [...]]]></description>
			<content:encoded><![CDATA[<p>I was going to update to 2.3.1, but I thought I&#8217;d see if the trunk was working as expected first, and it is.</p>
<p>It&#8217;s actually fairly easy to upgrade using subversion, so here&#8217;s what I did.</p>
<p><span id="more-37"></span></p>
<h3>Updated betablog(my nonpublic mirror of my working site) to trunk</h3>
<p>using <code>svn sw http://svn.automattic.com/wordpress/trunk/ .</code><br />
Files changed:<br />
A    wp-admin/js/edit-posts.js<br />
A    wp-includes/js/wp-lists.js<br />
A    wp-includes/images/wlw<br />
A    wp-includes/images/wlw/wp-icon.png<br />
A    wp-includes/images/wlw/wp-watermark.png<br />
A    wp-includes/images/wlw/wp-comments.png<br />
A    wp-includes/wlwmanifest.xml</p>
<p>No files I have edited are part of this update, so I shouldn&#8217;t have to re-edit anything.  <code>svn update</code> should incorporate the changes into files, instead of overwriting the whole file, unless there are overlapping changes.  The only local modification I&#8217;ve made to this is to add my analytics tag to <code>footer.php</code>, but this hasn&#8217;t been modified in the repository version.  </p>
<p><code>[smithers]$ svn status -u<br />
       *     6188   wp-includes/functions.php<br />
       *     6188   wp-admin/setup-config.php<br />
?                   wp-config.php<br />
?                   sitemap_gen.py<br />
?                   robots.txt<br />
?                   wp-content/uploads<br />
X                   wp-content/plugins/akismet<br />
?                   wp-content/plugins/code-markup.php<br />
M            6188   wp-content/themes/default/footer.php<br />
?                   config.xml<br />
?                   google5422237c5650fd97.html<br />
       *     6188   wp-settings.php<br />
?                   .htaccess<br />
Status against revision:   6294</p>
<p>Performing status on external item at 'wp-content/plugins/akismet'<br />
Status against revision:  23396</code></p>
<p>I&#8217;ll add a comment to <code>wp-admin/setup-config.php </code>to see if the file is over written or just modified.</p>
<p>I don&#8217;t understand just yet how to diff my copy against the repository.  In other words, I can get a list of what I&#8217;ve changed since last update, but I can&#8217;t get a line-by-line list of what they&#8217;ve changed without doing an update.  However, doing an update should tell me if my changes overlap, and then I can manually incorporate them, but this probably won&#8217;t ever be an issue because the only thing I&#8217;ve changed in the core files is the tag output.  The analytics stuff is at the end of the file, where it&#8217;s not likely to overlap.</p>
<p>Added comment to <code>wp-admin/setup-config.php</code></p>
<p><code>[smithers]$ svn status -u<br />
       *     6188   wp-includes/functions.php<br />
M      *     6188   wp-admin/setup-config.php<br />
?                   wp-config.php<br />
?                   sitemap_gen.py<br />
?                   robots.txt<br />
?                   wp-content/uploads<br />
X                   wp-content/plugins/akismet<br />
?                   wp-content/plugins/code-markup.php<br />
M            6188   wp-content/themes/default/footer.php<br />
?                   config.xml<br />
?                   google5422237c5650fd97.html<br />
       *     6188   wp-settings.php<br />
?                   .htaccess</p>
<p>[smithers]$ svn update<br />
U    wp-includes/functions.php<br />
U    wp-settings.php<br />
G    wp-admin/setup-config.php</code></p>
<p>U means repository file changed, but I didn&#8217;t change it locally. (it had status<code>     *</code> before update)<br />
G means it changed both places(i.e. it had status<code> M   *</code> before update)</p>
<p>Removed change from <code>wp-admin/setup-config.php</code><br />
<code>[smithers]$ svn status -u<br />
?                   wp-config.php<br />
?                   sitemap_gen.py<br />
?                   robots.txt<br />
?                   wp-content/uploads<br />
X                   wp-content/plugins/akismet<br />
?                   wp-content/plugins/code-markup.php<br />
M            6294   wp-content/themes/default/footer.php<br />
?                   config.xml<br />
?                   google5422237c5650fd97.html<br />
?                   .htaccess<br />
M            6294   wp-admin/setup-config.php<br />
Status against revision:   6294</code></p>
<p>So <code>footer.php</code> remains modified, because it wasn&#8217;t updated, and <code>setup-config.php</code> remains M, because even though I removed the change I made, making it an exact copy of the repository version, it&#8217;s still different from the version in my .svn directory, which is fine, because I don&#8217;t plan on checking any changes into the repository any time in the forseeable future.</p>
<p>Ah, <code>svn update</code> doesn&#8217;t change the version number, it just updates the existing code, so totally new files don&#8217;t get added, and WP still thinks you&#8217;re running the old version, even though subversion reports the revision number correctly.  <code>svn update</code>, then <code>svn sw</code> is perhaps a good idea, then, so that repository changes get incorporated locally, then new files get added.  I don&#8217;t know if <code>svn sw</code> will overwrite, but I know one way to find out.  I&#8217;ll edit <code>wp-admin/setup-config.php</code> in my other blog that hasn&#8217;t been updated yet, and instead of <code>svn update</code>, I&#8217;ll do <code>svn sw</code>.</p>
<p>That results in a C status for the changed file, so that means I&#8217;ll have to manually incorporate changes.  This is most likely due to the fact my comment was at the top of the file.</p>
<p><code>[smithers]$ svn sw http://svn.automattic.com/wordpress/trunk/ .<br />
C    wp-admin/setup-config.php<br />
U    wp-admin/link-manager.php<br />
U    wp-admin/page.php<br />
U    wp-admin/export.php<br />
U    wp-admin/edit-pages.php<br />
U    wp-admin/categories.php<br />
Updated to revision 6294.</code></p>
<p>If you get a conflict, you need to do one of three things:</p>
<li>Merge the conflicted text “by hand” (by examining and editing the conflict markers within the file).</li>
<li>Copy one of the temporary files on top of your working file.</li>
<li>Run svn revert <filename> to throw away all of your local changes.</li>
<p>Once you&#8217;ve resolved the conflict, you need to let Subversion know by running <code>svn resolved</code>. This removes the three temporary files and Subversion no longer considers the file to be in a state of conflict.</p>
<p>You have to be in the directory to use <code>svn resolved</code>, or you get an error.</p>
<p><code>[smithers]$ svn resolved /wp-admin/setup-config.php<br />
svn: warning: '/wp-admin' is not a working copy</p>
<p>[smithers]$ cd wp-admin<br />
[smithers]$ svn resolved setup-config.php<br />
Resolved conflicted state of 'setup-config.php'</code></p>
<p>Some themes are in subversion, too, and since themes are more often customized, it makes sense to use subversion for them, especially the complicated K2 theme.</p>
<p>[smithers]$ svn status -u<br />
M      *      500   theloop.php<br />
       *      500   page.php<br />
       *      500   page-archives.php<br />
       *      500   single.php<br />
       *      500   k2.pot<br />
       *      500   header.php<br />
       *      500   attachment.php<br />
Status against revision:    582</code></p>
<p>So my changes to <code>theloop.php</code> will need to be merged.  Let's try <code>svn update</code>.</p>
<p><code>[smithers]$ svn update<br />
D    app/display/sbm/backup.php<br />
D    app/display/sbm/header.php<br />
D    css/rollingarchives.css<br />
A    css/options.css<br />
A    css/humanmsg.css<br />
A    css/rollingarchives.css.php<br />
A    css/header.php<br />
C    theloop.php<br />
A    images/sbmmanager/buttoncircle.png<br />
A    images/sbmmanager/deletebutton.png<br />
A    images/sbmmanager/undo.png<br />
U    single.php<br />
A    js/jquery.easing.js.php<br />
A    js/jquery.humanundo.js.php<br />
A    js/jquery.humanmsg.js.php<br />
Updated to revision 582.</code><br />
So I got the dreaded C for <code>theloop.php</code>, but I guess I should have expected that.<br />
<code>Rollingarchives.css</code> is gone, interesting, and <code>humanundo.js</code> functions have been added.  <code>Single.php</code> hasn't been changed, thankfully, nor has <code>header</code>, <code>footer</code>, or <code>style.css</code>.</p>
<p>Because of the revision marks in <code>theloop.php</code>, the file requires resolution of the changes before it will work again.  In my case, almost the whole file has changed, so I'll have to re-add the tag code to the new version.</p>
<p>Lots of changes to <code>theloop.php</code>.  One thing k2 does that I don't agree with is that it only shows tags on single post pages. I think it makes more sense to just stick the tags in the place of the categories at the top.</p>
<h3>Problems:</h3>
<li>no problems with updating blog code</li>
<li>I don't understand why there's <code>' .</code> surrounding everything in <code>theloop.php</code>.</li>
<li>Pasting the code in as before isn't working.  </li>
<li>Pasting just <code>the_tags(__('Tags: ','k2_domain'), ', ', '.')</code> replacing the <code>k2_nice_category</code>, works, but outputs before the time and date.</li>
<h3>Current status:</h3>
<li>All sites upgraded to 2.4.bleeding</li>
<li>K2 theme is updated on betablog, but my customizations to <code>theloop.php</code> aren't there because I need to figure out how to put them in the page of the categories.</li>
<h3>Useful subversion commands:</h3>
<p><code>svn status -u</code> shows which files will be updated when you run <code>svn update</code><br />
* means files will be updated<br />
M means files are files I've changed<br />
X means files are unversioned, but specified as an external, so this applies to WP supplied plug-ins.<br />
? means files aren't being tracked in subversion.  Files I've created, such as the sitemap generator, which isn't part of wordpress, will have ? Status.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/10/27/updated-to-24bleeding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Do you have too many passwords to remember?</title>
		<link>http://tech.williamgunn.org/2007/09/27/do-you-have-too-many-passwords-to-remember/</link>
		<comments>http://tech.williamgunn.org/2007/09/27/do-you-have-too-many-passwords-to-remember/#comments</comments>
		<pubDate>Thu, 27 Sep 2007 22:23:39 +0000</pubDate>
		<dc:creator>Mr. Gunn</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dreamhost]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[openid]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/09/27/do-you-have-too-many-passwords-to-remember/</guid>
		<description><![CDATA[Or worse, do you use the same password for everything? There&#8217;s a better way to do it, which relieves every site and blog owner from having to store your password. The way to do it is to set up an OpenID, that way the only person you have to blame is yourself, and perhaps your [...]]]></description>
			<content:encoded><![CDATA[<p>Or worse, do you use the same password for everything?  There&#8217;s a better way to do it, which relieves every site and blog owner from having to store your password.</p>
<p>The way to do it is to set up an <a href="http://openid.net">OpenID</a>, that way the only person you have to blame is yourself, and perhaps your hosting provider.  Another benefit of doing this touches on a big interest of mine: owning your identity online.  You see, in the OpenID scheme, your blog URL is all you provide upon login.  The rest of the information is exchanged by encrypted API handshake between the OpenID provider(your site) and the site you&#8217;re logging into(the OpenID consumer).</p>
<p>The setup is not for the faint of heart, but it&#8217;s not that bad, and I took good notes on how I did it.</p>
<p>To set up your site as an OpenID provider, you need to following things:</p>
<li>A hosting account running PHP5 and offering shell access, and a domain name.  I always recommend <a href="http://www.dreamhost.com/r.cgi?233446">Dreamhost</a>, and you can get hosting for only $5/month using my &#8220;Synthesis&#8221; promo code.
<p>Step 1: Download the latest and greatest version of the PHP script. It&#8217;s 0.6 currently.<br />
Log into your account, and open a shell window, and issue the following commands(if you&#8217;re not running PHP as CGI, you can   skip the .htaccess part and do everything from the root URL if you want &#8211; read below to see how to tell):<br />
<code>mkdir me<br />
cd me<br />
svn co https://www.siege.org/svn/oss/phpMyID/trunk/ .</code></p>
<p>Step 2: Visit MyID.config.php in your web browser.  You should see something like this:</p>
<blockquote><p>This is an OpenID server endpoint. For more information, see http://openid.net/<br />
Server: http://williamgunn.org/me/OpenID.config.php<br />
Realm: phpMyID<br />
Login</p></blockquote>
<p>Step 3: Edit MyID.config.php to include your info.<br />
- pick a username<br />
- generate and include your password hash<br />
<code> echo -n 'username:realm:password' | openssl md5</code><br />
- optionally enter personal information to be supplied to sites.</p>
<p>Step 4: Go back to your webbrowser and hit the login link on the page you loaded before.  Now, you&#8217;ll be redirected and a window will popup asking for your username and password.  Enter them, and press enter.  If you get a message saying you&#8217;re logged in, you&#8217;re not running php as a cgi, and you can skip the rest of this paragraph.  If you are running php as a cgi, you need a .htaccess file to convert the authentication headers into environment variables the script can use. This is because &#8220;<a href="http://us3.php.net/manual/en/features.http-auth.php">The HTTP Authentication hooks in PHP are only available when it is running as an Apache module and is hence not available in the CGI version.</a>&#8221;  The fix is easy: if you didn&#8217;t make a subdirectory above, make one, and create a file called .htaccess containing the following code(you may already have an example file in your root directory).<br />
<code># Option 1, mod_rewrite (req)<br />
RewriteEngine on<br />
RewriteCond %{HTTP:Authorization} !^$<br />
RewriteCond %{QUERY_STRING} openid.mode=authorize<br />
RewriteCond %{QUERY_STRING} !auth=<br />
RewriteCond %{REQUEST_METHOD} =GET<br />
RewriteRule (.*) %{REQUEST_URI}?%{QUERY_STRING}&amp;amp;auth=%{HTTP:Authorization} [L]</code></p>
<p>Now, you should be able to hit the login link and get logged in.</p>
<p>I get redirected to the following URL: <code>http://williamgunn.org/me/scriptname.config.php?openid.mode=id_res&amp;amp;openid.identity=http%3A%2F%2Fwilliamgunn.org%2Fme%2Fscriptname.config.php&amp;amp;openid.assoc_handle=[redacted]&amp;amp;openid.return_to=http%3A%2F%2Fwilliamgunn.org%2Fme%2Fscriptname.config.php&amp;amp;openid.signed=mode%2Cidentity%2Cassoc_handle%2Creturn_to&amp;amp;openid.sig=[more redacted stuff]</code></p>
<p>Now set allow_gmp and allow_test to true in the config file, allowing encryption aka Smart Mode and testing, and load http://yoursite.com/you/MyID.config.php?openid.mode=test</p>
<p>If you&#8217;re on Dreamhost, which doesn&#8217;t have support for GMP in the php binaries, you will get a file looking somewhat like this:</p>
<table border="1" cellpadding="4">
<tr>
<th>bcmath</th>
<td style="background: #ffff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">warn &#8211; not loaded</td>
</tr>
<tr>
<th>gmp</th>
<td style="background: #ffff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">warn &#8211; not loaded</td>
</tr>
<tr>
<th>logfile</th>
<td style="background: #ffff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">warn &#8211; log is not writable</td>
</tr>
<tr>
<th>session</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>secret</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>expire</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>base64</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>hmac</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>bigmath</th>
<td style="background: #ff9999 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">fail &#8211; big math functions are not available.</td>
</tr>
<tr>
<th>sha1_20</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
<tr>
<th>x_or</th>
<td style="background: #99ff99 none repeat scroll 0% 50%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial">pass</td>
</tr>
</table>
<p>If you get that, you&#8217;re ready to delegate your OpenID identity.  Go to the root of your domain(or wherever you wish) and enter the following code in a file named index.html<br />
<code>&lt;link href="http://williamgunn.org/me/OpenID.config.php" rel="openid.server" /&gt;<br />
&lt;link href="http://williamgunn.org/me/OpenID.config.php" rel="openid.delegate" /&gt;</code><br />
If you&#8217;ve already got an index.html, just put the two lines in the head section.</p>
<p>Now go login to a <a href="https://www.myopenid.com/directory">site that accepts OpenID</a>, or just leave a comment <a href="http://betablog.williamgunn.org/">here</a>. See here for details on how to set up your site to accept OpenID, see <a href="http://tech.williamgunn.org/2007/09/27/are-you-worried-about-a-hacker-stealing-the-logon-details-of-people-who-have-registered-at-your-site/">here</a>.</li>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/09/27/do-you-have-too-many-passwords-to-remember/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Write a Google sitemap for your WordPress blog</title>
		<link>http://tech.williamgunn.org/2007/09/21/write-a-google-sitemap-for-your-wordpress-blog/</link>
		<comments>http://tech.williamgunn.org/2007/09/21/write-a-google-sitemap-for-your-wordpress-blog/#comments</comments>
		<pubDate>Fri, 21 Sep 2007 23:17:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[dreamhost]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[sitemap]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/09/21/write-a-google-sitemap-for-your-wordpress-blog/</guid>
		<description><![CDATA[One of the most effective ways to increase the visibility of your content is to make sure it&#8217;s indexed regularly by Google. However, the Googlebot sometimes has a hard time with database-driven websites like WordPress blogs, so it helps if you tell Google which URLs to visit. The way to do that is with an [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most effective ways to increase the visibility of your content is to make sure it&#8217;s indexed regularly by Google.  However, the Googlebot sometimes has a hard time with database-driven websites like WordPress blogs, so it helps if you tell Google which URLs to visit.  The way to do that is with an XML sitemap.  There are a couple different kinds of sitemaps, which work with different search engines, but I&#8217;m only going to talk about the XML sitemap supported by Google and Yahoo.  There&#8217;s also a Google sitemap generator for WordPress, but if you&#8217;re like me, you try to keep the number of active plug-ins to a minimum to make your site as fast as possible.</p>
<p>Not only will a sitemap ensure Google has the freshest content from your site, but it will also make your site run faster by telling the Googlebot that it doesn&#8217;t need to crawl your back archives with the same frequency as your front page.  This is especially important for shared hosting situations like Dreamhost. Because <a href="http://wiki.dreamhost.com/Googlebot_behaving_badly#Googlebot_behaving_badly">the Googlebot alone can use 50% of the CPU of the shared server</a>, if your site isn&#8217;t configured properly, you could bog down the server for everyone else and even get your site taken offline<sup><a href="#1">1</a></sup>.</p>
<p>To set this up you&#8217;ll need an account with Google Webmaster Tools, the downloadable <a href="https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html">sitemap generator</a>, and a hosting account that uses <a href="http://www.analog.cx/">Analog </a>logging and offers python support.  I use <a href="http://dreamhost.com">Dreamhost</a>.  If you need a host, <a href="https://dreamhost.com/signup/">check &#8216;em out</a> <i>(and use promo code &#8220;Synthesis&#8221; to get your first year for $60)</i>.</p>
<p>First, download the program and upload it to the base directory of your website.  Unzip the package and open up config-example.xml.  In config-example.xml are the parameters that control how the URL list that makes up the sitemap is generated.  You&#8217;ll need to rename this to config.xml for it to work.  There are two steps to setting up config.xml: Including URLs, and excluding URLs.  Because sitemap_gen doesn&#8217;t do any crawling itself, you have to supply it with a list of URLs.  One simple way to do this is with a text listing of URLs, but manually adding to this list every time you wrote a new post would get tedious. Conveniently, sitemap_gen can parse logfiles, so you can use your logs as the URL list. The frequency with which URLs appear in your logs also allows sitemap_gen to assign a priority score to each URL, letting the Googlebot know which pages to update more frequently and which pages it doesn&#8217;t need to crawl as often.</p>
<p>Next, find the section in config.xml that says, &#8220;The &#8220;site&#8221; node describes your basic web site.&#8221;  In this section, you want to replace <code>http://www.example.com</code> with the path to your site.  Replace <code>/var/www/docroot/sitemap.xml.gz</code> or whatever comes after <code>store_into</code> with the name of your sitemap.  I used sitemap.xml.gz, to generate a compressed sitemap for google to read.</p>
<p>Moving down the file, find the INPUTS section.  This is where you will specify which URLs to <strong>include </strong>in the sitemap.  This part if broken up into sections which contain different link inclusion mechanisms.  You can only use one mechanism at a time, so delete or comment out the sections until you get to the one that talks about accesslogs.  Remove two of the three example statements in brackets in this section, and modify the remaining one to contain the full path to your access logs. You can use the * character to specify all the logs in the directory like so <code>&lt;accesslog path="/path/to/logs/access.log*" encoding="UTF-8" /&gt; </code>.  Delete the remaining sections in the INPUTS section.</p>
<p>The next section is the filters section.  This is where you will specify which URLs to <strong>exclude</strong>.  You can do a lot of fancy stuff here, but the most important thing for WordPress is to remove URLs that lead to non-content pages, like wp-login, for example<a href="#2"><sup>2</sup></a>.  In these statements you tell sitemap_gen which URLs to add or remove from the list, using normal wildcards or regular expressions.  I recommend keeping this as simple as possible, avoiding the use of pass statements because those act like short circuits and will leave matching URLs in the list no matter what you specify later, and in conjunction with regular expressions, this can sometimes be non-intuitive and hard to debug.</p>
<p>Here&#8217;s my filters section:</p>
<pre>
<code>&lt;filter action="drop"  type="regexp"  pattern="/wp-admin/"    /&gt;  </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/wp-login/"       /&gt; </code>
 <code> &lt;filter action="drop"  type="regexp"  pattern="wp-cron\.php"    /&gt;    </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="wp-login\.php"      /&gt;  </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/doc/"        /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/noexist_" /&gt;       </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/\?p=[\d]"      /&gt;  </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/\?s=[a-zA-Z0-9]" /&gt;       </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/Photos/tags/.*\.html" /&gt;       </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/Photos/tags/.*/tags/"    /&gt;    </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/wp-content/"  /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/wp-includes/" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/stats/" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/_vti_bin/" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/MSOffice/" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/dh_phpmyadmin/"/&gt; </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/htmledit/" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/robots\.txt" /&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/sitemap\.xml"/&gt; </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/xmlrpc\.php" /&gt;</code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.jpg"         /&gt;</code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.tif"         /&gt;</code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.tiff"        /&gt; </code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.bmp"       /&gt;  </code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.ico"         /&gt;</code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.js"         /&gt;</code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.css"       /&gt;  </code>
<code>  &lt;filter action="drop"  type="wildcard"  pattern="*.gif"        /&gt; </code>
 <code>    &lt;!-- Exclude URLs within UNIX-style hidden files or directories       --&gt;</code>
<code>  &lt;filter action="drop"  type="regexp"    pattern="/\.[^/]*"   /&gt;  </code>
</pre>
<p>That&#8217;s all fairly straightforward, I hope, but two things merit explaining.  The section below</p>
<pre>
<code>&lt;filter action="drop"  type="regexp"  pattern="/\?p=[\d]"     /&gt;   </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/\?s=[a-zA-Z0-9]"    /&gt;    </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/Photos/tags/.*\.html"    /&gt;    </code>
<code>  &lt;filter action="drop"  type="regexp"  pattern="/Photos/tags/.*/tags/"     /&gt;   </code>
</pre>
<p>is an example of one way to remove redundant URLs from your list.  You don&#8217;t need the &#8220;Pretty URL&#8221; to your site and the /p?number URL both, and if you&#8217;ve changed that setting recently, they will both show up in your logs.  The <code>/\?p=[\d]</code> string tells site_gen to exclude any URL of the form /p?some number.  Also, you don&#8217;t necessarily need search result pages to appear in the list, so the next line takes care of that.  The following two lines are for use with the <a href="http://www.silaspartners.com/">Flickr Photo Gallery</a> plugin.  This plugin allows you to browse your tags just as you would at Flickr, but this creates a URL problem when the site is crawled, resulting in 90% of your logs being composed of redundant crap.  Those two lines remove all the URLs pertaining to the gallery except gallery pages and display pages for a single tag.</p>
<p>The next thing worth mentioning is the lines below, which are generated when someone using IE visits your page with the discussion toolbar loaded.  IE looks to see if your site supports it, which mine doesn&#8217;t.</p>
<p><code>&lt;filter action="drop"  type="regexp"  pattern="/_vti_bin/" /&gt; </code><br />
<code>&lt;filter action="drop"  type="regexp"  pattern="/MSOffice/" /&gt;</code></p>
<p>After processing your logs and applying some intelligent filter rules to exclude URLs that aren&#8217;t content-containing parts of your site, you&#8217;re ready to submit. Run <code>python sitemap_gen.py --config=config.xml --testing</code>, extract the sitemap.xml file from sitemap.xml.gz, and load it in your browser.  Look through it and make sure your rules have worked as expected, then run the command again, removing the &#8211;testing part.  If you want to get fancy, you can set this up as a cron job.  If you do, run it on access.log.0, yesterdays logs, around 2am.  That way you don&#8217;t miss any traffic as the logging switches over at midnight.</p>
<p>Finally, log into Google webmaster tools and submit your sitemap to Google!</p>
<p><a name="1"></a>To see how must of your traffic is coming from the Googlebot, SSH to your server and run <code>tail -10000 access.log| awk '{print $1}' | sort | uniq -c |sort -n</code> from the same directory as your access.log files.  The first number is the connections, the second is the IP making those connections.  IPs that start with 66.249 are the Googlebot.  If 66.249 is the last entry, and the number of connections is very high(over a thousand, say) and many times bigger than the number of connections for the second most frequent IP, you probably need to do something before the hosting company does something for you, like ban Google from accessing your site.<br />
<a name="2"></a>I&#8217;m not exactly sure if it would be better to leave some things in, but set to a zero priority, however I have non-content stuff removed for now.   Really, the non-content pages should probably be excluded in robots.txt</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/09/21/write-a-google-sitemap-for-your-wordpress-blog/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>I&#8217;ve had enough.</title>
		<link>http://tech.williamgunn.org/2007/08/25/ive-had-enough/</link>
		<comments>http://tech.williamgunn.org/2007/08/25/ive-had-enough/#comments</comments>
		<pubDate>Sun, 26 Aug 2007 02:07:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[asides]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/08/25/ive-had-enough/</guid>
		<description><![CDATA[I&#8217;m sticking with the default lame-ass Kubrick theme, as it seems to be the only one that plug-in developers test against, and I don&#8217;t have time to mess around editing the template to fix one thing while breaking another. EDIT: I couldn&#8217;t resist, I&#8217;m trying K2]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m sticking with the default lame-ass Kubrick theme, as it seems to be the only one that plug-in developers test against, and I don&#8217;t have time to mess around editing the template to fix one thing while breaking another.</p>
<p><strong>EDIT: I couldn&#8217;t resist, I&#8217;m trying K2</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/08/25/ive-had-enough/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Documents and WordPress</title>
		<link>http://tech.williamgunn.org/2007/06/27/google-documents-and-wordpress/</link>
		<comments>http://tech.williamgunn.org/2007/06/27/google-documents-and-wordpress/#comments</comments>
		<pubDate>Wed, 27 Jun 2007 17:26:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bad]]></category>
		<category><![CDATA[dissertation]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google documents]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/06/27/google-documents-and-wordpress/</guid>
		<description><![CDATA[My dissertation post is here. When I edit the document at Google Documents and republish, it overwrites the post, so any explanatory text or tags are lost. One thing that is a little annoying is how it tries to take over the right-click context menu. I end up with the Google Document right-click menu opening [...]]]></description>
			<content:encoded><![CDATA[<p>My dissertation post is <a href="http://www.synthesis.williamgunn.org/2007/06/27/111/">here</a>.   When I edit the document at Google Documents and republish, it overwrites the post, so any explanatory text or tags are lost.  One thing that is a little annoying is how it tries to take over the right-click context menu.  I end up with the Google Document right-click menu opening up, with the Firefox context menu on top of it, obscuring the top half of the google menu.</p>
<p>I would have thought Google would have known better than to try to subvert such an important browser function.  Bad Google, Bad!</p>
<p>The good: revision control, easy collaboration, seamless output to many formats, rich editing features.<br />
The bad: post metadata isn&#8217;t preserved, non-standard browser UI, no way(I know of) to put the post on a separate page.</p>
<p>Maybe I could get the best of both by sticking the RSS feed of revisions on a separate WP page.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/06/27/google-documents-and-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keywords work now, and editing works.  Now to get widgets figured out.</title>
		<link>http://tech.williamgunn.org/2007/06/13/keywords-work-now-and-editing-works-now-to-get-widgets-figured-out/</link>
		<comments>http://tech.williamgunn.org/2007/06/13/keywords-work-now-and-editing-works-now-to-get-widgets-figured-out/#comments</comments>
		<pubDate>Wed, 13 Jun 2007 13:19:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[asides]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[theme]]></category>
		<category><![CDATA[tiga]]></category>
		<category><![CDATA[widgets]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/06/13/keywords-work-now-and-editing-works-now-to-get-widgets-figured-out/</guid>
		<description><![CDATA[Deleting posts from the manage page doesn&#8217;t work, but deleting from the edit entry page does work. There are about 10 support threads at WordPress for this, but no resolution. The ones where it was a rights issue have been figured out, but not the weird behavior of the manage page. Because the widgets work [...]]]></description>
			<content:encoded><![CDATA[<p>Deleting posts from the manage page doesn&#8217;t work, but deleting from the edit entry page does work.  There are about 10 support threads at WordPress for this, but no resolution.  The ones where it was a rights issue have been figured out, but not the weird behavior of the manage page.</p>
<p>Because the widgets work in the default theme, but not in Tiga, there must be some weirdness with the theme, but I should be able to paste the widget code into sidebar.php in the theme directory.</p>
<p>I don&#8217;t think wp-admin/widgets.php works with tiga, because it expects wp-content/plugins/widgets.php.  I&#8217;ll have to check that soon, and in the mean time, I could probably just paste the code in.</p>
<p>Replacing tiga&#8217;s sidebar.php with the default&#8217;s works, but the formatting is screwed up.  I need to figure out what parts of the default sidebar need to be reproduced in tiga&#8217;s.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/06/13/keywords-work-now-and-editing-works-now-to-get-widgets-figured-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I&#8217;m having issues with my old theme, Tiga, and the new wordpress.</title>
		<link>http://tech.williamgunn.org/2007/06/12/im-having-issues-with-my-old-theme-tiga-and-the-new-wordpress/</link>
		<comments>http://tech.williamgunn.org/2007/06/12/im-having-issues-with-my-old-theme-tiga-and-the-new-wordpress/#comments</comments>
		<pubDate>Wed, 13 Jun 2007 00:22:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[asides]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[keywords]]></category>
		<category><![CDATA[painintheass]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[sidebar]]></category>
		<category><![CDATA[theme]]></category>
		<category><![CDATA[upgrade]]></category>
		<category><![CDATA[widgets]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://tech.williamgunn.org/2007/06/12/im-having-issues-with-my-old-theme-tiga-and-the-new-wordpress/</guid>
		<description><![CDATA[Some things don&#8217;t work until I get this figured out. Specifically, the jerome&#8217;s keywords plugin doesn&#8217;t work with both Tiga and wordpress2.2, though they work with either alone. Many of the fancier sidebar widgets don&#8217;t work, like the one that displays RSS feeds. Not only that, but I can&#8217;t delete posts.]]></description>
			<content:encoded><![CDATA[<p>Some things don&#8217;t work until I get this figured out.</p>
<p>Specifically, the jerome&#8217;s keywords plugin doesn&#8217;t work with both Tiga and wordpress2.2, though they work with either alone.<br />
Many of the fancier sidebar widgets don&#8217;t work, like the one that displays RSS feeds.</p>
<p>Not only that, but I can&#8217;t delete posts.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.williamgunn.org/2007/06/12/im-having-issues-with-my-old-theme-tiga-and-the-new-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

