JasonDaly.name

PHP, Ruby, Symfony, Rails, Doctrine, MooTools. Web Development.

Posts tagged with "nokogiri"

August 9, 2011

Spelling Suggestions from Google in Ruby

Google used to offer a SOAP API for spelling suggestion/correction but put it out of service in November 2010. Since then the only way I had found to reliably get Google’s recommended spelling suggestion for an incorrectly spelled phrase was through the same interface their toolbar browser extension uses to help correct spelling mistakes. My SpellCheck project is a PHP5 tool that asks Google it’s suggestion for a given phrase the same way their own toolbar does and based on the returned response, the original phrase will be parsed and updated to reflect the recommended changes.

How the Google Toolbar Used to Make Replacements

The Google toolbar sends and HTTP POST request to https://www.google.com/tbproxy/spell?lang=en&hl=en originally containing an XML body as shown below (Note: appls and ornages is the phrase being queried).

<?xml version="1.0" encoding="utf-8" ?>  
  <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">  
    <text>appls and ornages</text>  
  </spellrequest>

The returned response would come as XML too, with a body something like

<?xml version="1.0" encoding="utf-8" ?>
  <suggestions>
    <c o="0" l="5">apples\tapple\tapps</c>
    <c o="10" l="7">oranges\torange</c>
  </suggestions>

Each c node contains an o attribute which is the starting point of a word to be replaced and an l attribute which is the length of the original word to be replaced. The text content of each c node is a tab-delimited list of suggestions in order of it’s potential to be what you really meant to type.

Google’s HTTP responses no longer contain o or l attributes, suggesting their toolbar does a bit more work to determine where replacements should be made based on the suggestions returned.

Moving to Ruby and Another Solution

Rewriting an application in Ruby I needed to find a different way to get suggestions from Google. Using Nokogiri’s CSS selector support, This proved to be trivial.

require 'open-uri'
require 'nokogiri'
require 'awesome_print'

query = CGI::escape(ARGV)
doc = Nokogiri::HTML(open("http://www.google.com/search?q=#{query}"))
nodes = doc.css('#topstuff p a')

ap nodes[0].content if nodes.length > 0

(Note: You can put this in spellsuggest.rb and run ruby spell_suggest.rb "appls and ornages")

The main Google search results page is queried and parsed instead of working with the Google toolbar. The added benefit of this solution is that the suggestions seem to account for context better than the Google toolbar’s. Google’s toolbar suggestions seem to inspect and process each word independently, whereas main search page accounts for the entire phrase. A word which alone might be considered incorrectly spelled may make perfect sense in context (for example, my last name Daly might come back with suggestion Daily when the query is Jason Daly, however the Ruby solution leveraging Google’s main search page returns no suggestion).

6 notes Tags: ruby nokogiri open-uri google code php spelling spellcheck