JasonDaly.name

PHP, Ruby, Symfony, Rails, Doctrine, MooTools. Web Development.

Posts tagged with "spellcheck"

August 9, 2011

Spelling Suggestions from Google in Ruby

Google used to offer a SOAP API for spelling suggestion/correction but put it out of service in November 2010. Since then the only way I had found to reliably get Google’s recommended spelling suggestion for an incorrectly spelled phrase was through the same interface their toolbar browser extension uses to help correct spelling mistakes. My SpellCheck project is a PHP5 tool that asks Google it’s suggestion for a given phrase the same way their own toolbar does and based on the returned response, the original phrase will be parsed and updated to reflect the recommended changes.

How the Google Toolbar Used to Make Replacements

The Google toolbar sends and HTTP POST request to https://www.google.com/tbproxy/spell?lang=en&hl=en originally containing an XML body as shown below (Note: appls and ornages is the phrase being queried).

<?xml version="1.0" encoding="utf-8" ?>  
  <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">  
    <text>appls and ornages</text>  
  </spellrequest>

The returned response would come as XML too, with a body something like

<?xml version="1.0" encoding="utf-8" ?>
  <suggestions>
    <c o="0" l="5">apples\tapple\tapps</c>
    <c o="10" l="7">oranges\torange</c>
  </suggestions>

Each c node contains an o attribute which is the starting point of a word to be replaced and an l attribute which is the length of the original word to be replaced. The text content of each c node is a tab-delimited list of suggestions in order of it’s potential to be what you really meant to type.

Google’s HTTP responses no longer contain o or l attributes, suggesting their toolbar does a bit more work to determine where replacements should be made based on the suggestions returned.

Moving to Ruby and Another Solution

Rewriting an application in Ruby I needed to find a different way to get suggestions from Google. Using Nokogiri’s CSS selector support, This proved to be trivial.

require 'open-uri'
require 'nokogiri'
require 'awesome_print'

query = CGI::escape(ARGV)
doc = Nokogiri::HTML(open("http://www.google.com/search?q=#{query}"))
nodes = doc.css('#topstuff p a')

ap nodes[0].content if nodes.length > 0

(Note: You can put this in spellsuggest.rb and run ruby spell_suggest.rb "appls and ornages")

The main Google search results page is queried and parsed instead of working with the Google toolbar. The added benefit of this solution is that the suggestions seem to account for context better than the Google toolbar’s. Google’s toolbar suggestions seem to inspect and process each word independently, whereas main search page accounts for the entire phrase. A word which alone might be considered incorrectly spelled may make perfect sense in context (for example, my last name Daly might come back with suggestion Daily when the query is Jason Daly, however the Ruby solution leveraging Google’s main search page returns no suggestion).

4 notes Tags: ruby nokogiri open-uri google code php spelling spellcheck

May 8, 2010

Introducing SpellCheck

After learning Github is now offering SVN support (nearly all of my development work is done using SVN), I decided it was time to properly version my small changes to the great gRaphael library by forking the original code with my own account. I also decided to start to maintain smaller utilities I write for personal use through git on Github as well. The first of these packages is \Deefour\SpellCheck.

SpellCheck v1.0

Introducing ”SpellCheck v1.0”. As mentioned in the README,

SpellCheck leverages “…the XML request/response used by the Google Toolbar…” accepting “… a string to be transformed into a corrected version of itself.”

This class simply makes all corrections suggested by Google to the original string passed in. Admittedly, this is not as flexible as some will like, but for now it suits my needs and is a great start. Some small points:

Usage instructions and code can be found in the v1.0 Tag on GitHub.

Tags: php spellcheck releases v1.0 code github 5.3 php5.3