First look: Google Cloud Machine Learning soars

Four, rich, pretrained machine learning APIs bring the smarts behind Google to your apps

1 2 Page 2
Page 2 of 2

Beyond that, the translation API is straightforward. Supply the source and target language codes, as many source strings as you wish, your API key, and optionally specify the output format. Options include HTML or plain text, pretty printing (using indentations and line breaks), and supplying a callback function.

google cloud translate api russian

Here we are testing the English to Russian pair in the Translate API. The English source “Good, fresh Russian black bread” translates to a textbook Russian phrase that uses all of the adjective endings. Only the placement of “русский” (Russian) at the end of the sentence differs from the textbook version of the phrase.

In many cases, such as the example above, the translation will be of high quality. In others, such as the example below, the translation will fail spectacularly.

By way of comparison, Haven OnDemand can currently identify 85 languages and perform sentiment analysis on them, but cannot translate them. Azure Cognitive Services can detect sentiment and key phrases in four languages, but cannot do translations, though Bing translations are almost as common on the web as Google translations.

google cloud translate api genesis

Here we attempt to translate Genesis 1:2 from Hebrew to English. An acceptable translation would be something close to “Now the earth was unformed and void, and darkness was upon the face of the deep; and the spirit of God hovered over the face of the waters.” The machine translation, “And Hartz, Hith Tho and Bho, and Hsc, upon-Fni pit; And Roh god, Mrhft upon-Fni Hmim,” is ludicrous. Strangely, some of phrases making up this line of Biblical Hebrew are translated well by the API when isolated, for example “תֹהוּוָבֹהוּ”, tohu v’bohu, a little like the English “topsy-turvy,” is correctly translated as “chaos.” I honestly thought that Google Translate would do OK with this line. (Yes, Hebrew is a completely supported language.) I think it was confused by the grammar near the beginning of the line; a human would have put the correct words together, but the machine couldn't.

Google Cloud Vision API

The Google Cloud Vision API is a trained ML service for categorizing images and extracting various features. It can classify images into thousands of pretrained categories, ranging from generic objects and animals found in the image (such as a cat), to general conditions (for example, dusk), to specific landmarks (Eiffel Tower, Grand Canyon), and identify general properties of the image, such as its dominant colors. It can isolate areas that are faces, then apply geometric (facial orientation and landmarks) and emotional analyses to the faces, although it does not recognize faces as belonging to specific people. The Vision API can also read and extract text from images in some 10 languages, identify product logos, and detect adult, violent, and medical content.

You can construct a JSON request for the Cloud Vision API that either contains the image (in base64 format) or points to the image (in a Google Cloud Storage bucket). The request also needs to contain a list of the features you want to extract, along with the maximum number of items to return for each feature. You can request processing of multiple images in one call, but you’d risk running into the total size limitation.

I managed to run into the size limit for single images on the first JPEG I tried on the service. I naively took a high-quality APS-C DSLR JPEG I had exported from my Lightroom catalog and tried getting a label for it using Python code (checked out from GitHub) from the Google Label Detection tutorial. After struggling with and solving some authentication issues, I got a mysterious Error 400 with “Request Admission Denied.” An email query to my contacts at Google got me the suggestion to look at the Best Practices for the Vision service; as it happens, my file was 6MB, and the limit is 4MB. I generated another version with a lower JPEG quality that was less than 4MB, and this time got a correct label back from the service.

Google suggests several applications for the Vision API. One is to catalog your image collection because not everyone faithfully adds keyword tags to all their images, and not every cloud photo service retains EXIF data in uploaded images. Another is to detect and moderate offensive content in images. (No, I did not try to test that myself. Google has plenty of experience filtering offensive material from image searches.)

google cloud vision api

Vision Explorer is a demo of the Google Cloud Vision API that processed 80,000 images from the Wikimedia Commons to extract all of the features that the Vision API can produce. In this particular case, the API successfully labeled the image as “architecture” with a 75 percent probability, and “lighting” and “dusk” with lower probabilities. It also recognized faces on the statues in the frieze and characterized each one.

Further suggested applications include tasks like finding your logo in images on social media, detecting emotions from faces in those images, and automatically extracting text from selected images. If you wanted to get wild and crazy, you might pipe all of the non-English text retrieved into the Cloud Translate service and analyze the sentiment of all of the OCR’d text using the Cloud Natural Language API.

Among the competition, Haven OnDemand offers four image analysis services: bar-code recognition, face detection, corporate logo recognition, and OCR. The Google Cloud Vision API doesn’t do bar-code recognition, but returns more information about detected faces, recognizes many more items than logos, and has a more mature OCR implementation. (HPE’s OCR is still in preview.)

IBM Bluemix offers a Watson Visual Recognition service that does general classification, face detection, text extraction (English-only, beta), and visual training and tagging. Google Cloud Vision is better at the first three (and offers more capabilities in those areas), but doesn’t do training. Visual training is something you can do with Google TensorFlow now and should be able to do with the Cloud Machine Learning Platform when it is available to the public.

Microsoft Azure Cognitive Services has Face and Emotion APIs that are currently in preview. The Face API does face detection, verification, identification, grouping, and similar face searching; the Emotion API classifies the mood of faces detected by the Face API. These two APIs together provide a subset of the capabilities of Google Cloud Vision.

Machine learning at your service

As we’ve seen, the four Google applied machine learning APIs discussed -- the beta natural-language processing and speech-to-text APIs and the production language translation and vision classification APIs -- are based on engines that have long histories of production use at Google, with millions of requests served for consumer-facing services. In most cases, a given feature of the Cloud Machine Learning APIs will perform as well or better than competitive APIs from HPE, IBM, and Microsoft, and will have more options.

For example, Google Cloud Speech (in beta) transcribes more than 80 languages and variants; its nearest competitor, Microsoft Bing Speech Recognition, supports 28 languages and variants. The accuracy? Well, it’ll depend as much on the conditions as the service, but my experience with Google voice search and Cortana gives a slight nod to Google.

Nevertheless, as the car ads say in fine print, your mileage may vary. If you’re considering using natural-language processing, speech-to-text, translation, or vision APIs, the Google Cloud Machine Learning services are worth testing in your application and on your data.

The dollar cost of trying them out is minimal because of the free monthly service allowances. The effort to try them out is fairly low. I learned to use all four APIs and all three kinds of authentication over a weekend, and I had only one glitch, which I would have been able to avoid had I read the best-practices documentation before trying the vision API.

Google Cloud Machine Learning pricing 

Cloud Natural Language API: Priced per feature per thousand records, ranging from 25 cents to $2 depending on feature and quantity; first 50,000 records per month are free

Cloud Speech API: 0.6 cents per 15 seconds; first 60 minutes per month are free

Cloud Translate API: $20 per 1 million characters of text for translation, plus $20 per 1 million characters of text for language detection

Cloud Vision API: Priced per feature and prices drop with increased usage; prices range from 60 cents per thousand features to $5 per thousand features; first 1,000 requests per month are free

Cloud storage (per gigabyte per month): Standard Storage 2.6 cents, Durable Reduced Availability (DRA) Storage 2 cents, Nearline Storage 1 cent

This story, "First look: Google Cloud Machine Learning soars" was originally published by InfoWorld.

1 2 Page 2
Page 2 of 2
5 collaboration tools that enhance Microsoft Office
  
Shop Tech Products at Amazon