What recruiters know about you is about to get a whole lot deeper than what you put on your resume. An emerging class of search engines is taking a big data approach to recruiting by crawling the Web for every bit of data about you, assembling it into a master profile, rating your knowledge, skill levels and interests, and serving it up to recruiters who can filter it by location, skill, the school you attended and a range of other criteria.
Today the technology is mainly used as a tool for finding scarce software development talent. But that could broaden into more types of jobs, including high-tech, legal, medical and engineering, according to the vendors and an analyst. (Read about some of the privacy implications of this technology.)
What Gild's algorithms tell recruiters about you
Here are two examples of Gild's algorithms for evaluating skills and knowledge based on what the company finds out about a subject on the Web. The company has created more than 50,000 of these "features," or rules, for its Gild Source service.
Using Bayesian analysis, Gild claims it can predict how skilled a subject might be even little data is available, such as when a person has no open-source code available for evaluation.
Raw data: In an online profile you describe yourself as proficient in C/C++.
Conclusion: You're not very good at either.
Logic: C and C++ are completely different languages. So why would you lump them together? Listing them together indicates that you may have just put them in as checklist items.
Raw data: On Twitter you recently said that Celery sucks.
Conclusion: You have knowledge of Python, Django and Celery.
Logic: The fact that you dislike the asynchronous processing toolkit, written in Python and used extensively in Python Web development, means you're not only familiar with Celery but almost certainly are knowledgeable about Python and Django, with which Celery is commonly used.
Last year Red Hat hired more than 1,000 people. But it wasn't easy to find the software development and engineering talent needed to fill many of those seats. "We use LinkedIn Recruiter extensively," says CIO Lee Congdon.
But the top-notch talent that the open-source software developer is looking for doesn't always bother keeping an updated resume on LinkedIn or elsewhere, and many of the best software engineers don't need to look on job boards for a better position.
So this year, Red Hat decided to be more proactive. It began using a cloud-based service from Gild that takes a big data approach, mining the social Web to identify and evaluate qualified talent.
Working with Gild, Red Hat was able to quickly come up with a ranked list of prospective software engineering candidates, complete with contact information that in some cases Gild harvests from the prospect's source code. "We're very satisfied with the early results," Congdon says.
Red Hat tested the tool by scoring some known quantities: People who had been previously hired. In each case Gild Source's report accurately scored them as a good fit for the job. While Congdon declined to discuss specific hires, he says the correlation between traditional recruiting methods and Gild Source, as Gild's service is called, has been "notable."
In addition to identifying new prospects, it also correctly identified qualified individuals that Red Hat had previously found using its traditional recruiting tools. Gild Source gave Red Hat a longer list but it also correctly identified candidates they had already considered qualified using traditional recruiting methods. The fact that it included the same people in its list validated the tool, in Congdon's eyes.
Comparing services
Gild, along with competitors RemarkableHire, TalentBin and Entelo, are part of an emerging niche of companies that mine social activity on the Web to help recruiters discover and evaluate skilled technical talent quickly -- without waiting for qualified potential candidates to self-identify by building and updating a profile on online job boards and/or LinkedIn.
How RemarkableHire processes your info
Raw data: Your Ruby repositories on GitHub have a large number of reputable followers.
Conclusion: You are skilled in Ruby development.
Logic: You are making contributions that the community deems valuable. If those followers are highly rated by RemarkableHire's algorithms, they carry even more weight, resulting in an even higher aptitude score.
Raw data:You tweet about Java frequently.
Conclusion: None.
Logic: One-way social contributions that lack a response from the community are meaningless. Some talented Java developers tweet about Java, but so do poor Java developers and recruiters looking to fill Java developer roles.
Gild has 6 million profiles. TalentBin claims to have "tens of millions," while RemarkableHire says "we are in the single-digit millions of complete/matched/merged profiles." But as with other types of search engines, says Scott Rothrock, president and co-founder of RemarkableHire, what matters is the ability to put the best possible matches on the first few pages of results.
It's best to take those numbers with a grain of salt, says Peter Kazanjy, CEO at TalentBin, because everyone defines a profile differently. Profiles may be incomplete, or information from different sources may not be matched up into a single profile.
Content from some sources, such as GitHub, may be crawled and fully indexed while other data simply establishes that, for example, the user has a Twitter profile without indexing or analyzing the subject's tweets. The profile record might include a link to the Twitter account but not know that the person has been tweeting extensively about Ruby.
So it's important to understand when comparing services not just which sites the service includes in its search results, but what gets indexed and analyzed from those sites and what doesn't.
How TalentBin processes your info
Raw data: You are the sender on a number of email messages on a Objective-C online email list referencing Core Audio, Core Data and Core Animation in the text of the email.
Conclusion: You have familiarity with iOS and Mac OS X development, especially as regards the audio, data processing and UI animation parts of the language. As such, your experience would be relevant in rich iOS apps that deal with audio and stored user state.
Logic: Core Audio is the library in iOS that is used for audio processing, while Core Data is used for storing user data and synching it with iTunes, while Core Animation is the toolkit that allows for rich animations.
Raw data: You are a member of both the Quantified Self Meetup and Cassandra Users Group Meetup on Meetup.com, and have frequently RSVP'd to their events.
Conclusion: You would be an interesting candidate for a Fitbit, NikeFuel, RunKeeper or Jawbone-type wearable computing software engineering role.
Logic: As a member of the "Quantified Self" meetup, you have demonstrated an interest in the instrumentation of the human body, and as a member of the Cassandra User Group, you have shown an interest in a key tool used for the management and analysis of the "big data" that these various wearable computing companies create.
Raw data: You are listed as an inventor, with five others, on a patent filed in 2012, regarding VMware virtual machine memory handling and moving virtual machines across wide area networks.
Conclusion: You have experience with virtual machine memory, high-performance networking and virtualization, and worked at VMware recently.
Logic: As one of five listed inventors, you were likely a key contributor on the project, and thus have familiarity with the underlying technology in the patent and, more largely, at VMware as an organization.
These services are available by subscription; you pay for use of the tool, not by the search or according to the number of names returned.
Prices range from $6,000 per year per seat for TalentBin, to $349/month for RemarkableHire, to $8,400/year ($700/month) for Gild Source. Gild also offers a 90-day license for $2,700 or $900/month.
The startups are benefitting from a growing trend in recruiting. In response to the high demand for high-tech talent, many large organizations have assembled sourcing teams. These are specialized recruiting groups that look for highly qualified people, which include "passive candidates" who aren't necessarily looking for a job, says Sarah White, principal strategist with Sarah White Associates, an analysis firm that specializes in recruiting technology.
She thinks the idea could spread well beyond just recruiting software engineers. "Two years ago these product didn't even exist, but we are already seeing it go beyond the developer and software engineering area" to other technical disciplines and even sales and marketing, she says.
While people in other positions tend to have a smaller online footprint than open source software developers, there's still plenty to mine, these vendors argue, both in social media and in other areas, such as patent databases for engineering roles and PubMed in the healthcare field.
Congdon is a believer. "It will be interesting to watch the dynamics in the marketplace," he says: "In the future, your online body of work will speak more loudly in the recruiting process than will your resume and interviewing skills."
Different approaches
At one level, all of the vendors in this space do the same thing. "The base technical approach is not dissimilar to that of a public search engine," says RemarkableHire's Rothrock. But their approaches vary, as do the online sites that each crawls. And the tools are evolving on a monthly basis, both in terms of features and the number of sites on the Web that each monitors.
To identify potential candidates who are about to start looking for a new job, Entelo looks for "social insights" ranging from layoff announcements to changes to a person's social profile.
Gild Source's stock in trade lies in its rankings of developers' code stored on open source sites. "We predictively pull data only on developers," says Dr. Vivienne Ming, chief scientist.
Gild Source's service continuously crawls 65 social sites on the Web, including GitHub and Stack Overflow, where developers might hang out, answer questions and contribute code. It pulls in all of the data it finds, processes it, stores the results in a 20-plus gigabyte Mongo database, and assembles the far-flung data into more than eight million individual profiles that include both structured and unstructured data. Users of the service -- companies looking to fill jobs -- can filter results by categories such as location, degree or school, and can link back to code examples.
The results Gild Source offers up take into account how other people in online forums rank each person's expertise as well as Gild Source's evaluation of the code they've written for open source projects. It then issues an overall knowledge score as well as a ranking for specific skills and influence in the open source community.
For developers who don't contribute code to open source projects, Gild Source has developed predictive algorithms using Bayesian analysis. "We are deeply machine learning-driven," says Ming. "We can predict someone's skill level from the surrounding information. It's highly effective."
RemarkableHire uses what it calls "social evidence" that people are knowledgeable in a particular skill by looking, among other things, for recognition by their peers and indications that they've provided the best answers to questions posted online. "We look for signals within the content that someone has expertise in a particular skill," says Rothrock. The company then provides skills proficiency ratings of one to four stars for each subject.
Joy Garlock, manager of professional recruiting at Gannet Digital Division, has been using RemarkableHire for the last few months to find and interview multiple candidates and extended two offers in the first month after signing up. (She declined to talk about the outcome of the offers.)
The candidates "weren't even looking," Garlock says. "This is an opportunity for us to be in their world as opposed to them coming to us."
TalentBin focuses on discovering talent rather than qualifying it, but the company does offer a "level of intensity" score in particular skills (such as the Ruby programming language) that correlate with the prospect's interest level in a given area, says CEO Kazanjy. He hopes to expand beyond TalentBin's core software engineering jobs to positions in engineering and healthcare by mining some 40 different online sources, including social media, vertical communities, online publications such as PubMed, mailing lists and patent databases. "This approach is extensible to any sort of knowledge worker," he argues.
The fledgling businesses have been successful enough to get the attention of at least one online job board. Dice.com, a tech-focused site, recently launched a similar service, called OpenWeb. That tool excels at the complex process of assembling the bits and pieces of data it gathers from across the Web into a master profile for each individual, analyst White says.
Building a profile