Follow Us:

Discover our scoring service.

Geescore business logic

Geescore™ displays an AI based, validated, Jobseeker score, and provides a pdf download proof of the score, for submission to hiring managers.

Latest Developments - our 3rd Gen Deep Transformer model.

A BERT based text-to-text tranformer model is used for scoring a resume against a job description. Here the transformer model is used to extract the context and the relevant words or phrases that are semantically related to the context. These words or phrases will then be compared against the context and related words and phrases identified in the job description and then be scored based on text matching and probability scores for each matching text values. These score a averaged and final average value is given as the score for that particular section.

The contexts, related keywords and phrases are extracted first. Then they are embedded and converted to tensors. The similarity score will be done between two tensors. This will give an accuract score between two paragraphs of text, for example, in our case, it will be Job description vs different sections of a resume.

We will be scoring by extracting work experience, skill set, education, awards & certification and interests sections from the resume and these sections will get scored against the entire job description. The Transformer Model Score is very strict and has to be an absolute match or closer to absolute match to get a decent score. We can confidently say , a high score of 70 and above means theresume is a good fit for the job. A score less than 50 is a bad match and scores between 50 to 70 are a moderate fit for the job.

The Tranformer Model scores will add up to 65% of the total score. Rest of the 35% will be custom scoring for Resume addition, location distance, Steady Job in the last few years and any asset links such as LinkedIn or Girthub links added.

For example, for a Data Scientist job, the contexts would be data, modeling, data preprocessing , prediction, classification, forecasting, optimization, hyper paramter tuning and so on. If a Data Scientists’ resume is scored agains a Data Scientist job for each section, we will get a higher score as compared to an Accountant resume compared against a Data Scientist Job.

Scoring Validation

  1. The parameter scoring is based on expert analysis, feature engineering and feature selection techniques. The higher the score, the higher the probability of a Jobseeker being in the correct work domain and high suitability for the position.
  2. Feature engineering is the process of using domain knowledge of the data to create features (i.e. scoring parameter) that makes machine learning algorithms work.
  3. Feature Selection is the process where we automatically or manually select those features/parameters which contribute most to our prediction variable/output in which we are interested in.
  4. Applying feature engineering and feature selection technique on our datasets, we found approximately 60 scoring parameters, in addition to the extension into more specific custom scoring modules.
  5. Using Machine Learning Algorithms, we find scores for each parameter.

 

Techniques:

 

Feature Engineering Techniques

  1. Imputation – Handle missing data, incoherent data
  2. Outlier Analysis and Handling Outliers
  3. Data preprocessing such as remove punctuations, stop words, etc
  4. Tokenization
  5. Vectorization

Feature Selection Techniques

  1. Univariate or Multivariate Selection
  2. Recursive feature elimination (RFE)
  3. Topic Selection
  4. K-Fold Cross Validation

Machine Learning Algorithm

  1. Latent Dirichlet Allocation (LDA) Algorithm
  2. Deep learning Neural network
  3. Gibbs Sampling / Incremental Variational Inference

Scoring

We are pleased to share the science and business logic of the Geescore™Jobseeker Scoring solution. It is a dynamic hybrid approach to scoring based on constant improvement; lowering bias, increasing objectivity, and score validation. Currently we use a hybrid between LDA for Work experience, skills, education, working domain and our Classic scoring algorithms, with 3 levels of matching science, as well as real-world recruiting parameters, for derived attributes such as job matches, steady job, location, etc.

 

We start with what is commonly called “matching science”. This is a methodology to extract words, phrases and acronyms from both a job posting, as well as a Jobseeker resume, and to compare how this data matches.

Based on the data match and similarity scores, we calculate the probability of two matched words or phrases.

This is done by selecting topics or keywords from Job description and a vector of words from the resume. For each topic, joint probability distribution of relevant keywords will be formulated. The probability values show that for a particular topic, this resume would have a certain joint probability of topics/keywords and set of words from resume. Higher the probability, the prospect of resume being better suited to the Job posting.

 

For example, a person who has worked as a Data Scientist will have a higher probability of having Python programming in his skill set or vice versa. By calculating the sum of products of probability of each intersecting set such as Domain, work experience, skillset and various other parameters, we arrive at an accurate score for a Resume with respect to a Job Description.

 

For many HRTech solutions with filtering and scoring, this is a core function. To improve matching results, many providers use machine learning to train their solution, by better classifying the data. Geescore™’s Jobseeker Scoring solution uses a combination of hands-on human research and classification, alongside machine learning. There are a few other features that makes Geescore™ significantly different.  When we engage with the Jobseeker online and via email, we encourage them to add more career information (ADD), share links to their portfolios and social presence (SHARE), as well as help us fix issues discovered during the scoring process (FIX). This is valuable decision-making data for Hiring Managers. In the near future we will also begin scoring this content. Right now we reward an engaged & interested Jobseeker for adding, sharing and fixing. We give them a small boost in their score. Additional features of the Geescore™Jobseeker Scoring solution are applying a set of 12 + “real-world” recruiting factors that are part of our scoring, such as commuting distance, Jobseeker interest in a job posting, relevant domain expertise, and more. Finally, our view of matching science is that it is just a start to developing custom scoring modules.

We apply both machine learning and human research, to develop custom scoring to help avoid time spent on unsuitable Jobseekers, to improve your talent acquisition efforts. Clients have all kinds of methods and systems to find more success when hiring Jobseekers. Some use personality or EQ testing. Others consider background checking, and most take the time to check Jobseeker references.

 

We provide some recommended remedial Actions to consider, as a result of examining the scoring results.

More about our scoring

The hybrid scoring has two parts – A standard scoring for derived attributes and statistical scoring for different sections in a resume.

Standard scoring involves matching the keywords, acronyms and key phrases, getting the count and derive the scoring. Standard scoring involves extracting keywords from a job description and a resume. Then the keywords are matched, the more relevant matching keywords or key phrases found, the higher the score.

Statistical scoring involves semantic analysis of keywords based on which probability is calculated for different keywords. These keywords are spread across a probability distribution where there are probability for each keywords or phrases occurring in resume and a job description together. For example, the probability of a jobseeker working in accounts as a manager will be high for an Accounts manager job than a jobseeker working as a software development manager. This probability is calculated by doing a semantic search of Bigrams and Trigrams, occurring in a job description and resume, then find the similar topics (keywords and key phrases), where the probability is calculated for each of the topic, then summed up and normalized to give the scoring.

Hybrid Scoring involves both Standard and Statistical scoring spread across different parameters under each scoring system. Statistical scoring is done on resume with parameters which have a lot of content from which topics can extracted and will be used for modelling. Resume sections such as Work experience, Skills, Education, entire resume raw text, relevant experience based on domain, Candidate interests are scored using statistical scoring. Standard scoring are done on derived attributes such as Location, Steady Job, Missing years or Gap, Job domain and other parameters.

After scoring a resume with hybrid scoring, we will have different scores for different parameters. These scores are then normalized to factor between 0 and 1, then multiplied by referential factor (a multiplication factor that is relevant to the total percentage of score the hybrid scoring will constitute in the overall score) and multiply by 100 will give you the final score. For example, 0.8756. Multiply by 100, to get the final hybrid score – 87.56 (ceil to 88). Higher the score, better suited is the jobseeker for the job

How we reduce BIAS

We are using Latent Dirichlet Algorithm (LDA) for Scoring. Here, using Gensim pretrained model, we create bigrams and trigrams from one document (Resume) and is plotted against bigrams and trigrams(topics) of another document (Job description), with probability score calculated when plotted in a direchlet distribution.

 

Right now, bias is handled through Hyperparameter value Beta(β), which represents topic-word density between two documents or texts. The β parameter will specify prior beliefs about word sparsity and uniformity within topics, adjusting for bias that certain topics will favour certain words.

 

High β value will be dynamically set based on number of topics for Good job posts as there will be more topics derived from the job description. If it’s a Bad job posts or irrelevant job and resume, the value of β will be low as the number of topics between the job description and resume will be less, handling the bias, consequently, giving lower score as expected.

 

Right now, the β value has been ascertained through several trial and errors and also num_topic hyperparameter for each section has been ascertained through several trial and error.
There are other parameters such as alpha, adjusting number of topics dynamically for num_topics, etc which can be tuned further

 

Next Steps:
We are planning to use GridSearchCV or other hyperparameter evaluation techniques such as Stochastic Neighbour Embedding in t-distribution, etc to tune and get the oiptimal hyperparameters.

 

Next phase of scoring is to use the current parsing and scoring data to build a Deep learning NN model the gives even more accurate scoring.