A Machine Learning study of Google Analytics Metrics that predict Content Quality for ranking purposes

This study investigated the impact of eleven Google Analytics metrics that within the search engine optimisation (SEO) community are being regularly referred to or considered to impact Google’s algorithms in rating quality content on webpages, and subsequently the ranking of that webpages within their search engine. This was a very important endeavour given that Google often states, and the SEO community confirms that “Content is king”, that is quality and engaging content represents the most important factor in ranking a webpage. Moreover, very little research was undertaken within the SEO community on this subject, and most references to it are observational. Hence, understanding which of these eleven metrics most influence Google’s algorithms and the relationships between them will allow marketers and business owners alike to better distribute and maximize the effect of SEO budgets, and grow the business of their clients as well. In carrying out the study the researcher used four out-of-the-box and presumably codeless-free machine learning solutions offered by Google Cloud, and built three Python custom algorithms for testing the performance of the Google Cloud pre-built solutions vs. the custom-made solutions. This approach was particularly important to test one very important research question: can all marketers regardless of their programming skill leverage point and click machine learning solutions like Google Cloud’s to significantly improve the process of testing their hypotheses?

Firstly, the study found that metrics believed to signal great on-page / on-website engagement had far lower impact on webpage rankings than traditionally thought. Specifically, these metrics are Bounce rate, Pageviews, Time on page, Average Session Duration, Unique visitors, Average pages per session and Average page load. Secondly, metrics related to strength of brand and loyalty were believed to be of secondary importance and a natural consequence of how the metrics above performed. Indeed, the logic was that as users spend more time on the website, they become more loyal and return to the website more often. The study found that on the contrary loyalty and strength of brand drive more engagement with the website. These metrics are Direct website traffic, Returning visitors and Frequency of visits and were ranked as the most important features impacting the webpage ranking within Google. Finally, the study suggests that conversion rates, as set up by webmasters in Google analytics also have a strong impact on the target; this is a somehow controversial finding given the high potential of manipulation of this metric by webmasters, something Google is working hard on eliminating.

With regards to the Research Question, from the four pre-built solutions offered on Google Cloud only one met its promise of point-and-click and codeless interface. This being said, this solution performed significantly better against the Python built models, and as such the researcher concludes that all marketers can leverage this tool to improve their SEO strategies and impact, regardless of their programming skills.


From the very beginning of the profession of search engine optimization in 1997 (Baker, 2017), marketers vied for the commercial advantage of “owning” the first position in Google search results. At the time most common SEO practices included building thousands of links from unreliable sources such as link farms, press releases and directories, stuffing meta titles, meta descriptions and meta keywords, stuffing URL descriptions, and obsessively stuffing keywords into webpages (Singureanu, 2018). It did work for a time, mainly due to three factors. First, disciplines such as machine learning or AI were still in an emerging stage. Second, the capability of computers to store and process data increased exponentially recently as compared with the low processing power at that time. Third, the amount of data available in digital form was very low compared with the amount available now. In brief, Google’s access to data was limited, and the processing power of computers was inadequate. For these reasons, to determine the relevancy of a website Google unwillingly had to rely on ranking signals that were mainly under the control of webmasters. (Singureanu, 2018)

Fast forward to today and the power balance has changed; SEOs have far less control over factors impacting the ranking position of websites within Google. In fact, most ranking signals considered to be essential in the past play very little or no direct role in ranking websites at present. For example, backlinks are still important, however two essential Google algorithm updates, Panda in 2011 and Penguin in 2012 ensured that long gone were the days when SEO specialists submitted mediocre website to hundreds of directories and link farms and got well ranked within Google. (Baker, 2017) Similarly, in The Art of SEO the authors provide the following advice for optimizing images: “A descriptive caption underneath the image is helpful…Make sure the image filename or image src string contains your primary keyword…Always use the image alt attribute” (Enge, Spencer, Stricchiola & Fishkin, 2013, p 415). And all SEO tools point to the lack of alt tags as a technical issue. Some of them like YOAST’s SEO plugin (Yoast, 2019) even go as far as providing information on whether the images on page include your keyword within alt text descriptions. However, as early as 2012 Google’s algorithms have taught themselves to recognize cats within images, without humans ever teaching the algorithms about cats (Chace, 2015).  Other examples include Google correctly classifying a picture of a boy riding a motorbike as “boy riding a motorbike on a dirty road”, and an image of two pizzas on a stove as “two pizzas on a stove”. In fact, the image, text to speech, speech to text and video recognition technology has developed to the point that Google is now offering this technology as a service via its Google Cloud (Google, 2019) To think of an application, recordings of conversations can already be analyzed by topic, voice tonality, and fluency vs. silence (Marr, 2015). Given the improvements of speech to text technologies, recordings can also be downloaded in a text format and further sentiment analysis performed. And technology is already available to recognize faces, behavior, situations, and even words within videos (Marr, 2015).  Many other examples can be provided of how advancement in machine learning make traditional SEO factors like image or video alt text optimization redundant, or at least reduce significantly their weight as ranking factors. Indeed, over the last few years, in addition to several smaller experiments carried out on a daily basis Google has made several major algorithm updates in order to improve the quality of the websites they ranked: June 2019, March 2019, August 2018, April 2018, March 2018, December 2017 to name just a few. And, the renowned digital marketing expert Larry Kim (2019) explains in his “Rankbrain Judgement Day” article that “the way 30 trillion web pages are ranked changed forever on October 26, 2015. That’s when the world became aware of Rankbrain, Google’s machine-learning artificial intelligence system…Google calls Rankbrain ‘the third-most important signal contributing to the result of a search query” (Kim, 2019).  Kim continues and emphasizes that in future SEO signals that are still currently under some human control will be completely dismissed by Rankbrain, who will become the number one ranking factor within Google. This idea was also discussed by Rand Fishkin, the highly awarded SEO leader who argued that in the future Google may leverage algorithmic inputs in search rankings without requiring human intervention (Kim, 2019). In effect, these SEO experts argue that Google is slowly dismissing ranking signals which are under human control.

And yet, SEO specialists continue to deny that traditional SEO is dying. For example, another highly respected SEO leader, Neil Patel argues that “for a new age of machine learning, the AI takeover, and algorithm-driven algorithms…SEO is still alive and kicking”, (Patel, 2019) echoing the arguments of a large portion of the SEO population. Showing support for the idea, Dennis (2019) titles his article: “Is SEO Dead? The Answer is Yes, and No” (Dennis, 2019) What Dennis is actually arguing in his article is that if SEO professionals “stick with outdated strategies the only thing you’ll be seeing are low rankings” (Dennis, 2019) He argues that in fact SEOs should focus on long term value by creating high quality content and promoting that content “like crazy”. In fact, all his advice revolves around that idea of creating great content that people engage with, which will send signals to Google that the content is relevant to the user search query and rewards these websites with higher rankings. Most SEOs also place content as a top-ranking factor.  For example, Wolffberg (2019) places Content Quality as the number one ranking factor and argues that “Content is still king”.  Similarly, in her “Top 7 Ranking Signals: What REALLY Matters in 2019?” article, Crowe (2018) places “Publish High-Quality Content” as the top factor impacting search engine rankings. In a nutshell, the content is king idea is at the heart of every article discussing the topic of SEO. Of course, some other factors are also mentioned within the SEO community as impacting search rankings, however most of them revolve around the idea of great content being made available to the user fast and in the right format depending on the device it is being consumed on. For example, after placing quality content on first place in term of rankings influence on Google, Wolffberg (2019) places content freshness on the second place, backlinks to the content on the third, followed by mobile first and page speed. In a nutshell, Wolffberg argues that to rank in Google, websites need great content that is relevant and up to date (freshness), popular (backlinks) and loads fast on mobile and desktop.

2. Aims and objectives

Given the importance of content as a ranking factor, it is important to understand what metrics are used by Google’s Ranbrain to assess the quality of content. This will inform the strategy of digital marketing specialists in optimising metrics that enhance the quality of their content by Rankbrain’s rules. Understanding these factors will also prepare webmasters and SEOs for a future where great content will represent the only factor or the factor with the highest weight that Google will consider in ranking websites. This will, in turn, support small businesses in competing with larger websites, but will also encourage webmasters in regulated industries i.e. CBD, loans, medicine, etc to continuously enhance the quality of the information provided to their audience, provide useful and factual advice, and really focus on providing the best possible answers to their audience’s questions.

Within literature, several metrics were mentioned as potentially influencing Google’s ranking of quality content, though most mentions of these factors were not researched thoroughly using methodological approaches such as statistics or machine learning models. Hence, the overall aim of this study is to close this gap in research. From this perspective, the metrics that will be studied are:

  • Pageviews
  • Time on page
  • Average Session Duration
  • Bounce rate
  • Average Pages per session
  • Unique visitors
  • Returning Visitors
  • Direct website traffic
  • Conversion rate
  • Frequency of visits
  • Average page load 

From another perspective, following the literature research step the researcher identified several main challenges. Firstly, there is little to no empirical research on the Google Analytics metrics that could influence Google’s perception of quality content. Most of the debates on the subject are hypothetical and various metrics are suggested rather than demonstrated as influencing search engine rankings. Moreover, when an attempt is being made to conduct an SEO experiment this is mainly carried out on an observational basis, one feature is being tested i.e. asking lots of users to click on a specific website URL at the same time, followed by observing the result. However, no further consideration is being given to other factors that could correlate with increased traffic to support improvements in rankings. Another popular method for testing SEO changes on various pages is split testing, where pages are being split in groups, various changes applied to each group and changes in search engine ranking monitored. (Bradford 2018) The researcher argues that these popular tests provide very little information on the relationships between the various features tested, relationships with the outcome (search engine rankings) and so forth. Indeed, many popular tools like the popular tool Optimizely make it easy for marketers to perform split tests, further propagating the idea that split tests provide real insights into the performance of specific groups of pages. An important goal of this paper is to support Digital Marketers in learning a better way of understanding the performance of their pages, using the same technology used by Google in Machine Learning, in a code free environment. Indeed, it is the goal of the researcher to put what used to be complicated and complex machine learning processes in the hands of every digital marketer. If every digital marketer will be able to easily apply machine learning to test their hypotheses, as opposed to the inefficient process of split testing, then the whole profession of digital marketing will greatly improve their understanding of search engine ranking factors. It is why the researcher provided an extended argument within the literature research chapter for data scientists, managers and other non-experienced coders to using Google Cloud ML as a strong alternative to the more programmatic approach through R or Python.

Thus, it is also essential that any marketer can replicate it for their clients on a point and click basis, regardless of their level of programming skills. Specifically, the aim of the study is to on one side demonstrate a more empirical approach to SEO factors, and on the other side provide marketers with easy to use tools to run machine learning experiments of their own.

Hence, the study is also looking to answer the following research question:

12) Can marketers of all backgrounds /without programming experience implement Machine learning models to test their SEO hypotheses by using Google Cloud’s Tables and Pre-built algorithms options?

3. Methodology

In this section I will underline the methods of study, and the reasons behind the approach to the study.

3.1. Literature research

Reviewing existent literature on the subject will be the first step in gaining a better understanding on the level of research in this area.  The research for existing papers will begin on UEL’s library online portal, and a variety of online journals will be reviewed. Furthermore, Google searches will be performed to identify relevant articles or trustworthy sources of information that complement the UEL library results. In fact, utilizing Google Searches as a source of up to date information will most likely represent a primary method of literature research, given the fast pace at which Google has been updating its algorithms over the last years, as mentioned in the introduction section. In this context, it is very likely that many studies that have been carried out earlier that 2018, in some cases earlier than 2019 will be deemed as irrelevant. For example, in 2019 Google has already performed two major algorithm updates, which saw large respected websites like Daily Mail losing a large amount of traffic to their websites (Schwartz, 2019)

The search for relevant online papers on UEL’s library will follow a structured approach: identify long term keywords that best match the topic, search the online library using that keywords, identify relevant articles, read and if deemed relevant print the articles for further review. Should the long keyword strategy not deliver sufficient results, the keyword broadness will be increased, and the search process repeated. The same process will be performed within the Google search engine; however, the search will initially be performed on some of the most respected SEO online magazines. Some examples of these magazine include https://searchengineland.com, https://www.seroundtable.com or https://www.searchenginejournal.com . This approach will ensure that the research is performed on high authority dedicated platforms, where all articles and studies are being written by experts in the field and well scrutinised before and after being published. Subsequently, the search will be broadened to the Google search engine, where the researcher will search for any relevant articles in high authority online publications or newspapers like Daily Mail, New York Times, Forbes and so forth. Finally, several books on the subject of Search Engine Optimisation, Data Science and Predictive Analytics will also be reviewed.

Throughout this step the research will be looking at identifying trends and beliefs within the SEO community, studies performed and results, methodologies and their weaknesses, and further areas proposed for research. This step will inform the researcher’s approach and choices in terms of his own methodology, machine learning models and assessment of results.

3.2 Approach to research: Predictive Modelling Reasons for using this approach

Within the SEO community there are a lot of anecdotical references to factors that impact Google’s ranking of great content. For example, Rand Fishkin carried out several experiments regarding the impact of bounce rate on search rankings and found mixed results (Stetzer, 2016) However, the experiments consisted of groups of people visiting test websites and bouncing/leaving the website at various intervals of time rather than a rigorous machine learning approach using previous data analytics.  By contrast, other providers of SEO services like Fisher (2019)  provide anecdotical arguments, based on no sound research, to  conclude that “a website’s bounce rate is perhaps one of the most undervalued metrics of a successful SEO campaign” and launch into advising marketers on methods to improving their website bounce rates. It is therefore important that a more scientific approach must be employed to assess the impact of these metrics on Google’s Rankbrain perception of quality content. Machine learning models taking Google Analytics, Google Search Console’s and Google Adwords’s metrics as inputs will further the understanding of these factors. Metrics Selected

 The metrics that will be selected as inputs for the various algorithm models will be identified at the literature research stage.  All metrics will be available in the Google Analytics, Google Adwords and Google Search consoles of an account that will be used as a case study. Google analytics reports will be subtracted and the importance of these metrics on average ranking analysed. The analysis is restricted to a period of three months (April-June 2019), and provides an up to date information on Google’s perceptions of the website content quality. The data will be exported for every day of this period.

As mentioned, the data is reported via Google Analytics, Google Search Console and Google Adwords for each of the metrics I listed in the proposal; for example, one metric like bounce rate will be reported in Google Analytics as 68%. Other metrics like average time on site are reported as 0m26s (26 seconds), as an example. Machine learning: Algorithms used

 Singureanu (2018) argued at great length that advancements in machine learning have ben commodifying or are on track to commodifying many disciplines including search engine optimisation, paid advertising and machine learning. His investigations led to further exploration of the Google Cloud services to find that Google is already offering a variety of out of the box models that could handle most types of data, including the process of preparation, feature engineering, choice of optimal model, training, evaluation and hyperparameter tuning. And all this in an easy to use interface, with no coding experience required. One such service, Google AutoML Tables promises “automatically search through Google’s model zoo for structured data to find the best model for your needs, ranging from linear/logistic regression models for simpler datasets to advanced deep, ensemble, and architecture-search methods for larger, more complex ones.” (Google AutoML Table, 2019) Moreover, Google also states that “AutoML Tables automates feature engineering on a wide range of tabular data primitives — such as numbers, classes, strings, timestamps, and lists — and also helps you detect and take care of missing values, outliers, and other common data issues.”. (Google AutoML Table, 2019) The researcher has therefore decided to take a different approach to building the machine models that will answer the one research question of this thesis. Specifically, given that all Google Analytics features that will act as inputs into the algorithm are of a numerical nature, the author will use four different services or pre-built models within Google Cloud, all taking in numerical data. This approach will enable the researcher to further develop his skills, in addition to the R programming approach learned in class and practiced as part of the Machine Learning module assignment. Furthermore, the researcher will use Python to develop two XGboost machine learning models (with and without hyperparameter tuning), and a Keras model. These models will add further understanding of the performance of Google cloud vs. traditional approaches. The researcher preferred Python to R due to his extensive previous experience with this programming language.

The four Google models that will be used are:

– AutoML tables  (Google, AutoML Table, 2019)

–  Linear learner (Google, 2019)

– Wide and Deep. This algorithm “combines a linear that learns and “memorizes” a wide range of rules with a deep neural network that “generalizes” the rules and applies them correctly to similar features in new, unseen data.” (Google 2019)

– XGBoost (eXtreme gradient boosting). This algorithm “enables efficient supervised learning for classification, regression, and ranking tasks. XGBoost training is based on decision tree ensembles, which combine the results of multiple classification and regression models.” (Google, 2019) Why this approach

On a personal level the researcher is confident that out of the box models offered by companies like Google or Amazon will further improve over time, and that gaining a good understanding on leveraging cloud machine learning services provides a competitive advantage to all data driven companies, including digital marketing. The cost of entry will be lower as well, hence efficiencies of scale will be achieved in term of payroll. Moreover, putting machine learning in the hands of a variety of teams that are not trained in programming will enable organisations to leverage these services at all levels within the organization, drive a data led culture and gain a distinctive competitive advantage. Finally, the researcher believes that the availability of pre-built models within Cloud Services are on track to commodifying a large part of the Data Science profession and would like to further investigate this matter. Thus, from a more practical perspective, in spite of a large variety of algorithms available at hand in reality advances in the field means that only a handful are required for most tasks. For example, Hodnett and Willey (2018) point out that GBMS (Gradient Bosting Machines) have proven to be very successful with structured data, while Lantz (366: 2015) confirms that “boosting is thought to be one of the most significant discoveries in machine learning”. Of course, we can see how Google Cloud also confirms these statements by offering pre-trained the XGBoost and Wide and Deep solutions.

In relation to the goal of this project, all metrics that are subject of this research are of a numerical nature, which makes the pre-built Google algorithms ideal for the task. We will be able to assess the performance of simpler models (logistic regression), deep learning (Wide and Deep) and one powerful model based on decision trees ensembles (XGBoost). Moreover, it will be very interesting to observe the findings of the traditionally Python built algorithms vs. Google Cloud Algorithms, and of AutoML Tables against the other models, given Google’s claim that the AutoML Tables service provides state of art machine learning technology, and identifies the best possible model depending on data provided and optimisation goal. Analysis

 Regression models provide the following evaluation metrics in Google Cloud: MAE (mean absolute error), RMSE (root mean square error ), RMSLE (root mean squared logarithmic error), R^2 (coefficient of determination), MAPE (mean absolute percentage error), Precision, Recall, False positive rate, Confusion matrix and feature importance graphics are also made available. (Google, 2019) Hodnett and Wiley (2018) point out that for regression models (which is our case) the most often used evaluation method is RMSE. Chollet and Allaire (2008) also confirm that for balanced sets RMSE is the most commonly used metric. Indeed, Kuhn and Johnson (2016) also prefer RMSE as the main method for assessing the performance of regression models.

For the purpose of our research, the cost of errors is the same for all classes, and the dataset is well balanced hence we will be using RMSE as the main evaluation metric. Potential challenges

 Th main challenge will be learning the Google Cloud Environment in a first instance. This step involves not only the machine learning environment but also modules like Google’s Cloud Storage, Data Pipelines as needed, or Dataproc, the Data Preparation Service. Some cost will be involved as well, as all algorithms are offered on a pay as you go option. This being said there are a variety of online resources on the subject starting Google’s own training videos and documentation, up to affordable course on platforms like Qwiklabs or Coursera. Case Study

 The research will use as a case study one of researcher’s clients. For privacy reasons the name of the client will be kept anonymous, even though the client has given his accord to using his name. Data from this client’s Google Analytics account will be downloaded in csv files as indicated in the methodology and machine learning models will be built accordingly. The client is currently actively investing in optimising many of the SEO metrics analysed and will greatly benefit from both assessing the impact of his digital marketing actions, and from focusing his efforts on the most important features.

4. Literature research

 4.1 Cloud Services for Machine Learning

 4.1.1 Introduction

The Caret package by Max Kuhn acts as unified frame for over 238 algorithms, using a variety of training options from models accepting case weights, Bagging, Bayesian, Boosting to Support Vector Machines, Logistic Regression or Multivariate Adaptive Regression models (Kuhn, 2019). Thus, it comes as no surprise that data science is being perceived as one of the most promising professions of the future, due to the variety of skills required to perform it. Indeed, the well-known consultancy firm McKinsey and Company predicts a large shortage of data science professionals, going as far as forecasting a while ago that “US alone could face a shortage of 140 000 to 190 000 people with deep analytical skills as well as 1.5 million managers and analysists with the know how to use the analysis of big data to make effective decisions” (Manyika, 2011 in Provost & Fawcet, 13 : 2013). Given the large and growing market potential, a variety of companies entered the space to capture a piece of the big data as a service market. Google, Microsoft, IBM and Amazon are some of the most well known companies that have been developing cloud based services. Of course, these companies are aware of the gap in skills that prevented those 1.5 million managers in US alone from adopting technologies like Machine Learning and tackled the challenge by making it significantly easier for people with no programming experience to build machine learning models. For example, Google Cloud now offers a variety of out of the box pre-trained models and APIs from Translation, Video Intelligence to XGBoost and Deep Learning (Google, 2019). But, the number of out of the box algorithms offered by Google Cloud is well below the 238 supported by Caret, which could lead many practitioners to dismissing the power of it. However, it turns out that in reality, given the great advances in the field only few models outperform most if not all the previous models available…our study of  Angus Lift Trucks, a forklift truck company selling and hiring forklifts confirmed our assumptions when we benchmarked  Google’s out of the box algorithms vs. several other models, to predict the metrics leading to people submitting a quote on each of the two pages.  Indeed, we have already seen that Lantz (2015) praised the discovery of boosting techniques as one of the most important advances in the field, or the success of GBMs as indicated by Hodnett and Willey (2018). J J Allaire and Francois Chollet, the creator of the Keras package also pointed out that while Random Forests was the top algorithm used in Kaggle competitions between 2010 until 2014, in 2014 gradient boosting machines took over. Allaire and Chollet emphasize that the gradient boosting technique is “one of the best, if not the best algorithm for dealing with nonperceptual data today. Alongside deep learning, it’s one of the most commonly used techniques in Kaggle competitions”. (Allaire and Chollet, 16: 2018) Finally, Allaire and Chollet (18: 2018) conclude “These are the two techniques you should be most familiar with in order to be successful in applied machine learning today: gradient boosting machines for shallow learning problems, and deep learning for perceptual problems. In technical terms, you’ll need to be familiar with XGBoost and Keras…” Of course, Google Cloud offers the out of the box XGBoost algorithm. Moreover, the Keras package is running on top of Google’s Tensorflow which in effect means that Google’s out of the box Deep Learning offering is at least as powerful as Keras.

4.1.2 The Complexity of Running Machine Learning

One of the largest barrier in adopting Machine Learning at an enterprise level is the lack of skills, as the McKinsey report suggested. Indeed, the traditional process of machine learning does require a high level of skills and knowledge. For example, Talari (2018) mentions programming, statistics, machine learning, linear algebra, calculus, data visualization and data wrangling as some of the many skills required to become a data scientist.  And, a survey of data scientists presented in Forbes found that data preparation accounts for about 80% of the work of data scientists. (Press, 2016) The article was suggestively called “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”. Thus, preparation of data was found to be both unpleasant and most time consuming. (Gill, 2016)

And, some of the tasks associated with data preparation include dealing with missing values, different scales of measurement, high correlation or association between predictors, sparse predictors, predictors following symmetric or skewed distributions for continuous predictors, or balanced or unbalanced for categorical predictors, single value predictors, collinearity, binning techniques, or predictors without a relationship with the target (Kuhn and Johnson, 2016). Yet another data preparation task identified by Hodnett and Willey (2018) is the requirement of converting labels to factors for classification problems,  when using the R programming language, or performing dimensionality reduction on data using techniques like Principal Component Analysis (PCA) in order to tackle what Kelleher, Namee and D’Arcy (2015) refer to as the curse of dimensionality. Instances that have extremely large values for certain features, outliers and irregular cardinality are other data issues that need addressing as part of the data preparation process (Kelleher, Namee and D’Arcy, 2015). And converting data to different types is also something data scientists have to attend to. (Provost and Fawcett, 2013) Finally, Kuhn and Johnson (27: 2016) conclude “data preparation can make or break a model’s predictive ability”.

Another common issue in machine learning is overfitting data, which relates to the model performing better on the training data than on new data. (Hodnett and Willey, 2018) Indeed, most writers on the subject agree that overfitting is one of the main reasons why predictive modelling fail (Provost and Fawcet, 2013; Kelleher, Namee and D’Arcy, 2015; Kuhn and Johnson, 2016; Hodnett and Willey, 2018; Allaire and Chollet, 2018; Lantz, 2015)

Yet another important decision that data scientists need to make is related to their evaluation technique. For example, many practitioners are still making use of the holdout evaluation technique, where data is being split in single training and testing sets. However, Provost and Fawcett (126: 2013) argue “Should we have any confidence in a single estimate of model accuracy? It might have just been a single particularly lucky (or unlucky) choice of training and test data”. And many researchers argued that validation using a single test set can be a poor choice for combating overfitting and that resampling methods like cross-validation should be used. (Kuhn and Johnson, 2016)

Furthermore, splitting the data in training and test data alone may not be sufficient either to prevent overfitting, which is why adding a third validation split is recommended to prevent data leakage and overfitting. (Lantz, 2015; Kelleher, Namee and D’Arcy, 2015) Hence, deciding on an evaluation method needs to consider aspects like whether a hold-out validation set or cross -validation technique is better for the task at hand. (Chollet and Allaire, 2018).

Another critical task that data scientists would take on in the past is feature engineering, which is in itself a process using various techniques to “apply hardcoded (nonlearned) transformations to the data before it goes into the model”. (Kelleher, Namee and D’Arcy, 2015;  Chollet and Allaire, 93: 2018), and removing non-informative or redundant predictors from the model. (Kuhn and Johnson, 2016) Kuhn and Johnson (27: 2016) confirm that “how the predictors are encoded, called feature engineering can have a significant impact on model performance”. Perhaps the most telling example of the perception of feature engineering by data scientists is a training session on Youtube by Data Science Dojo in which the trainer proudly mentioned to his students eight times that feature engineering represented the main process to offer job security to data scientists (Data Science Dojo. 2017)

It is therefore not hard to argue that becoming a data scientist is a very demanding undertaking, given the large amount of skills required, hence the large gap and shortage of skills.

4.1.3 Advantages of XGBoost

As Lantz pointed out, gradient boosting machines and deep learning have been revolutionising machine learning, while Chollet and Allaire confirmed that nowadays being knowledgeable of the XGBoost and Deep Learning algorithms represent the most important skill required by machine learning practitioners. XGBoost is a tree-based machine learning algorithm using a gradient boosting framework… “apply the principle of boosting weak learners using the gradient descent architecture. However, XGBoost improves upon the base GBM framework through systems optimization and algorithmic enhancements” (Morde, 2019) Or, in simpler way XGBoost uses a technique in which many weak classifiers are combined into a strong classifier. (Kuhn and Johnson, 2016)

Indeed, Kuhn and Johnson (2016) show that tree-based algorithms like XGBoost have a variety of advantages over the traditional ML algorithms. For example, tree-based algorithms can handle a variety of data whether sparse, skewed, continuous, categorical etc without the need to pre-process it. Moreover, decision-based trees do not require input specifying the relationship of predictors with the outcome. Furthermore, tree-based models handle missing values and zero-variance predictors (predictor variables with a single unique value) in an effective manner. Tree-based models are also resistant to outliers and have built in feature selection (Kuhn and Johnson, 2016) Similarly, Hodnett and Wiley (2018) point out at the popularity of Gradient Boosting Machines due to their ability to avoid overfitting effectively. Finally, Chollet and Allaire (2018) explain that before deep learning and gradient boosting machines feature engineering was a critical task of data scientists “because classical shallow algorithms didn’t have hypothesis spaces rich enough to learn useful features by themselves…fortunately, modern deep learning and gradient boosting machines remove the need for feature engineering…are capable of automatically extracting useful features from raw data” (Chollet and Allaire, 94 : 2008)

In conclusion, while “the way you presented the data to the algorithm was essential to its success” (Chollet and Allaire, 94 : 2008) modern algorithms like XGBoost do not require any pre-processing of the data, no feature engineering, and are robust to predictor noise. (Kuhn and Johnson, 550 : 2016) The only downside suggested by Kuhn and Johnson (2016) and Lantz (2015) is the high computational power required by the algorithms. However, Hodnet and Wiley (2018) point out that making use of Google’s Cloud ML Engine to run machine learning algorithms jobs to the cloud have the advantage of both significantly increasing the computation time and reducing costs in the pay as you option. In a nutshell, the superiority and ease of implementation of a small number of algorithms over the multitude of the more traditional algorithms explains both Google Cloud and AWS’s decisions to supplement their offering with these specific best in class pre-built algorithms, in addition to machine learning solutions capable of analysing data presented and choosing automatically the best algorithms to tackle that data in the cloud.

4.1.4 Pre-built Algorithms on Google Cloud

Google’s AI platform (former Google Cloud ML Engine) enables the training of machine learning models, hosting of the trained model in the cloud and use of the new model for prediction on new data. In their introduction to the built-in algorithms Google emphasizes that “with built-in algorithms on AI Platform, you can run training jobs on your data without writing any code for a training application. You can submit your data, select an algorithm, and allow AI platform to handle the pre-processing and training for you. After that, it’s easy to deploy your model and get predictions on AI platform” (Google, 2019) Furthermore, Google’s pre-built algorithms allow for easy hypermeter tuning: the user selects a goal metric like maximizing accuracy or minimizing the training loss, or tune specific hyperparameters / set ranges for their values (Google 2019). Google Cloud offers three built in algorithms: a linear learner used for logistic regression, binary classification and multiclass classification, a wide and deep algorithm for large scale classification and regression tasks such as recommendation systems, search or ranking problems, and XGBoost for efficient supervised learning on classification, regression and ranking tasks. (Google, 2019)

In terms of Data Pre-processing Google Cloud AI Platform only requires that the data uploaded is in a csv format, the header row is removed and the target column is the first column. In terms of pre-processing , for the Linear learner and Wide and Deep algorithms, in addition to removing the header row and placing the target column as the first column, users must convert integers into floats when the intention is for the data to be perceived as numerical i.e. {101.0, 102.0, 103.0}. This process is undertaken by appending strings before each values i.e. {code_101, code_102, code_103} (Google 2019). The AI Platform then proceeds analysing the data. For example, the data type is detected automatically, the way the data transformation should be treated is being identified, and some statistics are being computed for the data. Next, the AI Platform proceeds with Transformation of the Data by splitting the training dataset into training, validation and test sets, removing rows that have in excess of 10% of the features missing and fills up missing values using the mean for numerical columns and zeroes for categorical colums (XGBoost) (Google 2019)

Example of transformation, where rows with 10% of missing values are removed and the row has 10 values:

Finally, small differences exist in the transformation process depending on the built-in algorithm used, however these transformations are all performed automatically. The Linear Learner algorithm

Google makes extremely simple the use of the linear learner algorithm. Following the pre-processing step described already “the AI platform processes the mix of categorical and numerical data into an all numerical dataset in order to prepare it for training” (Google, 2019). Then, using the dataset the parameters supplied Google runs the training.

The process of training and predicting is relatively simple for all three built in algorithms, with a difference in the model specific hyperparameters: The Wide and Deep algorithm

The very same approach is being employed by Google in training both the Linear Learner and the Wide and Deep algorithm.

The process of uploading data and deploying is identical with the linear model, main differences are in hyperparameters: XGBoost

XGBoost works on numerical tabular data, therefore it is ideal for the task we have at hand. The very same approach is being employed by Google in training both the Linear Learner and the Wide and Deep algorithm.

The process of uploading data and deploying is identical with the linear model, main differences are in hyperparameters: AutoML Tables

Another Machine Learning service provided by Google, “AutoML Tables enables your entire team of data scientists, analysts, and developers to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale” (Google, 2019). In terms of pre-processing AutoML tables automates feature engineering and helps the user detect and fix issues related to missing values, outliers and other common data issues. The interface is codeless, and AutoML produces state of art models by automatically matching the best algorithms available from Google to the data at hand. Easy to deploy, it also saves time by reducing “the time needed to go from data to top-quality, production ready machine learning models from months to just a few days” (Google, 2019)

In a first instance, AutoML allows users to upload their data and provides information about missing data, correlation, cardinality and distribution of the features. Next, AutoMl performs automatic feature engineering tasks like standardization and normalization, creating one-hot encoding and embeddings for categorical features, basic processing for text features and extracting date and time related features from Timestamp columns. At the next step, AutoML starts training of the model using several algorithms at the same time, allowing therefore for comparison of the models performance and identification of the best model. The algorithms included within AutoML’s architecture are : Linear, Feedforward deep neural network, Gradient Boosting Decision Tree, AdaNet and Ensembles of various model architectures (Google. 2019) AutoML provides a lot of flexibility in splitting the data %, and weighting some rows more heavily than others.

The training process is simple, truly a point and click process:

4.2 Google Analytics Metrics (features)

4.2.1 Introduction

Given the importance of great content many practitioners have been hypothesizing about the Google Analytics metrics that best enhance Google’s perception of what great content is. Enge, Spencer, Stricchiola and Fishkin (2013) explain that the method used by Search Engines to evaluate quality content is by measuring user interaction. They present Bounce Rate, Time on Site and page views per visitor as the main Google Analytics metrics measuring quality of content. Neil Patel (2019) also agrees with Bounce rate, Average Time on Site and Page Views metrics but proposes that New Customers, Returning visitors, Page Depth and Frequency of Visits should also be included as measures of quality content. Finally, Osman (2019) best sums up all the Google Analytics metrics that may represent measures of quality content for ranking purposes:

  • Bounce rate
  • Time on page
  • Average Session Duration
  • Page Views
  • Average Pages per session
  • Unique visitors
  • Returning Visitors
  • Direct website traffic
  • Conversion rate
  • Frequency of visits
  • Average page load time

4.2.2 Bounce rate

Enge, Spencer, Stricchiola and Fishkin (50: 2013) define bounce rate as “The percentage of visitors who visit only one page on your website”. As an example of poor content quality, they suggest that a large number of users clicking on a search result just to return back immediately to the search results and click on another link. The explanation behind this example is that users bouncing back quickly from the website equates with the website not being relevant to their search query, hence Google will award the website lower search engine rankings. However, other practitioners argue that a high bounce rate is not necessarily a bad thing as it may mean that the user received the information he or she needed and left the website happy. For example, Stetzer (2016) argues that “Having a high bounce rate on something like a ‘contact us’ page can actually be a good thing. That’s more of a call-to-action site, where the goal of that particular page is to have the user find the contact information, and then actually contact the business. The visitor got what they came for and then left. Extra navigation around the website doesn’t really mean anything in this case.”  From another perspective (Haws, 2013) also points out to a statement from Matt Cutt, Google’s liaison with the SEO community, who stated that Google is not using Bouce Rate as a ranking factor at all.  Thus, a more scientific approach to determining the impact of Bounce rate on rankings is required.

4.2.3 Time on page

Enge, Spencer, Stricchiola and Fishkin (50: 2013) define time on site as “ The Average amount of time spent by users on the site”. Osman (2019) continues “this metric provides an indication of interest”. Osman argues that when a user spent only 10 seconds on a 2100 long page “you can be very sure that they weren’t interested in the content”. However, Haws (2013) points out that “Google has never stated whether or not they use Visitor time (or time on site) as a ranking factor”. Hence, further investigation is needed to understand better the impact of this metric.

4.2.4 Average Session Duration

Osman (2019) defines average session duration as a metric tracking the average duration of all the activity of a visitor on a particular website. She recommends that users should analyse their average session duration metric and determine what has made these users to exit the page prematurely. In effect, just as it is the case with Bounce rate, Osman assumes that higher average session duration equates with users being more engaged with the content hence. Of course, the same argument presented in the section on bounce rate stands: what if users are consistently finding the answer to their question, which removes the need for them to stick around on the website?

4.2.5 Pageviews

Pageviews is defined by Osman (2019) as “the most basic of all user engagement metrics, measuring an instance of a user visiting a particular page on your website…a high number can be assumed to be an indicator of interest and /or good SEO practices”. In effect, Osman argues that the more pages are being viewed by users the better the quality of the website. However, the opinions are divided on the matter. For example, in a forum conversation on one of the most prestigious SEO websites -moz.com- titled “Do Page Views Matter? (ranking factor?” Rand Fishkin, the renowned SEO thinker argues that “page views in and of themselves are almost certainly not a raw ranking factor, but it could well be that engagement metrics  that correlate well with page views do have a direct or indirect positive impact on rankings” (Fishkin, 2015) Again, it pays to notice that both views are argued on a hypothetical basis, with no real research into the matter; it is why this research will further the knowledge on the subject.

4.2.6 Average pages per session

The metric is defined by Enge, Spencer, Stricchiola and Fishkin (50: 2013) as “the average number of pages viewed per visitor on your site”. While the authors present Average pages per session as a ranking factor, no research is presented to confirm this assertion. De Vivo (2019) shows that the average pages per session for most industries is 2 and that anything more than that is a positive signal of engagement. De Vivo explains by pointing out that users visiting more pages is a sign that they are engaged with the content and that the website is useful to the users. Of course, as we have previously seen this may not always be the case, as the example provided in the Bounce Rate section suggested. Users visiting more pages could equally be a signal that the structure of the website needs improving, users cannot easily find what they are looking for or the pages are irrelevant to their queries. In reality, it is likely that the impact of the average pages per session may occur in correlation with other factors like time on site. (De Vivo, 2019)


4.2.7 Unique visitors

The metric is referred to by Osman (2019) to define a user that visited a website at least once and measures how many individual users the website reached. In effect, the metric assumes that the more visitors the web site is the more popular its content is, and the more Goggle will reward the website with higher ranking. A study conducted by Gavrilas (2015) who ran a paid search campaign on Reddit and increased traffic to his website by 20 000 visitors found that his website’s search engine rankings also improved, even for very competitive keywords. This finding is in line with the findings of the experiment conducted by Rand Fishkin in the middle of a conference when he asked attendants to take their phones out and start interacting with a particular website page (Gavrilas , 2015). This indicates that Unique visitors does have an impact on Google rankings though the impact started to wear off shortly after the campaign has finished. The author proposed that the increased number of visitors to the website combined with the bounce rate metric performance may have impacted the result, though no more empirical research has been conducted further. It will therefore be interesting to assess the impact of this metric both as a standalone metric, collinearity and impact overall.

4.2.8 Returning Visitors

Osman (2019) refers to new website visitors as users who have accessed a website for the first time on a specific device, while returning visitors are users that have already visited a website within a timeframe of two years. Osman argues that a high returning visitors metric could be a sign of loyal followers while “the opposite situation demonstrates that you have some work to do to get people to come back again”. (Osman, 2019) Sutter (2018) also believes that this metric is a ranking signal, though he is taking a more holistic approach and suggests that the impact of Returning Visitors can only be viewed in relation to other metrics of engagement. However, one common trend within the conversation is that once more the suggestions offered by various experts don’t seem to be backed up with empirical research on the matter.

4.2.9 Direct website traffic

Bennet (2019) explains that direct traffic as measured in Google Analytics reports “a traffic source when it has no data on how the session arrived at your website, or when the referring source has been configured to be ignored”. A 2017 study carried out by SEMRush, the top provider of SEO tools concluded that from all engagement metrics direct websites visits carry the largest value for ranking purposes. (Semrush, 2017) However, within the SEO industry there is still a lot of controversy regarding this finding. For example, in a forum conversation on Quora, Johnson (2018) quotes the SEMrush study to conclude that direct traffic may have an impact on search engine rankings as it signals brand recognition. By contrast, Hedgepeth (2019) points out that in fact Direct Traffic is in fact traffic with no value given the unreliable sources it comes from and concludes that “direct traffic, to some extent, is something we must live with (at least for the time being)”. Finally, Bennet (2017) agrees and disagrees, concluding in the end that not all direct traffic is bad.

4.2.10 Conversion rate

Osman (2019) refers at the conversion rate as the percentage of website visitors that complete desired actions on the website such as purchases, subscriptions etc. Conversions can be configured in Google Analytics and within its paid services platforms (Google ads), hence the assumption made by Osman is that as Google can track conversions the algorithm will also take conversions as an input. However, again the conversation is hypothetical if we are to consider that the 2017 SEMrush study of ranking factors has not included conversions within its top seventeen factors of impact. The researcher found very little to no research on the subject, hence this study is intended to open the door for further research on the impact of conversions as a ranking factor.

4.2.11 Frequency of visits

Virgillito (2016) refers to frequency of visits as metric measured in Google Analytics for loyalty of returning visitors, that is the number of times the same visitors return to the website. Virgillito suggests that the more often visitors return to the website, the more loyal they are, thus indicating that the website is of a great quality thus worthy of being ranked better by Google. As such, the researcher will use as an input into the machine learning algorithms three features representing the number of people who visited the website twice, three and four times. However, once again this claim is not supported by empirical research and very little research has been carried out on revealing the impact of this metric as a search engine ranking factor.


4.2.12 Average page load time

Kusinitz (2019) refers to page speed as a measure of how fast a webpage loads when users click on the search result to their query within Google. He shows that page speed has an essential impact on user experience and as such it is a critical factor on ranking. Indeed, Barysevich (2019) refers to several updates by Google to conclude that page speed is a very important factor for user experience.  However, in spite of all the assertion on the import ace of page speed as a ranking factor the findings of the 2017 SEMRush study on top seventeen ranking factors is surprising, as page speed has not been found to be a top-ranking signal. The metric that best measures the page speed metric within Google Analytics is Average Page Load time.


5.1 Google Cloud Tables model

Below we can see the Evaluation metrics following the training of the model, and the feature importance graph.

5.1.1 Bounce rate

The importance score for bounce rate is 1.545, and it is the lowest ranked feature.

5.1.2 Time on page

The importance score for this feature is 3.553, and it is the 4th lowest ranked feature.

5.1.3 Average Session Duration

The importance score for this feature is 2.846, and it is the 3rd lowest ranked feature.

5.1.4 Pageviews

The importance score for this feature is 4.287, and it is ranked as the 7th most important metric.

5.1.5 Average pages per session

The importance score for this feature is 3.57, and it is the 5th lowest ranked feature.

5.1.6 Unique visitors

The importance score for this feature is 4.256, and it is ranked as the 8th most important metric.

5.1.7 Returning visitors

The importance score for this feature is 6.68, and it is ranked as the 5th most important metric.

5.1.8 Direct website traffic

This is the highest ranked feature by far, with an importance score of 31.61.

5.1.9 Conversion rate

The importance score for this feature is 5.879, and it is ranked as the 6th most important metric.

5.1.10 Frequency of visits

The importance score for the Frequency of Visits is 16.363 for three visits, 8.406 for two visits and 8.282 for 4 visits. In this order, this feature is ranked on positions 2, 3 and 4.

5.1.11 Average page load time

The importance score for this feature is 2.742, and it is the 2nd lowest ranked feature.

5.2 Google Cloud XGboost model

Despite Google’s promise of codeless machine learning process when running this algorithm, the researcher struggled to run it on a point and click basis, even when purchasing the full Google Cloud support package. The researcher succeeded in deploying the model but was unable to use it for prediction purposes due to what it seemed to be a complicated coding requirement, which contradicted the purpose of the experiment / study. Moreover, Google’s own Cloud support was less than supportive, hard to respond and seemed to lack understanding of this particular product themselves. Hence, the researcher was not able to train the model.


5.3 Google Cloud Linear model

As with the case of the Google Cloud XGboost model, the researcher was not able to train the linear model on a codeless basis, hence the training of the model did not occur.


5.4 Google Cloud Wide& Deep model

As with the case of the Google Cloud XGboost model, the researcher was not able to train the Wide and Deep model on a codeless basis, hence the training of the model did not occur.


5.5 Python XGboost model

Below we can see the Evaluation metrics following the training of the model, and the feature importance graph. For the standard XGboost model, RMSE has a value of 3.238639. In line with the findings of Kuhn and Johnson (2016) the optimised versions of the XGboost model do not generally bring significant improvements in the model. In our case, the optimised model saw a slightly improved RMSE value of 3.145879. It is beyond the purpose of this paper to fully explain the code reaching this results, however the code and the commentary for the whole exercise is included in in the appendices section of the study.

5.5.1 Bounce rate

This feature has the 4th highest importance on the target prediction.

5.5.2 Time on page

This feature is ranked as 9th most important feature.

5.5.3 Average Session Duration

This feature is being ranked as the 3rd most important feature.

5.5.4 Pageviews

This feature is ranked as the 8th most important feature on the target.

5.5.5 Average pages per session

This feature is being ranked as the third lowest ranked feature.

5.5.6 Unique visitors

This feature is being ranked by far as the most important feature on the target.

5.5.7 Returning visitors

This feature is being ranked as the forth lowest ranked feature.

5.5.8 Direct website traffic

This feature is being ranked as second most important feature impacting the target.

5.5.9 Conversion rate

This feature is being ranked as 5th most important feature.

5.5.10 Frequency of visits

Frequency of visits to a website is being ranked as the 7th most important feature for two visits, and as the lowest impact features for three and four visits.

5.5.11 Average page load time

This feature is being ranked as the 6th most important feature.


5.6 Python Keras model

Below we can see the Evaluation metrics following the training of the model, and the feature importance graph. The RMSE value of the model is 9.664. The code and commentary for the whole exercise is included in the paper in the appendices section.

5.6.1 Bounce rate

This feature is being ranked as the 5th most important feature.

5.6.2 Time on page

This feature is being ranked as the 4th lowest feature.

5.6.3 Average Session Duration

This feature is being ranked as the 7th most important feature.

5.6.4 Pageviews

This feature is being ranked as the 8th most important feature.

5.6.5 Average pages per session

This feature is being ranked as the 2nd lowest feature.

5.6.6 Unique visitors

This feature is being ranked as the most important feature

5.6.7 Returning visitors

This feature is being ranked as the 9th most important feature.

5.6.8 Direct website traffic

This feature is being ranked as the 2nd most important feature.

5.6.9 Conversion rate

This feature is being ranked as the 3rd most important feature.

5.6.10 Frequency of visits

Frequency of visits to a website is being ranked as the 4th most important feature for two visits, 6th for two visits and last for 4 visits.

5.6.11 Average page load time

This feature is being ranked as the 4th lowest feature.


6. Analysis


6.1 Introduction

Of the four models trained, the Google Tables models has reached its promise of a codeless interface and easy to run, and is by far the most accurate, achieving an RMSE of 1.576 after only one hour of training. This is impressive given that the tune d XGboost model achieved an RMSE of 3.145, the regular XGboost model achieved an RMSE of only 3.238 while the Keras model achieved an RMSE of 9.664. As the researcher was not able to complete the Google Cloud pre-built algorithms exercises, no RMSE values were obtain for the Google Cloud XGboost, Linear and Wide & Deep models. Hence, the researcher will use the Google Cloud Tables model to assess the impact of the engagement metrics taken as features on the target (Google Ranking).

6.2 Can marketers of all backgrounds /without programming experience implement Machine learning models to test their SEO hypotheses by using Google Cloud’s Tables and Pre-built algorithms options?

On one side, the superior performance of the Google Cloud Tables indicates that indeed Google Cloud has put state of art machine learning technology in the hands of all marketers. The researcher concludes that all marketers can use Google Cloud Tables to replicate the experiment, and many other SEO experiments to improve their SEO strategies. On another side, Google’s choice of a simple regression model for this task from its “zoo’ of algorithms (which included deep learning and Boosting Algorithms) seems to contradict Allaire and Chollet’s (2018) statement that modern data scientists have to only be familiar with XGBoost models and Deep Learning (Keras) techniques. From another perspective, the ease of use of the Google Cloud Tables model seems to achieve the task of democratizing machine learning and indeed eliminating the requirements of deep understanding of traditional machine learning models. Finally, the researcher was disappointed with the platform experience and the low quality of Google Cloud support services in getting the XGBoost, Linear and Wide & Deep models. For this reason, the researcher was unable to compare the performance of the Tables model against the other three Google Cloud pre-built models. As such, the author feels that only after running the state of art pre-built Google Cloud pre-built solutions he could confirm or infirm fully Allaire and Chollet’s statement regarding the performance of the XGboost and Deep Learning algorithms against traditional machine learning algorithms. A recommendation for further research was made in the dedicated section. This being said, Google Tables’s automatic choice of a linear algorithm over more advanced XGboost and deep learning models gives the researcher confidence that in spite of not being able to run the XGboost and Wide & Deep built in algorithms, the study delivered the best possible RMSE value for the data presented.

6.3 Bounce rate

The low ranking of this feature confirms Google’s Matt Cut statement that Google is not using bounce rate as an SEO ranking factor. The assertion made by many SEOs that high bounce rate could in fact be a good thing as users found what they were looking for in one click seems to stand. In this light, the large percentage of SEO practitioners who advocate reducing bounce rate as an essential task of ranking efforts can now review their SEO strategies and re-direct their efforts and resources towards more impactful actions.

6.4 Time on page

The ranking of this feature as the 4th lowest importance on target contradicts Osman’s 2019 assessment that when webpages have low time on page scores “you can be very sure that they weren’t interested in the content”. Moreover, this finding makes light into the problem described by Haws (2013) regarding Google not ever mentioning time on page as a ranking metric. This also seems to confirm the findings on the impact of bounce rate on rankings: users not spending lots of time on a webpage does not equate to low quality content, on the contrary it could mean they found what they needed and left the website happy. This finding is very important in how SEOs understand quality of content and indicates once more that SEOs must focus on answering the question of the user rather than long articles (which was standard advice within the SEO industry over the last couple of years).

6.5 Average Session Duration

Just as in the case of the time on page metric, the ranking of this feature as the 3rd lowest importance on target contradicts Osman’s encouragement to review this metric on a regular basis as a matter of urgency. The same conclusions and advice presented on the Time on Page section apply lower session duration can also be an indication that users found answers to their questions and left the website happy. Thus, SEOs must focus on user experience and understand what users really want as an answer to their search querry; writing comprehensive articles on specific topics may not always be indicated regardless of the popular SEO advice, and may in fact increase users dissatisfaction by making it more difficult to find the answer to their question.

6.6 Pageviews

While Osman referred to this metric as an indicator of user interest and good SEO practices, it positioning as the 7th ranking factor with a weighting of only 4.287 contradicts her statement. In this light, the researcher agrees with Rand Fishkin’s assertion that “page views in and of themselves are almost certainly not a raw ranking factor”. However, this study also makes light into Rand Fishkin’s statement that “it could well be that engagement metrics that correlate well with page views do have a direct or indirect positive impact on rankings”. Specifically, the Python models found no significant internal correlation with regards to page views, which further confirms that page views do not have a significant impact on Google ranking. This finding makes intuitive sense given that Google is ranking pages rather than websites. For example, a search query for “Ferrari” will return a page that best answers this question; it seems obvious that users visiting other Ferrari unrelated pages on the same website i.e. Rolex, Clothes etc will do nothing to improving user’s experience of the Ferrari page and hence will bear no value to the ranking of the Ferrari page. Moreover, as we discovered that metrics like Bounce rate have very low to impact on ranking websites, we can conclude that a lower number of page views may not necessarily equate with lack of engagement; on the contrary the user may in fact be satisfied with the answer and find no need wonder around the website.

6.7 Average pages per session

This is another factor with a low importance at only 3.57, and its ranking as the 5th lowest impact feature contradicts Enge, Spencer, Stricchiola and Fishkin’s reference to this metric as a strong ranking factor. Just as we have seen in the section on page views visiting more pages of the same website will neither enhance the value of one particular page, nor it would be an indication of better engagement vs. websites with lower average pages per session metrics. Furthermore, no internal correlation was found between  average pages per session and other metrics, which contradicts De Vivo’s suggestion that this metric may play an important part when taken in correlation with other factors like time on site.

6.8 Unique visitors

This is another factor with a low importance at only 4.256, and its ranking as only the 8th most important feature contradicts the findings of the study by Gavrilas, and Rand Fishkin’s live experiment results. Moreover, no internal correlation was found between this metric and other metrics like bounce rate as suggested by Gavrilas. This finding was very surprising given the highly held belief within the SEO community that more new visitors equate to high interest and popularity of the webpage, hence Google would reward the page with higher rankings. Of course, Fishkin’s live experiment was ran in 2015, and since Google’s machine learning algorithms improved and presumably dismissed or reduced the importance of this factor. Again, intuitively this finding makes sense: a large number of unique visitors reaching the website may not always be an indicator of good quality website. For example, a company may run a Reddit campaign as Gavrilas did in 2015, and drive users to a low-quality article; users would quickly leave the website. However, the lack of correlation between this metric and metrics like bounce rate or time on site negates this theory as well, which leaves us with the conclusion that unique visitors does not represent a significant ranking factor.

6.9 Direct website traffic

The ranking of this feature as the highest ranked feature, with a score of 31.61 confirms the findings of the SEMRush study regarding the high impact of this metric. This finding makes light on an important issue: while unique visitors to the website do not seem to have a strong impact on website rankings due to its potential of being “bought” via advertising campaigns, direct traffic is an indication of loyalty and brand power / loyalty as users type the name of the website directly into their browser. Again, this finding is intuitive given the well-known Google preference for brands in search results. For example, one of the top digital marketers Brian Dean suggestively names his SearchEngineJournal article “Love it or Hate It, There’s No Doubt That Google Prefers Brands”. The celebrated digital marketing speaker Aron Wall also concludes “Google Loves Brands” and demonstrates his conclusion with various research, while also referring to direct statements by Eric Schmidt and Matt Cutts, at the time CEO and Head of Spam respectively: “Brands are the solution not the problem…brands are how you sort out the cesspool” (Eric Schmidt in Wall, 2016), and “We actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side” (Matt Cutts in Wall, 2016). In conclusion, it makes sense that a strong branding feature as direct traffic will bear such a high importance as a ranking signal.

6.10 Returning visitors

Given its importance score of 6.68, and its ranking as the 5th most important metric the study confirms Osman’s statement that a high number of return visitors represents a sign of loyalty. No internal correlation was found between this metric and other, which rejects Sutter’s suggestion that this metric is impactful only in relation to other metrics. Just as with the case of Direct Traffic, the Returning Visitors seems to indicate a high level of loyalty and brand strength, and as Google Loves Brands it certainly seems to love this metric too.

6.11 Conversion rate

The positioning of this metric on the 6th place as most important metric, with a weighting of 5.879, is somehow surprising for several reasons. Firstly, the highly regarded SEMRush study has not found conversion as influencing website ranking. Secondly, creative SEOs could manipulate this metric by creating easily achievable goals within Google Analytics. This is counterintuitive to Google’s obsession of reducing or eliminating ranking factors that can be manipulated with human input. While running the algorithms, no significant correlation was found internally either – this could have explained the reason for the high importance of this metric. This study found that conversion rate has a strong impact on website ranking. However, the researcher believes that more research is needed to investigate this finding and its potential correlation with other factors not studied as part of this research. A recommendation for further research will be made by the researcher in the dedicated chapter.

6.12 Frequency of visits

This metric was found to have a strong impact on the target, given its ranking on the 2nd, 3rd and 4th positions. The findings confirm Virgilito’s statement that the more often users returned to the website the more the website’s quality is being rewarded by Google with higher rankings. And, as we have seen in the Direct Traffic and Returning visitors sections the more loyal the visitors are the stronger the brand is perceived to be, and the stronger the brand is the more Google “loves” the brand. The Python models also identified a high internal correlation of the three frequency features with the Returning Visitors metric, and on removing these features the model has slightly improved though not by much.

6.13 Average page load time

Given its ranking as the 2nd lowest impact on target at 2.742 the study concludes that this metric has a very low impact on Google’s ranking of websites. The finding is in line with the algorithmic study carried out by SEMRush and highly surprising given that pagespeed is widely acknowledged within the SEO community as a highly important factor affecting user experience and SEO rankings. The finding of this study and of SEMRush’s study re-emphasizes once more the importance of empirical research using advanced machine learning tools to test SEO hypothesis; nowhere is truer than in regard to this metric, whose importance over the years was emphasized over and over again.


7.1 Introduction

This chapter comes as a natural conclusion of the Findings and Analysis chapters. Instead of each subject being discussed individually as it was the case with these chapters, an overall conclusion on the impact of all features will be drawn; the researcher found this approach more appropriate given that each factor was discussed at length within the Analysis chapter. Based on this analysis, the chapter will end with some recommendations for the client and general implications of the study.

7.2 Discussion

The findings of the analysis are showing a very clear trend: building a digital brand by increasing loyalty to the brand website and return visitors pays off. The high rankings of metrics like Direct Website Traffic, Frequency of Visitors, Return visitors confirms that Google algorithms reward high loyalty to a page and website. By contrast, many myths are being debunked. Specifically, the popular belief that metrics like Bounce rate, Time on page, Average session duration, Pageviews or Average pages per session are neither representative of higher or lower engagement with a webpage or website, nor of a better-quality website. In fact, lower values of any or all of these metrics may as well represent a metric of relevancy: users found what they were looking for and left the website happy and engaged. Another metric that is often thought to impact website rankings is the number of unique visitors to the website, however as the study found this metric is not a measure of loyalty or quality of the website. The researcher speculates that this is due to the fact that in this case new users can be purchased via advertising campaigns rather than earned through relevancy to the search query or loyalty.

Conversion rate is another metric that was found to have a high impact on website rankings. The finding is surprising, and it suggests that as users convert to being clients Google considers that they have found what they were looking for. The purchase or conversion would represent the highest level of engagement of users with a brand. 

Finally, the low impact of Pagespeed on website ranking goes against the popular SEO belief that the faster the website the more engaged the users are, and the less likely to leave the website. The researcher speculates that given the high prevalence of loyalty and strength of a brand as found by the study, users are more likely to accept lower page speed scores to reach a website they value and are loyal to.


7.3. Recommendations to the client

The client has been changing his SEO strategy over the past 12 months as advised by the digital marketing agency that was supporting their digital growth. Specifically, he increased the number of words of each article from its usual 800 words to an average of 1400 words in order to impact positively metrics like time on page or average session duration. He has also re-worked the internal linking of the website by adding more links to other pages/articles from and to the pages and articles visited by the users; this was looking at improving metrics like bounce rate, page views, average pages per session and average session duration. Similarly, the client started to add long FAQs sections to the actual pages in response to the same advice that more content will improve these metrics. Furthermore, the client has put a significant amount of resources both agency time and purchasing new hosting for the website, separate hosting for the database and so forth; this action was meant to improve the average page load metric significantly, in line with the popular SEO advice. Finally, the client embarked on a spending spree in running Facebook and Twitter ads to increase the number of unique visitors to the website, with no effort put into segmentation, relevant audiences or any other data-based decision making processes.

In spite of all these actions, the client continues to lose SEO traffic and market share vs. the previous period. We now know that the client has been focusing on the wrong metrics, or how the famous and universally applicable Pareto principle states, he has focused on the 80% of the actions that delivered 20% of the results.  However, as this study found, the client is better off focusing 80% of his time on building brand loyalty and power, and loyal visitors. This will impact positively metrics like direct traffic, returning guests, frequency of visits and conversions. It is beyond of the scope of this study to provide advice on specific digital marketing strategies or actions, but the client must reconsider the goals of his digital marketing campaign, and the targets set to the digital marketing agency they employ. For example, at the moment the client is targeting the agency on increased visits to the website, even though we have seen that the real challenge is in transforming that unique visitors in returning / loyal website visitors. Similarly, the client is targeting the agency on improvement of metrics like bounce rate, average session duration or average pages per session, which as we have seen are to wrong metrics, a problem discussed at length by Eric Ries in his 2011 “The Lean Start-up: How Constant Innovation Creates Radically Successful Businesses” book. The client should also re-distribute some of the large budget he allocated to improving what seemed to be an acceptable page speed score towards improving the metrics we discussed. Finally, improvements in SEO rankings and SEO traffic are more relevant targets than the total number of visitors which can be skewed by irrelevant traffic from social media campaigns.


7.4 Implications of the study

The study confirmed some of the advice provided by SEO experts to improving average Google ranking of websites. On the other side, the study has also found that many of the factors believed to have a high importance on website ranking do not. As such, the study emphasizes the importance of a more empirical framework to supplement the current testing techniques within the SEO industry. The ease of use of Google Tables Machine Learning algorithms removes any barrier for marketers to further testing their assumptions on client websites. This will ensure that popular advice is supported by empirical research, and that both SEO practitioners and clients will benefit from the newly found insights, and from a more rigorous process of testing assumptions. Specifically, the study found that SEO practitioners are both misusing the client allocated agency time and advertising budget and are not reaching the full growth potential of their client websites; this is because they are focusing on low impact metrics. Instead, agencies should focus on building a strong brand and loyalty, which is more likely to maximize the returns for their clients. From another perspective, increased transparency and a more ethical culture has to be built between the agency / consultant and the clients. Specifically, clients trust SEO consultants to set up the correct metrics and targets, educate them and avoid lack of clarity in what is being achieved. An essential implication of the study is that Google has managed to put machine learning in the hands of every marketer or business owner. For this reason, SEO consultants should reconsider and evolve their role to understand and leverage the large amount of data available within their client businesses. This study demonstrated how existing client data and out of the box machine learning solutions can be used to improve website ranking. The researcher is confident that marketers can leverage machine learning in many other innovative ways to further improve their clients’ businesses.

Finally, the study re-emphasises the need for a more empirical approach within the SEO industry and the need that SEOs adopt a new more data driven rather than observational mindset in testing their hypotheses. Machine learning tools like Google Cloud Auto ML’s tables are available, though they cannot be leveraged with this change in mindset within the SEO community.



8.1. Findings vs. Expectations

The researcher is confident that the study has reached its desired outcomes. On one side the study found some very interesting gaps between the rhetorical approach employed by SEOs and the empirical approach via machine learning. On another side, the hypothesis that machine learning is accessible to all marketers regardless of programming skills was confirmed by the ease of use of Google’s Tables service, it’s choice of a simpler Regression model and superior performance of this model against the XGBoost and Keras models in Python. However the researcher expected the implementation of the built-in algorithms to be hassle free, in line with Google Cloud’s promises; this expectation was not met and has led to one of the main weaknesses of the research: while the Google Tables model well outperformed the XGboost and Keras models built in Python, we do not know how this model would perform against the powerful models offered as pre-built algorithms. However, that Google AutoML Tables’ service automatically chose a simple regression model for the task, at the expense of XGboost or deep learning models that were also available in its arsenal, is a good indication that this was indeed the best possible model for this project’s data.

With regards to the importance of engagement features the researcher had some of his long-held beliefs contradicted by the results; the researcher will certainly both further explore the findings of the study with more data, different websites, different industries and more. For example, based on the large amount of literature read on the subject, conferences attended, blog posts or various talks and courses the researcher believed and conveyed messages such as: higher time on page, average session duration and low bounce rates are indications of content relevancy to the search query, the user is more engaged, the webmaster answers his search query better and so forth. Similarly, as a standard advice the researcher would encourage clients to continuously improve the speed of their website. The researcher also somehow perceived the large number of unique visitors to a website as a positive factor at all times, the logic behind it being based on studies like the Reddit study or Rand Fishkin’s experiment we discussed about. Of course, we have seen that the study suggests these long held beliefs are wrong. Furthermore, the researcher did not expect to find that conversions would influence ranking decisions by Google. This was mainly for two reasons: on one side the conversion data can be easily manipulated by the creation of irrelevant conversion actions in Google Analytics, and on the other side Google as a company is obsessed about removing any ranking factors that can be manipulated by human intervention and implicitly impact user experience.

Finally, the researcher has always considered metrics like direct traffic, frequency of visits and returning visitors as an effect of the perceived engagement metrics discussed above. Specifically, the researcher reasoned that as users spent more time on the website it must be that they like it and subsequently return to it. Thus, metrics like direct traffic, frequency of visits or return guests were perceived as a natural consequence of other metrics, and secondary as importance to them. By contrast, the study suggests that the roles of features as ranking factors are actually inversed: it is because the strength and loyalty to the brand that users keep coming back to the website, spend more time on it, visit more pages and accept lower page load times.


8.2 Planning and Executing the Study

Reflecting on the whole planning and execution of the study the researcher feels that despite a very thorough preparation there was one particular unwanted setback; this could neither have been anticipated nor prevented. Specifically, the codeless point and click promise of Google Cloud prebuilt models was not met. The researcher struggled to get support from the Google Cloud team, even after purchasing an $100 support package. The response was slow, and mostly irrelevant to the matter at hand. Moreover, the documentation of the models was generic, applying mostly to custom built algorithms that require programmer input rather than a simple codeless pre-built algorithms option. The researcher brought this to the attention of the Google Cloud Support team in detail. Finally, despite the fact that the data was chosen to be as representative and relevant as possible the researcher would welcome the opportunity of re-running the analysis on a larger number of websites; the researcher would both generalize the findings better and eliminate the possibility that results were influenced by other factors i.e. online competition in that particular industry.


8.3 Recommendations for further research

The study has had some very interesting findings vs. the popular beliefs within the SEO community. The researcher is confident that the study has opened the door to further investigation in an area that is poorly researched within the SEO community. Given the limitation of the study (three months’ worth of data and one case study) the researcher recommends that the study be replicated on a larger scale with data ranging on a larger period of time and more case studies. This approach should further investigate surprising findings such as the high importance of conversion rates on Google Ranking. For example, the finding regarding the high impact of conversion rates on target should be further investigated, including the relationship and correlation of this metric with other metrics not considered by this study.

Furthermore, the researcher recommends a research on data collected for periods between major Google algorithms updates; this will provide SEOs with a trend assessing the evolution of each feature’s importance over years, and a glimpse into the future Google developments i.e. what are the features Google improved on and value more. Finally, the researcher recommends that the study should replicated on groups of websites per industry; this is important as Google may place different value on these engagement metrics depending on industries and user intent. Specifically, metrics like Frequency of visits and pageviews may bear lower weight on ranking on an accounting website where people are looking for specific services then on a tech related blog where the expectation is that users spend more time reading through the content.

8.4 Final word

The researcher found this study extremely useful in further understanding the factors impacting Google’s decisions to rank websites, and the relationships between these factors. Overall, the researcher increased significantly his awareness with regards to the need of continuously assessing, analysing and understanding the perceptions held by SEO consultants , and the type of actions and skills these consultants must develop for supporting the success of their clients’ businesses.

With this being said, the high focus Google shows to local SEO, with reviews being a very important ranking factor in this space, it has never been a better time to provide great customer service and get that positive reviews…as the really bad experience I personally had with companies like Affix Scaffolding shows, it will be very hard to recover from some of that negative reviews,


Barysevich, Aleh (2019). 14 ranking signals you need to optimise for in 2019. [online] Available at: https://searchenginewatch.com/2018/12/10/ranking-signals-2019-optimize/ [Accessed July 5 2019]

Bennet, Tom (2017). The Complete Guide to Direct Traffic in Google Analytics. [online] Available at: https://moz.com/blog/guide-to-direct-traffic-google-analytics [Accessed July 11 2019]

Bradford, Craig(2018). What is SEO Split testing? [online] Available at: https://www.distilled.net/resources/what-is-seo-split-testing/  [Accessed July 6 2019]

Chace, Calum (2015). Surviving AI. The promise and peril of artificial intelligence, GB: Three Cs

Chollet, Francois. Allaire, J J (2018). Deep Learning with R. UK: Manning Publications Co.

Ciaburro, Giuseppe. Ayyadevara, V Kishore. Perrier, Alexis (2018) Hands-On Machine Learning on Google Cloud Platform. UK : Packt Publishing

Crowe, Anna (2018). Top 7 Ranking Signals: What REALLLY Matters in 2019? [online] Available at: https://www.searchenginejournal.com/seo-guide/ranking-signals/ [Accessed June 10 2019]

Data Science Dojo (2017). Intro to Machine Learning with R & Caret. [online]

Available at https://www.youtube.com/watch?v=z8PRU46I3NY  [Accessed July 09 2019]

Dennis (2019), Is SEO Dead? The Answer is Yes and No. [online] Available at:  https://www.coredna.com/blogs/is-seo-dead [Accessed June 9 2019]

De Vivo, Marcela (2019). The 7 Most Important Ranking Factors in 2019. [online] Available at: https://www.singlegrain.com/seo/seo-ranking-factors-2019/

[Accessed July 8 2019]

Dean, Brian (2013). Love it or Hate It, There’s No Doubt That Google Prefers Brands. [online] Available at https://www.searchenginejournal.com/love-it-or-hate-it-theres-no-doubt-that-google-prefers-brands/59379/#close [Accessed August 21 2019]

Enge, Eric. Spencer, Stephan. Stricchiola, Jessie & Fishkin, Rand (2013). The Art of SEO. Mastering Search Engine Optimization, US: O’Reilly

Fisher, Pat (2019). Bounce Rates & SEO…What are they good for…absolutely…something? [online] Available at: https://www.outerboxdesign.com/search-marketing/search-engine-optimization/bounce-rate-effect-on-search-rankings [Accessed June 8 2019]

Fishkin, Rand (2015). Do Page Views Matter? (ranking factor?). [online] Available at https://moz.com/community/q/do-page-views-matter-ranking-factor [Accessed July 8 2019]

Gavrilas, Razvan (2015) Traffic Improves SEO and Affects Google Rankings, new research says. [online] Available at: https://cognitiveseo.com/blog/7013/traffic-boosts-organic-rankings-new-research-reveals-interesting-facts/  [Accessed on July 10 2019]

Google (2019), Cloud APIs. [online] Available at: https://cloud.google.com/apis/ [Accessed June 8 2019]

Google (2018). AutoML Tables. [online] Available at: https://cloud.google.com/automl-tables/ [Accessed June 11 2019]

Google (2019). Introduction to built-in algorithms. [online] Available at:


[Accessed June 9 2019]

Google (2019). Evaluating Models. [online] Available at: https://cloud.google.com/automl-tables/docs/evaluate [Accessed June 9 2019]

Haws, Spencer (2013). How important is time on site for ranking in Google? [online] Available at: https://www.nichepursuits.com/how-important-is-time-on-site-for-ranking-in-google/ [Accessed July 9 2019]

Hedgepeth, Cory (2019). This is why your direct traffic in Google Analytics is so high. [online] Available at: https://www.directom.com/direct-traffic-google-analytics/ [Accessed July 9 2019]

Hodnett, Mark. Wiley, F Joshua(2018). R Deep Learning Essentials. UK : Packt Publishing

Johnson, Patricia (2018). Is direct traffic good or bad for SEO?. [online] Available at: https://www.quora.com/Is-direct-traffic-good-or-bad-for-SEO  [Accessed July 3 2019]

Kelleher, John D. Brian, Mac Namee. D’Arcy, Aoife (2015). Fundamentals of Machine Learning for Predictive Data Analytics. US : MIT Press

Kim, Larry (2019). RankBrain Judgment Day: 4 SEO Strategies You’ll Need to Survive. [online] Available at: https://www.wordstream.com/blog/ws/2016/03/16/rankbrain-seo

[Accessed June 9 2019]

Kuhn, Max. Kjell, Johnson (2016).  Applied Predictive Modelling, US: Springer

Kusinitz, Sam (2019). The Ultimate Guide to Google Ranking Factors in 2019. [online] Available at: https://blog.hubspot.com/marketing/google-ranking-algorithm-infographic [Accessed July 5 2019]

Lantz, Brett (2015). Machine Learning with R. UK : Packt Publishing Ltd

Marr, Bernard (2015). Big Data. Using Smart Big Data Analytics and Metrics To Make Better Decisions And Improve Performance, GB: Wiley

Morde, Vishal (2019). XGBoost Algorithm: Long May She Reign!

The new queen of Machine Learning algorithms taking over the world. [online] Available at https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d [Accessed July 10 2019]

Osman, Maddy (2019). Top 10 User Engagement KPIs to Measure. [online] Available at: https://www.searchenginejournal.com/content-marketing-kpis/user-engagement-metrics/ [Accessed June 9 2019]

Patel, Neil (2019). Is SEO dead? [online] Available at: https://neilpatel.com/blog/seo-dead/  [Accessed June 8 2019]

Patel, Neil (2019). How to Measure Reader Engagement and Loyalty Using Google Analytics. [online] Available at: https://neilpatel.com/blog/how-to-measure-reader-engagement-and-loyalty-using-google-analytics/ [Accessed July 11 2019]

Press, Gil (2016). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. [online] Available at https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#2927f7076f63

[Accessed July 7 2019]

Provost, Foster. Fawcet, Tom (2013). Data Science for Business: What You Need to Know About Data Mining and Data-Analytical Thinking. US : O’Reilly

Ries, Eric (2011). The Lean Startup: How Constant Innovation Creates Radically Successful Businesses; UK : Penguin Books

Schwartz, Barry (2019). Daily Mail SEO says site lost big after June Google update, asks community for help. [online] Available at: https://searchengineland.com/daily-mail-seo-says-site-lost-big-after-june-google-update-asks-community-for-help-317926 [Accessed June 10 2019]

SEMRush (2017). Ranking Factors 2.0. [online] Available at: https://www.semrush.com/ranking-factors/  [Accessed July 2 2019]

Singureanu, Constantin (2018). Hacking Digital Growth 2025: Exploiting Human Biases, Tools of the Trade & The Future of Digital Marketing. Amazon UK

Sutter, Brian (2018). 7 User Engagement Metrics That Influence SEO. [online] Available at: https://www.forbes.com/sites/briansutter/2018/03/24/7-user-engagement-metrics-that-influence-seo/#2edcd8fb567b  [Accessed July 1 2019]

Stetzer, Adam (2016). Do bounce rates affect a site’s search engine ranking? [online] Available at:

https://searchenginewatch.com/2016/05/04/do-bounce-rates-affect-a-sites-search-engine-ranking/ [Accessed June 9 2019]

Talari, Saikumar (2018). Top Skills every Data Scientist needs to Master. [online] Available at: https://towardsdatascience.com/top-skills-every-data-scientist-needs-to-master-5aba4293b88 [Accessed July 8 2019]

Virgillito, Dan (2016). How to track user engagement with Google Analytics. [online] Available at: https://www.elegantthemes.com/blog/tips-tricks/how-to-track-user-engagement-with-google-analytics [Accessed July 4 2019]

Yoast (2019), Yoast SEO: the #1 WordPress SEO plugin. [online] Available at: https://yoast.com/wordpress/plugins/seo/ [Accessed June 8 2019]

Wall, Aron (2016). Google Loves Brands. The Rise of Brands in Google’s Relevancy Algorithms. [online] Available at: http://www.seobook.com/learn-seo/infographics/brand-branding-brands.php [Accessed August 21 2019]

Wolffberg, Doron (2019). Top 10 Google Ranking Factors For 2019. [online] Available at: https://yellowheadinc.com/blog/google-ranking-factors

[Accessed June 8 2019]



2.Python XGBoost models



import xgboost as xgb

from xgboost import XGBRegressor

import sklearn

from sklearn.metrics import mean_squared_error

from sklearn.model_selection import train_test_split

import eli5

from eli5.sklearn import PermutationImportance

import pandas as pd

import matplotlib

import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

import warnings



data = pd.read_csv(“Engagement.csv”)




output: unique_visitors 0

new_users 0

bounce_rate 0

average_pages_per_session 0

average_session_duration_seconds 0

average_page_load_time 0

av_time_on_page 0

page_views 0

direct_website_traffic 0

conversions 0

frequency_of_ visits_2 0

frequency_of_ visits_3 0

frequency_of_ visits_4 0

average_ranking_on_google 1

dtype: int64


We have a single null value in the “average_ranking_on_google” column and remove it before we continue.


data.dropna(inplace = True)


5.    FEATURE SELECTION: investigate correlation between features

corrmat = data.corr()

top_corr_features = corrmat.index

plt.figure(figsize = (20, 12))

#plot heat map

g = sns.heatmap(data[top_corr_features].corr(), annot = True, cmap = “RdYlGn”)


Fig1: Heatmap to study the correlation between features



Features like “frequency_of_visits_2”, “frequency_of_visits_3” and “frequency_of_visits_4” are highly correlated with “page_views” and “direct_website_traffic”. Considering the high correlation, these features can be considered as redundant and dropped from  the dataframe. Before we drop it though, we create a baseline XGBOOST model and assess its  RMSE  performance.


X1 = data.drop(‘average_ranking_on_google’, axis = 1)

y1 = data[‘average_ranking_on_google’]


7.     Convert the dataset into an optimized data structure called Dmatrix, which is supported by the XGBoost algorithm.

data_dmatrix = xgb.DMatrix(np.asmatrix(X1), label = y1)

8.     SPLIT DATA INTO TRAINING AND TESTING: create the train and test set for cross-validation of the results using the train_test_split function from sklearn’s model_selection module with test_size size equal to 20% of the data. Also, to maintain reproducibility, a random seed is assigned.

X_train1, X_test1, y_train1, y_test1 = train_test_split(X1, y1, test_size = 0.2, random_state = 123)



Step1: Initiate XGBRegressor class and assign it to a variable

xg_reg_1 = XGBRegressor(objective = ‘reg:squarederror’)


Step2: Fit the model to training set

xg_reg_1.fit(X_train1.values, y_train1.values)




Step3: Make predictions on testing data

preds1 = xg_reg_1.predict(X_test1.values)


Step4: Compute RMSE

rmse = np.sqrt(mean_squared_error(y_test1, preds1))

print(“RMSE: %f” % (rmse))


output RMSE: 3.238639


10. OPTIMIZE XGBOOST MODEL: remove redundant features.

Step1: Remove redundant features


removed = [‘frequency_of_ visits_2’, ‘frequency_of_ visits_3’, ‘frequency_of_ visits_4’, ‘direct_website_traffic’, ‘average_pages_per_session’, ‘page_views’]


Step2: Drop features from the dataframe

data.drop(removed, axis = 1, inplace = True)


Step3: Review the heatmap to assess if we have more correlated features

corrmat = data.corr()

top_corr_features = corrmat.index

plt.figure(figsize = (20, 12))

g = sns.heatmap(data[top_corr_features].corr(), annot = True, cmap = “RdYlGn”)



Fig 2: Heatmap after removing the correlated features


Most features are independent of each other.

Step4: Separate X—features and y—labels

X2 = data.drop(‘average_ranking_on_google’, axis = 1)

y2 = data[‘average_ranking_on_google’]


Step5: Split data in training and testing sets

X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size = 0.2, random_state = 123)


Step6: Initiate the XGBRegressor class and assign it to a variable

xg_reg_2 = XGBRegressor(objective = ‘reg:squarederror’)


Step7: Fit model to training dataset

xg_reg_2.fit(X_train2.values, y_train2.values)


Step8: Make predictions on testing data

preds2 = xg_reg_2.predict(X_test2.values)


Step9: Compute RMSE

rmse = np.sqrt(mean_squared_error(y_test2, preds2))

print(“RMSE: %f” % (rmse))

Output RMSE: 3.145879


We observe that the RMSE score of 3.145 for the optimized model is better than the 3.2386 for the baseline model, though not by much.


Use the ELI5 library to visualize the weights of each feature.

xgb.plot_importance(xg_reg_1, grid = False, height = 0.5)

plt.rcParams[‘figure.figsize’] = [18, 18]





Fig 3: Feature Importance on the original data

xgb.plot_importance(xg_reg_2, grid = False, height = 0.5)

plt.rcParams[‘figure.figsize’] = [18, 18]




Fig 4: Feature importance on the new data without correlated features


Step1: Substract names of all our features belonging to the optimized xgboost model

feature_names = X2.columns.tolist()


Step2: PermutationImportance is a class of ELI5 that gives us the important features

perm_xgb = PermutationImportance(xg_reg_2).fit(X_train2.values, y_train2.values)

eli5.show_weights(perm_xgb, feature_names = feature_names)




Fig 5: The weight table for the data without correlated features


The table above suggests that “unique_visitors” with a weight value of 0.3558 has the highest importance on target.

3. Python Keras Model


The input dimension to the model is 7 which is equivalent to the number of features present in our data after removing highly correlated features.


Step1: Import libraries


import tensorflow as tf

import keras

from keras.layers import Dense

from keras.models import Sequential

from keras.layers import BatchNormalization

from keras.layers import Dropout

from keras.optimizers import Adam, SGD

from keras.wrappers.scikit_learn import KerasRegressor

import eli5

from eli5.sklearn import PermutationImportance


Step2: Build model

def base_model():

    model = Sequential()

    model.add(Dense(200, input_dim = 7, kernel_initializer = ‘normal’, activation = ‘relu’))

    model.add(Dense(50, kernel_initializer = ‘normal’, activation = ‘relu’))

    model.add(Dense(1, kernel_initializer = ‘normal’))

 model.compile(loss = ‘mean_squared_error’, optimizer = Adam(lr = 0.00001))

    return model


Step3: Call base_model function above

my_model = KerasRegressor(build_fn = base_model)


Step4: Fit model to training data

history = my_model.fit(X_train2, y_train2, epochs = 7000, validation_data = (X_test2, y_test2), shuffle = True)




plt.figure(figsize = (15, 6))

plt.plot(range(7000), history.history[‘loss’], label = “Training Loss”)

plt.plot(range(7000), history.history[‘val_loss’], label = “Validation Loss”)


plt.ylabel(“Training Loss/Validation Loss”)

plt.title(“Training Loss vs Validation Loss”)

plt.legend(loc = “best”)



Fig 6: The loss plot for training and testing data

We observe that the loss starts at a high value for both training and validation data. However, as the training progresses, the loss for both training and validation data begins to decrease, after 2500 epochs the loss for both training and testing data converge and after around 5000 epochs the loss stagnates. We conclude that after 5000 epochs the loss would not decrease any further.

 Step6: Evaluate the model on validation data

evaluate = my_model.predict(X_test2)

print(f”RMSE: {mean_squared_error(evaluate, y_test2)}”)


output: RMSE: 9.664037599592092



Fig 7: Feature importance for the keras model

We observe that “unique_visitors” is the highest ranked feature on target the output which is “average_ranking_on_google”, Surprisingly, features like “frequency_of_visits_4” are assigned the lowest weight, completely opposite to the Google Tables model.


1. The aim of the exercise was predicting the “average_ranking_on_google” given various input features. Given the nature of the data, this was a “Regression Problem”.

2. We started by investigating feature correlations with a heatmap

3. A baseline XGBOOST model was build taking as input all features, and delivered a RMSE value of 3.238639.

4. To assess impact on the baseline model, correlated features were removed and a new XGBOOST model was built. The new model delivered a RMSE of 3.145879, only a slight improvement on the baseline model.

5. We then conducted a weight ranking exercise which suggested that “unique_visitors” was the most influential feature on target. This was followed by “bounce_rate”, “conversions” and “average_session_duration_seconds”.

6. A keras model was also built and trained on the dataset. The model was fit on the data and trained for 7000 epochs. A visualization of the training and validation loss suggests that the model stagnates around 6000 epochs when the loss stops to decline further. The RMSE of 9.6640 was well worse than the better performing XGBoost Python and Google Table’s models.

7. The Keras model also found “unique_visitors” as the highest ranked feature. “new_users” was found to be second most influential feature followed by “bounce_rate” 

8. Overall, the regression model chosen by Google Cloud AutoML Tables model performed best of all models mainly due to the nature of the data and the size of the dataset i.e. neural networks perform better on large datasets.

9. The PCA technique is generally helpful on datasets with a large number of inputs. In our case, we had a low number of features hence the researcher felt that PCA was unnecessary.

  1. Data input for Google Cloud built in algorithms (XGBoost, Linear, Wide & Deep)
  1. Data input for Google Cloud TablesML algorithm