Interview with Gary Illyes – Google Webmaster Trend Analyst

In 2018 I had the opportunity to interview Google Webmaster Trend Analyst, Gary Illyes in Delhi, India. Unfortunately I didn’t get around to transcribing the interview and publishing it until now. Better late than never!

Siddharth Lal: E-Commerce sites tend to have 100 of pages of product listing for a particular product. Would you recommend pagination versus an all product page or lazy loading? What method would you use?

Gary Illyes: I would most likely not use lazy loading because most search engines are not well fitted to load lazy loading, including Google. We are well aware of that. We are trying to figure out a solution to that. But we’re definitely not there yet. There are hacks for lazy loading so for example creating individual URLs and then when someone clicks on that URL, you scroll to that position on the page with JavaScript but they may not be perfect solutions.

Lazy Loading can either take the form of low res place holder images that load with the rest of the page content, or only loading the visible page and waiting to load the rest until it is scrolled to

Showing everything on the page is probably not feasible because if you think about 10,000 products. Let’s say you have flipkart, you can’t show every single piece of electronics that you have in store. Each time you will have like 1000s of products, it will slow down the page and it will be a nightmare for the user to navigate it. Pagination… that’s the way to do it. It works for pretty much every e-Commerce site.

Siddharth: But then if we use pagination. Let’s assume flipkart black shoes. So if it’s black shoes and they’ve got say 700 black shoes and they’ve got 100 products per page. Page 1,2,3,4,5,6,7, but in effect they’re not unique pages from the perspective how do you change the Titles since all of them are black shoes? So do you have 7 or 50 pages all saying black shoes… how do you make sure they’re unique?

Gary: One thing is that it’s very likely that you don’t want to get every one of those pages indexed. So have one master page, implement the rel=previous & rel=next, and canonical to whichever page makes more sense…  Usually it’s the first page but sometimes it’s not, so I would do that. It gets more complicated when you have faceted navigation and you allow users to filter on shoe colour. Actually the same thing applies… basically you pick one page that represents what the consumer is looking for and then… and basically you get that page indexed. Usually it wouldn’t even work for you to get every page indexed because the differences between the pages are so tiny usually that search engines just count them as duplicate sometimes. Just the search engines will pick one and that’s supposedly suboptimal because sometimes search engines might pick the wrong one. So with rel=canonical and rel=previous & rel=next you actually have control over which page should show up on search.

Siddharth: But then the main reason you would have this is because your products are listed there? So then that would be the discovery process of the search engines to get to that particular shoe model. There’s no other way to get to that. So you still have to show Google they’re not all the same page.

Gary: You want to show Google that they are different pages but in the same cluster.

Siddharth: Correct, but then as soon as you add the canonical to the first page.

Gary: Then you just say that’s the leading page.

Siddharth: Correct, but then does that mean that Google, even though I have some products which are great products but they’re on page 3. And if I’ve just told Google with a rel canonical page one is the main page and with canonical in effect, it says all the other pages are duplicates of this. Then will that mean that the product which is on page 3 does not now get any kind of… The search spider doesn’t go there because it says forget it, it’s not important.

Gary: So Googlebot will go to the other pages, it will most definitely go through the pages because we want to clarify that you can actually didn’t make a mistake. So we are going to visit them but the rel=previous/next will help us cluster them and basically the signals that come from the other pages will funnel to the leading page.

Siddharth: My concern is more around the product which is on page 3, page 2. Will Google then devalue that product saying that these products are not on the main page? Link juice that passes through for example. The Home page which has got the maximum page value… authority.

Gary: Most of the time, the home page doesn’t have the most value. It’s true for very big sites, but for small sites, very often a leaf page might be the most valuable not the home page.

Siddharth: Ya but they leave aside small sites because if they’re e-Commerce that’s why you get into a “previous”, “next” rel.

Gary: Yes, so rel “prev”, “next” will help us with passing link juice from the parent page which has the canonical. The link juice in the cluster will pass normally as it would be without the rel canonical. It will also enable us to do discovery. So basically, we are going to visit those pages, we are going to extract the links, we are going to try to crawl those pages. That’s sitemap is still the way to go for the listings because we will never guarantee that we are actually going to go through the 10th page or the 12th page. There is a very good chance because we have enough crawling capacity. If we have enough crawling capacity then we are going to go to the 11th page but there is no guarantee, but there is a chance that we are not going to crawl.

Siddharth: Correct! So the XML sitemap makes sense for that. And would this information also hold true if it was not an e-Commerce site but just an entertainment site or information site so same thing (Gary says yes). So even if its songs and all instead of having (inaudible) So users have rel previous and have all the songs.

Gary: It’s pretty much the same thing for any site. Basically if you have similar structure as an e-commerce site then you would do rel “prev” “next”

NOTE: Google has discontinued indexing rel-next/prev tags (announced via a tweet in March 2019). Leaving the tags in place will not harm your SEO in any way. While Google has stopped indexing rel=next/prev, Bing continues to index the tags

Siddharth: Social signals specifically, I’ll divide this into two. One is social signal from something like Facebook, which is generally behind login credential so my information is private. However, all these people who keep saying “hey you must invest in social media. You must have a Facebook page. You must do all of this. How would Google take into account all the shares, likes etc. because they’re all hidden behind a login password.

Gary: The easy answer is that we don’t.

Siddharth: Correct, because it’s a dark web kind of thing

Gary: Aha, but not because things are hidden between the login. So for example if I search for a site facebook.com. And, there would be plenty plenty of posts from Facebook. Let me restart. Basically, we don’t do social signals. These sites are treated pretty much as a normal site, like any other site. We do this consciously, this was a choice that we made. Because if we started to rely on likes for example. If someone at Facebook wakes up one day and they decide to put up a robots.txt block on the likes URL then we’re screwed. Basically we created the signal almost relying on the third party and then our users will get perhaps less relevance. That’s the main reason why we can’t do this. Login that complicates things a bit but for most social networks there is a sizeable number of pages that are public.

Siddharth: So what you’re saying basically is that Social Signals will not matter?

Gary: No, social signals will not matter. Those sites are treated as normal sites. There’s nothing special about them.

 Siddharth: So Brand mentions on a social site. You will not take it as a normal site or you will take it as a normal site?

Social media has no more weight than a normal site

Gary: We would take it as a normal site. So basically just because someone mentioned you on twitter it won’t be more valuable than when someone mentions you on a normal site.

Siddharth: So therefore then, whether its Facebook or whether it’s a YouTube, Quora it does not matter. It’s the same thing since you’re valuing it exactly the same.

2nd Audio

Siddharth: So content that’s only visible when the user clicks to expand or has to click to click on a tab to make that content visible. We’ve heard conflicting information and I think from various Google analysts, like from John who will come out and say something versus you etc. My own belief… that’s me personally, is that content that requires clicking to reveal itself should not be counted or should be devalued versus the content that’s visible. Is there any Google policy about it? And what would be your advice to webmasters for this?

Gary: I think the reason for the conflict in information out there is that things changed. So, when we started moving towards having more and more mobile content in our index, then at one point we realized that there is tons and tons of content that is hidden behind tabs/accordions & until that point we were demoting this content. So basically if you have a piece of content visible on the page, on page load, then that piece of content would rank much better than content that is only shown when the user clicks the accordion to expand the content. We changed that approach. Right now it doesn’t matter if you have some content behind a tab/accordion. It will do just as fine as content that is visible without clicking. There is a caveat, the content must be already on the page. (Siddharth clarifying: So it should not be dynamically pulled at the point of clicking.) So basically if you click something and then initiate an AJAX call, pulling from the server a piece of content, that’ll not work. We are not even going to see it, let alone it ranking. So I believe that’s where the confusion is coming from. Information that we put out will change because we are changing search every single day. We fix things that we believe that are not optimal and this was one of those things.

An example tab layout that might hide content that could rank well

Siddharth: Fair enough! You’ve actually answered my second part because I said for the above let’s take an example of mobile instead of Desktop where the screen dynamics are different. And for example, Wikipedia if you open it, requires pretty much everything to be clear for the content to be visible.  So what you’re saying, even though today Desktop indexing is still prevalent, you’re still taking into account content which is hidden behind the accordion. So it matters equally.

Gary: But then, as you know that when you’re on mobile phone, it’s actually because you have much less real estate. It makes perfect sense to hide things behind the accordion like Wikipedia. But that shouldn’t mean basically that you want to funnel your users and match up to that main piece of information. So for example if I search for Gandhi, then if I go to Wikipedia, they will throw in my face a first paragraph that gives a brief information about Gandhi’s life and achievements. Then I will have the table which contains tabular data, like what were his prominent names or prominent achievements in short and where he was born and when he died, what’s he famous for…stuff like that and rest of the page is hidden behind accordions. Basically if I wanted the user to dwell in the topic deeper. Then I would start including those important links.

Siddharth: In India, local languages, how do we figure out searches for them? Like we’ve got the keyword planner tool, which helps with English but if people are searching in Hindi, Gujarati etc. Is there a way to track them?

Gary: Not really, Right now the those keyword planners kind of suck because you have to pay for it

Siddharth: Ya but those are English because they are typed in English but if you typed in another script like Gurumukhi, it wouldn’t come.

Gary: With these languages, it’s pretty much safe to assume that there is no good content out there. And, that would work for the foreseeable future.  Basically you can assume that if you’re looking for, e.g. “how to do something” related content, there will not be good content for that in Hindi. So right now I think there should be and I don’t understand why is there not a boom in content creation in Hindi, Punjabi. Telugu etc., because there is literally no content to rank.

Siddharth: Yes, I totally agree. For example, I was talking to you the other day about saavn, the music streaming site. Currently they’re in English. And another thing I was also talking to them about, why don’t we take this and make this into Hindi? And not only Hindi, we can then make it into Marathi or all of them and that will immediately give us a very big advantage. You know? Gurumukhi, Punjabi, all of that and that would be a good way to do things and that’s something I’ve been talking to a lot of my clients about. 

Gary: The only thing is that when you translate these sites, you have to pay lots of attention to actually correctly translating the sites, not just using Google translate and you have instantly got a new site. While machine translation has evolved a lot over the years but it’s still not perfect. 

Siddharth: And would we then use hreflang in this because Like IN… Hindi I think Hindi there’s one but I don’t think we have it for the sub languages like Marathi and Punjabi.

Gary: This is a good example I think. “Yeh Junoon”. Basically, if I click on this site, which was automatically translated by Google. Even the translation is not accurate, basically I can’t read it (because it is in Hindi), but this would be the correct way to start

Siddharth: What are the results that come through? So if someone had done it in Hindi, that would’ve come up on top?

Third person: Hinglish websites are coming up in results but Hindi are low.

Gary: Yes, like for example this is in Hinglish. But, the funny thing is that if you had a content that is in Hindi script (Devnagri or Sanskrit)

Gary: Sanskrit, let’s say. I wouldn’t say they are low quality results but low relevancy results. So, if you have content for Yeh Junoon on this page then you would most likely be somewhere here or even here.

Third Person: In future, is it better to write in Hindi than of Hinglish?

Gary: Well, Hinglish is also good. We understand that people are writing in Hinglish here so we rank similarly. Like for example when I switch back (unintelligible) What?

Siddharth: But what happens when you voice search? Right now because you can’t type with it but, as soon as voice search starts exploding, then people will talk in the language. So then the language will come in effect. Right?

Gary: But see. We would be able to on our sides to translate this into this or this into this. So if it’s technically possible to use this interchangeable Hinglish…

Siddharth: I have another thought on that. So instead of going and creating content on a different page for that, what if on the same page, you start a Hindi translation? So the URLs still remains the same. Now you have a problem of the href language kicking in but, Google will see both the pieces of information.

Gary: I would go back to ask you, weather that’s what your users would like

Siddharth: True, they don’t know. You would have to do a test and see.

Third Person: I think most Indian people are using English.

Siddharth: Yes, like you could have a tab I’m thinking you know where someone clicks it, and now it’s in Hindi versus the tab which says English, but the URL then remains constant.

Gary: I would probably separate them. I know why I would separate them. So we tried to offer results for the users in the language that they are comfortable with. So for example if we see that the browser is set to accept language English, Hindi, Hinglish, then we would try to offer them results that are in Hindi and for that you probably want to have Hindi content separate from the English one because if you mix the two together then our language detection algorithms will freak out. They will not be able to distinguish.

Siddharth: Is there a way to track voice search queries right now?

Gary: You can assume. So basically the search analytics does tell us about voice queries as well. You know how voice grids typically look like. Basically they are longer form because no one is going to type out on the phone or whatever the longer sentence. And, basically if you go through search console keyboard data, supporting data, then you would probably be able to find these very long tail queries. Better thing is that these keywords will not have many impressions. So basically, there will be many variations for the same things and they would only have one or two or three impressions.

Voice queries are typically contain more natural language than a typed query

Siddharth: So what I’m trying to figure out by what you just said. Search consoles, the search queries it’s sort of already taking into account the voice search but it’s just not segregating them out. And the way for us to look at it logically is anything say which is long tail. There’s a high chance that if it’s a long tail, it could be voice search.

Gary: Well, it also matters how it’s phrased. Because long tail could mean that I’ve picked something from I don’t know Kamasutra a piece of text and pasted it in.

Siddharth: Very interesting example you’re giving here. Okay, okay. Go ahead. I like it.

Gary: Just took a piece of text and pasted it in. I don’t think I know what language is it written in. So I probably would not know what I’m pasting but I would take a sentence or one line, let’s see and paste it in the search box and then see what results come up. That’s a long tail thing because it’s part of a sentence that I copied from a page. The reason I picked Kamasutra is that I don’t know any other book, off the top of my head anyway. And then, that should strike you as a webmaster, as a non-voice query. Because it will be perhaps, obviously either a full sentence or a broken sentence. But then when people ask voice queries than they would ask phrases similarly as they would ask their friend or whoever else like “Ok Google! Why is the Sky blue?”. It’s a question. If you go through search results and see something like this. Then, you as a webmaster can quiet safely assume that there was no one typing why is the sky blue in the search box. Perhaps there would be some users, but my point is that the voice queries are more similar to the natural language than the keyword based searches that we used to do on Desktop.

Third Person: We can’t track keyword queries in Google Analytics as it shows up as not provided. Is that going to change?

Gary: That’s not going to change. So in search console you can definitely see those queries. So that’s why basically we recognize that Google analytics is good for tracking these things but it was giving way too tabular data and our privacy policy is very straight on what kind of data we will give out. But, search console on the other hand can give this data because it strips away pretty much every single data that would allow you to identify single users. That’s why you would track traffic from search in search console and other source of traffic in Google analytics.

Siddharth: Image Optimization. With Image optimization, three main factors that we’re generally talking about is the file name, it’s the content around the image and the alt tag and compression of the image so it shouldn’t be too big. These are generally the four main things right? Out of these four, which do you believe is the main factor that would affect image optimization? In your opinion for webmasters.

Gary: I’ll be throwing one more, that is image schema. I would also throw in image sitemaps. What is the most important? I don’t want to create the illusion that we have we have a rank for these ranking signals. Depending what other results your picture is showing up with in the results, the ranking signals might have different weights. That’s why I’m quite mad that we said that Rank Brain is the third most important because probably you can twist it in a way, it would look like it’s the third most important because it triggers for so many queries. But then in the best majority of the cases, it perhaps will not do pretty much anything for the results. It looks at them, yes, this looks about right to me and it doesn’t change the word. But if triggered, it looks like it we apparently count it as active so it’s counted as the third most important ranking signal  I don’t like that, that’s why I wouldn’t make or create them an order of importance for image searches.

Siddharth: But all of them. Are they important?

Gary: All are important. Typically the more signals you provide the better. I would focus mostly on-page things. So for example. Have your content close to the content because that helps us enormously associate the image with something that we can rank for. For example, if you’re writing about pond, but the image of the pond is next to a content for cats for Old Delhi or whatever, then you might just associate Old Delhi with an image of a pond. That’s not necessarily what users wants so you put your image next to the piece of content where you write about it. Captions are good for these sort of things.

Siddharth: So that’s the title of the image.

Gary: Going further, let’s say that you have an image and right below the image you have caption of the image. File name is also important to some extent. I think we started giving less weight to that because everyone can name their image like cat or whatever and then they expect it will show up for cat) but it still has some weight. Alt attribute value that’s supposed to taken into account. Title is taken into account. Nowadays even the title of the page is taken into account

Third Person: Content of the article is also taken into account

Gary: The problem is that, we got those reports a lot but we keep getting them from sales people, from our own sales people. And, search will not fix anything for sales. Basically the sky can fall and we would just sit there and if the sales reported the sky is falling to us, we would say “sorry, you are sales”

Third Person: Actually it is coming from content, from BBC news. It is coming from content people. Same thing there has been written there.

Gary: I will write down again

Third Person: I think it is important right?

Gary: I like that guy over there a lot

Siddharth: Ya this one, Narendra Modi. How’s it coming for most criminal person in the world. And he’s in Google image search. Someone has I think Google bombed it. It’s taken care of now, Google bomb?

Third person: In text, it’s okay. But in images it is not fine.

Gary: So the thing with Google bomb is that, we believe that we kind of can handle it well or well enough. Basically if you see a spike of something for example, to the same image or same page. All of them saying that criminal or “Obama Criminal” or whatever then it will work unfortunately quite well, or at least it used to work quite well. Now-a-days we believe that we took care of that to the biggest extent and we can still be surprised. I mean our algorithms can still be surprised by very ingenious spammers or scammers who can find new ways to exploit this, loop holes. But, honestly I don’t remember looking into any Google Bomb for many months.

Siddharth: So just coming back to that image search thing. Would you really differentiate the title of the image versus the alt text of the image because what would the difference be because let’s say it’s an image of this diary, I could have both exactly the same right? I’ve always struggled when people say that should be the title and this should be the alt tag.

Gary: I would also not care. Well, from accessibility point of view I would care because I think that the screen readers look for one of them not both, so I would pay more attention to that. And browser if you hover over, that will display the title.

Siddharth: But they can be the same thing? I think they should be the same thing.

Gary: It doesn’t matter I think. I would also argue that it is fine if you only fill out one of them.

Siddharth: So talking about the alt text. People tend to generally, most people they think it’s a place to put keywords whereas the reality of it. Let’s assume it’s this diary that we’re looking at. It is a yahoo diary purple colour. But they were talking about, the page was about shirts. Even though this image is of a diary, they’ll put shirts – blue shirts, green shirts. Right? That I believe is totally wrong and rather it should be. This is a purple color yahoo diary is how this should be called. So does Google therefore really look into this and devalue the thing because people are constantly just doing this. Pretty much nearly every SEO I talk to hasn’t reached a certain level, just thinks of it as keywords “let me put some keywords” into the alt attributes.

Third person: We should write natural name i.e. about diary. And it should be Diary, further name Yahoo Diary. But we don’t need to do keyword stuffing.

Gary: So I believe, if the keywords are not stuffed, they are just purple, diary, yahoo – I can’t come up with any others – that is kind of also fine. It’s not like we’re going to penalize them for that

Siddharth: Ya purple diary yahoo is absolutely fine or yahoo purple diary. That’s all fine. As long as you don’t go and say blue shirt because we’re writing…

Gary: We can differentiate between typical and off topic things, I’m going to tell you how. We can drop those things easily.

Siddharth: And a lot of e-commerce sites tend to because there are 1000s of images on their site and suddenly you go and tell them that “Oh you’re missing an alt attribute here and you need to put for each one”. They look at us and say that’s a massive project. Why would you want to do that? But we believe it’s important right? Every image should have an alt tag. Because that will also add value to the page.

Gary: So basically, if we don’t have enough information on the page or information which can be associated with the image. Then that image will not rank at all or it will rank weakly. And that’s why you want to provide more data similar to for example structured data.

Siddharth: What’s your favorite ranking signal.

Gary: PageRank (backlinks – votes from other sites). I like it because it’s a signal that was created over 20 years ago. And, in a sense, we’re still using the same signal that we used 20 years ago. And I think that something that lasts 20 years, on the internet, that’s worth admiring. I can’t think of any other ranking signal that lasted that long. Basically, we came up with lots of things over the years and basically, we learned something and 2,3,4,5 years later we unlaunched them. I personally unlaunched a dozen ranking signals over the years because they were not relevant anymore or because they were doing the same thing that some other ranking signal indirectly did.

Siddharth: Talking about EAT – Expert Authority and Trust. What exactly is the difference between authority, expertness and trust? Or are they all just the same thing?

Gary: They are pretty much the same thing. I think, if you’re in either one of those, you’re in good category, that’s one thing. If you like more in-depth description then it’s in the latest guidelines.

Siddharth: Yup that’s from Feb 27. I got that of your tweet only at one point.

Gary: So these examples that you see here, these are the rating pages. This is for…. In this case this is, your money your life topic. The page shows characteristics of a low authority sites. No contact information, no indication of, who wrote the content, no evidence of medical expertise, no authority and heavy monetization from ads. Therefore, this page is not trustworthy. So basically, if we know the… ok I will simplify it, if we know the author, and we would be very sure that the author is a doctor and the site is publishing consistently high- quality content. Hence the author of that content are doctors, then that’s basically something that we would trust more because this EAT, this matters a lot in your money your life – topics, keywords where if you get wrong information, then you can actually loose your money or your house or even die.

Siddharth: But so essentially they are the same thing, weather if someone refers to expert means of communication.

Gary: See even there we were using a slash between expertise and authority which means that we are kind of thinking that it’s kind of the same thing. I would say that it’s more that, it’s a description of a bucket so people or sites, pages that are verifiably provide accurate good information.

Siddharth: I’m still getting the feeling that, they’re sort of all very integral to each other, all directly linked. Because if you’re an expert, then you’re trustworthy right? If you’re an authority, you’re trustworthy.

Gary: So you’re an SEO, you know how many experts are in your fieldwork who are not necessarily trustworthy.

Siddharth: But then therefore, they’re not experts. If they were real experts…

Gary: But they’ve made a reputation for themselves that they are experts. I know quite a few of these people who created the image that they’re experts. Basically people trust them, they’re trustworthy. But then they’re preaching non-sense. So yeah I think they’re roughly the same bucket and that they relate to each other.

Leave a Reply