Treatment. . . Please wait.
This year, 20 years ago, I wrote an e-book called “Search Engine Marketing: The Essential Best Practice Guide. “It is considered to be the first comprehensive consultant for search engine optimization and the underlying science of data retrieval (IR).
I think it would be helpful to take a look at what I wrote in 2002 to see how it compares today. Let’s start with the basic facets of what Internet tracking entails.
It is vital to perceive the history and context of the Internet and studies to perceive where we are and what the next step is. And let me tell you, there’s a long way to go.
Our industry is now rushing to some other new iteration of the Internet. We’ll start by reviewing the foundational paintings I covered in 2002. Next, we’ll explore the present, with a look at the long-term SEO, looking at some vital examples (e. g. , structured data, cloud computing, IoT, edge computing, 5G),
All this is a mega from where the Internet began.
Join me, don’t you? As we wander through the memory of search engine optimization.
We use the terms World Wide Web and internet interchangeably. However, it is the same.
You’d be surprised how many other people don’t perceive the difference.
Another iteration that brought it closer to what we now know invented in 1973 through scientist Vint Cerf (currently Google’s leading web evangelist).
The World Wide Web was invented by British scientist Tim Berners-Lee (now Sir) in the late 1980s.
Interestingly, most people have the idea that something equivalent to a lifetime of clinical studies and experimentation happened before their invention was launched. But this is not the case at all. Berners-Lee invented the World Wide Web at lunchtime one day in 1989 while enjoying a ham sandwich in the staff cafeteria of the CERN Laboratory in Switzerland.
And to add a little clarity to the name of this article, since the following year (1990) the Internet has been tracked in one way or another through one bot or another until today (hence 32 years of tracking on the Internet).
The internet was never designed to do what we expect it to do now (and expectations continue to grow).
Berners-Lee first designed and developed the Internet to meet the demand for automated data exchange between scientists at universities and institutes around the world.
Much of what we look for the Internet to do is alien to the inventor and the browser (which Berners-Lee also invented).
And this is very applicable to the main demanding scalability situations that search engines have when searching for content to index and stay updated, while looking to notice and index new content.
Obviously, the World Wide Web has come with inherent challenges. And that brings me to an incredibly vital fact to highlight.
This is the “ubiquitous myth” that began when Google first introduced and turns out to be as ubiquitous today as it was then. And that’s the confidence that other people have that Google has access to the entire web.
No, right. In fact, from her.
When Google first started crawling the Internet in 1998, its index had about 25 million exclusive URLs. Ten years later, in 2008, they announced that they had reached the major milestone of having noticed 1 trillion exclusive URLs on the Internet.
More recently, I’ve noticed numbers that suggest Google is aware of some 50 trillion URLs. But here’s the big difference we all want to know about search engine optimization:
And 50 trillion is a lot of URLs. But that’s only a small component of the entire web.
Google (or any other search engine) can slowly move a lot of content on the surface of the web. But there’s also a lot of content on the “deep web” that moves more slowly and just can’t access. to colossal amounts of database content. As I pointed out in 2002, the slowest ones are not supplied with a screen and keyboard!
In addition, the figure of 50 trillion exclusive URLs is arbitrary. I have no idea of Google’s actual figure at the moment (and they themselves also have no idea how many pages are on the World Wide Web).
Those URLs also don’t lead to exclusive content. The Internet is full of spam, duplicate content, iterative links to nowhere, and all kinds of Internet waste.
In 2002, I created a visual interpretation of “the anatomy of a crawler-based search engine”:
Obviously, this symbol didn’t earn me any graphic design awards. But it was an accurate indication of how the other parts of an Internet search engine were combined in 2002. In fact, this has helped the emerging search engine optimization industry better understand why the industry and its practices were so necessary.
Although the technologies used through search engines are particularly complex (think: synthetic intelligence/machine learning), the main drivers, processes, and underlying science remain the same.
Although the terms “machine learning” and “artificial intelligence” have discovered their position in the industry lexicon in recent years, I wrote this in the segment about the anatomy of a search engine 20 years ago:
“In the conclusion of this section, I will deal with ‘machine learning’ (vector-borne machines) and synthetic intelligence (AI), where the Internet search and retrieval box will inevitably have to pass next. “
It’s hard to think that there are literally only a handful of general-purpose search engines on the planet crawling the web, with Google (possibly) being the largest. I say this because in 2002 there were dozens of search engines, with startups almost every one. and every week.
Since I often rub shoulders with much younger professionals in the industry, I find it amusing that many don’t even realize that search engine optimization existed before Google was there.
While Google gets a lot of credit for the innovative way it approaches internet search, it learned a lot from a guy named Brian Pinkerton. I had the opportunity to interview Pinkerton (on more than one occasion).
He is the inventor of the world’s first full-text search engine called WebCrawler. And while he was ahead of his time at the dawn of the search industry, he laughed with me when he explained his first setup for an internet search. engine. It ran on a single 486 device with 800 MB of disk and 128 MB of memory and a tracker only downloading and storing pages from only 6,000 websites.
A little different from what I wrote about Google in 2002 as a “next generation” search engine crawling the web.
“The word ‘tracker’ is almost always used in the singular; however, maximum search engines actually have several crawlers with a ‘fleet’ of agents working on a large scale. For example, Google, as a state-of-the-art search engine, started with 4 trackers, each remaining open around 300 connections. At maximum speeds, they downloaded data from more than a hundred pages consistent with the second. Google (at the time of writing) is now based on 3,000 Linux PCs, with over 90 terabytes of disk storage. They upload thirty new machines a day to your farm just to keep up with growth.
And this style of scaling and developing at Google has continued at great speed since I wrote this. It’s been a while since I saw an exact figure, but maybe a few years ago, I saw an estimate that Google crawls 20 billion pages per day. It’s probably even more than that now.
Can it be imagined to rank among Google’s top 10 most sensible if your page has never been crawled?
As unlikely as it may seem on the application, the answer is “yes. ” And again, this is all I mentioned in 2002 in the book:
From time to time, Google will roll back a list, or even a unique link to a document, that has already been crawled with confirmation that the document only appears because keywords appear in other documents with links pointing to it.
What is all this? How is that possible?
Hyperlink analysis. Yes, they are backlinks!
There’s a difference between crawling, indexing, and simply being aware of URLs. Here is the additional explanation I gave:
“If you go back to the high-demand situations described in the segment about the slow movement of the Internet, it is evident that you deserve to never assume, after the visit of a search engine spider, that ALL the pages of your website have been indexed. I have clients with Internet sites to varying degrees in number of pages. Fifty, about 5,000 and in all honesty, I can say that none of them have each and every page indexed through all the major search engines. All major search engines have URLs at the “edge” of slow movement as we know, that is, the control of the robots will have millions of URLs in the database, which it knows exist but have not yet moved slowly and downloaded.
There have been many occasions when I have noticed examples of this. The first 10 effects after a question showed a base URL with no name or snippet (or metadata).
Here’s an example I used in a 2004 presentation. Look at the result on the back and see what I mean.
Google is aware of the importance of this page due to the knowledge of links surrounding it. But no more data was extracted from the page, not even the name tag, because evidently the page was not crawled. (Of course, this can also take place with the little evergreen blunder that occurs when someone leaves the robots file. txt preventing the site from being crawled. )
I have underlined this earlier sentence as ambitious for two reasons:
I’m just going to embellish the “courtesy” a little more because it’s directly similar to the file/protocol. txt of robots. All the demanding situations of Internet mining that I explained 20 years ago still exist today (on a larger scale).
Since bots retrieve data at a much faster speed and intensity than humans, they can (and infrequently have) have a crippling effect on a website’s performance.
This is why a courtesy policy is necessary governed on the one hand through the programming of the tracker and the design of the crawl, and on the other hand through the robots file. txt.
The faster a search engine can slowly move new content and re-explore existing pages in the index, the fresher the content will be.
Finding the balance? This is the hardest part.
Let’s say, purely hypothetically, that Google was looking to maintain a full news and current affairs policy and try to slowly move the entire Online page of the New York Times every day (even weekly) without any courtesy factors. probably that the robot uses all its bandwidth. And that would mean that no one can read the newspaper online because of bandwidth congestion.
Fortunately now, beyond the undeniable courtesy factor, we have Google Search Console, where it is conceivable to manipulate the speed and frequency with which Internet sites are crawled.
OK, we have a lot of floor as I knew we would.
There have been many tweaks on both the Internet and the World Wide Web, but the tracking component is still hampered by the same old issues.
That said, a while back I saw a presentation through Andrey Kolobov, a researcher in the device learning box at Bing. He created a set of rules to make a balancing act with the challenge of bandwidth, courtesy, and importance when plotting tracking. .
I found it very informative, strangely undeniable and quite undeniable to explain. Even if you don’t perceive the calculations, don’t worry, you will still have an indication of how you approach the problem. And also hear the word “importance” in the combination. Once again.
Basically, as I explained earlier about URLs on the slow-moving edge, link research is before it moves slowly, and it may be the explanation for why it moves slowly temporarily. You can watch the short video of his presentation here.
Now let’s move on to what’s happening right now and how the web, 5G Array, and enhanced content formats are being developed.
The Internet has been a sea of unstructured knowledge from the beginning. That’s how it was invented. And as it continues to grow exponentially every day, the challenge for search engines is to move slowly and re-scan existing documents in the index to analyze and update if adjustments have been made to keep the index up to date.
It is a daunting task.
It would be much less difficult if the knowledge were structured. And that’s largely the case, as structured knowledge bases drive many websites. But the content and presentation are separate, of course, because the content will have to be published only in HTML.
There have been many attempts I’ve come across over the years, where traditional extractors have been created to verify converting HTML into structured data. But above all, those attempts were very fragile, difficult and completely error-prone operations.
Another thing that completely replaced the game is that the websites of the early days were hand-coded and designed for old, clunky desktop machines. wants to aim.
As I said, due to the demanding situations inherent in the web, search engines like Google will probably never move slowly and index the entire web.
So what other way would be a way to particularly improve the process?What if we let the crawler continue to do its same old task and make you have a structured data stream?
Over the past decade, the importance and usefulness of this concept has continued to grow. For many, this is still a new concept. But then again, Pinkerton, the inventor of WebCrawler, was way ahead of the curve on this issue 20 years ago.
He and I discussed the concept of domain-specific XML streams to standardize syntax. At the time, XML was new and was regarded as the long-term browser-based HTML.
This is called extensible because it is a constant format like HTML. XML is a “metalanguage” (a language to describe other languages that allows you to design your own traditional markup languages for unlimited and varied document types). Several other approaches were touted as the long-term HTML can still meet the required interoperability.
However, one technique that has attracted a lot of attention is known as mcf (Meta Content Framework), which brought concepts from the representation box of wisdom (semantic frameworks and networks). The concept for creating a non-unusual style of knowledge in the form of a classified oriented graph.
Yes, the concept is better known as Semantic Web. And what I just described is the first glimpse of the wisdom graph. This concept dates back to 1997, by the way.
That said, it was in 2011 when everything began to fall into place, with the founding of schema. org through Bing, Google, Yahoo and Yandex. The concept was to provide webmasters with an exclusive vocabulary. I only had to make the paintings once and I would get the consumer benefits from the marking.
OK, I don’t need to stray too far from the enormous importance of structured knowledge for long-term SEO. It will have to be an article in its own right. So I’ll come back to this in more detail.
But you can probably see that while Google and other search engines can’t move slowly around the web, the importance of offering structured information to help them temporarily update pages without having to re-explore them makes a huge difference.
That said, and this is especially important, you still want your unstructured knowledge to be identified by your E-A-T points (experience, authority, reliability) before structured knowledge comes into play.
As I have discussed before, over the past 4 decades, the Internet has gone from being a peer-to-peer network to the overlap of the World Wide Web into a revolution of the cellular Internet, cloud computing, the Internet of Things, edge computing. and 5G.
The evolution of cloud computing has given us the industry expression “the cloud of the Internet”.
Huge knowledge centers throughout a warehouse’s supply facilities to manage IT, storage, networking, knowledge management, and control. This means that cloud knowledge centers are located near hydroelectric plants, for example, to supply the enormous amount of energy they need.
Now, the “Edgeifacation of the Internet” is alternating everything from being further away from users to being right next door.
Edge computing is about physical hardware devices located in remote locations at the edge of the network with enough memory, processing power, and computing resources to collect data, process that data, and run it in near real-time with limitations from other parts of the network.
By bringing IT departments closer to those locations, users gain advantages from faster, more reliable installations with better user experiences, and companies gain advantages from a greater ability to help latency-sensitive applications, identify trends, and deliver vastly impressive products and installations. IoT devices and edge devices are used interchangeably.
With 5G and the strength of IoT and edge computing, the content that is created and distributed will also be drastically replaced.
We’re already seeing elements of virtual truth (VR) and augmented truth (AR) in all sorts of other applications. And in research, it will be no other.
AR images are an herbal initiative of Google, and they’ve had fun with 3D photography for a few years, testing, testing, and testing while doing so. But they already integrate this low-latency access to wisdom. graph and bring in content in a more visually pleasing way.
At the height of the pandemic, the end user, now “digitally accelerated,” became accustomed to interacting with the 3D images that Google sprayed into the combination of results. At first they were animals (dogs, bears, sharks) and then cars.
Last year, Google announced that in this period it interacted with 3D effects more than two hundred million times. This means the bar has been set and we all want to start thinking about creating those richer content experiences, because the end user (maybe their next customer) is already waiting for this kind of advanced content.
If you haven’t experienced it yourself yet (and everyone, even in our industry), here’s a very good gift. In this video from last year, Google features outstanding athletes in the AR mix. And superstar athlete Simone Biles can interact with herself AR in searching for results.
Having established the phases/developments of the Internet, it is not difficult to say that anything connected in one way or another will be the driving force of the future.
Due to the complex hype that much of the generation receives, it’s easy to dismiss it mentally, as IoT is all about smart bulbs and wearable devices are just about fitness trackers and watches. you can imagine a little. This is not science fiction.
IoT and wearable devices are two of the fastest developing technologies and topics of study that will particularly expand users’ electronic programs (in particular, communications).
The long term takes a long time to arrive this time. It’s here.
We live in a connected world where billions of computers, tablets, smartphones, wearables, game consoles, and even medical devices, even entire buildings, process and transmit information digitally.
Here’s a small fact appealing to you: It’s estimated that the number of IoT-connected devices and items is already eclipsing the number of other people on earth.
We will prevent here. But much more to come.
I plan to break down what we now call search engine optimization into a series of monthly articles that focus on the basics. However, the term “SEO” won’t be part of the lexicon for a while, as the cottage industry of “doing things to be discovered on search engine portals” began to emerge in the mid to late 1990s.
Until then, be well, be productive, and everything around you in these exciting technological times. I’ll be back with more in a few weeks.
The reviews expressed in this article are from the guest and not necessarily from Search Engine Land. Staff emails are indexed here.
Treatment. . . Please wait.
See conditions.
Treatment. . . Please wait.