Deploy a tool to dynamically render pages
Our DOI landing page is using JS scripts to render the web page. This occurs when the page is loaded. Unfortunately, not all indexing robots (including google bots) are able to run these scripts to generate the corresponding HTML code of the page.
The solution consists in using a new service which when a bot request arrives, serves the html version of a given doi landing page instead of the one used by a visitor at doi.esrf.fr.
Currently, we did not deploy locally (=at ESRF) such a service. Instead we are using a remote service from a third party. https://gitlab.esrf.fr/icat/doi-landing-page/-/blob/master/www/.htaccess#L22
This is not a long term solution because:
- we depend on a third party service which can be shutdown at any time
- we can not configure the service.
One important point is to be able provide the html version of the doi landing page fast enough to google bots. When this takes too much time, the google indexing service could trigger a timeout and the page is not indexed. This can be solved using a caching system which is normally a feature of a dynamic renderer. We have no control on that caching feature on the third party service
Here are stats regarding the time for the google bots to download a doi landing page from us https://www.google.com/webmasters/tools/crawl-stats?siteUrl=https%3A%2F%2Fdoi.esrf.fr%2F&utm_source=search_console&utm_campaign=left-nav-legacy-tool&hl=fr&authuser=1
To do:
- choose a dynamic renderer (in agreement with TID)
- deploy it locally (TID)
- fill its cache with all the current DOI landing pages
- write a script which periodically refills the cache to include the newly generated DOI landing pages
- configure the DOI landing page service such that it uses the local dynamic renderer instead of the remote service.