Deep Dive into Selenium's communication setup with ChromeDriver

Reading code and understanding a proven project’s detailed design/implementation is a good activity to spend time on. I am trying to make it a habit. Last week, this has led me to the Selenium and Chromium source code. First, I decided to go deep into the nitty-gritty of how “selenium” sets up the communication channel with “chromedriver”. In a near future article, I will deconstruct the chromedriver internals and how it works with the 3rd party clients and the Chrome browser.

To a user like you and me,
– “Selenium” is a collection of different programming language bindings that interact with the WebDriver API using the W3C WebDriver wire protocol and drives browser automation
– “chromedriver” is an os-dependent executable server (comes in the form of a binary file) provided by the Chrome browser vendor to interact with the Chrome browser using the CDP (Chrome DevTools Protocol)

The main communication channel setup story between selenium and chromedriver lies in the highly abstracted below two lines of code that users commonly use:

In Python:

webdriver.chrome.driver = "path/to/chromedriver";
driver = webdriver.Chrome();

In Java:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();

For its explanation, language binding doesn’t matter and I am taking the help of the Java one.

There is nothing fancy in this first line.

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

It sets the system variable/system property (in which the execution will happen) with the key as “webdriver.chrome.driver” and its value as “path/to/chromedriver“. Here, the mentioned location of the system resource (chromedriver) is required by the JVM to utilise the system resource. Internally, the “System.setProperty” method performs a security check using Java’s internal “SecurityManager.checkPermission()” method and throws SecurityException if a security manager is present in the machine and is not allowing to set this system property.

Moving on to the next line. This is where all the stuff happens and the chrome browser opens up.

WebDriver driver = new ChromeDriver();

Here, a non-parameterized constructor of the “ChromeDriver” class (present inside Selenium’s “org.openqa.selenium.chrome” package) was called. Its object got created and returned to the WebDriver type reference “driver” for future use.

Internally, at first, this non-parameterized constructor calls one of the parameterized constructors of the “ChromeDriver” class, which in turn calls another parameterized constructor. This constructor chaining looks like below:

public ChromeDriver()
public ChromeDriver(ChromeDriverService service, ChromeOptions options)
public ChromeDriver(ChromeDriverService service, ChromeOptions options, ClientConfig clientConfig)

Understanding the working of these 3 ChromeDriver constructors will make many things crystal clear. So, let’s start with the first one.

1. First constructor of the ChromeDriver class
Its constructor body looks like below:

public ChromeDriver() {
  this(ChromeDriverService.createDefaultService(), new ChromeOptions());
}

As you can see, it is calling the 2nd parameterized constructor. Two things are passed as arguments to the next constructor. Our focus will be on the first one:

ChromeDriverService.createDefaultService()

There is an important class “DriverService” (present inside the “org.openqa.selenium.remote.service” package and implements the “Closeable” interface) that manages the life and death of the native executable driver servers (e.g. here the chromedriver server). In general, the native executable driver servers must implement the “W3C WebDriver wire protocol” to be able to communicate with Selenium. To manage the lifecycle of the chromedriver server during the execution, a child class “ChromeDriverService” extends this parent class “DriverService” and inherits all its inheritable properties (fields) and behaviours (methods). For other driver servers, there are other inheritors of this DriverService present as well, e.g

EdgeDriverService
FirefoxDriverService
GeckoDriverService
InternetExplorerDriverService
SafariDriverService
SafariTechPreviewDriverService

The createDefaultService() method of the “ChromeDriverService“ class does some actions to configure and return a “ChromeDriverService” object (with default configuration) that could be later started/executed as a service. Internally, this createDefaultService() method implements a “Builder Design Pattern” (with the help of a static class called “Builder“) to configure and return the “ChromeDriverService” instance.

At first, a new Builder() object (with default configuration) is created and then the “.build()” method of the parent “DriverService” class is called.

public static ChromeDriverService createDefaultService() {
    return new Builder().build();
}

The “.build()” method performs some important actions:
1. It finds a free port from the system for the chromedriver server to listen to and sets that free port for the service (using “PortProber.findFreePort()” method)
2. If the timeout is null, it sets a default timeout of 20 seconds for the service (using the “getDefaultTimeout()” method)
3. It creates and returns the actual usable ChromeDriverService with the parameters set in the steps before

public DS build() {
      if (port == 0) {
        port = PortProber.findFreePort();
      }

      if (timeout == null) {
        timeout = getDefaultTimeout();
      }

      List<String> args = createArgs();

      DS service = createDriverService(exe, port, timeout, args, environment);
      port = 0; // reset port to allow reusing this builder

      return service;
    }

This “createDriverService” method internally calls the parameterized constructor of the “ChromeDriverService” class, which then calls its superclass DriverService‘s constructor that sets the URL to the service (http://localhost:<portNumber>). In this way, the ChromeDriverService gets created and returned.

public ChromeDriverService(File executable,int port,Duration timeout,List<String> args,Map<String, String> environment)

2. Second constructor of the ChromeDriver class
The second constructor body of the “ChromeDriver” class looks like below:

public ChromeDriver(ChromeDriverService service, ChromeOptions options) {
  this(service, options, ClientConfig.defaultConfig());
}

The “service” and “options” were already created by the first constructor and passed on to this second constructor. The main function of this second constructor is to create the “Client Config” and pass it on to the third constructor. I will not discuss the ClientConfig part.

3. Third constructor of the ChromeDriver class
The third constructor body of the “ChromeDriver” class looks like:

public ChromeDriver(ChromeDriverService service, ChromeOptions options, ClientConfig clientConfig) {
  super(generateExecutor(service, options, clientConfig), options, ChromeOptions.CAPABILITY);
  casting = new AddHasCasting().getImplementation(getCapabilities(), getExecuteMethod());
  cdp = new AddHasCdp().getImplementation(getCapabilities(), getExecuteMethod());
}

This constructor first calls the static “ChromeDriver.generateExecutor()” method by passing the derived parameters from the previous constructor: “service”, “options” and “clientConfig”. The “generateExecutor()” method takes those parameters and
– finds the path to the chromedriver executable (with the help of the “DriverFinder” class)
– sets the chromedriver executable’s path to the service
– with this service, it calls and returns an object of the “ChromeDriverCommandExecutor” class.

The “ChromeDriverCommandExecutor” constructor during the object creation calls its parent class constructor “ChromiumDriverCommandExecutor” which calls its parent class constructor “DriverCommandExecutor” which then calls its parent class constructor “HttpCommandExecutor“.

This newly created “ChromeDriverCommandExecutor ” object (along with Chrome options) is sent by the 3rd “ChromeDriver” constructor to its parent class constructor “ChromiumDriver” which then calls its parent class constructor “RemoteWebDriver“. Here, the passed capabilities are initialized.

This final action of calling the “startSession()” method from inside the “RemoteWebDriver” constructor is what launches the Chrome browser in your system.

Things happening inside the startSession() method of the RemoteWebDriver class
Inside the “startSession()” method of the “RemoteWebDriver” class, first the passed capabilities are checked for any non-W3C and ChromeW3C compliance using the methods

checkNonW3CCapabilities(capabilities);
checkChromeW3CFalse(capabilities);

Then the RemoteWebDriver’s “exeute()” method is called by passing the payload as argument to it.

protected Response execute(CommandPayload payload){}

A new “Command“ object is created inside it and then “exeute()” method of the “DriverCommandExecutor“ class is called by passing that command to it.

public Response execute(Command command) throws IOException {}

This calls the “start()” method inside the “DriverService“ class.

public void start() throws IOException{}

This then calls the “invokeExecute()” method of the “DriverCommandExecutor” class which then calls the “execute()” method of its parent “HTTPCommandExecutor” class.

Response invokeExecute(Command command) throws IOException {}

public Response execute(Command command) throws IOException {}

Then the “createSession()” method of the “ProtocolHandshake” class is called and finally this action then invokes the Chrome browser in the system.

public Result createSession(HttpHandler client, Command command) throws IOException {}

public Either<SessionNotCreatedException, Result> createSession(HttpHandler client, NewSessionPayload payload) throws IOException {}

private Either<SessionNotCreatedException, Result> createSession(HttpHandler client, Supplier<InputStream> contentSupplier, long size) {}

RemoteWebDriver’s “exeute()” method receives the response containing the created session id, status and other information.

There are some more classes and actions also involved in the initiation process e.g. HttpHandler, HttpResponse, HttpMessage, NettyClient but it requires a bigger article.

Posts

Deep Dive into Selenium’s communication setup with ChromeDriver