open

A module providing the ‘open’ command as an alternative to the Selenium based command. This command utilizes the requests library for HTTP requests and BeautifulSoup for HTML parsing.

asrch.commands.no_driver._nd_open.cache_results(get_url, config_path, tab_name, search_string, etag: str)[source]
asrch.commands.no_driver._nd_open.get_cache_file_path(config_path, tab_name)[source]
asrch.commands.no_driver._nd_open.get_html(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser') bs4.BeautifulSoup[source]

Get the content of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.

  • log (bool, optional) – Flag indicating whether to log the request, defaults to True.

Raises:
  • ValueError – If the URL is empty.

  • Exception – If there are issues with the request.

Returns:

The content of the web page.

Return type:

str

asrch.commands.no_driver._nd_open.get_index(url: str, mode: str = '') list[str][source]

Fetches and processes URLs from a web page.

Parameters:
  • url (str) – The URL of the web page to fetch and process.

  • mode (str, default is “”) – Optional mode to determine the format of the output list. If “url_list”, returns a list of URLs. Otherwise, returns a list with indexed URLs.

Returns:

A list of URLs, either in indexed format or as a plain list depending on the mode.

Return type:

list[str]

Raises:
  • ValueError – If the provided URL is empty.

  • requests.exceptions.RequestException – If there is an error with the HTTP request.

  • FeatureNotFound – If BeautifulSoup cannot parse the HTML content.

asrch.commands.no_driver._nd_open.get_js(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser') list[str] | str[source]

Get JavaScript sources from a web page.

asrch.commands.no_driver._nd_open.get_page(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser', browse: bool = False, images: bool = False, debug: bool = False) str[source]

Get the content of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.

  • log (bool, optional) – Flag indicating whether to log the request, defaults to True.

Raises:
  • ValueError – If the URL is empty.

  • Exception – If there are issues with the request.

Returns:

The content of the web page.

Return type:

str

asrch.commands.no_driver._nd_open.handle_page_navigation_input(prompt: str, url_list: list[str], history: list[str], tabs: list[str], config_path: str, etag: str) str[source]
asrch.commands.no_driver._nd_open.highlight_elements(soup: bs4.BeautifulSoup, base_url: str) bs4.BeautifulSoup[source]

Highlight all elements in BeautifulSoup object

Parameters:

soup – The soup object to parse.

Type:

BeautifulSoup

Returns:

soup

Return type:

BeautifulSoup

asrch.commands.no_driver._nd_open.inspect(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser', browse: bool = False, mode: Literal['js', 'html'] = 'html') None[source]

Inspects a web page by making an HTTP request with optional custom headers and proxies.

Parameters:
  • url (str) – The URL of the web page to inspect.

  • header (str, default is ‘’) – Optional custom headers to include in the HTTP request.

  • proxy (dict[str, str] | None, default is None) – Optional dictionary of proxy settings to use for the HTTP request. If None, no proxies are used.

  • log (bool, default is True) – Whether to enable logging of the HTTP request and response. Default is True.

  • parser (str | None, default is ‘html.parser’) – Optional HTML parser to use with BeautifulSoup. Default is ‘html.parser’. If None, no parser is specified.

  • browse (bool, default is False) – Whether to enable browsing mode, which might affect how the page content is processed. Default is False.

  • mode (Literal[‘js’, ‘html’], default is ‘html’) – Specifies the mode of operation. Can be either ‘js’ for JavaScript processing or ‘html’ for HTML parsing.

Returns:

This function does not return a value. It performs actions based on the provided parameters.

Return type:

None

Raises:
  • ValueError – If the url is empty or invalid.

  • requests.exceptions.RequestException – If there is an issue with the HTTP request.

  • FeatureNotFound – If the specified parser is not found or cannot parse the content.

  • Exception – For other unexpected errors that may occur during the inspection process.

Note:
  • Ensure that the header parameter is properly formatted as a valid header string.

  • The proxy dictionary should be formatted with valid proxy settings.

  • The mode parameter determines whether JavaScript or HTML parsing is used.

  • The browse parameter might affect how the content is handled, depending on its implementation.

Seealso: