open¶
A module providing the ‘open’ command as an alternative to the Selenium based command. This command utilizes the requests library for HTTP requests and BeautifulSoup for HTML parsing.
- asrch.commands.no_driver._nd_open.cache_results(get_url, config_path, tab_name, search_string, etag: str)[source]¶
- asrch.commands.no_driver._nd_open.get_html(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser') bs4.BeautifulSoup [source]¶
Get the content of a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.
log (bool, optional) – Flag indicating whether to log the request, defaults to True.
- Raises:
ValueError – If the URL is empty.
Exception – If there are issues with the request.
- Returns:
The content of the web page.
- Return type:
str
- asrch.commands.no_driver._nd_open.get_index(url: str, mode: str = '') list[str] [source]¶
Fetches and processes URLs from a web page.
- Parameters:
url (str) – The URL of the web page to fetch and process.
mode (str, default is “”) – Optional mode to determine the format of the output list. If “url_list”, returns a list of URLs. Otherwise, returns a list with indexed URLs.
- Returns:
A list of URLs, either in indexed format or as a plain list depending on the mode.
- Return type:
list[str]
- Raises:
ValueError – If the provided URL is empty.
requests.exceptions.RequestException – If there is an error with the HTTP request.
FeatureNotFound – If BeautifulSoup cannot parse the HTML content.
- asrch.commands.no_driver._nd_open.get_js(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser') list[str] | str [source]¶
Get JavaScript sources from a web page.
- asrch.commands.no_driver._nd_open.get_page(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser', browse: bool = False, images: bool = False, debug: bool = False) str [source]¶
Get the content of a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.
log (bool, optional) – Flag indicating whether to log the request, defaults to True.
- Raises:
ValueError – If the URL is empty.
Exception – If there are issues with the request.
- Returns:
The content of the web page.
- Return type:
str
- asrch.commands.no_driver._nd_open.highlight_elements(soup: bs4.BeautifulSoup, base_url: str) bs4.BeautifulSoup [source]¶
Highlight all elements in BeautifulSoup object
- Parameters:
soup – The soup object to parse.
- Type:
BeautifulSoup
- Returns:
soup
- Return type:
BeautifulSoup
- asrch.commands.no_driver._nd_open.inspect(url: str, header: str = '', proxy: dict[str, str] | None = None, log: bool = True, *, parser: str | None = 'html.parser', browse: bool = False, mode: Literal['js', 'html'] = 'html') None [source]¶
Inspects a web page by making an HTTP request with optional custom headers and proxies.
- Parameters:
url (str) – The URL of the web page to inspect.
header (str, default is ‘’) – Optional custom headers to include in the HTTP request.
proxy (dict[str, str] | None, default is None) – Optional dictionary of proxy settings to use for the HTTP request. If None, no proxies are used.
log (bool, default is True) – Whether to enable logging of the HTTP request and response. Default is True.
parser (str | None, default is ‘html.parser’) – Optional HTML parser to use with BeautifulSoup. Default is ‘html.parser’. If None, no parser is specified.
browse (bool, default is False) – Whether to enable browsing mode, which might affect how the page content is processed. Default is False.
mode (Literal[‘js’, ‘html’], default is ‘html’) – Specifies the mode of operation. Can be either ‘js’ for JavaScript processing or ‘html’ for HTML parsing.
- Returns:
This function does not return a value. It performs actions based on the provided parameters.
- Return type:
None
- Raises:
ValueError – If the url is empty or invalid.
requests.exceptions.RequestException – If there is an issue with the HTTP request.
FeatureNotFound – If the specified parser is not found or cannot parse the content.
Exception – For other unexpected errors that may occur during the inspection process.
- Note:
Ensure that the header parameter is properly formatted as a valid header string.
The proxy dictionary should be formatted with valid proxy settings.
The mode parameter determines whether JavaScript or HTML parsing is used.
The browse parameter might affect how the content is handled, depending on its implementation.
- Seealso:
BeautifulSoup documentation for parsing options: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
requests documentation for HTTP request details: https://docs.python-requests.org/en/latest/