asrch.commands._open package#

Usage examples#

Open Command Usage Examples

Open text from URL:

$ asrch open text --url https://scrapethissite.com/simple/pages
Output: [body text of the specified URL]

Browser mode (new in 7.4.0):

$ asrch open text --browse https://srapethissite.com/simple/pages
Output: [example]

>>> Body text
    [body text example]
    [page text example last line]
>>> (1) https://example.com/page1, (2) https://example.com/page2, (3) https://example.com/page3,
>>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3

[q]: quit program
[h]: navigate through your session history, e.g., h1 (first element in history), h2 (second element), h3 (third)
<: go to last opened page in history

Enter page number: go to new page from output above

>>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3
>>> Navigating to https://example.com/page3

Subpackages#

Module contents#

open command

asrch.commands._open.get_html(url: str, proxy: str | None, header: bool = False, log=True) → str[source]#

Get the HTML source code of a web page.

Parameters:

url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

The HTML source code of the web page.

Return type:

str

asrch.commands._open.get_image(url: str | None, proxy: str | None, header: bool, download: bool | None = False, workspace: str | None = None) → collections.abc.Generator[str, None, None][source]#

Get the image from a URL.

Parameters:

url – The URL of the image. If None, returns an empty generator.
proxy – Proxy to be used for the request, defaults to None.
header – Flag indicating whether to include headers in the request.
download – Flag indicating whether to download the image, defaults to False.

Raises:

ValueError – If the URL is empty.
NoSuchelementexception – If the element can’t be found.
ElementNotVisibleException – If the element isn’t visible in the DOM.
StaleElementReferenceException – If the element can no longer be accessed.

Returns:

A generator yielding the image content as strings.

asrch.commands._open.get_js(url: str, proxy: str | None, header: bool = False, log=True) → collections.abc.Generator[str, None, None][source]#

Get the JavaScript content from a web page.

Parameters:

url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to Fals.
log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

A generator yielding the JavaScript content as strings.

Return type:

Generator[str, None, None]

asrch.commands._open.get_page(url: str | int, header: bool = False, proxy: str | None = None, log=True, parser='html.parser', browse: bool = False) → str | int[source]#

Get the content of a web page.

Parameters:

url (str) – The URL of the web page.
header (bool, optional) – Flag indicating whether to include headers in the request, defaults to False.
proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.
log (bool, optional) – Flag indicating whether to log actions, defaults to True.
parser (str, optional) – The parser to use for BeautifulSoup, defaults to ‘lxml’.
browse (bool, optional) – Flag indicating whether to enter browse mode, defaults to False.

Raises:

ValueError – If the URL is empty.
Exception – If there are issues with the request.

Returns:

The content of the web page if successful, otherwise an error code.

Return type:

Union[str, int]

asrch.commands._open.get_screenshot(url: str, proxy: str | None, header: bool = False, log=True) → None[source]#

Take a screenshot of a web page.

Parameters:

url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.

asrch.commands._open.highlight_elements(soup)[source]#: Highlight all elements in BeautifulSoup object

asrch.commands._open.inspect(url: str, proxy: str | None, header: bool = False, log=True, parser='html.parser') → str[source]#

Get the HTML source code of a web page.

Parameters:

url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

The HTML source code of the web page.

Return type:

str

asrch.commands._open package

Contents

asrch.commands._open package#

Usage examples#

Subpackages#

Module contents#