asrch.commands._open package#

Usage examples#

Open Command Usage Examples

  • Open text from URL:

    $ asrch open text --url https://scrapethissite.com/simple/pages
    Output: [body text of the specified URL]
    
  • Browser mode (new in 7.4.0):

    $ asrch open text --browse https://srapethissite.com/simple/pages
    Output: [example]
    
    >>> Body text
        [body text example]
        [page text example last line]
    >>> (1) https://example.com/page1, (2) https://example.com/page2, (3) https://example.com/page3,
    >>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3
    
    [q]: quit program
    [h]: navigate through your session history, e.g., h1 (first element in history), h2 (second element), h3 (third)
    <: go to last opened page in history
    
    Enter page number: go to new page from output above
    
    >>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3
    >>> Navigating to https://example.com/page3
    

Subpackages#

Module contents#

open command

asrch.commands._open.get_html(url: str, proxy: str | None, header: bool = False, log=True) str[source]#

Get the HTML source code of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (str, optional) – Proxy to be used for the request, defaults to None.

  • header (bool) – Flag indicating whether to include headers in the request, defaults to False

  • log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

The HTML source code of the web page.

Return type:

str

asrch.commands._open.get_image(url: str | None, proxy: str | None, header: bool, download: bool | None = False, workspace: str | None = None) collections.abc.Generator[str, None, None][source]#

Get the image from a URL.

Parameters:
  • url – The URL of the image. If None, returns an empty generator.

  • proxy – Proxy to be used for the request, defaults to None.

  • header – Flag indicating whether to include headers in the request.

  • download – Flag indicating whether to download the image, defaults to False.

Raises:
  • ValueError – If the URL is empty.

  • NoSuchelementexception – If the element can’t be found.

  • ElementNotVisibleException – If the element isn’t visible in the DOM.

  • StaleElementReferenceException – If the element can no longer be accessed.

Returns:

A generator yielding the image content as strings.

asrch.commands._open.get_js(url: str, proxy: str | None, header: bool = False, log=True) collections.abc.Generator[str, None, None][source]#

Get the JavaScript content from a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (str, optional) – Proxy to be used for the request, defaults to None.

  • header (bool) – Flag indicating whether to include headers in the request, defaults to Fals.

  • log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

A generator yielding the JavaScript content as strings.

Return type:

Generator[str, None, None]

asrch.commands._open.get_page(url: str | int, header: bool = False, proxy: str | None = None, log=True, parser='html.parser', browse: bool = False) str | int[source]#

Get the content of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • header (bool, optional) – Flag indicating whether to include headers in the request, defaults to False.

  • proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.

  • log (bool, optional) – Flag indicating whether to log actions, defaults to True.

  • parser (str, optional) – The parser to use for BeautifulSoup, defaults to ‘lxml’.

  • browse (bool, optional) – Flag indicating whether to enter browse mode, defaults to False.

Raises:
  • ValueError – If the URL is empty.

  • Exception – If there are issues with the request.

Returns:

The content of the web page if successful, otherwise an error code.

Return type:

Union[str, int]

asrch.commands._open.get_screenshot(url: str, proxy: str | None, header: bool = False, log=True) None[source]#

Take a screenshot of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (str, optional) – Proxy to be used for the request, defaults to None.

  • header (bool) – Flag indicating whether to include headers in the request, defaults to False

  • log (bool) – Flag indicating whether to log actions, defaults to True.

asrch.commands._open.highlight_elements(soup)[source]#

Highlight all elements in BeautifulSoup object

asrch.commands._open.inspect(url: str, proxy: str | None, header: bool = False, log=True, parser='html.parser') str[source]#

Get the HTML source code of a web page.

Parameters:
  • url (str) – The URL of the web page.

  • proxy (str, optional) – Proxy to be used for the request, defaults to None.

  • header (bool) – Flag indicating whether to include headers in the request, defaults to False

  • log (bool) – Flag indicating whether to log actions, defaults to True.

Returns:

The HTML source code of the web page.

Return type:

str