Scraping Web Pages in Workflows
Lesson Overview
In this video, you'll learn how to scrape information from existing websites and incorporate it into your workflows using the Web Page Scrape step in AirOps. The lesson covers key configuration options and demonstrates how to combine the scraped data with language models for further processing.
- 0:00: Introduction to fetching information from websites for workflows
- 0:12: Adding and configuring the Web Page Scrape step
- 1:52: Examining the scraped markdown output
- 2:02: Combining scraped data with language models for summarization
Key Concepts
Web Page Scrape Step
The Web Page Scrape step in AirOps allows you to fetch information from existing websites and incorporate it into your workflows. To use this step, simply add it to your workflow and configure the necessary options, such as the target URL, maximum length, output type (text, markdown, or HTML), and whether to extract only the main content.
Output Types and Configuration
When configuring the Web Page Scrape step, you can choose from three output types:
- Text: Returns only the text content of the page
- Markdown: Returns the text with lightweight structure, such as headings, links, and images
- HTML: Returns the full HTML content of the page
Additionally, you can set a maximum length to limit the number of characters returned, which helps manage input token costs when using large language models.
Combining Scraped Data with Language Models
Once you have scraped the desired web page content, you can feed the output into a language model prompt for further processing. For example, you can ask the model to summarize the scraped content in a specified number of sentences, as demonstrated in the video.
Key Takeaways
- The Web Page Scrape step in AirOps simplifies the process of fetching information from websites and incorporating it into workflows.
- When configuring the Web Page Scrape step, consider the desired output type (text, markdown, or HTML) and set a maximum length to manage input token costs.
- Scraped web page content can be combined with language model prompts for tasks such as summarization, allowing for powerful information processing within workflows.
Workflow Builder
Now that you understand Grids, it's time to create your own precise workflows that include data, AI calls and human review.