Web Scraping Features

Web Scraping Capabilities

42rows provides specialized scraping tools for different types of web content. All scraping operations are processed server-side, with results delivered to your email upon completion.

Available Scraping Tools

SERP Scraping

How It Works

  1. Search something on Google
  2. Copy and paste the Google search results URL
  3. Set your desired number of results
  4. Get ranking data and metadata for each result

Available Options

  • Add to Training Data: Save search results as context embeddings
  • Get Result Details: Extract metadata and rich snippets
  • Create 42rows Table: Automatically generate table with results

Article Scraping

Capabilities

  • Extract article content from web pages
  • Preserve article structure
  • Capture metadata (author, date, etc.)
  • Generate structured content tables

E-commerce Scraping

Features

  • Product information extraction
  • Price data collection
  • Product specifications capture
  • Image URL extraction

Job Listing Scraping

Data Collection

  • Job title and description
  • Company information
  • Location and salary data
  • Requirements and qualifications

News Scraping

Content Extraction

  • News article content
  • Publication information
  • Category and topic data
  • Related content links

Common Configuration Options

Available Settings

  • Result Limit:
    • Control number of items to scrape
    • Set maximum results
    • Manage processing load
  • Output Format:
    • Create 42rows table
    • Generate downloadable files
    • Structure data for analysis
  • Processing Options:
    • Add to training data
    • Extract additional metadata
    • Create embeddings for context

Using Scraped Data

Data Integration

  • Table Operations:
    • Process scraped data with AI
    • Generate content from results
    • Analyze collected information
  • Context Creation:
    • Use scraped content as context
    • Create embeddings for AI operations
    • Build knowledge bases
  • Data Export:
    • Download structured results
    • Share collected data
    • Create reports

Best Practices

  • Planning:
    • Define clear scraping objectives
    • Choose appropriate scraping type
    • Set reasonable result limits
  • Processing:
    • Monitor email notifications
    • Review results for quality
    • Plan post-processing steps
  • Integration:
    • Structure data appropriately
    • Plan AI processing workflows
    • Consider context creation

Crawler Model

The Crawler model allows you to extract content from multiple URLs automatically. Available in both chat and table interfaces, it processes URLs in your data and returns their content.

Setting Up URL Crawling

Table Configuration

  1. Create a column containing URLs
  2. Add a new column for crawled content
  3. Select Crawler model in the prompt settings
  4. Reference URL column using {letter} syntax

Example Setup

Column A: URLs to crawl
Column B (Crawler Model): 
Prompt: "Extract content from URL: {A}"

Chat Usage

  • Select Crawler model from model dropdown
  • Provide URLs in your prompt
  • Receive extracted content in response

Content Processing

  • URL Processing:
    • Each URL is processed individually
    • Content is extracted and cleaned
    • Results maintain page structure
  • Batch Processing:
    • Process multiple URLs simultaneously
    • Server-side execution
    • Email notification upon completion

Common Use Cases

  • Process list of blog posts
  • Extract content from multiple articles
  • Gather information from product pages
  • Collect data from documentation pages

Implementation Example

// Column Setup
Column A: Product URLs
Column B (Crawler): "Extract content from {A}"
Column C (AI Prompt): "Analyze extracted content from {B} 
                      and create product summary"

Best Practices

  • URL Management:
    • Verify URL formatting
    • Check URL accessibility
    • Consider rate limiting
  • Processing:
    • Start with small URL batches
    • Monitor processing status
    • Review extracted content quality