Web Scraping Features

Web Scraping Capabilities

42rows provides specialized scraping tools for different types of web content. All scraping operations are processed server-side, with results delivered to your email upon completion.

Available Scraping Tools

SERP Scraping

How It Works

Search something on Google
Copy and paste the Google search results URL
Set your desired number of results
Get ranking data and metadata for each result

Available Options

Add to Training Data: Save search results as context embeddings
Get Result Details: Extract metadata and rich snippets
Create 42rows Table: Automatically generate table with results

Article Scraping

Capabilities

Extract article content from web pages
Preserve article structure
Capture metadata (author, date, etc.)
Generate structured content tables

E-commerce Scraping

Features

Product information extraction
Price data collection
Product specifications capture
Image URL extraction

Job Listing Scraping

Data Collection

Job title and description
Company information
Location and salary data
Requirements and qualifications

News Scraping

Content Extraction

News article content
Publication information
Category and topic data
Related content links

Common Configuration Options

Available Settings

Result Limit:
- Control number of items to scrape
- Set maximum results
- Manage processing load
Output Format:
- Create 42rows table
- Generate downloadable files
- Structure data for analysis
Processing Options:
- Add to training data
- Extract additional metadata
- Create embeddings for context

Using Scraped Data

Data Integration

Table Operations:
- Process scraped data with AI
- Generate content from results
- Analyze collected information
Context Creation:
- Use scraped content as context
- Create embeddings for AI operations
- Build knowledge bases
Data Export:
- Download structured results
- Share collected data
- Create reports

Best Practices

Planning:
- Define clear scraping objectives
- Choose appropriate scraping type
- Set reasonable result limits
Processing:
- Monitor email notifications
- Review results for quality
- Plan post-processing steps
Integration:
- Structure data appropriately
- Plan AI processing workflows
- Consider context creation

Crawler Model

The Crawler model allows you to extract content from multiple URLs automatically. Available in both chat and table interfaces, it processes URLs in your data and returns their content.

Setting Up URL Crawling

Table Configuration

Create a column containing URLs
Add a new column for crawled content
Select Crawler model in the prompt settings
Reference URL column using {letter} syntax

Example Setup

Column A: URLs to crawl
Column B (Crawler Model): 
Prompt: "Extract content from URL: {A}"

Chat Usage

Select Crawler model from model dropdown
Provide URLs in your prompt
Receive extracted content in response

Content Processing

URL Processing:
- Each URL is processed individually
- Content is extracted and cleaned
- Results maintain page structure
Batch Processing:
- Process multiple URLs simultaneously
- Server-side execution
- Email notification upon completion

Common Use Cases

Process list of blog posts
Extract content from multiple articles
Gather information from product pages
Collect data from documentation pages

Implementation Example

// Column Setup
Column A: Product URLs
Column B (Crawler): "Extract content from {A}"
Column C (AI Prompt): "Analyze extracted content from {B} 
                      and create product summary"

Best Practices

URL Management:
- Verify URL formatting
- Check URL accessibility
- Consider rate limiting
Processing:
- Start with small URL batches
- Monitor processing status
- Review extracted content quality