Documentation Index
Fetch the complete documentation index at: https://firecrawl-noaa-mar-900-create-self-partnership-provisioning.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
了解如何使用 Firecrawl 的核心功能抓取 GitHub 的仓库、议题(issues)和文档。
npm install @mendable/firecrawl-js zod
通过 Zod schema 从代码仓库中提取结构化数据。
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
formats: [{
type: 'json',
schema: z.object({
name: z.string(),
description: z.string(),
stars: z.number(),
forks: z.number(),
language: z.string(),
topics: z.array(z.string())
})
}]
});
console.log(result.json);
搜索 GitHub 上的仓库、issue 或文档。
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const searchResult = await firecrawl.search('machine learning site:github.com', {
limit: 10,
sources: [{ type: 'web' }], // { type: 'news' }, { type: 'images' }
scrapeOptions: {
formats: ['markdown']
}
});
console.log(searchResult);
抓取单个 GitHub 页面(仓库、issue 或文件)。
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await firecrawl.scrape('https://github.com/firecrawl/firecrawl', {
formats: ['markdown'] // 例如 html、links 等
});
console.log(result);
发现仓库或文档站点中所有可用的 URL。请注意:Map 仅返回 URL,不包含页面内容。
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const mapResult = await firecrawl.map('https://github.com/vercel/next.js/tree/canary/docs');
console.log(mapResult.links);
// 返回不含内容的 URL 数组
从仓库或文档中抓取多个页面。
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const crawlResult = await firecrawl.crawl('https://github.com/facebook/react/wiki', {
limit: 10,
scrapeOptions: {
formats: ['markdown']
}
});
console.log(crawlResult.data);
同时抓取多个 GitHub 链接。
import FirecrawlApp from '@mendable/firecrawl-js';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// 等待完成
const job = await firecrawl.batchScrape([
'https://github.com/vercel/next.js',
'https://github.com/facebook/react',
'https://github.com/microsoft/typescript'],
{
options: {
formats: ['markdown']
},
pollInterval: 2,
timeout: 120
}
);
console.log(job.status, job.completed, job.total);
console.log(job);
一次性从多个仓库中抽取结构化数据。
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// 等待完成
const job = await firecrawl.batchScrape([
'https://github.com/vercel/next.js',
'https://github.com/facebook/react'],
{
options: {
formats: [{
type: 'json',
schema: z.object({
name: z.string(),
description: z.string(),
stars: z.number(),
language: z.string()
})
}]
},
pollInterval: 2,
timeout: 120
}
);
console.log(job.status, job.completed, job.total);
console.log(job);