Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the service as a REST API #43

Closed
databill86 opened this issue Nov 21, 2023 · 6 comments · Fixed by #52
Closed

Expose the service as a REST API #43

databill86 opened this issue Nov 21, 2023 · 6 comments · Fixed by #52
Labels
enhancement New feature or request released

Comments

@databill86
Copy link

As a follow up on this pull request #38

I was wondering if it's possible to expose the service as an API. It would be a lot easier and simpler to run it locally, without the need to publish the gpt crawler. It would be perfect if it's containerized!
I'm no expert in js, I tried to implement an express js server with the help of chatgpt, but I had a lot of exceptions and errors, so I gave up ^^

This is my attempt:

// file: app/src/api.ts

import express from 'express';
import cors from 'cors';
import fileUpload from 'express-fileupload';
import { PlaywrightCrawler } from 'crawlee';
import { Page } from 'playwright';
import { readFile, writeFile } from 'fs/promises';
import {startCrawling} from "./main";

// Create a new express application instance
const app = express();
const port = 3000; // You may want to make the port configurable

// Enable JSON and file upload functionality
app.use(cors());
app.use(express.json());
app.use(fileUpload());

// Define a POST route to accept config and run the crawler
app.post('/crawl', async (req, res) => {
    // Verify that we have the configuration in the request
    if (!req.files || !req.files.config) {
        return res.status(400).json({ message: 'Config file is required.' });
    }

    // Read the configuration file sent as form-data
    const configContent = req.files.config.data.toString('utf-8');
    const config = JSON.parse(configContent);

    // Placeholder for handling crawler events and operations
    try {
        await startCrawling(config);

        // Read the output file after crawling and send it in the response
        const outputFileContent = await readFile(config.outputFileName, 'utf-8');
        res.contentType('application/json');
        return res.send(outputFileContent);
    } catch (error) {
        res.status(500).json({ message: 'Error occurred during crawling', error });
    }
});

// Start the Express server
app.listen(port, () => {
    console.log(`API server listening at http://localhost:${port}`);
});

export default app;
@marcelovicentegc
Copy link
Contributor

Hey, @databill86 ! That's a great idea. By the way, this could even evolve into a website in the future, enabling people with less expertise to use this kind of service and even turn this into an actual priced product. #38 makes the implementation of such API easier indeed, as it already abstracts the core methods that can be extended to different clients (API, CLI, etc.)

@databill86
Copy link
Author

Thank you!
That's really cool! I can't wait to see this feature

@adityak74
Copy link
Contributor

I can work on this one.

@databill86
Copy link
Author

Hello @adityak74,

I appreciate your interest on this feature. I wanted to check in on the status of #52 and if you have an idea about when you anticipate it being completed.

Thank you.

@adityak74
Copy link
Contributor

Hi @databill86 I am going to finish this up this week, hope to complete by Sunday.

@marcelovicentegc marcelovicentegc added the enhancement New feature or request label Nov 28, 2023
Copy link

🎉 This issue has been resolved in version 1.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants