Skip to content

Crawling a whole blog #592

Answered by adbar
TomLucidor asked this question in Q&A
Discussion options

You must be logged in to vote

You'd need to use the --explore function on the command-line with --backup-dir html/ to replicate the functionality but Trafilatura would also extract the content.
In some cases it makes more sense to get the data first and then use Trafilatura locally on the downloaded content (which also answers Q2).

Q1: This is not the same, requires a lot of training, or is not as efficient (depending on the use case and the package).

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by TomLucidor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants