blob: 4842136aba4a4fc4147051ec9ce7f5f316e674ce (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
|
# go-linkfinder
`go-linkfinder` is a simple web crawling and link extraction tool written in Go, inspired by [https://github.com/GerbenJavado/LinkFinder](https://github.com/GerbenJavado/LinkFinder). It supports fetching content from URLs or files, recursively crawling same-domain links up to a specified depth, and customizing request headers, user-agent, timeout, and delay between requests.
> **WARNING**: This tool is intended for **authorized security testing and web research only**. Unauthorized crawling may violate website terms of service or legal regulations. The author and contributors are not responsible for misuse. Always obtain explicit permission before crawling any site.
## Features
- **Input Sources**: Fetch content from URLs or local files.
- **Recursive Crawling**: Configurable recursion depth to crawl same-domain links.
- **Request Customization**: Set HTTP headers and user-agent strings.
- **Timeout Control**: Customize HTTP request timeout.
- **Delay Between Requests**: Configurable delay (default 4 seconds) to avoid overloading servers.
- **Automatic Gzip Handling**: Supports gzip compressed HTTP responses.
- **URL Normalization**: Resolves relative URLs to absolute form.
- **Duplicate Detection**: Tracks and avoids revisiting URLs.
## Installation
### Prerequisites
- **Go**: Version 1.21 or later.
- **Make**: For building with the provided Makefile.
- **Git**: To clone the repository.
### Steps
- Clone the repository:
```
$ git clone https://cgit.heqnx.com/go-linkfinder
$ cd go-linkfinder
```
- Install dependencies:
```
$ go mod tidy
```
- Build for all platforms:
```
$ make all
```
- Binaries will be generated in the build/ directory for Linux, Windows, and macOS; alternatively, build for a specific platform:
```
$ make linux-amd64
$ make windows-amd64
$ make darwin-arm64
```
- (Optional) Run directly with Go:
```
$ go run main.go -depth <depth> -delay <delay> -header <Header: Value> -input <url/file> -timeout <timeout> -user-agent <ua>
```
## Usage
### Command-Line Flags
```
Usage of ./go-linkfinder-linux-amd64:
-depth int
recursion depth for same-domain links (0 disables crawling) (default 0)
-delay int
delay between requests in seconds when crawling (only applies if depth > 0) (default 4)
-header value
add HTTP header to request (can be repeated, e.g. -header "Authorization: Bearer token")
-input string
url or file path (required)
-timeout int
timeout for HTTP requests in seconds (default 10)
-user-agent string
set User-Agent header (default "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1")
```
## Examples
### Crawl a single URL without recursion
```
$ ./go-linkfinder -input https://example.com -depth 0
```
- Extracts and prints all links found on the initial page only.
### Recursively crawl same-domain links with delay
```
$ ./go-linkfinder -input https://example.com -depth 2 -delay 5
```
- Crawls links up to 2 levels deep within the same domain.
- Waits 5 seconds between each HTTP request to avoid server overload.
### Use Custom Headers and User-Agent
```
$ ./go-linkfinder -input https://example.com -depth 1 -header "Authorization: Bearer mytoken" -user-agent "CustomAgent/1.0"
```
## License
This project is licensed under the MIT License. See the LICENSE file for details.
## Disclaimer
`go-linkfinder` is provided "as is" without warranty. The author and contributors are not liable for any damages or legal consequences arising from its use. Use responsibly and only in authorized environments.
|