当前位置:网站首页>【批处理DOS-CMD命令-汇总和小结】-外部命令-cmd下载命令、抓包命令(wget)
【批处理DOS-CMD命令-汇总和小结】-外部命令-cmd下载命令、抓包命令(wget)
2022-06-25 06:39:00 【dssgresadfsrgre】
一、wget下载程序的优势
1)支持断点下传功能
2)同时支持FTP和HTTP下载方式
3)支持代理服务器
4)设置方便简单
5)程序小,完全免费
二、下载和安装wget程序
wget的官网如下:Wget for Windows
不过很奇怪的是, 所有的下载链接,都出现了301错误。
于是我不得不换一个网站下载,推荐网站GNU Wget 1.21.3 for Windows
我下载了1.21.3 的64位版本zip文件包。
解压后,将这个文件夹整体移动到c盘下的system32目录下,配置环境变量,即安装完成。
三、wget命令的使用方法
3.1 wget命令的帮助信息——wget --help
如果你执行这个命令是正常的,那么就说明安装成功,如果报错——wget既不是内部命令也不是外部命令,那么说明安装失败了。
往往是你环境变量的配置有问题。
C:\Users\Administrator>wget --help
GNU Wget 1.21.3, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit
-h, --help print this help
-b, --background go to background after startup
-e, --execute=COMMAND execute a `.wgetrc'-style command
Logging and input file:
-o, --output-file=FILE log messages to FILE
-a, --append-output=FILE append messages to FILE
-d, --debug print lots of debugging information
-q, --quiet quiet (no output)
-v, --verbose be verbose (this is the default)
-nv, --no-verbose turn off verboseness, without being quiet
--report-speed=TYPE output bandwidth as TYPE. TYPE can be bits
-i, --input-file=FILE download URLs found in local or external FILE
--input-metalink=FILE download files covered in local Metalink FILE
-F, --force-html treat input file as HTML
-B, --base=URL resolves HTML input-file links (-i -F)
relative to URL
--config=FILE specify config file to use
--no-config do not read any config file
--rejected-log=FILE log reasons for URL rejection to FILE
Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits)
--retry-connrefused retry even if connection is refused
--retry-on-http-error=ERRORS comma-separated list of HTTP errors to retry
-O, --output-document=FILE write documents to FILE
-nc, --no-clobber skip downloads that would download to
existing files (overwriting them)
--no-netrc don't try to obtain credentials from .netrc
-c, --continue resume getting a partially-downloaded file
--start-pos=OFFSET start downloading from zero-based position OFFSET
--progress=TYPE select progress gauge type
--show-progress display the progress bar in any verbosity mode
-N, --timestamping don't re-retrieve files unless newer than
local
--no-if-modified-since don't use conditional if-modified-since get
requests in timestamping mode
--no-use-server-timestamps don't set the local file's timestamp by
the one on the server
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS set all timeout values to SECONDS
--dns-servers=ADDRESSES list of DNS servers to query (comma separated)
--bind-dns-address=ADDRESS bind DNS resolver to ADDRESS (hostname or IP) on local host
--dns-timeout=SECS set the DNS lookup timeout to SECS
--connect-timeout=SECS set the connect timeout to SECS
--read-timeout=SECS set the read timeout to SECS
-w, --wait=SECONDS wait SECONDS between retrievals
(applies if more then 1 URL is to be retrieved)
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval
(applies if more then 1 URL is to be retrieved)
--random-wait wait from 0.5*WAIT...1.5*WAIT secs between retrievals
(applies if more then 1 URL is to be retrieved)
--no-proxy explicitly turn off proxy
-Q, --quota=NUMBER set retrieval quota to NUMBER
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host
--limit-rate=RATE limit download rate to RATE
--no-dns-cache disable caching DNS lookups
--restrict-file-names=OS restrict chars in file names to ones OS allows
--ignore-case ignore case when matching files/directories
-4, --inet4-only connect only to IPv4 addresses
-6, --inet6-only connect only to IPv6 addresses
--prefer-family=FAMILY connect first to addresses of specified family,
one of IPv6, IPv4, or none
--user=USER set both ftp and http user to USER
--password=PASS set both ftp and http password to PASS
--ask-password prompt for passwords
--use-askpass=COMMAND specify credential handler for requesting
username and password. If no COMMAND is
specified the WGET_ASKPASS or the SSH_ASKPASS
environment variable is used.
--no-iri turn off IRI support
--local-encoding=ENC use ENC as the local encoding for IRIs
--remote-encoding=ENC use ENC as the default remote encoding
--unlink remove file before clobber
--keep-badhash keep files with checksum mismatch (append .badhash)
--metalink-index=NUMBER Metalink application/metalink4+xml metaurl ordinal NUMBER
--metalink-over-http use Metalink metadata from HTTP response headers
--preferred-location preferred location for Metalink resources
Directories:
-nd, --no-directories don't create directories
-x, --force-directories force creation of directories
-nH, --no-host-directories don't create host directories
--protocol-directories use protocol name in directories
-P, --directory-prefix=PREFIX save files to PREFIX/..
--cut-dirs=NUMBER ignore NUMBER remote directory components
HTTP options:
--http-user=USER set http user to USER
--http-password=PASS set http password to PASS
--no-cache disallow server-cached data
--default-page=NAME change the default page name (normally
this is 'index.html'.)
-E, --adjust-extension save HTML/CSS documents with proper extensions
--ignore-length ignore 'Content-Length' header field
--header=STRING insert STRING among the headers
--compression=TYPE choose compression, one of auto, gzip and none. (default: none)
--max-redirect maximum redirections allowed per page
--proxy-user=USER set USER as proxy username
--proxy-password=PASS set PASS as proxy password
--referer=URL include 'Referer: URL' header in HTTP request
--save-headers save the HTTP headers to file
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION
--no-http-keep-alive disable HTTP keep-alive (persistent connections)
--no-cookies don't use cookies
--load-cookies=FILE load cookies from FILE before session
--save-cookies=FILE save cookies to FILE after session
--keep-session-cookies load and save session (non-permanent) cookies
--post-data=STRING use the POST method; send STRING as the data
--post-file=FILE use the POST method; send contents of FILE
--method=HTTPMethod use method "HTTPMethod" in the request
--body-data=STRING send STRING as data. --method MUST be set
--body-file=FILE send contents of FILE. --method MUST be set
--content-disposition honor the Content-Disposition header when
choosing local file names (EXPERIMENTAL)
--content-on-error output the received content on server errors
--auth-no-challenge send Basic HTTP authentication information
without first waiting for the server's
challenge
HTTPS (SSL/TLS) options:
--secure-protocol=PR choose secure protocol, one of auto, SSLv2,
SSLv3, TLSv1, TLSv1_1, TLSv1_2, TLSv1_3 and PFS
--https-only only follow secure HTTPS links
--no-check-certificate don't validate the server's certificate
--certificate=FILE client certificate file
--certificate-type=TYPE client certificate type, PEM or DER
--private-key=FILE private key file
--private-key-type=TYPE private key type, PEM or DER
--ca-certificate=FILE file with the bundle of CAs
--ca-directory=DIR directory where hash list of CAs is stored
--crl-file=FILE file with bundle of CRLs
--pinnedpubkey=FILE/HASHES Public key (PEM/DER) file, or any number
of base64 encoded sha256 hashes preceded by
'sha256//' and separated by ';', to verify
peer against
--random-file=FILE file with random data for seeding the SSL PRNG
--ciphers=STR Set the priority string (GnuTLS) or cipher list string (OpenSSL) directly.
Use with care. This option overrides --secure-protocol.
The format and syntax of this string depend on the specific SSL/TLS engine.
HSTS options:
--no-hsts disable HSTS
--hsts-file path of HSTS database (will override default)
FTP options:
--ftp-user=USER set ftp user to USER
--ftp-password=PASS set ftp password to PASS
--no-remove-listing don't remove '.listing' files
--no-glob turn off FTP file name globbing
--no-passive-ftp disable the "passive" transfer mode
--preserve-permissions preserve remote file permissions
--retr-symlinks when recursing, get linked-to files (not dir)
FTPS options:
--ftps-implicit use implicit FTPS (default port is 990)
--ftps-resume-ssl resume the SSL/TLS session started in the control connection when
opening a data connection
--ftps-clear-data-connection cipher the control channel only; all the data will be in plaintext
--ftps-fallback-to-ftp fall back to FTP if FTPS is not supported in the target server
WARC options:
--warc-file=FILENAME save request/response data to a .warc.gz file
--warc-header=STRING insert STRING into the warcinfo record
--warc-max-size=NUMBER set maximum size of WARC files to NUMBER
--warc-cdx write CDX index files
--warc-dedup=FILENAME do not store records listed in this CDX file
--no-warc-compression do not compress WARC files with GZIP
--no-warc-digests do not calculate SHA1 digests
--no-warc-keep-log do not store the log file in a WARC record
--warc-tempdir=DIRECTORY location for temporary files created by the
WARC writer
Recursive download:
-r, --recursive specify recursive download
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
--delete-after delete files locally after downloading them
-k, --convert-links make links in downloaded HTML or CSS point to
local files
--convert-file-only convert the file part of the URLs only (usually known as the basename)
--backups=N before writing file X, rotate up to N backup files
-K, --backup-converted before converting file X, back up as X.orig
-m, --mirror shortcut for -N -r -l inf --no-remove-listing
-p, --page-requisites get all images, etc. needed to display HTML page
--strict-comments turn on strict (SGML) handling of HTML comments
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions
-R, --reject=LIST comma-separated list of rejected extensions
--accept-regex=REGEX regex matching accepted URLs
--reject-regex=REGEX regex matching rejected URLs
--regex-type=TYPE regex type (posix|pcre)
-D, --domains=LIST comma-separated list of accepted domains
--exclude-domains=LIST comma-separated list of rejected domains
--follow-ftp follow FTP links from HTML documents
--follow-tags=LIST comma-separated list of followed HTML tags
--ignore-tags=LIST comma-separated list of ignored HTML tags
-H, --span-hosts go to foreign hosts when recursive
-L, --relative follow relative links only
-I, --include-directories=LIST list of allowed directories
--trust-server-names use the name specified by the redirection
URL's last component
-X, --exclude-directories=LIST list of excluded directories
-np, --no-parent don't ascend to the parent directory
Email bug reports, questions, discussions to <[email protected]>
and/or open issues at https://savannah.gnu.org/bugs/?func=additem&group=wget.
3.2 wget命令的基本用法——wget site
假设我们要下载B站某个视频的封面图,我们可以执行命令【wget https://i2.hdslb.com/bfs/archive/[email protected]_378h_1c.webphttps://funimg.pddpic.com/ddjb/2020-09-16/804f5f88-82d4-4b3f-9cfe-06d4d172fec3.png.slim.pnghttps://i2.hdslb.com/bfs/archive/[email protected]_378h_1c.webp】。
下载成功。
在2345看图王中打开这个webp格式的图片,正常无误。
注意:不是所有图片都支持用wget命令下载,比如谷歌的logo图片链接是https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png
,我们用wget命令下载的时候,发现服务器拒绝回应。
但是我们用浏览器直接访问这个链接,就是正常的。
成功下载。
下载得到的文件有待补充扩展名。
补充扩展名后,打开,非常成功。
最后,我们尝试下载一下网页,比如百度搜索首页,让我们执行命令【wget www.baidu.com】。
下载到当前目录后,让我们打开看看?还行达到预期了。
3.3 下载日志输出至目标文件——参数-o(小写)
简版参数-o的长版是--output-file,因此假设我们想要下载百度搜索网页,那么我们可以执行命令【wget -o log.txt www.baidu.com】或者【wget --output-file=log.txt www.baidu.com】。
执行命令后,在当前目录下新增了一个日志文件、一个html文件,但是日志并没有显示在cmd窗口中了。
3.4 从文本文件中批量取出url——参数-i
简版参数-i的长版是--input-file,因此假设我们想要下载一个批量的资源,并且这些资源的url在一个文本文件中存储,那么我们可以执行命令【wget -i url.txt】或者【wget --input-file=url.txt】。
新建一个txt文件,里面包括了两行url。
执行命令 【wget -i url.txt】,成功下载。
3.5 下载并重命名——参数-O(大写)
从上面几个小节,我们容易得出“如果不主动给下载的资源文件命名,那么程序一般会自动加html扩展名或者不加”。
因此,为了省掉后续在资源管理器中重命名的麻烦,我们可以直接在执行下载命令时就做好这一操作。
简版参数-O的长版是--output-document,如果你想要下载并重命名,可以执行命令【wget -O filename url】或者【wget --output-document=filename url】。
执行命令【wget -O baidu.html www.baidu.com】。
如果你想存在指定目录,并且对文件重命名,那么你可以写成【wget -O filepath url】的形式,其中filepath中最后一段应该是一个文件名。
可见,-O参数不仅可以重命名,也可以实现后面提到的参数-P的功能。
3.6 下载至指定文件夹——参数-P(大写)
简版参数-P的长版是--directory-prefix,因此假设我们将资源下载到指定文件夹,那么我们可以执行命令【wget -O directory url】或者【wget --directory-prefix=directory url】。
将下载的资源存到目录webpage(可以是未存在的目录)下,执行命令【wget -P webpage www.baidu.com】。
3.7 递归下载整个网站——参数-r
简版参数-r的长版是--recursive,因此假设我们想要下载对应网站的全部资源,那么我们可以执行命令【wget -r url】或者【wget --recursive url】。
不妨试试对CSDN的网站进行爬虫,因为下载的资源都会保存至以url命名的文件夹中,所以执行命令【wget -r www.csdn.net】后也不需担心各种文件下载后会和其他文件混乱。
执行完后,我们发现只下载了两个文件,其中有一个名称为robot.txt的文件,说明CSDN网站存在反爬虫的机制,毕竟是商业大站,我认怂...
我们试试爬一些小站,比如说北京某知名高校的官网(在这里温馨提示一下,不要乱搞,你如果没爬出问题还好,爬出问题来了,惹祸了,小心高校到时候给你发律师函,所以不要持续地爬,试试就行了)。
执行命令【wget -r www.tsinghua.edu.cn】,然后我们可以看到cmd窗口一直在滚动,说明这个网站没有反爬虫机制。
但是出于不想太“刑”的考量,我还是按下ctrl+c键停止了爬虫,下载到的资源文件如下图所示。
四、对Cmd命令的小结和反思
在上面用wget命令对参数-i进行实验时,我发现执行命令【wget -i=url.txt】会找不到文件。
而后,经过一段时间,我看了网上其他文章对这种参数有简版、也有长版形式的情况,参数值最好是与参数用空格隔开。
因为对于简版形式,参数和参数值不能用等号隔开,但是可以用空格隔开;而对于长版形式,参数和参数值之间可以用等号或者空格隔开。
如果采用了不合适的语法,就会导致参数值被错误地赋予给参数。
比如执行命令【wget -o=log.txt www.baidu.com】后,生成的日志文件名称不叫“log.txt”,而叫“=log.txt”。
边栏推荐
- keepalived監控進程,自動重啟服務進程
- Editing the date formatting of x-axis tick labels in Matplotlib - editing the date formatting of x-axis tick labels in Matplotlib
- Blue Bridge Cup SCM module code (LED) (code + comments)
- Shell命令学习
- How to store the directory / hierarchy / tree structure in the database- How to store directory / hierarchy / tree structure in the database?
- 威迈斯新能源冲刺科创板:年营收17亿 应收账款账面价值近4亿
- [XXL job] the pond is green and the wind is warm. I remember that Yu Zhen first met
- Sqlmap advanced use – cookies
- Want to self-study SCM, do you have any books and boards worth recommending?
- Icon already includes gloss effects
猜你喜欢
随机推荐
Icon already includes gloss effects
【C语言】给字符串增加分隔符
Alphassl wildcard certificate for one month
Google extender address
Can we use function pointers in go- Can we have function pointers in Go?
Too beautiful promise because too young
100 times larger than the Milky way, Dutch astronomers found mysterious objects in deep space
Astronomers may use pulsars to detect merged supermassive black holes
关于硬件问题造成的MCU死机,过来人简单的谈一谈
How to get the difference between two dates rounded to hours
5g private network market is in full swing, and it is crucial to solve deployment difficulties in 2022
Qcom--lk phase I2C interface configuration scheme -i2c6
MySQL(十二)——更改表的备注
Harmony美食菜单界面
[Shangshui Shuo series] day 5
1W字|40 图|硬核 ES 实战
Flexbox on ie11: stretching images for no reason- Flexbox on IE11: image stretched for no reason?
How is the network connected?
几款不错的天气插件
lotus v1.16.0-rc2 Calibration-net