A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
OTHER License
Bot releases are hidden (Show)
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows x64版本支持64位的Windows 10/Windows Server 2016及以上系统,Windows x32版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载x32版本的EasySpider使用。 注意x32版本的EasySpider的Chrome浏览器永远都是109版本,不会随着Chrome版本更新而更新(为了兼容Windows 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。无任何版本支持Windows Server 2012及以下版本系统,这些系统下需要自行编译运行。
The Windows x64 version supports Windows 10/Windows Server 2016 and above with 64-bit, while the x32 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download x32 version of EasySpider. Note that the Chrome browser in this x32 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Windows 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
MacOS版本压缩包请用系统自带的归档使用工具
解压,MacOS版本支持所有芯片组,包括Apple自研芯片(如M1,M2)和Intel芯片(如酷睿i7),注意下载对应版本的程序,且操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个issue。
For the MacOS version, please use the system's inbuilt software Archive Utility to unzip the .7z file. The MacOS version supports all chipsets, including Apple's self-developed chips (such as M1, M2) and Intel chips (such as Core i7). Ensure you download the correct version of the program, and note that the minimum required version for the operating system is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译,如CentOS系统编译示例看这个issue。
Similarly, the Linux version is only suitable for Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to collect data using other Linux distributions, please download the code and compile it yourself. For an example of compiling on CentOS, see this issue.
请划到本节最下方以下载EasySpider。
Please scroll down to the bottom of this section to download EasySpider.
Published by NaiboWang 10 months ago
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows x64版本支持64位的Windows 10/Windows Server 2016及以上系统,Windows x32版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载x32版本的EasySpider使用。 注意x32版本的EasySpider的Chrome浏览器永远都是109版本,不会随着Chrome版本更新而更新(为了兼容Windows 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。无任何版本支持Windows Server 2012及以下版本系统,这些系统下需要自行编译运行。
The Windows x64 version supports Windows 10/Windows Server 2016 and above with 64-bit, while the x32 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download x32 version of EasySpider. Note that the Chrome browser in this x32 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Windows 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems. There is no version support for Windows Server 2012 and below. These systems require manual compilation for execution.
MacOS版本压缩包请用系统自带的归档使用工具
解压,MacOS版本支持所有芯片组,包括Apple自研芯片(如M1,M2)和Intel芯片(如酷睿i7),注意下载对应版本的程序,且操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个issue。
For the MacOS version, please use the system's inbuilt software Archive Utility to unzip the .7z file. The MacOS version supports all chipsets, including Apple's self-developed chips (such as M1, M2) and Intel chips (such as Core i7). Ensure you download the correct version of the program, and note that the minimum required version for the operating system is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译,如CentOS系统编译示例看这个issue。
Similarly, the Linux version is only suitable for Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to collect data using other Linux distributions, please download the code and compile it yourself. For an example of compiling on CentOS, see this issue.
请划到本节最下方以下载EasySpider。
Please scroll down to the bottom of this section to download EasySpider.
For English version of update notes, please see: https://github.com/NaiboWang/EasySpider/wiki/Update-notes-of-version-0.6.0
eval("表达式值")
来直接表示python环境中的表达式,无需用自定义操作储存变量做中转,示例:用自定义操作的exec选项定义一个变量a:
self.a = 1
在提取数据的操作中的XPath中,使用下面的值来表示/html/body/div[1]:
/html/body/div[eval("self.a")]
再次使用自定义操作的exec选项改变a的值:
self.a = self.a +1
则此时提取数据的XPath将会变为/html/body/div[2]
适用于以下没有下一页按钮只能依次点击不同页码翻页的场景,查看示例教程。
outside:myCode.py
,程序将会读取并执行EasySpider目录下的myCode.py中的代码,此功能适合执行大量代码需要IDE辅助的场景。注意EasySpider支持自定义Python函数,引入外部Python包以及使用try...except...进行异常处理等操作。
eval("Python代码")
关键词来输入任务执行时由Python程序动态生成的输出值;同时,还支持使用JS("return JS代码")
关键词来输入由JavaScript动态生成的文本内容(JS代码不能换行),例如,使用JS("return new Date().getMonth()+1")/2023
来输入“当前月份/2023”,即输入:12/2023
(2023年12月时的输入值):可以处理多层嵌套的iframe,体验和无iframe时相同,但需要注意的是XPath需设定为只有指定iframe页面中才能定位到的XPath,因此类似//body
这种XPath将只会定位到第一层iframe中的body标签。
在设计完提取数据操作后,浏览器操作台将提示是否要进行进一步的翻页操作,此时可以指定翻页按钮位置,流程图中将自动生成好带翻页功能的提取数据操作:
浏览器操作台新增批量输入文字功能,将自动生成带文本列表的循环操作。
提取数据操作设置是否作为新的一行存储,如果为否,则不生成新行而是暂时将数据存储下来,等待其他提取数据操作生成新行的操作一起作为新的一行,适用于列表联动的场景:#35,#189。
TempUserDataFolder
文件夹下的用户信息临时目录)。Published by NaiboWang 10 months ago
发布说明:感谢大家对易采集EasySpider的支持,前段时间一直在忙论文,文章刚投出去就开始马不停蹄的开发新功能,现在发布0.6.0的Windows 64位的Beta版本,欢迎大家去测试新版本的全新功能,并及时在Github issues向我反馈使用中遇到的问题,如果一周之内经过大家测试问题不大,则一周之后会放出所有其它操作系统版本。
Release Note: Thank you for your support of EasySpider. I have been busy with my thesis recently and started to develop new features as soon as my paper was submitted. Now, I am releasing the 0.6.0 Beta version for Windows 64-bit. Everyone is welcome to test out the new features and provide feedback on any issues encountered on Github issues. If the testing goes well without significant problems for a week, I will release versions for all operating systems after one week.
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
eval("表达式值")
来直接表示python环境中的表达式,无需用自定义操作储存变量做中转,示例:用自定义操作的exec选项定义一个变量a:
self.a = 1
在提取数据的操作中的XPath中,使用下面的值来表示/html/body/div[1]:
/html/body/div[eval("self.a")]
再次使用自定义操作的exec选项改变a的值:
self.a = self.a +1
则此时提取数据的XPath将会变为/html/body/div[2]
适用于以下没有下一页按钮只能依次点击不同页码翻页的场景,详细教程将在近期放出,示例任务文件:290.json
outside:myCode.py
,程序将会读取并执行EasySpider目录下的myCode.py中的代码,此功能适合执行大量代码需要IDE辅助的场景。注意EasySpider支持自定义Python函数,引入外部Python包以及使用try...except...进行异常处理等操作。
可以处理多层嵌套的iframe,体验和无iframe时相同,但需要注意的是XPath需设定为只有指定iframe页面中才能定位到的XPath,因此类似//body
这种XPath将只会定位到第一层iframe中的body标签。
在设计完提取数据操作后,浏览器操作台将提示是否要进行进一步的翻页操作,此时可以指定翻页按钮位置,流程图中将自动生成好带翻页功能的提取数据操作:
浏览器操作台新增批量输入文字功能,将自动生成带文本列表的循环操作。
提取数据操作设置是否作为新的一行存储,如果为否,则不生成新行而是暂时将数据存储下来,等待其他提取数据操作生成新行的操作一起作为新的一行,适用于列表联动的场景:https://github.com/NaiboWang/EasySpider/issues/35
https://github.com/NaiboWang/EasySpider/issues/189
TempUserDataFolder
文件夹下的用户信息临时目录)。eval
: Any XPath or JavaScript code snippet can now incorporate expressions directly from the Python environment using eval("expression_value")
, eliminating the need for intermediate storage variables. For instance:
a
using the exec option in a custom operation:
self.a = 1
/html/body/div[1]
:
/html/body/div[eval("self.a")]
a
using the exec option:
self.a = self.a + 1
/html/body/div[2]
.This is particularly useful for scenarios where there is no "next page" button, and pages must be turned by clicking different page numbers. A detailed tutorial and example task file (290.json
) will be released soon: 290.json
5. External Code File for Exec and Eval: Users can now write Python code in an IDE like VSCode and input outside:myCode.py
in the task input box. The program will execute the code from myCode.py
in the EasySpider directory. This is suitable for scenarios requiring extensive code that benefits from an IDE.
Note that EasySpider supports custom Python functions, importing external Python packages, and using try...except for exception handling.
6. Handling Multi-Layer Nested iframes: The experience is the same as with no iframes, but XPath should be set to locate elements only within the specified iframe. Thus, a generic XPath like //body
will only target the body tag of the first iframe layer.
7. Post-Data Extraction Paging Prompt: After designing a data extraction operation, the browser console will suggest whether to add paging. Specifying the paging button location automatically generates a data extraction operation with paging functionality:
8. Batch Text Input Feature: Automatically generates a loop operation with a text list.
9. Option to Store Extracted Data as a New Row: If set to 'no', the data isn't stored as a new row but temporarily held until another data extraction operation creates a new row. This is suitable for linked list scenarios: Issue #35, Issue #189
10. Pause Function in Custom Operations: Allows pausing the program, useful when a captcha or other interactive page appears.
11. Refresh Page Function in Custom Operations.
12. Send Email Feature in Custom Operations.
13. Alert Dialog Handling in Click Element Operations: Choose to accept or dismiss alerts.
14. Optimizations for Parallel Execution: For browser executions with user information, the user directory is now copied before execution to solve parallel execution issues. Multiple task executions or command line programs can be run in parallel. After task completion, the copied user information folder is automatically deleted (if manually exited, delete the TempUserDataFolder
directory manually).
15. Automatic Operation Naming: Operations are automatically named based on the scenario, eliminating the need to manually rename operations. Examples include default names for click and move operations based on the text value of the element, loop operations named according to loop type, and automatic renaming when switching custom operations/loops/conditional branches.
16. Single Element Loop Optimization: For loops like continuously clicking a pagination button, the unchanged content check can be limited to a single element instead of the entire page.
17. Default File Download Location: Now set to the task folder.
18. New Conditional Branches Added to the Right Side.
19. Right-Click Menu in Flowchart: Enables trial run (debug run), copy, cut, delete elements, and adjust the order of conditional branches.
20. Add a close hint at the bottom right of the operation prompt box, which is useful for cases where the QR code is occluded during login. You can click the "×" at the bottom right to close the operation panel.
21. Custom Pause/Control Keys When Saving Tasks: Different programs can use different keys to pause/continue.
22. Maximize Browser Window Option When Saving Tasks.
23. Data Overwrite Mode When Writing Data: Each execution of the same task ID will delete the original file and recollect data (requires static file name setting).
24. MySQL Database Writing: When encountering duplicate data, ignore and continue running. Suitable for scenarios where inserting duplicate data is undesirable (requires setting the database table's primary key to specific fields; otherwise, as per EasySpider's design, the primary key is an auto-increment ID, preventing duplicates).
25. Base64 Image Download: Handles images that require login for download (not always effective).
26. Enhanced Exception Handling: Prevents accidental interruptions during collection; retries in case of interruption, bug fixes for history rollback.
27. The browser window with user information mode can remember the browser position from the last design task, instead of splitting the screen equally with the flowchart every time. (Official Version)
28. Clicking on an element now supports clicking based on coordinates, which is useful for scenarios where you need to click in an empty space to close some dialog or popup window. For example, if the coordinates of the space are (10, 10), you would write point(10, 10) in the element's XPath field to represent a click at the web page coordinates (10, 10). (Official Version)
29. You can choose whether to remove duplicate data after the data collection is completed. Please note that this feature needs to be executed at the end of the task, so exiting in the middle of the task execution will prevent deduplication!!! (Official Version)
30. In the loop through a non-fixed/fixed cycle list, text list, or URL list, it is possible to set the option to skip the first n iterations. This feature is useful for scenarios where the task is interrupted midway, and there is no desire to restart from the beginning (official version).
31. When executing a task, one may manually specify the task ID. In this case, clicking the "Execute Directly" or "Get ID" button will not generate a new task ID but will use the specified ID instead. If the specified task ID has previously existed, the task's invocation file will be overwritten. This is suitable for scenarios where, after modifying the task workflow, there is no wish to start with a new task ID; instead, one may want to continue appending and writing files within the original task ID folder.
32. ddddocr Library Upgrade.
33. UI Update.
34. Chrome Browser Upgrade to Version 120.
Published by NaiboWang about 1 year ago
EXEC和EVAL用法示例教程:https://github.com/NaiboWang/EasySpider/wiki/EXEC%E5%92%8CEVAL%E7%94%A8%E6%B3%95%E7%A4%BA%E4%BE%8B
此版本只发布了Windows x64 与x32版本,欢迎试用并及时提Issue反馈Bug,其余操作系统版本需要等Bilibili将展示视频播放次数更改为展示播放总时长后发布为0.5.2版本,因此对于其余操作系统版本,请先使用0.3.5版本。
This version has only released the Windows x64 and x32 versions, welcome to try it out and report any bugs as Issues in a timely manner. The versions for other operating systems will be released as version 0.5.2 after Bilibili changes the display of the number of video plays to the display of the total length of video plays. Therefore, for the other operating system versions, please use version 0.3.5 for now.
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows x64版本支持64位的Windows 10及以上系统,Windows x32版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载此版本。注意x32版本的EasySpider的Chrome浏览器永远都是109版本,不会随着Chrome版本更新而更新(为了兼容Win 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。
The Windows x64 version supports Windows 10 and above with 64-bit, while the x32 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download this version. Note that the Chrome browser in this x32 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Win 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems.
MacOS版请用系统自带的归档使用工具
解压,MacOS版本支持所有芯片组,包括Intel芯片(如酷睿i7) 和 Apple自研芯片(如M1,M2),注意下载对应版本的程序,且操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个Issue。
For the MacOS version, please use the system's inbuilt Archive Utility to unzip. The MacOS version supports all chipsets, including Intel chips (such as Core i7) and Apple's self-developed chips (such as M1, M2). Ensure you download the correct version of the program, and note that the minimum required version for the operating system is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue. Please unzip the .tar.gz
file with the Arxiv Utility
software.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
此选项为高级功能,可以直接用Python代码操纵正在运行中的浏览器,及可以自定义整个执行环境中的变量,并对变量进行修改赋值等操作,示例:
self.browser
表示当前操作的浏览器,可直接用selenium
的API进行操作,如self.browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END)
即可滚动到页面最下方。self.myVar = 1
self.myVar = self.myVar + 1
print(self.myVar)
如果想要将自己定义的变量作为字段记录,请选择下一个在执行环境下获得Python表达式值(eval操作)
选项。
此选项为高级功能,可以直接返回Python代码的表达式值,并在其他位置用Field["本操作名称"]
表示此操作返回值,示例:
self.browser
表示当前操作的浏览器,可直接用selenium的API进行操作,如self.browser.find_element(By.CSS_SELECTOR, "body").text
即可返回当前页面的文字。self.myVar
self.myVar == 1
,此表达式的判断值可用于条件判断
和循环
!!!self.myVar = 1
这种,如果想要进行赋值操作,请选择上一个在执行环境下运行Python代码(exec操作)
选项。Continue
功能。Field["字段值"]
替换为变量值。ddddocr
,无需手动安装环境并提高了OCR识别准确率。一直向下滚动直到页面内容无变化
的功能,同时循环点击下一页的操作的退出循环条件改为找不到下一页按钮
及检测不到页面内容变化
。JSON
格式的文件的功能。Major Update: Added the ability to run Python code, manipulate custom variables, and retrieve variable values directly in the current environment for custom actions. Loops and conditional statements also support recognition of custom variables and expressions:
This option provides advanced functionality to manipulate the browser running in real-time using Python code. You can customize variables within the entire execution environment and perform operations such as modification and assignment. Examples:
self.browser
to refer to the current browser being operated on, and perform actions using Selenium APIs. For instance, self.browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END)
can scroll to the bottom of the page.self.myVar = 1
self.myVar = self.myVar + 1
print(self.myVar)
If you want to record your custom variables as field values, choose the next option: Retrieve Python Expression Value in Execution Environment (eval operation)
.
This option allows you to directly return the expression value of Python code and represent the return value of this operation using Field["operation name"]
in other places. Examples:
self.browser
, which refers to the current browser being operated on. You can directly use Selenium APIs, e.g., self.browser.find_element(By.CSS_SELECTOR, "body").text
to retrieve the text on the current page.self.myVar
self.myVar == 1
, the evaluation of this expression can be used for conditional statements
and loops
!!!self.myVar = 1
. If you want to perform an assignment, choose the previous option: Run Python code on current environment (the "exec" operation)
.Within a loop, multiple input fields can now be associated with text from a looped list by matching corresponding index values:
During execution, you can set Excel files for specific reads, specifying Excel paths. For multiple fields within a looped text list, you can read multiple columns with the same name from Excel and automatically merge them:
Relative element clicks and move-to-element events within a loop can be set using relative XPath. However, this feature is not compatible with task files from previous versions. Previous version files need manual modification, where XPaths used for element clicks within the loop must be set to empty in order to work. It's recommended to directly use the new version's task design.
UI Major Update: Operations can be added, flow can be modified, and anchor points can be adjusted through drag-and-drop actions. Adding operations, cutting elements, and adjusting anchor points can all be achieved through dragging and dropping. Right-click to delete elements. Double-click arrows to directly adjust anchor points.
Added a close button in the bottom right corner of the browser console to handle scenarios where the console obstructs captcha or login prompts.
Option to clear other non-operation-defined field values before recording a field.
Added the feature to skip the current loop, i.e., Continue
functionality.
All XPaths can be replaced with variable values using Field["field value"]
.
For data extraction operations, added the ability to resume execution from the last saved position when re-executing a task (set during task save), to address the issue of starting from the beginning after unexpected program termination.
Replaced OCR functionality with ddddocr
, eliminating the need for manual environment installation and improving OCR recognition accuracy.
Fixed a bug where an extra row of data wasn't saved during data extraction.
Set a waiting condition for an element to appear before executing an operation.
Can extract attribute values of elements.
Added copyright and usage agreement statements.
Full version supports the function to "Scroll down continuously until the page content remains unchanged." The exit conditions for looped operations of clicking the next page have been updated to "Next page button not found" and "Page content change not detected."
Optimized log formatting.
Added the ability to save files in JSON format.
Updated Chrome version to 115.
Published by NaiboWang over 1 year ago
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
有关Windows x64位版本部分情况下无法采集链接地址的说明:https://github.com/NaiboWang/EasySpider/issues/128
Explanation about the issue where the link address cannot be collected in some cases on Windows x64 version: https://github.com/NaiboWang/EasySpider/issues/128
Windows x64版本支持64位的Windows 10及以上系统,Windows x86版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载此版本。注意x86版本的EasySpider的Chrome浏览器永远都是109,不会随着Chrome版本更新而更新(为了兼容Win 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。
The Windows x64 version supports Windows 10 and above with 64-bit, while the x86 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download this version. Note that the Chrome browser in this x86 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Win 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems.
MacOS版请用系统自带的归档使用工具
解压,MacOS版本支持所有芯片组,包括Intel和M1,M2等处理器,但操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个Issue。
The MacOS version supports all chipsets, including Intel, M1, M2, and other processors. However, the minimum operating system requirement is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue. Please unzip the .tar.gz
file by the Arxiv Utility
software.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
Field["参数名"]
表示最近提取到的页面参数值/自定义操作返回,即实现了全面的变量
功能。自定义操作
的退出循环
选项直接退出循环,即添加了Break
功能。<iframe>
标签内的数据。p
键暂停和继续执行任务。XPath Helper
扩展来调试XPath,配合上面的暂停功能使用。Excel/TXT
文件,可写入MySQL
数据库,可指定数据类型为整数/小数/日期
等,点此查看MySQL写入教程。<enter>
或<ENTER>
表示硬回车,即输入完成后在当前文本框按回车。打开网页
的高级操作支持获取当前页面Cookies,并可修改Cookies。Data/Task_ID
,想要保存到其他路径,可以用../../
这种形式进行相对路径引用,比如../../JS
表示保存的的文件名是JS
,保存位置为和Data
文件夹同一级目录的文件夹,即EasySpider
主文件夹。确定
按钮,但仍需手动保存任务。exit loop
option of custom operation
at any position to directly exit the loop, that is, the Break
function has been added.<iframe>
tags can be extracted.p
key on the keyboard to pause and continue execution.XPath Helper
extension to debug XPath during the execution stage, which can be used in conjunction with the pause feature above.Excel/TXT
files, can be written to MySQL
databases, can specify data types as integer/decimal/date
, etc., click here to view MySQL writing tutorial.<enter>
or <ENTER>
can be used to represent a hard return, that is, press enter in the current text box after entering.open webpage
support getting the current page Cookie and can modify Cookie.Data/Task_ID
. If you want to save to a different path, use relative path referencing like ../../
. For example, if the file name is JS
and you want to save it in a folder at the same level as the Data
folder, which is the EasySpider
main folder, you can use ../../JS
as the relative path.Confirm
button.Published by NaiboWang over 1 year ago
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
新特性视频讲解:定时执行任务,选中子元素多种模式及将提取值作为变量输入。
Windows的x64和x32版本支持Windows 10及以上系统,Windows 7需下载Windows7 专版(因为Chrome 109是最后一个支持Windows 7的Chrome版本),不要下载错了。
The x64 and x32 versions of Windows support Windows 10 and above. For Windows 7, please download the Windows 7 special edition (as Chrome 109 is the last Chrome version to support Windows 7). Please make sure not to download the wrong version.
MacOS版支持所有芯片组,包括Intel和M1,M2等处理器,但操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个帖子。
The MacOS version supports all chipsets, including Intel, M1, M2, and other processors. However, the minimum operating system requirement is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this post.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
V0.3.2版本兼容V0.3.1版本任务。
Version 0.3.2 is compatible with tasks from version 0.3.1.
输入文字和打开网页选项中可以使用最后一次提取到的字段值作为变量进行文字输入,用Field["字段名"]
表示此变量。
可下载文件,如PDF。
修复打开后有可能会白屏10秒左右的Bug,使得在内网,暗网以及任意局域网都可以使用软件。
修复提取当前页面URL和标题时可能提取不到的bug。
修复OCR识别时可能提取不到文字信息的bug。
提取逻辑更新为每采集10条本地保存一次。
修改任务时默认锚点位置为任务流程的最后操作后。
更新Chrome版本为114。
Field["field_name"]
.Published by NaiboWang over 1 year ago
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows版支持Windows 10及以上版本,Windows 7此版本无直接可用版本(因为Chrome 109是最后一个支持Windows 7的Chrome版本),但v0.2.0的32位版本可用,且可以通过自行编译软件来运行,因此如想使用Windows 7采集数据,请下载v0.2.0的32位版本或自行下载代码并编译。
The Windows version supports Windows 10 and above. There is no direct usable version available for Windows 7, but the 32-bit version of v0.2.0 is available and can be run by compiling the software yourself. Therefore, if you want to use Windows 7 for data collection, please download the 32-bit version of v0.2.0 or download the code and compile it yourself.
MacOS版支持所有芯片组,包括Intel和M1,M2等处理器,但操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个帖子。
The MacOS version supports all chipsets, including Intel, M1, M2, and other processors. However, the minimum operating system requirement is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this post.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
B站最新版特性视频已上传,新视频非常有用,推荐大家观看。
【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹
注意,v0.3.1版本任务tasks
文件夹内.json
文件和之前所有版本均不兼容,请重新设计v0.3.1版本任务。
Note that the '.json' file in the tasks
folder of the v0.3.1 version is not compatible with all previous versions. Please redesign the v0.3.1 version task.
判断条件和循环条件中同样增加了执行自定义脚本,并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件,同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定,自定义操作可以操作循环内元素。
可同时生成多种XPath供用户选择,并预装了XPath Helper扩展供大家调试XPath。
增加采集元素背景图片地址,当前页面标题,当前页面URL地址功能。
增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。
增加下载图片功能。
增加OCR识别元素功能(使用此功能需首先自行安装Tesseract库:https://blog.csdn.net/u010454030/article/details/80515501)
可直接提取对元素执行JavaScript代码后的返回值,实现如正则表达式,获得元素背景颜色等功能。
增加切换下拉选项功能,采集下拉选项正在选中的值和文本。
Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.
Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.
Added the functionality to extract the background image URL of elements, current page title, and current page URL.
Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.
Added the functionality to download images.
Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html
Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.
Added the capability to switch dropdown options and extract the selected value and text of dropdown options.
Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.
Added instructions on how to execute tasks from the command line.
Added parallel mode which can run different tasks concurrently.
Added headless mode configuration, allowing the software to run without a browser interface.
Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.
Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.
Fixed the issue where the input box would freeze after saving a task.
Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.
Added the functionality to move the mouse to an element.
Displays a prompt when an element cannot be found.
Fixed the webpage scrolling bug.
New Field Function at Extract Data operation.
The task name is initialized with the value of the page title upon the first visit.
Added version update prompts.
Added the information of the publisher as requested.
Updated Chrome version to 113.
Published by NaiboWang over 1 year ago
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows 64位Beta版已上传,欢迎大家测试,如果发现问题和bug请及时提issue,正式版以及其他操作系统版本将于5月底前上线为v0.3.1版本。
B站最新版特性视频已上传,新视频非常有用,推荐大家观看。
【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹
注意,v0.3.0版本任务task文件夹内.json
文件和v0.2.0版本不兼容,请重新设计v0.3.0版本任务。
Published by NaiboWang over 1 year ago
完全重构版本,支持功能:
A completely restructured version that supports the following features:
Published by NaiboWang over 1 year ago
此版本已弃用,请下载最新版本使用。
This version has been deprecated, please download the latest version for use.
Support on Windows 10/11 x64 (amd64), Windows 10/11 x86 (386), windows 7 (.Net Framework 4.7 required), Linux x64 (tested on Ubuntu 20.04 and above), and MacOS x64 (support on both Intel and Arm Chips like M1).
支持Windows 10/11所有版本,Ubuntu 20.04及以上版本,MacOS所有版本(包括Intel和M1等芯片)。