我有一个文件夹A,其中包含1880个HTML文件,并且在子文件夹B中有50个HTML文件。因此,我需要一段代码来显示每个只存在于文件夹A中的文件,即1880-50的差集。我尝试使用了以下代码,但是没有得到预期的结果,我认为问题可能出在Python无法区分主文件夹和子文件夹。
import os
folder1 = r"C:\Folder-Oana\extracted"
folder2 = r"C:\Folder-Oana\extracted\translated"
# 获取每个文件夹中HTML文件的列表
html_files_folder1 = [f.lower() for f in os.listdir(folder1) if f.lower().endswith('.html')]
html_files_folder2 = [f.lower() for f in os.listdir(folder2) if f.lower().endswith('.html')]
# 找出两个文件列表之间的差异
missing_files = list(set(html_files_folder1) - set(html_files_folder2))
# 显示缺失的文件
if missing_files:
print("在文件夹2中未找到但在文件夹1中存在的HTML文件有:")
for filename in missing_files:
print(filename)
else:
print("在文件夹1中没有找到在文件夹2中不存在的HTML文件。")
这段代码输出错误结果,显示没有缺失的文件。
示例:
c:\Folder-Oana\extracted
内容
c:\Folder-Oana\extracted\translated\ <DIR> 01/05/2024 13:36 ----
c:\Folder-Oana\extracted\2.html
c:\Folder-Oana\extracted\3.html
c:\Folder-Oana\extracted\4.html
c:\Folder-Oana\extracted\5.html
c:\Folder-Oana\extracted\11.html
c:\Folder-Oana\extracted\12.html
c:\Folder-Oana\extracted\13.html
c:\Folder-Oana\extracted\14.html
c:\Folder-Oana\extracted\15.html
c:\Folder-Oana\extracted\16.html
c:\Folder-Oana\extracted\17.html
c:\Folder-Oana\extracted\translated
内容
c:\Folder-Oana\extracted\translated\11.html
c:\Folder-Oana\extracted\translated\12.html
c:\Folder-Oana\extracted\translated\13.html
c:\Folder-Oana\extracted\translated\14.html
c:\Folder-Oana\extracted\translated\15.html
c:\Folder-Oana\extracted\translated\16.html
c:\Folder-Oana\extracted\translated\17.html
c:\Folder-Oana\extracted\translated\18.html
c:\Folder-Oana\extracted\translated\19.html
c:\Folder-Oana\extracted\translated\24.html
c:\Folder-Oana\extracted\translated\25.html
c:\Folder-Oana\extracted\translated\26.html
期望输出:
2.html
3.html
4.html
5.html
版本2:
(这个版本也无法正确检测到仅位于\extracted
文件夹中的文件,而是比较了所有HTML文件。)
import os
import filecmp
def compare_folders(folder1, folder2):
dcmp = filecmp.dircmp(folder1, folder2)
diff_files = dcmp.diff_files
if diff_files:
print(f"以下文件在{folder1}和{folder2}之间存在差异:")
for file in diff_files:
print(f" - {file}")
else:
print(f"在{folder1}和{folder2}之间没有发现差异。")
if __name__ == "__main__":
folder1 = r"C:\Folder-Oana\extracted"
folder2 = r"C:\Folder-Oana\extracted\translated"
compare_folders(folder1, folder2)