Rust challenge 54/100 - duplicate file finder
Table of content
What is this  
The rules of the game are explained in my original.
54th Challenge
Challenge
Find all files that have at least two instances and output them on the terminal.
Solution  
This is fairly simple, the result is seen here. This was a lot of fun and could be made into actual tool :)
     use std::collections::HashSet;
    use clap::{App, Arg};
    fn main() {
        let matches = App::new("dupdel")
            .version("1.0")
            .author("Maebli")
            .about("Finds duplicate files in current directory recursively")
            .arg(Arg::with_name("folder")
                .short('f')
                .long("folder")
                .help("Specify the folder in which the duplicates should be searched")
                .takes_value(true))
            .get_matches();
        let folder = matches.value_of("folder").unwrap_or(".");
        let hash_set = HashSet::new();
        find_duplicates(hash_set, folder);
    }
    fn find_duplicates(mut hash_set: HashSet<Vec<u8>>, folder: &str) -> HashSet<Vec<u8>>{
        if let Some(folder) = std::fs::read_dir(folder).ok() {
            for entry in folder {
                if let Ok(entry) = entry {
                    if entry.path().exists() {
                        if entry.path().is_dir() {
                            hash_set = find_duplicates(hash_set, entry.path().to_str().unwrap());
                        } else {
                            let file = std::fs::read(entry.path()).unwrap();
                            if hash_set.contains(&file) {
                                println!("🪡 Found duplicate: {}", entry.path().to_str().unwrap());
                            } else {
                                hash_set.insert(file);
                            }
                        }
                    }
                }
            }
        }
        hash_set
    }
    See the repo on github. Intrestingly enough I found this article which describes typical pitfalls and found this repos that is apprently one of the best tools for this task.




