MacOS File Indexing using Python

A Python-based solution for indexing and searching files on a macOS system

This project provides a Python-based solution for indexing and searching files on a macOS system. The tool indexes all files in a specified volume, stores their metadata in an SQLite database, and allows fast searching using semantic similarity and fuzzy matching.

Download the script here: https://github.com/earthinversion/macos_file_indexing

πŸš€ Features

  • Index all files on a specified macOS volume
  • Store metadata (file path, file kind, size, volume name, modified time) in SQLite
  • Perform semantic search using FAISS and Sentence Transformers
  • Fuzzy matching to find similar filenames
  • Formatted search results in a human-readable table
  • Fast and efficient due to FAISS caching

πŸ”§ Installation

1️⃣ Install Dependencies

pip install faiss-cpu sentence-transformers fuzzywuzzy pandas tabulate tqdm sqlite3 numpy python-Levenshtein

Clone the Repository

git clone https://github.com/earthinversion/macos-file-indexing.git
cd macos-file-indexing

3️⃣ Set Up the Database

  • Edit the configuration inΒ config.yamlΒ file
  • Run the following command to build the indexing databasepython file_indexer.py

πŸ” Searching for Files

  • To search for a file, use:
python search_files.py                                                                                                                 
Do you want to rebuild the search cache? (yes/no): no
Enter filename to search: location_info.yaml

❌ Exact match not found. Suggested files:
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+
|   |                               Path                                |             File Kind              | Size (bytes) |  Volume   |    Modified Time    |
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/event_data_download_waveform_api/._config.yaml | AppleDouble encoded Macintosh file |   4.00 KB    | QSIS_DISK | 2025-01-26 14:18:00 |
| 1 |          /Volumes/QSIS_DISK/QSIS-Server-run/run_info.yml          |             ASCII text             |    101 B     | QSIS_DISK | 2022-06-27 23:55:45 |
| 2 |     /Volumes/QSIS_DISK/qsis-server-inspect/data/run_info.yml      |             ASCII text             |   2.46 KB    | QSIS_DISK | 2023-03-18 02:08:30 |
| 3 |           /Volumes/QSIS_DISK/line-bot-qsis/config.yaml            |             ASCII text             |    140 B     | QSIS_DISK | 2023-01-14 17:21:30 |
| 4 |  /Volumes/QSIS_DISK/event_data_download_waveform_api/config.yaml  |      Unicode text, UTF-8 text      |    511 B     | QSIS_DISK | 2025-01-25 12:51:35 |
+---+-------------------------------------------------------------------+------------------------------------+--------------+-----------+---------------------+

πŸ” Best fuzzy match:
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+
|   |                           Path                           | File Kind  | Size (bytes) |  Volume   |    Modified Time    |
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/qsis-server-inspect/location_info.yml | ASCII text |   1.10 KB    | QSIS_DISK | 2023-04-07 22:32:39 |
+---+----------------------------------------------------------+------------+--------------+-----------+---------------------+
Enter filename to search: wpa_supplicant.conf

βœ… Exact match found:
+---+----------------------------------------+------------+--------------+-----------+---------------------+
|   |                  Path                  | File Kind  | Size (bytes) |  Volume   |    Modified Time    |
+---+----------------------------------------+------------+--------------+-----------+---------------------+
| 0 | /Volumes/QSIS_DISK/wpa_supplicant.conf | ASCII text |    161 B     | QSIS_DISK | 2022-03-30 20:18:02 |
+---+----------------------------------------+------------+--------------+-----------+---------------------+

Utpal Kumar
Utpal Kumar

Geophysicist | Geodesist | Seismologist | Open-source Developer
I am a geophysicist with a background in computational geophysics, currently working as a postdoctoral researcher at UC Berkeley. My research focuses on seismic data analysis, structural health monitoring, and understanding deep Earth structures. I have had the opportunity to work on diverse projects, from investigating building characteristics using smartphone data to developing 3D models of the Earth's mantle beneath the Yellowstone hotspot.

In addition to my research, I have experience in cloud computing, high-performance computing, and single-board computers, which I have applied in various projects. This includes working with platforms like AWS, GCP, Linode, DigitalOcean, as well as supercomputing environments such as STAMPEDE2, ANVIL, Savio and PERLMUTTER (and CORI). My work involves developing innovative solutions for structural health monitoring and advancing real-time seismic response analysis. I am committed to applying these skills to further research in computational seismology and structural health monitoring.

Articles: 44

Leave a Reply

Your email address will not be published. Required fields are marked *