.Git folders on deployed websites shouldn’t be accessible to the public since they could contain sensitive information such as database passwords, API keys and development IDE settings. Ideally, such sensitive data shouldn’t be stored in a .git directory, and in case it is, it should be hidden from the public. Czech researcher, Vladimír Smitka was able to find about 400,000 websites online with exposed .git folders, which were found to contain sensitive information.
Web developers who place information in .git directories confirm it is not available to the public by trying to open <web-site>/.git/. The receipt of a 403 error will confirm the page is not available, which is what is expected. However, using <web-site>/.git/comes with a bunch of false positives in some situations. Smitka recommends it is a correct practice to carry out the test using <web-site>/.git/HEAD. The 403 error stems from the fact that the index.html or index.php is missing and there is a disabled autoindex functionality.
Smitka began his research by analyzing a bunch of Czech websites back in July. He was able to acquire a number of Czech domains and modified the scripts to keep track of the presence of the /.git/HEAD file and the “refs” string within it. In less than 2 days, he found 1,925 Czech sites with an accessible .git repository. He dug deeper in a bid to verify the severity and found both passwords in DB and unauthenticated uploaders. A list of commits in the /.git/ logs/HEAD file usually contains the email address of the developers who make commits. The email addresses were used to contact the site developers and warn them about the issue. He wrote a script and automated the notifications. After a month of sending alerts, he was pleased to discover that after doing a rescan, the .git was accessible on just 874 sites.
After the discoveries from Czech websites, Vladimír Smitka decided to expand his search to a worldwide scale. He admits this was challenging, as he had to get a grand list of domains, and then scan a huge volume of data. He ended up with more than 230 million domains. During the scan, he found 390,000 web pages with the open .git directory. In the second part of the scan, he visited these pages again to find email contacts from the /.git/logs/HEAD file and managed to get 290,000 emails tied up with different domains. A list of 90,000 unique emails were obtained and then warnings were sent throughout an entire week.
As a conclusion, Smitka said:
“In the end, I would like to recommend to everyone that you watch what you upload to your website more carefully – it’s not just about system versions but also various temporary test scripts. It is also good to remember that things are changing – server configurations and team members, and what doesn’t seem like a problem today may be problem tomorrow.”