Insta_Delete

Insta_Delete

by eddyizm

on March 8th 2020

Since 2010, I had posted more images on IG than Flickr, 8,200+ with a good chunk of them "throw away" images. I already backed up my photos so now came the challenge to clean/delete my feed. Spring cleaning, pruning the hedges, whatever you want to call it, IG doesn't make it easy.



I decided to build myself a bot glorified script to first scroll as far back as possible on my feed, then scrape the page for URL's, parse and find the href links, save them to a file, log in with a mobile emulated browser and delete those old posts.





See screenshot of my starting profile, 8262 posts.



I wrote a script that is working now, albeit not the most pythonic nor cleanest. It is doing it's job considering I have less than a year of experience with python (my background is in the microsoft stack) and I am continually amazed by the power and ease of python and it's ecosystem of packages and community.

(Full project available on my github)

First I load up necessary packages:

# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime
import time
import os
import sys

Set your paths to the url file and login details (login details can be entered directly):


# store urls to delete later
log_path = 'C:/Users/eddyizm/Source/Repos/seleniumTesting/env/media_urls.txt'
logintext = "C:\\Users\\eddyizm\\Desktop\\Work\\login.txt"
URLS
= []

Next I define the methods to add time waits, open log, write to the log, and parse the href data with beautiful soup:


def stime(seconds):
    return time.sleep(seconds)

def OpenLog():
    with open(log_path, 'r', encoding= 'utf-8') as g:
        lines = g.read().splitlines()
        return (lines)

def WriteToArchive(log, data):
    with open(log, 'w', encoding= 'utf-8') as f:
        for d in data:
            if d.startswith('https://www.instagram.com/'):
                f.write(str(d)+'\n')
            else:
                f.write('https://www.instagram.com'+str(d)+'\n')
            
             
def parse_href(data):
    url_list = []
    for link in BeautifulSoup(data, "html.parser", parse_only=SoupStrainer('a') ):
        if link.has_attr('href'):
            t = link.get('href')
            if t is not None:
                url_list.append(t)
                
    return url_list            

def scroll_to_end():
    browser = webdriver.Chrome()
    get_html = None
    print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    print ('scrolling profile to get more urls')
    try:
        browser.get("https://www.instagram.com/eddyizm")
        lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        match=False
        count = 0
        while(match==False):
            lastCount = lenOfPage
            time.sleep(10)
            lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
            count += 1
            if (lastCount==lenOfPage) and (count > 100):
                match=True
                
        get_html = browser.page_source                       
        browser.close()
        print ('scrolled down: '+str(count)+' times!')
        print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    except Exception as err:
        print (err)
        browser.close()
    
    return get_html

Finally, I added the method, to login to the site and delete the images. Unfortunately, I had a hard time breaking this up into smaller chunks. The challenge is how to pass the selenium browser session back and forth. I believe that is possible and I barely skimmed the docs. This long function incorporates the functions above to login and delete the posts, which I currently have the counter set to 15, moved up from 10, running 5 times a day. Please visit the repo for the source.

Schedule Task / Cron Tab
On my windows machine I set up a scheduled task that fires off the script via a batch file set up to activate virtual environment and append output results to log file. Linux and Mac would be just as easy using crontab.


REM ************************************************************
REM Batch file to run python script
REM ************************************************************

@echo off
cmd /k "cd /d C:\Users\eddyizm\Source\Repos\seleniumTesting\env\Scripts && activate && cd /d  C:\Users\eddyizm\Source\Repos\seleniumTesting && python insta_delete.py >> C:\Users\eddyizm\Source\Repos\seleniumTesting\env\log.txt"  


Log file output
Handy for debugging and keeping track of how long the scrolling takes and deleting progress. I tail this file to my dropbox or email to keep an eye on it.


----------------------------------------------------------------------------------------------------- 
--------------------------------------- new session ------------------------------------------------- 
2018-08-07 11:00:18
----------------------------------------------------------------------------------------------------- 
file size: 0
file empty, going to scroll
2018-08-07 11:00:22
scrolling profile to get more urls
scrolled down: 617 times!
2018-08-07 12:43:22
logging in as mobile device to delete
2018-08-07 12:43:28
length of file: 30
counter: 10
DELETING POSTS!
2018-08-07 12:43:59
POST DELETED: https://www.instagram.com/p/NJM0L/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NLNSX/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NOkLl/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NO2KG/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPJCZ/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPSq-/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPS6H/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NUlgG/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NUnRd/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NX6FM/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NYC8u/?taken-by=eddyizm
while loop done and exited successfully
2018-08-07 12:55:06
----------------------------------------------------------------------------------------------------- 
2018-08-07 12:55:08
--------------------------------------- end session ------------------------------------------------- 
----------------------------------------------------------------------------------------------------- 

Result:

Running for roughly one month I already removed over 1,600 old posts.




To contribute or use the code yourself: https://github.com/eddyizm/insta_delete

Categories:  Technology