podRacer - Functionality

Foundation

What it does

podRacer is built primarily in Python 3, with C++ bindings used to create the graphical user interface and other custom modules. The application is cross–platform compatible and was written from the ground up to support both x86/x64 and ARM64 architectures.

Source: https://docs.python.org

podRacer has three core functionalities:

Process RSS feeds and locate related content
Organize concice reports from podcast metadata
Download media and automatically sort the content

Process, Organize and Download — that's what podRacer is designed to do.

How it works

podRacer is a multi-threaded application, which means it has room in the back for a car seat to take the kids to school on its way to work. Tasks, such as fetch calls and downloads, are handled by separate thread workers to not only ensure the stability of the application in the event of network discrepancies, but also allows each worker to be individually optimized by the application and the respective OS.

By using QRunnable and WorkerSignals, podRacer is able to designate a background task and communicate with it from the main application. This keeps the main application free to take on more work from the user. Here's how it does it:

Python

from PyQt6.QtCore import QRunnable, pyqtSlot
...

class Worker(QRunnable):
    def __init__(self, fn, *args, **kwargs):
        super(Worker, self).__init__()

        # The passed in function
        self.fn = fn

        # The passed in arguments
        self.args = args

        # Keyword arguments
        self.kwargs = kwargs

        # Create the worker signals to communicate task progress
        self.signals = WorkerSignals()
        ...

    # The thread's worker function
    @pyqtSlot()
    def run(self):

        # Runs the passed in function, with its arguments
        result = self.fn(*self.args, **self.kwargs)

        # Returns the results of the thread
        self.signals.result.emit(result)

        # Signal that the thread has completed its task
        self.signals.finished.emit()

You can also use a threadpool and create separate thread classes to handle designated workloads
Note: I wrote this just an example - if you're using this for your own application, please adapt it accordingly!

Python

from PyQt6.QtCore import QThreadPool
import multiprocessing
...

class MainWindow(QMainWindow):
    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)
        self.threadpool = QThreadPool()
        ...
        self.thread={}
        self.button.clicked.connect(lambda: self.start_worker(1))

    def threadRunner(self):

        # Create our thread and give it a function as the argument
        self.worker = Worker(self.some_function)

        # Connect result signal of our thread to thread_result
        self.worker.signals.result.connect(self.thread_result)

        # Connect finish signal of our thread to thread_complete
        self.worker.signals.finished.connect(self.thread_finished)

        # Initialize multiprocessing
        self.proc = multiprocessing.Process(target=self.threadpool.start(self.worker), args=())

        # Run process
        self.proc.start()

    # Star Worker
    def start_worker(self, i):

        # Start Thread
        self.thread[i] = ThreadClass(parent=None,index=1)
        self.thread[i].start()

        # Connect Signal
        self.thread[i].any_signal.connect(self.background_task)

    def background_task(self):
        index = self.sender().index
        if index==1:
            # DO SOMETHING
        if index==2:
            # DO SOMETHING ELSE
        if index==3:
            # DO SOMETHING ELSE

    def thread_result(self):
        # Signal Results

    def thread_finished(self):
        # Signal Completion of Task

    def stop_worker(self, i):
        self.thread[i].stop()

if __name__ == '__main__':

    # Necessary for excecutables
    multiprocessing.freeze_support()

Fetch Calls

What it does

Sends a request to read certain properties of an RSS feed in XML / HTML format

Read more about it here: Starter Guide - Fetch Calls

How it works

With each fetch call, podRacer tests your network connection to ensure reliable transmissions.

network_test = httplib.HTTPSConnection("8.8.8.8", timeout=5)
try:
    network_test.request("HEAD", "/")
    return True # A valid network connection has been established
...

During downloads, a background task monitors your system's performance and dynamically adjusts resource management, including CPU usage and network bandwidth. If your download speed is slow, or if your CPU is under-performing, podRacer will adequately scale back its download thread worker to accomodate. While this may result in a slower download, it ensures other processes aren't effected.

def performance_limiter():
    p = psutil.Process(os.getpid())
    p.nice(psutil.BELOW_NORMAL_PRIORITY_CLASS) # If Unix: ps.nice(19)

Once the fetch request is received, podRacer will parse the XML data for key information. The 'Show Metadata' is passed into the user interface, which then displays this information to the user. Each 'item' referring to an episode is counted to deliver the total episode count.

XML

// SHOW METADATA
<title>The Daily</title>
<description>This is what the news should sound like...</description>
<language>en</language>
<pubDate>Mon, 13 Jun 2022 10:00:00 +0000</pubDate>
<lastBuildDate>Mon, 13 Jun 2022 10:00:23 +0000</lastBuildDate>
<itunes:author>The New York Times</itunes:author>

...

// EPISODE METADATA
<item>
    <title>The Incomplete Picture of the War in Ukraine</title>
    <description>In the nearly four months since the Russian invasion of Ukraine...</description>
    <pubDate>Mon, 13 Jun 2022 10:00:00 +0000</pubDate>
    <enclosure length="21322381" type="audio/mpeg" url="https://dts.podtrac.com/redirect.mp3/chrt.fm/track/8DB4DB/pdst.fm/e/nyt.simplecastaudio.com/03d8b493-87fc-4bd1-931f-8a8e9b945d8a/episodes/8f932970-ceda-4af8-83df-b022ed3ad0be/audio/128/default.mp3?aid=rss_feed&awCollectionId=03d8b493-87fc-4bd1-931f-8a8e9b945d8a&awEpisodeId=8f932970-ceda-4af8-83df-b022ed3ad0be&feed=54nAGcIl"/>
    <itunes:author>The New York Times</itunes:author>
    <itunes:duration>00:22:12</itunes:duration>
</item>

Python

# No. of Episodes
episode_count = 0

# Array of RSS Elements
items = rss_data.findAll('item')

# Get Show Title
show_title = rss_data.find('title').text

# Get Show Author
show_author = rss_data.find('itunes:author').text

# Get Latest Episode
latest_ep_date = rss_data.find('pubdate').text

# Get Episode Data
for item in items:

    # Create Dict
    rss_item = {}

    # Collect Title, Description and Publish Date
    rss_item['title'] = item.title.get_text(strip=False)
    rss_item['description'] = item.description.text
    rss_item['pubdate'] = item.pubdate.text.split(',')[1]

# Accumulate Episode Count
episode_count = len(items)

Update the interface with the show metadata

Qt (PyQt6)

# QLabels...
self.title = QLabel('Title')
self.author = QLabel('Author')
self.last_updated = QLabel('Last Updated:')
self.length = QLabel('Episode Count:')

...

# Update QLabel Text...
self.title.setText(show_title)
self.author.setText(show_author)
self.last_updated.setText("Last Updated: ", latest_ep_date)
self.length.setText("Episode Count: ", episode_count)

You can also generate XML data from text, which can then be used as an RSS feed. The easiest and most straight forward way to do this is by creating a spreadsheet. podRacer can read the information from your sheet and will generate an RSS feed that can be fetched locally and easily shared online.

The RSS data is structed to meet Apple's standard formatting practices for RSS feeds. Here's an example of how it retrieves the titles of shows within a spreadsheet using the openpyxl library.

## GET ALL SHOW TITLES
# If the sheet includes headers, the row start will begin right after the header
# The row end is determined by the max amount of rows within the document using 'sheet.max_rows'
for data in range(row_start, row_end):

    ## READ VALUE OF CELL
    # The column is automatically determined if your header includes the substring 'show'.
    # If ommitted, podRacer will ask for users to designate the show column.
    curr_value = sheet.cell(row=data, column=show_col).value

    ## IF THE CELL CONTAINS DATA...
    if curr_value is not None:

        ## ...AND THE DATA IS A VALID INPUT...
        # Having a validation method in place is highly advised to ensure the data is accurately handled.
        # In this case, the validator checks for patterns to see if this content matches the type of data - in this case a show title.
        # Duplicate data entries for multiple shows, no accounts of web protocols ('https') and checking for dates are some examples of this form of validation.
        if self.validate(curr_value, 'show_title'):

            ## ...ADD THE DATA TO OUR LIST OF SHOW TITLES
            self.list_of_show_titles.append(curr_value.strip())

Once the necessary data has been gathered, it can be used to generate our XML feed. Show titles, episode titles and media links are required entries, while episode descriptions, publishing dates and other metadata are all optional but highly recommended.

Here is how we can use the parsed spreadsheet data to generate our new RSS feed.

## ITERATE OVER EACH EPISODE ENTRY AND ADD IT TO THE RSS FEED
for x in range(entry_count):

    ## GET THE EPISODE'S LINK
    ep_link = self.list_of_links[x]

    ## GET EPISODE'S TITLE
    ep_title = self.list_of_episodes[x]

    ## GET EPISODE'S PUBDATE
    ep_pubdate = self.list_of_pubdates[x]

    ## CREATE ITUNES ENTRY
    # This generates the iTunes tags used by Apple Podcasts
    itunes_entry = self.create_itunes_entry(ep_title)

    ## CREATE XML ENTRY
    # This generates the XML tags
    entry = self.create_rss_entry(ep_title, ep_link, ep_pubdate, itunes_entry)

    ## ADD THE XML ENTRY TO A LIST OF ENTRIES
    episode_entries.append(entry)

The XML tags are created using custom classes that formats the metadata into the respective entry type within our XML data.

def create_itunes_entry(self, ep_title):

    return iTunesItem(
        author = "Generated by podRacer",
        image = "https://www.podracer.app/assets/images/icon.png",
        subtitle = ep_title,
        summary = ep_title)

def create_rss_entry(self, ep_title, ep_link, ep_pubdate, itunes):

    return Item(
        title = ep_title,
        author = "Generated by podRacer",
        pubDate = published_date,
        enclosure = Enclosure(url=ep_link, length=0, type='audio/mpeg'),
        extensions = [itunes]
    )

Each XML entry is then gathered to generate the complete XML feed.
Here's an example of how this is achieved.

## CREATE ITUNES FEED DATA
itunes = iTunes(
    author = 'podRacer',
    subtitle = show_title,
    summary = "podRacer Generated RSS Feed",
    image = "https://www.podracer.app/assets/images/icon.png",
    categories = iTunesCategory(name = 'Technology', subcategory = 'Software'),
    owner = iTunesOwner(name = 'podRacer', email = 'contact@mafshari.work'))

## ADD ENTRIES TO OUR RSS FEED
feed = Feed(
    title = show_title,
    link = 'https://podracer.app',
    description = "podRacer Generated RSS Feed",
    generator = 'podRacer v2.5',
    docs = 'https://www.podracer.app/documentation/',
    language = "en-US",
    lastBuildDate = self.latest_episode,
    items = episode_entries,
    extensions = [itunes])

## WRITE TO XML FILE
with open(show_rss_file, mode="w", encoding="utf-8") as xml:
    xml.write(feed.rss())

These XML files can then be fetched locally or shared online as RSS feeds. If an import includes multiple shows, each show will have its own respective RSS feed generated. Alongside the XML data are metadata reports, created for every show in your import, similar to traditional fetch calls.

Metadata & Reporting

What it does

Using the XML data from a fetch request, podRacer is able to automatically generate comprehensive reports of podcasts, conveniently gathering metadata from each individual episode and creating an interactive webpage that can be easily hosted and shared.

You can view a sample report here: podRacer - Sample Metadata

How it works

A dataframe is created using the XML data, along with HTML, CSS, Javascript files to give the report some style and functionality. By default, metadata reports have dark-mode enabled, with 'click-to-copy' functionality built-in.

Here is an example of how one might generate one of these reports using HTML, CSS and Javascript

Python

import pandas as pd
...

rss = pd.DataFrame(rss_items, columns=[...])
...

def build_report(meta_html, meta_css, meta_js, rss, title, directory):

    pd.set_option('colheader_justify', 'center')

    html_content = '''
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
            <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
            <title>{show_title}</title>
            <link rel="stylesheet" type="text/css" href="assets/style.css"/>
        </head>
        <body>
            <h1>{show_title}</h1>
            {table}
        </body>
        <script src="assets/script.js"></script>
    </html>
    '''

    css_content = '''

    # Custom CSS...

    '''

    js_content = '''

    # Custom Javascript...

    '''

    # Create HTML file
    with open(meta_html, 'w+') as htmlFile:
        htmlFile.write(html_content.format(show_title=title, table=rss.to_html(classes='podracer')))

    # Create CSS file
    with open(meta_css, 'w+') as cssFile:
        cssFile.write(css_content)

    # Create JS file
    with open(meta_js, 'w+') as jsFile:
        jsFile.write(js_content)

In addition to the main report, smaller text files are created that each contain specific information, such as a list of all episode names or links to all the audio files for the podcast.

Once all the content has been gathered and the metadata files have been created, podRacer will validate the report and finally merge the three separate files into one.

# Custom method that combines the HTML/CSS/JS files into one report within the show's directory
self.merge_report(meta_html, meta_css, meta_js, title, directory)

While it is generally advised to keep your HTML, CSS and JS separated (see separation of concerns), this convention is overlooked for the sake of having each report be one file instead of three. I could have easily made a PDF out of the whole thing and been done with it, but I like retaining the web formatting (HTML) to allow users the ability to easily upload and host their own reports and to easily adjust the styling without needing any third-party PDF editors.

A built-in theme picker for podRacer is currently in development.

Download Media

What it does

Whenever I'm downloading content from the web, I noticed that I always repeat these three actions:

I rename the downloaded files to match the name of the content, regardless of what the original filename was
I organize the files within directories so I know exactly where to look to find what I'm looking for
I check if an existing file has been altered/updated and skip downloading it if it hasn't

Naturally, podRacer does all of these things for you without the need for any user intervention.

How it works

By using the metadata from a fetch call, podRacer is able to download media with ease while applying some organizational touches to keep everything neat and tidy.

Media links are parsed to check for certain things like domain hosts, SSL certifications, filenames, media types, and redundancies like redirects and affiliate links. Here's what a media link looks like from the XML data:

<enclosure url="https://dts.podtrac.com/redirect.mp3/chrt.fm/track/8DB4DB/pdst.fm/e/nyt.simplecastaudio.com/03d8b493-87fc-4bd1-931f-8a8e9b945d8a/episodes/8f932970-ceda-4af8-83df-b022ed3ad0be/audio/128/default.mp3?aid=rss_feed&awCollectionId=03d8b493-87fc-4bd1-931f-8a8e9b945d8a&awEpisodeId=8f932970-ceda-4af8-83df-b022ed3ad0be&feed=54nAGcIl"/>

podRacer automatically amends these links, so it only saves the parts we actually need to retrieve the content.

Here's an example of how that works:

Python

# Amend Download Links
def amend_link(self, link):

    # Checks web protocol
    protocol = self.check_protocol(link) # https://

    # Strip protocol from link
    link = link.split(protocol)[-1]

    # Checks for affiliate links
    podtrac = 'dts.podtrac.com/'
    pdst = 'pdst.fm/'
    ...

    # Checks for redirects
    redirect = 'redirect.mp3/'
    ...

    # List of things to remove
    garbage_collector = []

    # Find redundancies
    if podtrac in link:
        garbage_collector.append(podtrac)
    if redirect in link:
        garbage_collector.append(redirect)
    ...

    # Amend link
    for garbage in garbage_collector:
        link = link.replace(garbage, ' ')
    new_link = " ".join(link.split())

    # Remove all whitespaces
    new_link = new_link.strip()

    # Return the link with the proper protocol added
    return f"{protocol}{new_link}"

Here is our updated media link:

<enclosure url="https://nyt.simplecastaudio.com/03d8b493-87fc-4bd1-931f-8a8e9b945d8a/episodes/8f932970-ceda-4af8-83df-b022ed3ad0be/audio/128/default.mp3"/>

Note: The file containing the media links includes the full address of each link, but is amended when a download is initiated. This is so that the download process doesn't have to deal with redirects or invalid inputs. The original links are otherwise saved un-altered within the podcast's directory

Next up is checking for the file's audio format

Python

# Detect Audio Format
def audio_format(self, file):
    self.format = file.split('.')[-1]
    self.format = re.sub(r"[^a-zA-Z0-9.]+","",self.format)

    # If file requires tokenauth
    if 'tokentime' in self.format:
        self.format = self.format.split('tokentime')[0]

    return self.format

We also have our filenames to think about.

Currently, our filename is 'default.mp3'. If you're just downloading one file from the web, it's not that much of a burden to have to rename it — but when downloading hundreds of files all at once, it would be nice if they weren't all called 'default.mp3', 'default(1).mp3', 'default(2).mp3', etc.

Because podRacer already has the name of the podcast, the names of all the episodes, the links to all the audio and what format the audio files are in, it uses this information in its download process to organize your content.

Python

# Download Media Method
...
self.download_media(f"{show_dir}/{media_dir}/{title}.{format}", media.content)

Another aspect of the download process is cross-referencing files with those available for free. This is done to ensure the 'paid' content you're downloading isn't in fact identical to the free version available elsewhere. These 'freemium' episodes are automatically flagged and a list of episodes that match their free counterparts can be found within the show's metadata directory.

Note: This only occurs when downloading media from a feed that is different than the free one available through Apple Podcast and other hosting platforms.

The validation checks are conducted by using podRacer's search feature. When downloading media from a 'non-free' RSS feed, a background task searches for the identical episode by making calls via the iTunes API. With this information, a set of various comparisons are made to see if these assets are identical, each one bearing a unique weight value to discern notable differences that would in all likelyhood lead to this conclusion.

The 'heaviest' of these comparisons is the size of the files, measured in bytes. Here's how that's done.

Python

## READ HEADER
resp = requests.request('HEAD', File)

## VERIFY CONTENT LENGTH KEY IN HEADER
if 'Content-Length' in resp.headers:
    file_size = resp.headers['Content-Length']
else:
    return

## CAST FILE SIZE AS INT
file_size = int(file_size)

## CONVERT FILE SIZE
if file_size == 0:
    return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(file_size, 1024)))
p = math.pow(1024, i)
s = round(file_size / p, 2)

# RETURN FILE SIZE IN BYTES
return file_size

## ALTERNATIVELY, RETURN FORMATTED FILE SIZE
return "%s %s" % (s, size_name[i])

These checks are all done in the background and don't impede or interrupt a download session in any way. If you have audio assets that you want to check manually, a standalone analyzer is also available. Simply run a search for the show the audio is from, then run the analyzer via File > Run Analyzer. A Finder window will ask you to locate the audio file, and upon selection the analyzer will run for a few seconds and then tell you if the asset you have is 'free' or 'premium' content.

iTunes API

What it does

podRacer uses the iTunes API to find and organize various content.

Source: https://developer.apple.com/

This integration allows you to search for various Apple hosted content and gives podRacer the ability to collect certain metadata from podcasts.

How it works

A request is sent to the iTunes API using the following URL qualification

https://itunes.apple.com/search?parameterkeyvalue

Here is a list of valid parameters to use this API:

Key	Description	Values
term	The URL-encoded text string you want to search for. For example: jack+johnson.	Any URL-encoded text string. Note: URL encoding replaces spaces with the plus (+) character and all characters except the following are encoded: letters, numbers, periods (.), dashes (-), underscores (_), and asterisks (*).
country	The two-letter country code for the store you want to search. The search uses the default store front for the specified country. For example: US. The default is US.	See ISO_3166-1_alpha-2 for a list of ISO Country Codes.
media	The media type you want to search for. For example: movie. The default is all.	movie, podcast, music, musicVideo, audiobook, shortFilm, tvShow, software, ebook, all
entity	The type of results you want returned, relative to the specified media type. For example: movieArtist for a movie media type search. The default is the track entity associated with the specified media type.	podcastAuthor, podcast
attribute	The attribute you want to search for in the stores, relative to the specified media type. For example, if you want to search for an artist by name specify `entity=allArtist&attribute=allArtistTerm`.	titleTerm, languageTerm, authorTerm, genreIndex, artistTerm, ratingIndex, keywordsTerm, descriptionTerm
callback	The name of the Javascript callback function you want to use when returning search results to your website.	wsSearchCB
limit	The number of search results you want the iTunes Store to return. For example: 25. The default is 50.	1 to 200
lang	The language, English or Japanese, you want to use when returning search results. Specify the language using the five-letter codename. For example: en_us. The default is en_us (English).	en_us, ja_jp
version	The search result key version you want to receive back from your search. The default is 2.	1,2
explicit	A flag indicating whether or not you want to include explicit content in your search results. The default is Yes.	Yes, No

The Search API returns your search results in JavaScript Object Notation (JSON) format. JSON is built on two structures:

A collection of name/value pairs, also known as an object; this concept is similar to a Java Map object, a Javascript Dictionary, or a Pearl/Ruby hash. An object is an unordered set of name/value pairs, beginning with a left brace ( { ) and ending with a right brace ( } ). Each name is surrounded by double-quotes and followed by a colon ( : ); the name/value pairs are separated by commas ( , ).

An ordered list of values, also known as an array. An array is an ordered collection of values, beginning with a left bracket ( [ ) and ending with a right bracket ( ] ). Values are separated by commas ( , ).

The following example displays the JSON results for a song in the iTunes Store (encoded in UTF8):

JSON

{"wrapperType":"track",
    "kind":"song",
    "artistId":909253,
    "collectionId":120954021,
    "trackId":120954025,
    "artistName":"Jack Johnson",
    "collectionName":"Sing-a-Longs and Lullabies for the Film Curious George",
    "trackName":"Upside Down",
    "collectionCensoredName":"Sing-a-Longs and Lullabies for the Film Curious George",
    "trackCensoredName":"Upside Down",
    "artistViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewArtist?id=909253",
    "collectionViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewAlbum?i=120954025&id=120954021&s=143441",
    "trackViewUrl":"https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewAlbum?i=120954025&id=120954021&s=143441",
    "previewUrl":"http://a1099.itunes.apple.com/r10/Music/f9/54/43/mzi.gqvqlvcq.aac.p.m4p",
    "artworkUrl60":"http://a1.itunes.apple.com/r10/Music/3b/6a/33/mzi.qzdqwsel.60x60-50.jpg",
    "artworkUrl100":"http://a1.itunes.apple.com/r10/Music/3b/6a/33/mzi.qzdqwsel.100x100-75.jpg",
    "collectionPrice":10.99,
    "trackPrice":0.99,
    "collectionExplicitness":"notExplicit",
    "trackExplicitness":"notExplicit",
    "discCount":1,
    "discNumber":1,
    "trackCount":14,
    "trackNumber":1,
    "trackTimeMillis":210743,
    "country":"USA",
    "currency":"USD",
    "primaryGenreName":"Rock"}

Using this information, podRacer is able to gather certain information, such as authors, availability, genre, artwork, etc.

Search Engine

What it does

podRacer has a built–in search engine that uses the iTunes API to find podcasts using various search parameters. If a search query returns multiple results, a list of podcasts sorted in order of likelyhood to be the one you're looking for is displayed on the page with the option of running a fetch call for each one.

If a search parameter is a direct match with a low likelyhood of alternative correspondance, a fetch call is initiated automatically in your behalf.

How it works

When a new search is conducted, the tool first searches for any matching titles. If a matching title cannot be found, it passes the search parameter to the next query and searches for matching authors (artistName) instead.

Here's an example of how this works:

def itunes_api(search, country, type):
    search_request = requests.get("https://itunes.apple.com/search", params={
        'term': search,       # Radiolab
        'country': country,   # 'us'
        'entity': type,       # 'podcast'
        ...
    }

    return search_request.json()

def search_itunes(self, search):

    # Results from iTunes
    search_result = PodSearch.itunes_api('Radiolab', 'us', 'podcast')
    ...

    # RSS Feed URL from our search
    result_rss = result['results'][0]['feedUrl']

    # Run fetch call on RSS feed
    podRacer.fetch_RSS(result_rss)

This method also applies to the Apple Music API, which is used to search for artists. You can run a Music API search the same way by using File > Apple Music > $artistName.

Both the Music and Podcasts app have their own respective launchers and API calls through podRacer. This lets you easily find the content you're looking for and can also act as a prompt way to reference content across the platform by allowing you to quickly and easily pull up shows and artists in each respective app without having to interact with those apps seperately.

Download podRacer and try its features out for yourself! If you have any questions about how it works, feel free to contact me or checkout the support page.