Python PDF to Speech Web App - Build a simple Flask application to convert PDF files to audio.
Have you ever found yourself wanting to listen to the content of a PDF instead of reading it? Whether you’re busy, multitasking, or simply prefer audio formats, Well, you’re in luck! In this blog post, we will guide you through creating a PDF-to-Audio converter using Python. This tool will take a PDF file, extract its text, convert the text into audio, and save it as an MP3 file.
We’ll be using some powerful Python libraries to accomplish this task: PyMuPDF for reading PDF files, Google Text-to-Speech (gTTS) for converting text to speech, and Flask for creating a simple web interface. Let’s start! Before we begin coding, let’s observe how our PDF-to-Audio Converter Application works
Before we start coding, make sure you have Python installed on your computer. You can download Python from python.org. We will also need to install a few Python libraries. Open your terminal and run the following commands:
pip install Flask
pip install PyMuPDF
pip install gTTS
These commands will install Flask, PyMuPDF, and Google Text-to-Speech (gTTS).
First, let’s create a project directory. Open your terminal and run:
mkdir pdf_to_audio
cd pdf_to_audio
Inside this directory, we will create the necessary files for our project:
app.py: This will contain our Flask application.templates/index.html: This will be our HTML file for the web interface.static/style.css: This will contain our CSS styles.Let’s start by creating these files.
app.pyOpen your text editor or IDE and create a new file named app.py. This file will contain our Flask application.
First, let’s import the necessary modules and set up our Flask app:
from flask import Flask, render_template, request, send_file, jsonify
import os
import fitz # PyMuPDF
from gtts import gTTS # Google Text-to-Speech
from concurrent.futures import ThreadPoolExecutor
import threading
app = Flask(__name__)
app.secret_key = (Your Secret Key) # Replace with your own secret key
from flask import Flask, render_template, request, send_file, jsonify: This line imports necessary modules from Flask, which is a Python web framework. These modules help manage web requests, display templates, send files, and convert Python data into JSON format for responses.import os: This module provides functions for interacting with the operating system. We used here to perform operations related to files and directories.import fitz: This is a Python library (PyMuPDF) used for reading PDF files and extracting text from them.from gtts import gTTS: This library allows us to convert text into spoken audio (MP3 files) using Google’s Text-to-Speech API.from concurrent.futures import ThreadPoolExecutor: This module provides a high-level interface for asynchronously executing functions using threads or processes.import threading: Threading is Python’s way of achieving concurrency, allowing multiple tasks to run concurrently.app = Flask(__name__): Creates a new Flask application instance.app.secret_key = (Your Secret Key): Replace this with your own secret key. This line sets a secret key for the Flask application. The secret key is used to secure session data and other cryptographic operations within Flask. It should be kept private and secure.So far,
We will keep track of the conversion status using a shared dictionary. We’ll use a lock to ensure thread-safe updates to this dictionary:
status = {
"current_chunk": 0,
"total_chunks": 0,
"status": "idle"
}
status_lock = threading.Lock()
def update_status(current, total, state):
with status_lock:
status["current_chunk"] = current
status["total_chunks"] = total
status["status"] = state
status = { "current_chunk": 0, "total_chunks": 0, "status": "idle" }status is a dictionary that stores information about the current state of a conversion process.current_chunk: Tracks the number of the current chunk (part) of the PDF being processed.total_chunks: Indicates the total number of chunks (parts) in the PDF that need to be processed.status: Describes the current status of the conversion process, which starts as “idle”.status_lock = threading.Lock(): This line creates a Lock object named status_lock.Lock in Python ensures that only one thread can access a shared resource (in this case, the status dictionary) at a time.status dictionary are handled safely when multiple threads are involved.def update_status(current, total, state):
update_status takes three parameters: current: The current chunk number being processed.total: The total number of chunks in the PDF.state: The current state of the conversion process (e.g., “processing”, “completed”, “error”).with status_lock: ensures that the code block inside it is executed while holding the status_lock. This prevents other threads from modifying status concurrently.with blockstatus["current_chunk"] = current: Updates the current_chunk in the status dictionary with the provided current value.status["total_chunks"] = total: Updates the total_chunks in the status dictionary with the provided total value.status["status"] = state: Updates the status in the status dictionary with the provided state value, indicating the current state of the conversion process.In simpler terms:
status to keep track of where we are in converting a PDF to audio.status_lock ensures that only one part of the program can update this dictionary at a time, preventing mix-ups.update_status function makes it easy to tell what part of the PDF we’re on and if the conversion is done or still going.Next, we need a function to convert the PDF to text. We will use PyMuPDF for this:
def pdf_to_text(pdf_file):
text = ""
try:
doc = fitz.open(pdf_file)
for page_num in range(len(doc)):
page = doc.load_page(page_num)
text += page.get_text()
doc.close()
except Exception as e:
print(f"Error reading PDF: {e}")
return None
return text
def pdf_to_text(pdf_file):
pdf_to_text that takes pdf_file as a parameter. This parameter is expected to be the path to a PDF file.text Variabletext = ""
text is initialized as an empty string. This variable will accumulate all the text content extracted from the PDF.try:
doc = fitz.open(pdf_file)
for page_num in range(len(doc)):
page = doc.load_page(page_num)
text += page.get_text()
doc.close()
except Exception as e:
print(f"Error reading PDF: {e}")
return None
fitz.open(pdf_file): Opens the PDF file specified by pdf_file using PyMuPDF (also known as fitz).
for page_num in range(len(doc)): Iterates through each page in the PDF.
page = doc.load_page(page_num): Loads the current page from the PDF.text += page.get_text(): Retrieves the text content of the current page and appends it to the text variable.doc.close(): Closes the PDF document after all pages have been processed.
except Exception as e: Handles any exceptions that may occur during the PDF reading process.
Exception), it prints an error message indicating the issue (f"Error reading PDF: {e}").None to indicate that the function did not successfully extract text from the PDF.In summarry,
pdf_to_text function reads a PDF file (pdf_file) and extracts its text content.fitz) to open the PDF and iterate through each page to get the text.None.return text
text extracted from all pages of the PDF.We also need a function to convert text to MP3. We will use Google Text-to-Speech (gTTS) for this:
def text_to_mp3_chunk(chunk, chunk_index, output_directory):
try:
tts = gTTS(text=chunk, lang='en')
temp_chunk_path = os.path.join(output_directory, f'chunk_{chunk_index}.mp3')
tts.save(temp_chunk_path)
print(f"Saved chunk {chunk_index} to {temp_chunk_path}")
except Exception as e:
print(f"Error converting chunk {chunk_index} to MP3: {e}")
Here’s a simplified explanation of the Python function text_to_mp3_chunk that uses Google Text-to-Speech (gTTS) to convert a chunk of text into an MP3 file:
def text_to_mp3_chunk(chunk, chunk_index, output_directory):
This line defines a function named text_to_mp3_chunk that takes three parameters:
chunk: The text content to be converted into speech.chunk_index: The index or identifier of the current chunk.output_directory: The directory where the MP3 file will be saved.try:
tts = gTTS(text=chunk, lang='en')
temp_chunk_path = os.path.join(output_directory, f'chunk_{chunk_index}.mp3')
tts.save(temp_chunk_path)
print(f"Saved chunk {chunk_index} to {temp_chunk_path}")
gTTS(text=chunk, lang='en'): Creates a gTTS (Google Text-to-Speech) object with the specified chunk of text and language (‘en’ for English).temp_chunk_path = os.path.join(output_directory, f'chunk_{chunk_index}.mp3'): Constructs the path where the MP3 file will be saved. The file name includes chunk_index to differentiate between different chunks.tts.save(temp_chunk_path): Saves the synthesized speech as an MP3 file at temp_chunk_path.print(f"Saved chunk {chunk_index} to {temp_chunk_path}"): except Exception as e:
print(f"Error converting chunk {chunk_index} to MP3: {e}")
Exception), it prints an error message indicating the issue (f"Error converting chunk {chunk_index} to MP3: {e}").In simpler terms:
text_to_mp3_chunk function converts a piece of text (chunk) into spoken audio using Google Text-to-Speech.output_directory.To handle large texts, we will split the text into chunks, convert each chunk to MP3, and then combine these chunks into a single MP3 file:
def combine_mp3_chunks(chunk_count, output_directory, final_mp3_path):
try:
with open(final_mp3_path, 'wb') as final_mp3_file:
for i in range(chunk_count):
chunk_path = os.path.join(output_directory, f'chunk_{i}.mp3')
with open(chunk_path, 'rb') as chunk_file:
final_mp3_file.write(chunk_file.read())
os.remove(chunk_path) # Clean up chunk file
print(f"MP3 saved to {final_mp3_path}")
except Exception as e:
print(f"Error combining MP3 chunks: {e}")
def combine_mp3_chunks(chunk_count, output_directory, final_mp3_path):
This line defines a function named combine_mp3_chunks that takes three parameters:
chunk_count: The total number of MP3 chunks to combine.output_directory: The directory where the MP3 chunks are stored.final_mp3_path: The path where the final combined MP3 file will be saved.try:
with open(final_mp3_path, 'wb') as final_mp3_file:
for i in range(chunk_count):
chunk_path = os.path.join(output_directory, f'chunk_{i}.mp3')
with open(chunk_path, 'rb') as chunk_file:
final_mp3_file.write(chunk_file.read())
os.remove(chunk_path) # Clean up chunk file
print(f"MP3 saved to {final_mp3_path}")
with open(final_mp3_path, 'wb') as final_mp3_file:: Opens final_mp3_path in write mode ('wb'), creating a new file for the final combined MP3.for i in range(chunk_count):: Iterates through each chunk index from 0 to chunk_count - 1.chunk_path = os.path.join(output_directory, f'chunk_{i}.mp3'): Constructs the path to each MP3 chunk file.with open(chunk_path, 'rb') as chunk_file:: Opens chunk_path in read mode ('rb') to read its content. final_mp3_file.write(chunk_file.read()): Writes the content of the current chunk file (chunk_file) to the final MP3 file (final_mp3_file).os.remove(chunk_path): Deletes (removes) the current chunk file (chunk_path) after its content has been written to the final MP3 file to clean up disk space.except Exception as e:
print(f"Error combining MP3 chunks: {e}")
Exception), it prints an error message describing the issue (f"Error combining MP3 chunks: {e}").Let’s summarize this
combine_mp3_chunks function takes multiple MP3 chunks stored in output_directory, combines them into a single MP3 file (final_mp3_path), and deletes the individual chunk files afterward.Now, let’s set up the routes for our Flask application:
/) will display the form to upload the PDF file./status route will return the current status of the conversion./download/<filename> route will handle the download of the MP3 file.@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
pdf_file = request.files['pdf_file']
if pdf_file.filename == '':
return "No file chosen"
mp3_output_directory = './uploads'
if not os.path.exists(mp3_output_directory):
os.makedirs(mp3_output_directory)
if pdf_file and pdf_file.filename.endswith('.pdf'):
try:
# Save the uploaded PDF file
pdf_path = os.path.join(mp3_output_directory, pdf_file.filename)
pdf_file.save(pdf_path)
print(f"PDF saved to {pdf_path}")
# Convert PDF to text
text = pdf_to_text(pdf_path)
if text:
print(f"Extracted text: {text[:500]}...") # Print first 500 characters for verification
# Convert text to MP3
mp3_filename = f"{os.path.splitext(os.path.basename(pdf_path))[0]}.mp3"
mp3_path = os.path.join(mp3_output_directory, mp3_filename)
print(f"Converting text to MP3 and saving to {mp3_path}...")
# Splitting the text into chunks
text_chunks = [text[i:i + 1000] for i in range(0, len(text), 1000)]
update_status(0, len(text_chunks), 'processing')
# Process chunks in parallel using ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
futures = [executor.submit(text_to_mp3_chunk, chunk, i, mp3_output_directory) for i, chunk in enumerate(text_chunks)]
for i, future in enumerate(futures):
future.result() # Ensure each thread completes
update_status(i + 1, len(text_chunks), 'processing')
# Combine all chunks into a single MP3 file
combine_mp3_chunks(len(text_chunks), mp3_output_directory, mp3_path)
update_status(len(text_chunks), len(text_chunks), 'done')
return jsonify({"status": "done", "download_url": f"/download/{mp3_filename}"})
else:
print("Error converting PDF to text.")
return "Error converting PDF to text. Please try again."
except Exception as e:
print(f"Error processing PDF: {e}")
return "Error processing PDF. Please try again."
return render_template('index.html')
@app.route('/status', methods=['GET'])
def get_status():
with status_lock:
return jsonify(status)
@app.route('/download/<filename>')
def download(filename):
mp3_file_path = os.path.join('./uploads', filename)
try:
print(f"Attempting to send file: {mp3_file_path}")
return send_file(mp3_file_path, as_attachment=True)
except Exception as e:
print(f"Error sending file: {e}")
return "Error sending file. Please try again."
if __name__ == '__main__':
app.run(debug=True)
/)This route handles both GET and POST requests.On POST request:
pdf_file)../uploads directory.pdf_to_text.text_to_mp3_chunk.combine_mp3_chunks.update_status.On GET request or if no file is uploaded, it renders an index.html template for file upload.
/status)@app.route('/status', methods=['GET'])
def get_status():
with status_lock:
return jsonify(status)
GET requests to /status.status_lock to ensure thread-safe access to status dictionary.status dictionary)./download/<filename>)@app.route('/download/<filename>')
def download(filename):
mp3_file_path = os.path.join('./uploads', filename)
try:
print(f"Attempting to send file: {mp3_file_path}")
return send_file(mp3_file_path, as_attachment=True)
except Exception as e:
print(f"Error sending file: {e}")
return "Error sending file. Please try again."
<filename>.mp3_file_path) using filename from ./uploads directory.mp3_file_path) as an attachment for download using send_file.__name__ == '__main__') and Debug Modeif __name__ == '__main__':
app.run(debug=True)
app) runs when this script is executed directly.debug=True enables debug mode in Flask, providing detailed error messages and auto-reloading the server when code changes.In simpler terms:
/, /status, /download/<filename>) for uploading a PDF, checking conversion status, and downloading the converted MP3.Next, we need to create the HTML template for our web interface.
templates/index.htmlCreate a directory named templates in your project directory, and inside it, create a file named index.html. This file will contain the HTML for our web interface:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>PDF to MP3 Converter</title>
<link rel="stylesheet" href="/static/style.css">
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
</head>
<body>
<h1>PDF to MP3 Converter</h1>
<form id="uploadForm" enctype="multipart/form-data">
<input type="file" name="pdf_file" id="pdf_file" accept="application/pdf">
<button type="submit">Upload and Convert</button>
</form>
<div id="status" style="display:none;">
<p id="progress">Converting...</p>
<progress id="progressBar" value="0" max="100"></progress>
</div>
<div id="downloadLink" style="display:none;">
<a id="downloadAnchor" href="" target="_blank">Download Audio</a>
</div>
<script>
$(document).ready(function(){
$('#uploadForm').submit(function(event){
event.preventDefault();
var formData = new FormData(this);
$('#status').show();
$('#progress').text('Uploading PDF...');
$('#progressBar').val(0);
$.ajax({
url: '/',
type: 'POST',
data: formData,
processData: false,
contentType: false,
success: function(data){
if(data.status === 'done'){
$('#progress').text('Conversion complete!');
$('#progressBar').val(100);
$('#downloadAnchor').attr('href', data.download_url);
$('#downloadLink').show();
} else {
$('#progress').text('Converting PDF to MP3...');
checkStatus();
}
},
error: function(){
alert('Error uploading file.');
}
});
});
function checkStatus() {
$.ajax({
url: '/status',
type: 'GET',
success: function(data){
if(data.status === 'processing'){
$('#progress').text('Processing chunk ' + data.current_chunk + ' of ' + data.total_chunks);
$('#progressBar').val((data.current_chunk / data.total_chunks) * 100);
setTimeout(checkStatus, 1000);
} else if(data.status === 'done'){
$('#progress').text('Conversion complete!');
$('#progressBar').val(100);
$('#downloadAnchor').attr('href', data.download_url);
$('#downloadLink').show();
}
},
error: function(){
alert('Error checking status.');
}
});
}
});
</script>
</body>
</html>
Create a directory named static in your project directory, and inside it, create a file named style.css. This file will contain our CSS styles:
static/style.cssbody {
font-family: Arial, sans-serif;
text-align: center;
margin-top: 50px;
}
form {
margin-bottom: 20px;
}
#status {
margin-top: 20px;
}
#downloadLink {
margin-top: 20px;
}
Now that we have all the necessary files, let’s run our Flask application. In your terminal, navigate to your project directory and run:
python app.py
Open your web browser and go to http://127.0.0.1:5000/. You should see the web interface for uploading a PDF file.
Congratulations! You have successfully created a PDF-to-Audio converter using Python. This project demonstrates how to integrate various Python libraries to build a useful tool. By combining Flask for the web interface, PyMuPDF for reading PDFs, and gTTS for text-to-speech conversion, we created a perfect user experience for converting PDF files to MP3.
Feel free to expand this project by adding features like language selection for the text-to-speech conversion or supporting more file formats.
As I have already provided the complete index.html and styles.css files, here is the full app.py file.
from flask import Flask, render_template, request, send_file, jsonify
import os
import fitz # PyMuPDF
from gtts import gTTS # Google Text-to-Speech
from concurrent.futures import ThreadPoolExecutor
import threading
app = Flask(__name__)
app.secret_key = (Your Secret Key) # Replace this with your own secret key
status = {
"current_chunk": 0,
"total_chunks": 0,
"status": "idle"
}
# Lock for thread-safe status updates
status_lock = threading.Lock()
# Function to convert PDF to text using PyMuPDF (fitz)
def pdf_to_text(pdf_file):
text = ""
try:
doc = fitz.open(pdf_file)
for page_num in range(len(doc)):
page = doc.load_page(page_num)
text += page.get_text()
doc.close()
except Exception as e:
print(f"Error reading PDF: {e}")
return None
return text
# Function to convert a chunk of text to MP3 using Google Text-to-Speech (gTTS)
def text_to_mp3_chunk(chunk, chunk_index, output_directory):
try:
tts = gTTS(text=chunk, lang='en')
temp_chunk_path = os.path.join(output_directory, f'chunk_{chunk_index}.mp3')
tts.save(temp_chunk_path)
print(f"Saved chunk {chunk_index} to {temp_chunk_path}")
except Exception as e:
print(f"Error converting chunk {chunk_index} to MP3: {e}")
# Function to combine multiple MP3 chunks into a single MP3 file
def combine_mp3_chunks(chunk_count, output_directory, final_mp3_path):
try:
with open(final_mp3_path, 'wb') as final_mp3_file:
for i in range(chunk_count):
chunk_path = os.path.join(output_directory, f'chunk_{i}.mp3')
with open(chunk_path, 'rb') as chunk_file:
final_mp3_file.write(chunk_file.read())
os.remove(chunk_path) # Clean up chunk file
print(f"MP3 saved to {final_mp3_path}")
except Exception as e:
print(f"Error combining MP3 chunks: {e}")
def update_status(current, total, state):
with status_lock:
status["current_chunk"] = current
status["total_chunks"] = total
status["status"] = state
# Route for the home page
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
pdf_file = request.files['pdf_file']
if pdf_file.filename == '':
return "No file chosen"
mp3_output_directory = './uploads'
if not os.path.exists(mp3_output_directory):
os.makedirs(mp3_output_directory)
if pdf_file and pdf_file.filename.endswith('.pdf'):
try:
# Save the uploaded PDF file
pdf_path = os.path.join(mp3_output_directory, pdf_file.filename)
pdf_file.save(pdf_path)
print(f"PDF saved to {pdf_path}")
# Convert PDF to text
text = pdf_to_text(pdf_path)
if text:
print(f"Extracted text: {text[:500]}...") # Print first 500 characters for verification
# Convert text to MP3
mp3_filename = f"{os.path.splitext(os.path.basename(pdf_path))[0]}.mp3"
mp3_path = os.path.join(mp3_output_directory, mp3_filename)
print(f"Converting text to MP3 and saving to {mp3_path}...")
# Splitting the text into chunks
text_chunks = [text[i:i + 1000] for i in range(0, len(text), 1000)]
update_status(0, len(text_chunks), 'processing')
# Process chunks in parallel using ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
futures = [executor.submit(text_to_mp3_chunk, chunk, i, mp3_output_directory) for i, chunk in enumerate(text_chunks)]
for i, future in enumerate(futures):
future.result() # Ensure each thread completes
update_status(i + 1, len(text_chunks), 'processing')
# Combine all chunks into a single MP3 file
combine_mp3_chunks(len(text_chunks), mp3_output_directory, mp3_path)
update_status(len(text_chunks), len(text_chunks), 'done')
return jsonify({"status": "done", "download_url": f"/download/{mp3_filename}"})
else:
print("Error converting PDF to text.")
return "Error converting PDF to text. Please try again."
except Exception as e:
print(f"Error processing PDF: {e}")
return "Error processing PDF. Please try again."
return render_template('index.html')
# Route to get the current status
@app.route('/status', methods=['GET'])
def get_status():
with status_lock:
return jsonify(status)
# Route to download the generated MP3 file
@app.route('/download/<filename>')
def download(filename):
mp3_file_path = os.path.join('./uploads', filename)
try:
print(f"Attempting to send file: {mp3_file_path}")
return send_file(mp3_file_path, as_attachment=True)
except Exception as e:
print(f"Error sending file: {e}")
return "Error sending file. Please try again."
if __name__ == '__main__':
app.run(debug=True)
pip install Flask
pip install PyMuPDF
pip install gTTS
After debugging production systems that process millions of records daily and optimizing research pipelines that…
The landscape of Business Intelligence (BI) is undergoing a fundamental transformation, moving beyond its historical…
The convergence of artificial intelligence and robotics marks a turning point in human history. Machines…
The journey from simple perceptrons to systems that generate images and write code took 70…
In 1973, the British government asked physicist James Lighthill to review progress in artificial intelligence…
Expert systems came before neural networks. They worked by storing knowledge from human experts as…
This website uses cookies.