Extract Jira Tickets with Comment History - Python - Step-by-Step Guide

In this post, we'll explore a step-by-step guide on how to extract all Jira tickets with their comment history using Python. This can be a crucial task for analysis, reporting or backup.

Prerequisites

Jira Account

Ensure you have access to a Jira account with the necessary permissions to view and extract ticket data.

Python Environment

Install Python on your machine if it's not already installed. This code is tested against Python 3.10

Required Libraries

Install the necessary Python libraries using the following commands:

pip install jira
pip install pandas

1. Get API token

First step is to get an API token with the right permissions. You can find more here:

In my case steps were:

P.S. You can copy the key only once - so store it somewhere in case of a need.

2. Python Extract Ticket and Changelog

Below you can find the code which is extracting full Jira history of tickets with comments:

import pandas as pd
from jira import JIRA

jira = JIRA({"server": "https://xxx.atlassian.net/"}, basic_auth=("xxx@xxx.com", "xxx"))
issues = jira.search_issues('comment ~ "keyword" and project = "XXX" ORDER BY created DESC', maxResults=None)

ls_status = []
for issue in issues:

	issue_num = issue.key
	issue = jira.issue(issue_num, expand="changelog,issuelinks")

	changelog = issue.changelog
	for history in changelog.histories:
    	for item in history.items:
        	if item.field == 'status':
            	statuses = {'issue_num': issue_num, 'history.created': history.created,
                        	'item.fromString': item.fromString, 'item.toString': item.toString}
            	ls_status.append(statuses)

df = pd.DataFrame(ls_status)	 

2.1 Explain the code

Below you can find detailed explanation of this code:

  • Libraries Imported:

    • pandas is used for data manipulation and analysis.
    • jira is used for interacting with the Jira API.
  • Jira Connection:

    • Use your Jira server URL and user credentials (email and API token).
  • Issue Search:

    • The JQL (Jira Query Language) query searches for issues with a specific keyword in comments. Extraction of issues belonging to a particular project and orders the results by creation date in descending order.
    • The maxResults=None parameter ensures that all matching issues are retrieved.
  • Changelog Processing:

    • The code iterates through the changelog histories of each issue.
    • For each history entry, it further iterates through the items in the history.
  • Status Change Detection:

    • If an item in the changelog corresponds to a change in the 'status' field, relevant information is extracted.
    • We extract information such as:
      • issue number (issue_num)
      • the timestamp of the status change (history.created)
      • the previous status (item.fromString)
      • and the new status (item.toString)
  • Data Storage:

    • The extracted status change information is stored in the df_status list.

2.2 Data processing

In this section we will see how to calculate how much time a ticket spend in a given status.

First we will convert Jira dates and time to Pandas:

  • date
  • week number
  • time
df['history.created'] = pd.to_datetime(df['history.created'], utc=True)
df['history.created.prev'] = df['history.created'].shift(-1)

df['date'] = df['history.created'].dt.date
df['week'] = df['history.created'].dt.isocalendar().week
df['time'] = df['history.created'].dt.time

Next we will calculate how much hours a ticket stayed in a given status by:

df['time_in_status'] = 0 
df['time_in_status_days'] = 0 
df['time_in_status_hours'] = 0 

n = 0

for row in df.iterrows():
    row = row[1]
    if row.name > 0 and row['issue_num'] != df.loc[row.name - 1, :]['issue_num']:
        n = 0

    if row['item.fromString'] == 'To Do':
        time_feed = row['history.created'] - row['history.created.prev']
        df.loc[row.name, 'time_in_status_days'] = time_feed.days
        df.loc[row.name, 'time_in_status_hours'] = (time_feed.days) * 24 + (time_feed.seconds)/3600
        df.loc[row.name, 'time_in_status'] = time_feed
        df.loc[row.name, 'iteration'] = n
        n = n + 1

3. Extract All ticket Comments in Jira

To extract ticket comments from Jira using Python, we can use the jira library:

from jira import JIRA

JIRA_SERVER = "xxx"
USERNAME = "xxx"
API_TOKEN = 'xxx'

jira = JIRA(server=JIRA_SERVER, basic_auth=(USERNAME, API_TOKEN))

issue_key = 'xxx'
issue = jira.issue(issue_key, expand='comments')

comments = []
for comment in issue.fields.comment.comments:
	comments.append({
    	'author': comment.author.displayName,
    	'created': comment.created,
    	'body': comment.body
	})

for comment in comments:
	print(f"Author: {comment['author']}\nCreated: {comment['created']}\nComment: {comment['body']}\n---\n")

P.S. Replace the placeholder values for:

  • JIRA_SERVER
  • USERNAME
  • API_TOKEN
  • issue_key

This script above connects to Jira, retrieves the specified issue, and extracts the comments along with:

  • the author's name
  • creation timestamp
  • comment body

4. Resources

Below you can find additional resources. For example you can learn how to use request library to access Jira API endpoint:

Summary

In summary, this code fetches Jira issues based on a specified JQL query, then iterates through each issue's changelog to identify and record status changes.

The collected status change data is stored in a list (df_status) for further analysis or storage.

By following these steps, you can effortlessly extract Jira ticket data along with comment history using Python, facilitating more comprehensive insights and reporting for your project management needs.