File format identification
Why does file format identification matter?
Not all file formats are equally reliable or preservation friendly. If you are planning to retain digital materials for the long-term (more than 10 years), it is important to understand the file formats you’re working with, so you can consider the risks and plan ahead.
Unidentified or misidentified formats can’t be properly managed or preserved. Even if your digital materials are accessible now, changes in software or hardware used to access a file’s contents may result in it becoming inaccessible in future.
Accurately identifying file formats helps to:
- Predict whether files will remain accessible
- Flag potentially unsupported or obsolete formats
- Reduce the risk of data loss or inaccessibility over time
- Inform preservation planning and actions, such as migration and/or normalisation of files.
Overall, identifying file formats supports good long-term preservation (including by preservation systems).
Why file extensions aren’t enough
Unfortunately, looking at the file extension isn’t always a reliable way to identify the format you’re working with. File extensions can:
- Be wrong – files can be mislabeled due to human or computer error
- Provide limited information – some formats share the same extension but are structurally different, and need to be managed differently
- Be missing altogether.
Tools to identify file formats
The following free tools are made available by the National Archives UK to identify file formats.
| Tool name | Description |
|---|---|
| DROID | A software tool that uses internal file signatures to perform automated, batch identification of file formats. Uses the PRONOM registry. |
| PRONOM | A registry of technical information about different file formats. |
For a demonstration of DROID, refer to the Digital Preservation Coalition website.
If DROID can't identify a file format, you can search the PRONOM registry manually, or contact the University’s Digital Preservation Program for further assistance.
What can you do?
- Be aware of the file formats that you are working with.
- Run DROID to identify any file formats you are unsure of.
- When creating files that you intend to retain and use for the long-term, choose sustainable file formats that will be more likely to remain accessible over time.
- If you are working with existing files which pose preservation risks, consider migrating the files to sustainable formats.
- Seek advice from the Digital Preservation Program via email at digital-stewardship@unimelb.edu.au