This table indicates the set of analytic components in development. Various columns indicate capabilities of analytic components via fields like media analyzed, supported actions, and semantic versus non semantic analysis. This table was last updated on March 19, 2024.
Source Attribution of Online News Images by Compression Analysis
The online news websites compress digital images in their articles. We find that different sites often compress their images in distinctive ways, which enable source attribution via analysis of image compression settings.
Thus, the analytic attributes a news article to one of 30 news sources based on image compression statistics from a collected datasets of those news sources' articles. The analytic assumes that the news article comes from one of those 30 articles.
Supported media types:
News Articles with Images
GAN-Generated Image Detection
This analytic classifies computer-generated images, such as GAN images, from photos captured by cameras. The input to the network is a residual image, which is obtained by median-filtering the original image pixels.
Supported media types:
Image
Detection of DeepFake Videos By Detecting Face Warping Artifacts
This Project is designed for GAN generated/manipulated image detection for eval4 of SemaFor. Single image frames extracted from videos in the following training dataset were used for training: https://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Celeb-DF_A_Large-Scale_Challenging_Dataset_for_DeepFake_Forensics_CVPR_2020_paper.pdf
The detection architecture was designed on the MediFor program to detect DeepFake videos, using all frames of a video: https://github.com/yuezunli/DSP-FWA
Supported media types:
Video
CNN Detmatch Generated Image Detection
This analytic classifies CNN-generated fake images, such as GAN images, and recognizes the type of generator that was used to make it. The component was trained to detect ProGAN images, after applying augmentation, using approach of Sheng-Yu Wang et al., CVPR 2020.
Supported media types:
Image
Contrastive Domain Adaptation for AI-generated Text Detection
This analytic can perform detection of text generated by various generators (CTRL, FAIR_wmt19, GPT2_xl, GPT-3, GROVER_mega, XLM and GPT-3.5) without requiring labeled training data from target generator. The analytic also supports re-training on new generators, if needed.
The following text generation architectures are supported by default: CTRL, FAIR_wmt19, GPT2_xl, GPT-3, GROVER_mega, XLM and GPT-3.5.
Supported media types:
Image
Stylometric Detection of Machine Generated Text in Twitter Timelines
Tweets are inherently short, making it difficult for current state-of-the-art pre-trained language model-based detectors to accurately detect at what point AI starts to generate tweets in a given Twitter timeline. In this paper, we present a novel algorithm using stylometric signals to aid detecting AI-generated tweets. We propose models corresponding to quantifying stylistic changes in human and AI tweets in two related tasks: Task 1 - discriminate between human and AI-generated tweets, and Task 2 - detect if and when an AI starts to generate tweets in a given Twitter timeline. Our extensive experiments demonstrate that the stylometric features are effective in augmenting the state-of-the-art AI-generated text detectors.
Supported media types:
Text
J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News
J-Guard is capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By incorporating stylistic cues inspired by the unique journalistic attributes, J-Guard effectively distinguishes between real-world journalism and AI-generated news articles. Our experiments on news articles generated by a vast array of AI models, including ChatGPT (GPT3.5), demonstrate the effectiveness of J-Guard in enhancing detection capabilities while maintaining an average performance decrease of as low as 7% when faced with adversarial attacks.
Supported media types:
Text
Synthetic face GAN image detector
This is a GAN Image detector for Faces. The generators focus is StyleGAN2 and its variants. The analytic always provides an evidence, regardless of score. The evidence is visual that shows the provided probe (aligned to StyleGAN2 requirements) and the closest image in StyleGAN2 latent space to this probe. A close visual similarity suggests the image is GAN generated while a difference suggests the image is real as it is hard to find a good fit in the latent space.
Supported media types:
Image
Synthetic GAN image atttribution
This is a GAN image detector that is trained on a set of early generators. It was trained on different datasets the preceded StyleGAN3 and Diffusion-based generators. It provides no evidence.
Supported media types:
Image
Splicing image detector
This is a Splicing Image detector. Image pixels are assigned a likelihood of being not authentic. The range of output is displayed as a greyscale heatmap. Initially developed for MediFor, it is of general purpose and is not focused on faces. It learns inconsistencies in optical properties of background and spliced regions. Trained on different datasets but uses the latest weights from Dec 2020. A heatmap image, 0-255 with highest value suggesting tampered pixel.
Supported media types:
Image
Synthetic Audio Attribution for MTVC using Spectrogram Transformer
This component reads audio files, formats them as spectrograms, and then attributes them as generated by MTVC or not, with a trained Patchout faSt Spectrogram Transformer (PaSST).
Supported media types:
Audio
Synthetic Audio Attribution for RTVC using Spectrogram Transformer
This component reads audio files, formats them as spectrograms, and then attributes them as generated by RTVC or not, with a trained Patchout faSt Spectrogram Transformer (PaSST).
Supported media types:
Audio
Synthetic Audio Detection using Spectrogram Transformer Without Windowing
This component reads audio files, formats them as spectrograms, and then detects whether they are synthesized or authentic audio with a trained Patchout faSt Spectrogram Transformer (PaSST).
Supported media types:
Audio
Source code:
To be published soon
Synthetic Audio Detection using Spectrogram Transformer With 18 sec Windowing
This component reads audio files, formats them as spectrograms, and then detects whether they are synthesized or authentic audio with a trained Patchout faSt Spectrogram Transformer (PaSST). This component implements a windowing approach with window length 18 seconds and takes maximum to fuse decision of all windows.
Supported media types:
Audio
Source code:
To be published soon
Synthetic Audio Detection using Spectrogram Transformer With 12 sec Windowing
This component reads audio files, formats them as spectrograms, and then detects whether they are synthesized or authentic audio with a trained Patchout faSt Spectrogram Transformer (PaSST). This component implements a windowing approach with window length 12 seconds and takes maximum to fuse decision of all windows.
Supported media types:
Audio
Source code:
To be published soon
Synthetic Audio Detection using Spectrogram Transformer With 24 sec Windowing
This component reads audio files, formats them as spectrograms, and then detects whether they are synthesized or authentic audio with a trained Patchout faSt Spectrogram Transformer (PaSST). This component implements a windowing approach with window length 24 seconds and takes maximum to fuse decision of all windows.
Supported media types:
Audio
Source code:
To be published soon
Analyzing the Political Biases of Large Language Models and Its Impact on Misinformation Detection
This component develops a quantitative framework to evaluate the political biases of language models and further investigate their impact on performance of misinformation detection and hate speech detection.
Supported media types:
Text
Knowledge Card: Empowering Large Language Models with Modular and Specialized Information Sources for Misinformation Characterization
This component develops the Knowledge Framework, where a large language model interacts with a pool of small, independently trained, modualr, and specialized language models. Knowledge Card enhances the knowledge access of a static LLM and boosts performance for misinformation characterization.
Supported media types:
Text
Investigating the Zero-Shot Generalization of Machine-Generated Text Detectors
This component develops a framework to evaluate how well do existing machine-generated text detectors generalize across different LLM text generators.
Supported media types:
Text