HiI want to import a csv table with tweets from twitter. But I get the error message:
Failed to invoke procedure apoc.load.csv: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 4 out of bounds for length 4
My Code is:
CALL apoc.load.csv("conversations_until_2021_06_18.tsv"{
sep: "TAB",
arraySep: ",",
skip: 100000,
mapping: {
hashtags: {array: true},
mentions: {array: true},
ref_id: {array: true},
reply_count: {type: "int"},
retweet_count: {type: "int"},
quote_count: {type: "int"},
like_count: {type: "int"}
}
}
)
YIELD map AS tweet
CREATE (t:Tweet)
SET t = tweet
Please provide the following information if you ran into a more serious issue:
I'm using Neo4j v4.3.1Desktop v1.4.5
An example:
comment_type conversatoin_id text author_id tweet_id ref_type ref_id in_reply_to_user_id created_at mentions url hashtags like_count quote_count reply_count retweet_count reply_settings
side 1234 @url https://t.co/... 345 5678 replied_to 564465 4566 2021-04-28T15:55:42.000Z ABaerbockArminLaschet https://twitter.com/... NaN 0 0 0 0 everyone
One of my files works finebut the second produces this error. According to this this question there my be a problem with a line in the file. But how to find that line? I have no idea where an array of length 4 could be.
There seems to be some "fake tabs" in your file.
Can be solved by replacing them.
You can do it for example with IntelliJopening the filethen Replaceand substitute [ ]+ (that is1 or more spaces) with \t (selecting the Regex option).
Obviouslyyou could also delete spaces that you don't want to deletein that case you have to replace only what you need
Yesthe error message is unclearbut unfortunately it seems to depend on an external library (http://opencsv.sourceforge.net/)
Thanks for the input. What do you mean by "fake tabs"? I have some text with spacesso i cannot simply replace them.
I deleted some quotation marks and brackets from the file. Could this cause the problem?
I don't think the missing quotation marks cause the problem.
For "fake tabs" I mean some column separators which are not actually tab characters,
but instead of multiple white spaces.
In simple termsif you copy and paste your example in VsCodeIntelliJ or any other editor
and try to search "\t" (using "Regular Expression" option),
you will see that some are not tab separated. For example from 1234 and @url there are 2 spaces instead of 1 tab (this causes the error).
I think I found my problem. Some of my strings ended with a Backslash \. This escapes the following tab. Using \\ seems to solve this.